### *Google Trends* data for "Kaggle" in Finland, Norway, and Sweden (2015-2019)
This short notebook is inspired by an obsevation made by [John Mitchell](https://www.kaggle.com/jbomitchell) in the topic ["We need more external data"](https://www.kaggle.com/c/tabular-playground-series-jan-2022/discussion/302694) by [AmbrosM](https://www.kaggle.com/ambrosm) regarding the need for kaggle related data. Here we look at the [*Google Trends*](https://trends.google.com/trends/) data for the keyword `kaggle`, in the countries Finland, Norway, and Sweden as well as worldwide, using the [pytrends API](https://github.com/GeneralMills/pytrends).

Each of the dataframes are saved as `.csv` files and output in the Data section of this notebook

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18})
plt.style.use('fivethirtyeight')

In [None]:
!pip install -q pytrends
from pytrends.request import TrendReq

# date range
dates = '2015-01-01 2019-12-31'

# keyword
kw_list=['kaggle']

### Finland

In [None]:
pytrends = TrendReq(geo='FI')
pytrends.build_payload(kw_list=kw_list, timeframe=dates)
Finland_df = pytrends.interest_over_time()
Finland_df.plot(y=kw_list, kind='line',figsize=(15,5), lw=2, title="Kaggle in Finland");

### Norway

In [None]:
pytrends = TrendReq(geo='NO')
pytrends.build_payload(kw_list=kw_list, timeframe=dates)
Norway_df = pytrends.interest_over_time()
Norway_df.plot(y=kw_list, kind='line',figsize=(15,5), lw=2, title="Kaggle in Norway");

### Sweden

In [None]:
pytrends = TrendReq(geo='SE')
pytrends.build_payload(kw_list=kw_list, timeframe=dates)
Sweden_df = pytrends.interest_over_time()
Sweden_df.plot(y=kw_list, kind='line',figsize=(15,5), lw=2, title="Kaggle in Sweden");

### Worldwide

In [None]:
pytrends = TrendReq(hl='en-US', tz=360)
pytrends.build_payload(kw_list=kw_list, timeframe=dates)
Worldwide_df = pytrends.interest_over_time()
Worldwide_df.plot(y=kw_list, kind='line',figsize=(15,5), lw=2, title="Kaggle worldwide");

### Observations
Firstly to note that the data is scaled form 0 to 100 for each of the individual queries, so they are not directly comparable. It is also clear that the countrywide data is rather sparse, so may not be particularly useful.

The data is also sampled on a weekly basis, whereas the competition data is daily. For convenience we also resample and forward fill the worldwide data (`Worldwide_trends_data_daily.csv`) on the chance that the worldwide data may perhaps be useful.

### Save the data to `.csv` files

In [None]:
Finland_df.to_csv('Finland_trends_data.csv', index=True)
Norway_df.to_csv('Norway_trends_data.csv', index=True)
Sweden_df.to_csv('Sweden_trends_data.csv', index=True)
Worldwide_df.to_csv('Worldwide_trends_data.csv', index=True)

# forward-fill the missing days
Worldwide_df_daily = Worldwide_df.resample('D').ffill()
Worldwide_df_daily.to_csv('Worldwide_trends_data_daily.csv', index=True)


### Related reading

* [pytrends](https://github.com/GeneralMills/pytrends) Unofficial API for Google Trends

### Related notebooks

* ["The latest Trends in data science"](https://www.kaggle.com/carlmcbrideellis/the-latest-trends-in-data-science)