### Plot of the GDP of Finland, Norway, and Sweden (2015-2019)
This notebook plots the data for the [gross domestic product](https://en.wikipedia.org/wiki/Gross_domestic_product) (in billion US$) for the countries Finland, Norway, and Sweden for the years 2015-2019 from the dataset ["GDP 2015-2019: Finland, Norway, and Sweden"](https://www.kaggle.com/carlmcbrideellis/gdp-20152019-finland-norway-and-sweden), along with the the yearly total of `num_sold` for the [Tabular Playground Series - Jan 2022](https://www.kaggle.com/c/tabular-playground-series-jan-2022) competition.

We can see that, for all three countries, there is a noticeable drop in the GDP in the year 2019 with respect to 2018.

In [None]:
import numpy  as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
plt.rcParams.update({'font.size': 16})
plt.rcParams["figure.figsize"] = (10, 5)

In [None]:
train    = pd.read_csv("../input/tabular-playground-series-jan-2022/train.csv", parse_dates=['date'],index_col="row_id")
GDP_data = pd.read_csv("../input/gdp-20152019-finland-norway-and-sweden/GDP_data_2015_to_2019_Finland_Norway_Sweden.csv",index_col="year")

In [None]:
GDP_data

### GDP Finland 

In [None]:
GDP_data["GDP_Finland"].plot(kind='bar').legend(loc='center left',bbox_to_anchor=(1.0, 0.5));

### GDP Norway

In [None]:
GDP_data["GDP_Norway"].plot(kind='bar').legend(loc='center left',bbox_to_anchor=(1.0, 0.5));

### GDP Sweden

In [None]:
GDP_data["GDP_Sweden"].plot(kind='bar').legend(loc='center left',bbox_to_anchor=(1.0, 0.5));

### Kaggle merchandise sales (2015-2018)
Here is a plot of the yearly total of `num_sold` for the [Tabular Playground Series - Jan 2022](https://www.kaggle.com/c/tabular-playground-series-jan-2022) competition data:

In [None]:
train['year'] = train['date'].dt.year
pivot_table = pd.pivot_table(train, index=['year'], values=['num_sold'], aggfunc=sum)
pivot_table.plot(kind='bar').legend(loc='center left',bbox_to_anchor=(1.0, 0.5));

### Pearson correlation coefficients (2015-2018)
We shall now calculate the [Pearson correlation coefficients and the p-values](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html) between the total yearly `num_sold` for each country, and the GDP for each country for the years 2015 to 2018 inclusive:

In [None]:
yearly_total_sales_data = train.groupby(['country','year'])['num_sold'].sum().reset_index()
yearly_total_sales_data.style.hide_index()

In [None]:
from scipy import stats
# Finland
pearson, p_value = stats.pearsonr( GDP_data.iloc[0:4,0] , yearly_total_sales_data.iloc[0:4,2])
print("Finland correlation " + str(round(pearson, 3)) + " p-value " + str(round(p_value, 3)))
# Norway
pearson, p_value = stats.pearsonr( GDP_data.iloc[0:4,1] , yearly_total_sales_data.iloc[4:8,2])
print("Norway correlation  " + str(round(pearson, 3)) + " p-value " + str(round(p_value, 3)))
# Sweden
pearson, p_value = stats.pearsonr( GDP_data.iloc[0:4,2] , yearly_total_sales_data.iloc[8:12,2])
print("Sweden correlation  " + str(round(pearson, 3)) + " p-value " + str(round(p_value, 3)))

#### Hypothesis: There could be a drop in the yearly total sales in 2019 for each country, corresponding to the drop in the GDP in the year 2019 for each country.

### Addendum: How to add the GDP data to the competition `train.csv` file to make a new feature

In [None]:
# rename the columns in GDP dataframe 
GDP_data.columns = ['Finland', 'Norway', 'Sweden']
# create a dictionary
GDP_dictionary = GDP_data.unstack().to_dict()
# now create a new `GDP_value` column
train["GDP_value"] = train.set_index(['country','year']).index.map(GDP_dictionary.get)

take a quick look

In [None]:
train