# EPA112A - Programming for Data Science - Group 31

- Georges Puttaert - 4686160
- Thijs Roolvink - 4961382
- Gijs de Werd - 4717775

## Research Question

 **What is the relative influence of social, economic, and environmental indicators on a country's GDP, and can we accurately predict GDP based on these indicators?**

*Chosen Countries per category:*
- The Netherlands 
- Germany
- Greece
- Ireland

### Packages

In [47]:
import wbdata
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPRegressor
import matplotlib.pyplot as plt
import country_converter as coco
import warnings
warnings.filterwarnings('ignore')

### Indicators from the World Bank

In [48]:
# Define indicators Inequality and Social Welfare
health_indicators = {'SH.XPD.CHEX.GD.ZS': "Health Expenditure as a Percentage of GDP", "SH.IMM.IDPT": "Immunization"}
GDP_indicator = {'NY.GDP.PCAP.CD': 'gdppc'}
life_exp_indicator = {'SP.DYN.LE00.IN': 'Life Expectancy at Birth', 'SH.DYN.MORT': 'Child Mortality'}
disease_indicator = {'SH.TBS.INCD': 'Indicence of Diseases'} 

#Define indicators Import & Export
export_indicator = {'NE.EXP.GNFS.KD.ZG': 'Export'}
import_indicator = {'NE.IMP.GNFS.KD.ZG': 'Import'}
freight_indicator = {'IS.AIR.GOOD.MT.K1': 'Freight'}

#Define indicators Evironmental
renewable_energy_indicator = {'EG.FEC.RNEW.ZS': 'Renewable energy consumption (% of total final energy consumption)'}

### DataFrames for chosen indicators and countries from the World Bank

In [49]:
countries = ['NLD', 'DEU', 'GRC', 'IRL']

#Dataframes Inequality and Social Welfare
df_health = wbdata.get_dataframe(health_indicators, country=countries, convert_date=True)
df_gdp = wbdata.get_dataframe(GDP_indicator, country=countries, convert_date=True)
df_life_exp = wbdata.get_dataframe(life_exp_indicator, country=countries, convert_date=True)
df_diseases = wbdata.get_dataframe(disease_indicator, country=countries, convert_date=True)

#Dataframes Import & Export
df_export = wbdata.get_dataframe(export_indicator, country=countries, convert_date=True)
df_import = wbdata.get_dataframe(import_indicator, country=countries, convert_date=True)
df_freight = wbdata.get_dataframe(freight_indicator, country=countries, convert_date=True)

#Dataframes Environmental
df_renewable = wbdata.get_dataframe(renewable_energy_indicator, country=countries, convert_date=True)

### Data Cleaning

In [50]:
### Inequality and Social Welfare ###
# Reset index of the dataframes
df_health = df_health.reset_index()
df_gdp = df_gdp.reset_index()
df_life_exp = df_life_exp.reset_index()
df_diseases = df_diseases.reset_index()

#Formatting the date column to year
df_health['date'] = df_health['date'].dt.year
df_health = df_health.drop(['Immunization'], axis = 1)
df_gdp['date'] = df_gdp['date'].dt.year
df_life_exp['date'] = df_life_exp['date'].dt.year
df_diseases['date'] = df_diseases['date'].dt.year 

In [51]:
### Import & Export ###
# Reset index of the dataframes
df_export = df_export.reset_index()
df_import = df_import.reset_index()
df_freight = df_freight.reset_index()

#Formatting the date column to year
df_export['date'] = df_export['date'].dt.year
df_import['date'] = df_import['date'].dt.year
df_freight['date'] = df_freight['date'].dt.year

In [52]:
### Environmental ###
# Reset index of the dataframes
df_renewable = df_renewable.reset_index()

#Formatting the date column to year
df_renewable['date'] = df_renewable['date'].dt.year

### Indicators of the European Data Bank 
- Air GHG - https://ec.europa.eu/eurostat/databrowser/view/sdg_13_10__custom_8184934/default/table?lang=en 
- The recycling rate of municipal waste - https://ec.europa.eu/eurostat/databrowser/view/cei_wm011/default/table?lang=en

In [53]:
df_emissions = df = pd.read_csv('Datasets/sdg_13_10_linear.csv')
df_recycling = pd.read_csv('Datasets/cei_wm011_linear.csv')

cc = coco.CountryConverter()
df_emissions['geo'] = df_emissions['geo'].replace('EL', 'GR')
iso3_codes_emissions = cc.pandas_convert(series=df_emissions['geo'], to='ISO3')

df_recycling['geo'] = df_recycling['geo'].replace('EL', 'GR')
iso3_codes_recycling = cc.pandas_convert(series=df_recycling['geo'], to='ISO3')

df_emissions['geo_3'] = iso3_codes_emissions
GHG = df_emissions[df_emissions['geo_3'].isin(countries) & (df_emissions['airpol'] == 'GHG') & (df_emissions['unit'] == 'T_HAB') & (df_emissions['src_crf'] == 'TOTXMEMONIA')]

df_recycling['geo_3'] = iso3_codes_recycling
recycling = df_recycling[df_recycling['geo_3'].isin(countries)]



### Visualizing indicators

Inequality and Social Welfare

In [54]:
#GDP per capita
fig1 = px.line(df_gdp, x = 'date', y = 'gdppc', color = 'country', title = 'GDP per Capita')

### Inequality and Social Welfare ###

#Health expenditure as a percentae of GDP
fig2 = px.line(df_health, x = 'date', y = 'Health Expenditure as a Percentage of GDP', color = 'country', title = 'Health Expenditure as a Percentage of GDP')


#Life Expectancy at Birth
fig4 = px.line(df_life_exp, x = 'date', y = 'Life Expectancy at Birth', color = 'country', title = 'Life Expectancy at Birth')

#Child Mortality
fig5 = px.line(df_life_exp, x = 'date', y = 'Child Mortality', color = 'country', title = 'Child Mortality')

# Indices of Diseases
fig6  = px.line(df_diseases, x = 'date', y = 'Indicence of Diseases', color = 'country', title = 'Indices of Diseases')

fig1.show()
fig2.show()
fig4.show()
fig5.show()
fig6.show()

Import & Export

In [55]:
#GDP per capita
fig1 = px.line(df_gdp, x = 'date', y = 'gdppc', color = 'country', title = 'GDP per Capita')

### Import & Export ###
#Export
fig2 = px.line(df_export, x = 'date', y = 'Export', color = 'country', title = 'Exports of goods and services (annual percentage growth)')

#Import 
fig3 = px.line(df_import, x = 'date', y = 'Import', color = 'country', title = 'Imports of goods and services (annual percentage growth)')

#Freight
fig4 = px.line(df_freight, x = 'date', y = 'Freight', color = 'country', title = 'Air transport, freight (million ton-km)')

fig1.show()
fig2.show()
fig3.show()
fig4.show()

Environmental

In [56]:
recycling

Unnamed: 0,DATAFLOW,LAST UPDATE,freq,wst_oper,unit,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG,geo_3
144,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,DE,2000,52.5,s,DEU
145,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,DE,2001,52.3,s,DEU
146,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,DE,2002,56.1,,DEU
147,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,DE,2003,57.8,,DEU
148,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,DE,2004,56.4,,DEU
...,...,...,...,...,...,...,...,...,...,...
523,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,NL,2017,54.6,,NLD
524,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,NL,2018,55.9,,NLD
525,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,NL,2019,56.9,,NLD
526,ESTAT:CEI_WM011(1.0),21/08/23 23:00:00,A,RCY,PC,NL,2020,56.9,,NLD


In [57]:
#GDP per capita
fig1 = px.line(df_gdp, x = 'date', y = 'gdppc', color = 'country', title = 'GDP per Capita')

#Renewable Energy Consumption
fig2 = px.line(df_renewable, x = 'date', y = 'Renewable energy consumption (% of total final energy consumption)', color = 'country', title = 'Renewable energy consumption (% of total final energy consumption)')

#Air GHG
fig3 = px.line(GHG, x = 'TIME_PERIOD', y = 'OBS_VALUE', color = 'geo_3', title = 'Air GHG')

#Recycling
fig4 = px.line(recycling, x = 'TIME_PERIOD', y = 'OBS_VALUE', color = 'geo_3', title = 'Recycling rate')


fig1.show()
fig2.show()
fig3.show()
fig4.show()

### Setting up timeframes

Inequality and Social Welfare (2000 - 2020)

In [58]:
### Inequality and Social Welfare ###
#Dataframes with dates starting in 2000 unitll 2020
df_health_filtered = df_health[(df_health['date'] >= 2000) & (df_health['date'] <= 2020)]
df_gdp_filtered = df_gdp[(df_gdp['date'] >= 2000) & (df_gdp['date'] <= 2020)]
df_life_exp_filtered = df_life_exp[(df_life_exp['date'] >= 2000) & (df_life_exp['date'] <= 2020)]
df_diseases_filtered = df_diseases[(df_diseases['date'] >= 2000) & (df_diseases['date'] <= 2020)]

In [59]:
#GDP per capita
fig1 = px.line(df_gdp_filtered, x = 'date', y = 'gdppc', color = 'country', title = 'GDP per Capita')

### Inequality and Social Welfare ###
#Health expenditure as a percentae of GDP
fig2 = px.line(df_health_filtered, x = 'date', y = 'Health Expenditure as a Percentage of GDP', color = 'country', title = 'Health Expenditure as a Percentage of GDP')


#Life Expectancy at Birth
fig4 = px.line(df_life_exp_filtered, x = 'date', y = 'Life Expectancy at Birth', color = 'country', title = 'Life Expectancy at Birth')

#Child Mortality
fig5 = px.line(df_life_exp_filtered, x = 'date', y = 'Child Mortality', color = 'country', title = 'Child Mortality')

# Indices of Diseases
fig6  = px.line(df_diseases_filtered, x = 'date', y = 'Indicence of Diseases', color = 'country', title = 'Indices of Diseases')

fig1.show()
fig2.show()
fig4.show()
fig5.show()
fig6.show()

Import & Export (2000-2020)

In [60]:
### Import & Export ###
#Dataframes with dates starting in 1975 unitll 2020
df_gdp_filtered_ie = df_gdp[(df_gdp['date'] >= 2000) & (df_gdp['date'] <= 2020)]
df_export_filtered = df_export[(df_export['date'] >= 2000) & (df_export['date'] <= 2020)]
df_import_filtered = df_import[(df_import['date'] >= 2000) & (df_import['date'] <= 2020)]
df_freight_filtered = df_freight[(df_freight['date'] >= 2000) & (df_freight['date'] <= 2020)]

In [61]:
#GDP per capita
fig1 = px.line(df_gdp_filtered_ie, x = 'date', y = 'gdppc', color = 'country', title = 'GDP per Capita')

### Import & Export ###
#Export
fig2 = px.line(df_export_filtered, x = 'date', y = 'Export', color = 'country', title = 'Exports of goods and services (annual percentage growth)')

#Import 
fig3 = px.line(df_import_filtered, x = 'date', y = 'Import', color = 'country', title = 'Imports of goods and services (annual percentage growth)')

#Freight
fig4 = px.line(df_freight_filtered, x = 'date', y = 'Freight', color = 'country', title = 'ir transport, freight (million ton-km)')

fig1.show()
fig2.show()
fig3.show()
fig4.show()

Environmental (2000 - 2020)

In [62]:
df_gdp_filtered_env = df_gdp[(df_gdp['date'] >= 2000) & (df_gdp['date'] <= 2020)]
df_renewable_filtered = df_renewable[(df_renewable['date'] >= 2000) & (df_renewable['date'] <= 2020)]
GHG_filtered = GHG[(GHG['TIME_PERIOD'] >= 2000) & (GHG['TIME_PERIOD'] <= 2020)]
recycling_filtered = recycling[(recycling['TIME_PERIOD'] >= 2000) & GHG['TIME_PERIOD'] <= 2020]

In [63]:
#GDP per capita
fig1 = px.line(df_gdp_filtered_env, x = 'date', y = 'gdppc', color = 'country', title = 'GDP per Capita')

#Renewable Energy Consumption
fig2 = px.line(df_renewable_filtered, x = 'date', y = 'Renewable energy consumption (% of total final energy consumption)', color = 'country', title = 'Renewable energy consumption (% of total final energy consumption)')

#Air GHG
fig3 = px.line(GHG_filtered, x = 'TIME_PERIOD', y = 'OBS_VALUE', color = 'geo_3', title = 'Air GHG')

#Recycling
# fig4 = px.line(recycling_filtered, x = 'TIME_PERIOD', y = 'OBS_VALUE', color = 'geo_3', title = 'Recycling rate')


fig1.show()
fig2.show()
fig3.show()

## The Netherlands

### Inequality and Social Welfare

Data Preparation

In [64]:
#Combining all dataframes
dfs_nld = [df_gdp_filtered[df_gdp_filtered['country'] == 'Netherlands'], df_health_filtered[df_health_filtered['country'] == 'Netherlands'], df_life_exp_filtered[df_life_exp_filtered['country'] == 'Netherlands'], df_diseases_filtered[df_diseases_filtered['country'] == 'Netherlands']]
df_combined_nld = pd.concat(dfs_nld, axis = 1)

#Merge columns date
df_combined_nld = df_combined_nld.drop(['date'], axis = 1)
df_combined_nld['date'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Netherlands']['date']

#Merge columns country
df_combined_nld = df_combined_nld.drop(['country'], axis = 1)
df_combined_nld['country'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Netherlands']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_nld.isnull().sum().sum()}')

#Copy df_combined_nld to make train set
df_combined_nld_train = df_combined_nld.copy()
df_combined_nld_train =  df_combined_nld_train[df_combined_nld_train['date'] <= 2017]

#Copy df_combined_nld to make test set
df_combined_nld_test = df_combined_nld.copy()
df_combined_nld_test = df_combined_nld_test[df_combined_nld_test['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [65]:
df_corr_nld = df_combined_nld_test.drop('date', axis = 1)
corr = df_corr_nld.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Import & Export

Data Preparation

In [66]:
#Combining all dataframes
dfs_nld_ie = [df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Netherlands'], df_export_filtered[df_export_filtered['country'] == 'Netherlands'], df_import_filtered[df_import_filtered['country'] == 'Netherlands'], df_freight_filtered[df_freight_filtered['country'] == 'Netherlands']]
df_combined_nld_ie = pd.concat(dfs_nld_ie, axis = 1)


#Merge columns date
df_combined_nld_ie = df_combined_nld_ie.drop(['date'], axis = 1)
df_combined_nld_ie['date'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Netherlands']['date']

#Merge columns country
df_combined_nld_ie = df_combined_nld_ie.drop(['country'], axis = 1)
df_combined_nld_ie['country'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Netherlands']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_nld_ie.isnull().sum().sum()}')

#Copy df_combined_nld_ie to make train set
df_combined_nld_train_ie = df_combined_nld_ie.copy()
df_combined_nld_train_ie = df_combined_nld_train_ie[df_combined_nld_train_ie['date'] <= 2017]

#Copy df_combined_nld_ie to make test set
df_combined_nld_test_ie = df_combined_nld_ie.copy()
df_combined_nld_test_ie = df_combined_nld_test_ie[df_combined_nld_test_ie['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [67]:
df_corr_nld = df_combined_nld_test_ie.drop('date', axis = 1)
corr = df_corr_nld.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Environmental

Data Preparation

In [68]:
df_combined_nld_env = df_gdp_filtered_env[df_gdp_filtered_env['country'] == 'Netherlands']
df_combined_nld_env['GHG'] = GHG_filtered[GHG_filtered['geo_3'] == 'NLD']['OBS_VALUE'][::-1].values
df_combined_nld_env['Renewable'] = df_renewable_filtered[df_renewable_filtered['country'] == 'Netherlands']['Renewable energy consumption (% of total final energy consumption)'].values

#Check for missing data
print(f'Number of missing data: {df_combined_nld_env.isnull().sum().sum()}')

#Copy df_combined_nld_env to make train set
df_combined_nld_train_env = df_combined_nld_env.copy()
df_combined_nld_train_env = df_combined_nld_train_env[df_combined_nld_train_env['date'] <= 2017]

#Copy df_combined_nld_env to make test set
df_combined_nld_test_env = df_combined_nld_env.copy()
df_combined_nld_test_env = df_combined_nld_test_env[df_combined_nld_test_env['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [69]:
df_corr_nld = df_combined_nld_test_env.drop('date', axis = 1)
corr = df_corr_nld.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

## Germany

### Inequality and Social Welfare

Data Preparation

In [70]:
#Combining all dataframes
dfs_deu = [df_gdp_filtered[df_gdp_filtered['country'] == 'Germany'], df_health_filtered[df_health_filtered['country'] == 'Germany'], df_life_exp_filtered[df_life_exp_filtered['country'] == 'Germany'], df_diseases_filtered[df_diseases_filtered['country'] == 'Germany']]
df_combined_deu = pd.concat(dfs_deu, axis = 1)

#Merge columns date
df_combined_deu = df_combined_deu.drop(['date'], axis = 1)
df_combined_deu['date'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Germany']['date']

#Merge columns country
df_combined_deu = df_combined_deu.drop(['country'], axis = 1)
df_combined_deu['country'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Germany']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_deu.isnull().sum().sum()}')

#Copy df_combined_deu to make train set
df_combined_deu_train = df_combined_deu.copy()
df_combined_deu_train =  df_combined_deu_train[df_combined_deu_train['date'] <= 2017]

#Copy df_combined_arg to make test set
df_combined_deu_test = df_combined_deu.copy()
df_combined_deu_test = df_combined_deu_test[df_combined_deu_test['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [71]:
df_corr_deu = df_combined_deu_test.drop('date', axis = 1)
corr = df_corr_deu.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Import & Export


Data Preparation

In [72]:
#Combining all dataframes
dfs_deu_ie = [df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Germany'], df_export_filtered[df_export_filtered['country'] == 'Germany'], df_import_filtered[df_import_filtered['country'] == 'Germany'], df_freight_filtered[df_freight_filtered['country'] == 'Germany']]
df_combined_deu_ie = pd.concat(dfs_deu_ie, axis = 1)

#Merge columns date
df_combined_deu_ie = df_combined_deu_ie.drop(['date'], axis = 1)
df_combined_deu_ie['date'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Germany']['date']

#Merge columns country
df_combined_deu_ie = df_combined_deu_ie.drop(['country'], axis = 1)
df_combined_deu_ie['country'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Germany']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_deu_ie.isnull().sum().sum()}')

#Copy df_combined_deu_ie to make train set
df_combined_deu_train_ie = df_combined_deu_ie.copy()
df_combined_deu_train_ie = df_combined_deu_train_ie[df_combined_deu_train_ie['date'] <= 2017]

#Copy df_combined_deu_ie to make test set
df_combined_deu_test_ie = df_combined_deu_ie.copy()
df_combined_deu_test_ie = df_combined_deu_test_ie[df_combined_deu_test_ie['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [73]:
df_corr_deu = df_combined_deu_test_ie.drop('date', axis = 1)
corr = df_corr_deu.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Environmental

Data Preparation

In [74]:
df_combined_deu_env = df_gdp_filtered_env[df_gdp_filtered_env['country'] == 'Germany']
df_combined_deu_env['GHG'] = GHG_filtered[GHG_filtered['geo_3'] == 'DEU']['OBS_VALUE'][::-1].values
df_combined_deu_env['Renewable'] = df_renewable_filtered[df_renewable_filtered['country'] == 'Germany']['Renewable energy consumption (% of total final energy consumption)'].values

#Check for missing data
print(f'Number of missing data: {df_combined_deu_env.isnull().sum().sum()}')

#Copy df_combined_deu_env to make train set
df_combined_deu_train_env = df_combined_deu_env.copy()
df_combined_deu_train_env = df_combined_deu_train_env[df_combined_deu_train_env['date'] <= 2017]

#Copy df_combined_deu_ie to make test set
df_combined_deu_test_env = df_combined_deu_env.copy()
df_combined_deu_test_env = df_combined_deu_test_env[df_combined_deu_test_env['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [75]:
df_corr_deu = df_combined_deu_test_env.drop('date', axis = 1)
corr = df_corr_deu.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

## Greece

### Inequality and Social Welfare

Data Preparation

In [76]:
#Combining all dataframes
dfs_grc = [df_gdp_filtered[df_gdp_filtered['country'] == 'Greece'], df_health_filtered[df_health_filtered['country'] == 'Greece'], df_life_exp_filtered[df_life_exp_filtered['country'] == 'Greece'], df_diseases_filtered[df_diseases_filtered['country'] == 'Greece']]
df_combined_grc = pd.concat(dfs_grc, axis = 1)

#Merge columns date
df_combined_grc = df_combined_grc.drop(['date'], axis = 1)
df_combined_grc['date'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Greece']['date']

#Merge columns country
df_combined_grc = df_combined_grc.drop(['country'], axis = 1)
df_combined_grc['country'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Greece']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_grc.isnull().sum().sum()}')

#Copy df_combined_grc to make train set
df_combined_grc_train = df_combined_grc.copy()
df_combined_grc_train =  df_combined_grc_train[df_combined_grc_train['date'] <= 2017]

#Copy df_combined_grc to make test set
df_combined_grc_test = df_combined_grc.copy()
df_combined_grc_test = df_combined_grc_test[df_combined_grc_test['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [77]:
df_corr_grc = df_combined_grc_test.drop('date', axis = 1)
corr = df_corr_grc.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Import & Export

Data Preparation

In [78]:
#Combining all dataframes
dfs_grc_ie = [df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Greece'], df_export_filtered[df_export_filtered['country'] == 'Greece'], df_import_filtered[df_import_filtered['country'] == 'Greece'], df_freight_filtered[df_freight_filtered['country'] == 'Greece']]
df_combined_grc_ie = pd.concat(dfs_grc_ie, axis = 1)

#Merge columns date
df_combined_grc_ie = df_combined_grc_ie.drop(['date'], axis = 1)
df_combined_grc_ie['date'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Greece']['date']

#Merge columns country
df_combined_grc_ie = df_combined_grc_ie.drop(['country'], axis = 1)
df_combined_grc_ie['country'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Greece']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_grc_ie.isnull().sum().sum()}')

#Copy df_combined_grc_ie to make train set
df_combined_grc_train_ie = df_combined_grc_ie.copy()
df_combined_grc_train_ie = df_combined_grc_train_ie[df_combined_grc_train_ie['date'] <= 2017]

#Copy df_combined_grc_ie to make test set
df_combined_grc_test_ie = df_combined_grc_ie.copy()
df_combined_grc_test_ie = df_combined_grc_test_ie[df_combined_grc_test_ie['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [79]:
df_corr_grc = df_combined_grc_test_ie.drop('date', axis = 1)
corr = df_corr_grc.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Environmental

Data Preparation

In [80]:
df_combined_grc_env = df_gdp_filtered_env[df_gdp_filtered_env['country'] == 'Greece']
df_combined_grc_env['GHG'] = GHG_filtered[GHG_filtered['geo_3'] == 'GRC']['OBS_VALUE'][::-1].values
df_combined_grc_env['Renewable'] = df_renewable_filtered[df_renewable_filtered['country'] == 'Greece']['Renewable energy consumption (% of total final energy consumption)'].values

#Check for missing data
print(f'Number of missing data: {df_combined_grc_env.isnull().sum().sum()}')

#Copy df_combined_grc_env to make train set
df_combined_grc_train_env = df_combined_grc_env.copy()
df_combined_grc_train_env = df_combined_grc_train_env[df_combined_grc_train_env['date'] <= 2017]

#Copy df_combined_grc_ie to make test set
df_combined_grc_test_env = df_combined_grc_env.copy()
df_combined_grc_test_env = df_combined_grc_test_env[df_combined_grc_test_env['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [81]:
df_corr_grc = df_combined_grc_test_env.drop('date', axis = 1)
corr = df_corr_grc.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

## Ireland

### Inequality and Social Welfare

Data Preparation

In [82]:
#Combining all dataframes
dfs_irl = [df_gdp_filtered[df_gdp_filtered['country'] == 'Ireland'], df_health_filtered[df_health_filtered['country'] == 'Ireland'], df_life_exp_filtered[df_life_exp_filtered['country'] == 'Ireland'], df_diseases_filtered[df_diseases_filtered['country'] == 'Ireland']]
df_combined_irl = pd.concat(dfs_irl, axis = 1)

#Merge columns date
df_combined_irl = df_combined_irl.drop(['date'], axis = 1)
df_combined_irl['date'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Ireland']['date']

#Merge columns country
df_combined_irl = df_combined_irl.drop(['country'], axis = 1)
df_combined_irl['country'] = df_gdp_filtered[df_gdp_filtered['country'] == 'Ireland']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_irl.isnull().sum().sum()}')

#Copy df_combined_irl to make train set
df_combined_irl_train = df_combined_irl.copy()
df_combined_irl_train =  df_combined_irl_train[df_combined_irl_train['date'] <= 2017]

#Copy df_combined_irl to make test set
df_combined_irl_test = df_combined_irl.copy()
df_combined_irl_test = df_combined_irl_test[df_combined_irl_test['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [83]:
df_corr_irl = df_combined_irl_test.drop('date', axis = 1)
corr = df_corr_irl.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Import & Export

Data Preparation

In [84]:
#Combining all dataframes
dfs_irl_ie = [df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Ireland'], df_export_filtered[df_export_filtered['country'] == 'Ireland'], df_import_filtered[df_import_filtered['country'] == 'Ireland'], df_freight_filtered[df_freight_filtered['country'] == 'Ireland']]
df_combined_irl_ie = pd.concat(dfs_irl_ie, axis = 1)

#Merge columns date
df_combined_irl_ie = df_combined_irl_ie.drop(['date'], axis = 1)
df_combined_irl_ie['date'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Ireland']['date']

#Merge columns country
df_combined_irl_ie = df_combined_irl_ie.drop(['country'], axis = 1)
df_combined_irl_ie['country'] = df_gdp_filtered_ie[df_gdp_filtered_ie['country'] == 'Ireland']['country']

#Check for missing data
print(f'Number of missing data: {df_combined_irl_ie.isnull().sum().sum()}')

#Copy df_combined_irl_ie to make train set
df_combined_irl_train_ie = df_combined_irl_ie.copy()
df_combined_irl_train_ie = df_combined_irl_train_ie[df_combined_irl_train_ie['date'] <= 2017]

#Copy df_combined_irl_ie to make test set
df_combined_irl_test_ie = df_combined_irl_ie.copy()
df_combined_irl_test_ie = df_combined_irl_test_ie[df_combined_irl_test_ie['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [85]:
df_corr_irl = df_combined_irl_test_ie.drop('date', axis = 1)
corr = df_corr_irl.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

### Environmental

Data Preparation

In [86]:
df_combined_irl_env = df_gdp_filtered_env[df_gdp_filtered_env['country'] == 'Ireland']
df_combined_irl_env['GHG'] = GHG_filtered[GHG_filtered['geo_3'] == 'IRL']['OBS_VALUE'][::-1].values
df_combined_irl_env['Renewable'] = df_renewable_filtered[df_renewable_filtered['country'] == 'Ireland']['Renewable energy consumption (% of total final energy consumption)'].values

#Check for missing data
print(f'Number of missing data: {df_combined_irl_env.isnull().sum().sum()}')

#Copy df_combined_irl_env to make train set
df_combined_irl_train_env = df_combined_irl_env.copy()
df_combined_irl_train_env = df_combined_irl_train_env[df_combined_irl_train_env['date'] <= 2017]

#Copy df_combined_irl_ie to make test set
df_combined_irl_test_env = df_combined_irl_env.copy()
df_combined_irl_test_env = df_combined_irl_test_env[df_combined_irl_test_env['date'] > 2017]

Number of missing data: 0


Correlation between datasets

In [87]:
df_corr_irl = df_combined_irl_test_env.drop('date', axis = 1)
corr = df_corr_irl.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
corr.where(~mask, inplace=True)

fig = px.imshow(corr, text_auto=True, color_continuous_scale= px.colors.sequential.Emrld, width= 1500, height=750)

fig.show()

## Random Forest

Model 

In [88]:
#Sorted data of the countries in a dictonary
dict_countries_data = {'The Netherlands - Inequality and Social Welfare': [df_combined_nld_train, df_combined_nld_test, df_combined_nld],
                       'Germany - Inequality and Social Welfare': [df_combined_deu_train, df_combined_deu_test, df_combined_deu],
                       'Greece - Inequality and Social Welfare': [df_combined_grc_train, df_combined_grc_test, df_combined_grc],
                       'Ireland - Inequality and Social Welfare': [df_combined_irl_train, df_combined_irl_test, df_combined_irl], 
                       'The Netherlands - Import and Export': [df_combined_nld_train_ie, df_combined_nld_test_ie, df_combined_nld_ie],
                       'Germany - Import and Export': [df_combined_deu_train_ie, df_combined_deu_test_ie, df_combined_deu_ie],
                       'Greece - Import and Export': [df_combined_grc_train_ie, df_combined_grc_test_ie, df_combined_grc_ie],
                       'Ireland - Import and Export': [df_combined_irl_train_ie, df_combined_irl_test_ie, df_combined_irl_ie], 
                       'The Netherlands - Environmental': [df_combined_nld_train_env, df_combined_nld_test_env, df_combined_nld_env],
                       'Germany - Environmental': [df_combined_deu_train_env, df_combined_deu_test_env, df_combined_deu_env],
                       'Greece - Environmental': [df_combined_grc_train_env, df_combined_grc_test_env, df_combined_grc_env],
                       'Ireland - Environmental': [df_combined_irl_train_env, df_combined_irl_test_env, df_combined_irl_env]
                       }

In [89]:
#Results Machine Learning methods Dataframe
df_ML_results = pd.DataFrame()
df_ML_results['Datasets'] = dict_countries_data.keys()
df_ML_results.set_index('Datasets')
df_ML_results['Random Forest - MAE'] = np.nan
df_ML_results['Random Forest - RMSE'] = np.nan
df_ML_results['Linear Regression - MAE'] = np.nan
df_ML_results['Linear Regression - RMSE'] = np.nan
df_ML_results['Neural Network - MAE'] = np.nan
df_ML_results['Neural Network - RMSE'] = np.nan

df_ML_results

Unnamed: 0,Datasets,Random Forest - MAE,Random Forest - RMSE,Linear Regression - MAE,Linear Regression - RMSE,Neural Network - MAE,Neural Network - RMSE
0,The Netherlands - Inequality and Social Welfare,,,,,,
1,Germany - Inequality and Social Welfare,,,,,,
2,Greece - Inequality and Social Welfare,,,,,,
3,Ireland - Inequality and Social Welfare,,,,,,
4,The Netherlands - Import and Export,,,,,,
5,Germany - Import and Export,,,,,,
6,Greece - Import and Export,,,,,,
7,Ireland - Import and Export,,,,,,
8,The Netherlands - Environmental,,,,,,
9,Germany - Environmental,,,,,,


In [90]:
mae_list_RF = []
rmse_list_RF = []
pred_list = []


for key in dict_countries_data.keys():
    #Setting up X and Y
    X_test = dict_countries_data[key][1].drop(['gdppc', 'date', 'country'], axis=1)
    X_train = dict_countries_data[key][0].drop(['gdppc', 'date', 'country'], axis=1)
    y_test = dict_countries_data[key][1]['gdppc']
    y_train = dict_countries_data[key][0]['gdppc']

    scaler = StandardScaler()
    X_test_scaled = scaler.fit_transform(X_test)
    X_train_scaled = scaler.fit_transform(X_train)

    model = RandomForestRegressor()

    #Checking for best grid
    # param_grid = {
    #     'n_estimators': [500, 1000, 2000],
    #     'max_features': [1, 'sqrt'],
    #     'max_depth': [10, 20, 30],
    #     'min_samples_split': [2, 5, 10],
    #     'min_samples_leaf': [1, 2, 4]
    # }

    # grid_search = GridSearchCV(estimator= model, param_grid=param_grid, cv=3, n_jobs=-1)
    # grid_search.fit(X_train_scaled, y_train)

    # # Then you could fit the model with the best parameters
    # best_grid = grid_search.best_estimator_

    ##Fit model
    model.fit(X_train_scaled, y_train)

    score = model.score(X_test_scaled, y_test)
    predictions = model.predict(X_test_scaled)

    errors = model.predict(X_test_scaled) - y_test.values

    #Mean Absolute Error
    mae = np.mean(np.abs(errors))
    mae_list_RF.append(mae)

    #Root Mean Square Error
    rmse = np.sqrt(np.mean(np.abs((errors) ** 2)))
    rmse_list_RF.append(rmse)

    #Visaulizing results
    list_predicted_gdp =  list(predictions) + list(y_train)
    df_results = pd.DataFrame()
    df_results['gdppc'] = list_predicted_gdp
    df_results['date'] = list(dict_countries_data[key][2]['date'])

    # Initialise the figure
    fig = go.Figure()

    # Add the line for the actual GDP
    fig.add_trace(
        go.Scatter(
            x=dict_countries_data[key][2]['date'],
            y=dict_countries_data[key][2]['gdppc'],
            mode='lines',
            name='Actual GDP',
            line=dict(color='blue')
        )
    )

    # Add the line for the predicted GDP
    fig.add_trace(
        go.Scatter(
            x=df_results['date'],
            y=df_results['gdppc'],
            mode='lines',
            name='Predicted GDP',
            line=dict(color='red', dash='dash')
        )
    )


    title = "GDP prediction of " + str(key) + " using Random Forest"  
    
    fig.update_layout(
    title=dict(text= title))

    # Show the figure
    fig.show()

    print(f'The Mean Absolute Error (MAE) for Random Forest is {mae}')
    print(f'The Root Mean Square Error (RMSE) for Random Forest is {rmse}')

df_ML_results['Random Forest - MAE'] = mae_list_RF
df_ML_results['Random Forest - RMSE'] = rmse_list_RF

The Mean Absolute Error (MAE) for Random Forest is 8105.023199580767
The Root Mean Square Error (RMSE) for Random Forest is 9722.25398019746


The Mean Absolute Error (MAE) for Random Forest is 8139.794968064875
The Root Mean Square Error (RMSE) for Random Forest is 10897.986898387244


The Mean Absolute Error (MAE) for Random Forest is 4323.443010535605
The Root Mean Square Error (RMSE) for Random Forest is 5169.40207037214


The Mean Absolute Error (MAE) for Random Forest is 32593.180389567464
The Root Mean Square Error (RMSE) for Random Forest is 32848.032604843516


The Mean Absolute Error (MAE) for Random Forest is 9967.927904731161
The Root Mean Square Error (RMSE) for Random Forest is 12930.827247456604


The Mean Absolute Error (MAE) for Random Forest is 11339.093676537435
The Root Mean Square Error (RMSE) for Random Forest is 11723.602775121995


The Mean Absolute Error (MAE) for Random Forest is 4320.763145503867
The Root Mean Square Error (RMSE) for Random Forest is 4433.402944369803


The Mean Absolute Error (MAE) for Random Forest is 29186.36650130397
The Root Mean Square Error (RMSE) for Random Forest is 29869.91903840264


The Mean Absolute Error (MAE) for Random Forest is 8229.187367631437
The Root Mean Square Error (RMSE) for Random Forest is 10004.001494493716


The Mean Absolute Error (MAE) for Random Forest is 9272.982975456938
The Root Mean Square Error (RMSE) for Random Forest is 11419.269957778812


The Mean Absolute Error (MAE) for Random Forest is 4460.494631059875
The Root Mean Square Error (RMSE) for Random Forest is 5431.181067067712


The Mean Absolute Error (MAE) for Random Forest is 31953.999199662296
The Root Mean Square Error (RMSE) for Random Forest is 32979.20422647564


## Linear Regression

Model

In [91]:
mae_list_LR = []
rmse_list_LR = []

for key in dict_countries_data.keys():
    #Setting up X and Y
    X_test = dict_countries_data[key][1].drop(['gdppc', 'date', 'country'], axis=1)
    X_train = dict_countries_data[key][0].drop(['gdppc', 'date', 'country'], axis=1)
    y_test = dict_countries_data[key][1]['gdppc']
    y_train = dict_countries_data[key][0]['gdppc']

    scaler = StandardScaler()
    X_test_scaled = scaler.fit_transform(X_test)
    X_train_scaled = scaler.fit_transform(X_train)

    model = LinearRegression()

    ##Fit model
    model.fit(X_train_scaled, y_train)

    score = model.score(X_test_scaled, y_test)
    predictions = model.predict(X_test_scaled)

    errors = model.predict(X_test_scaled) - y_test.values

    #Mean Absolute Error
    mae = np.mean(np.abs(errors))
    mae_list_LR.append(mae)

    #Root Mean Square Error
    rmse = np.sqrt(np.mean(np.abs((errors) ** 2)))
    rmse_list_LR.append(rmse)

    #Visaulizing results
    list_predicted_gdp =  list(predictions) + list(y_train)
    df_results = pd.DataFrame()
    df_results['gdppc'] = list_predicted_gdp
    df_results['date'] = list(dict_countries_data[key][2]['date'])

    # Initialise the figure
    fig = go.Figure()

    # Add the line for the actual GDP
    fig.add_trace(
        go.Scatter(
            x=dict_countries_data[key][2]['date'],
            y=dict_countries_data[key][2]['gdppc'],
            mode='lines',
            name='Actual GDP',
            line=dict(color='blue')
        )
    )

    # Add the line for the predicted GDP
    fig.add_trace(
        go.Scatter(
            x=df_results['date'],
            y=df_results['gdppc'],
            mode='lines',
            name='Predicted GDP',
            line=dict(color='red', dash='dash')
        )
    )

    title = "GDP prediction of " + str(key) + " using Linear Regression"
    
    fig.update_layout(
    title=dict(text= title))

    # Show the figure
    fig.show()

    print(f'The Mean Absolute Error (MAE) for Linear Regression is {mae}')
    print(f'The Root Mean Square Error (RMSE) for Linear Regression is {rmse}')

df_ML_results['Linear Regression - MAE'] = mae_list_LR
df_ML_results['Linear Regression - RMSE'] = rmse_list_LR

The Mean Absolute Error (MAE) for Linear Regression is 16116.766039388698
The Root Mean Square Error (RMSE) for Linear Regression is 20102.151113702


The Mean Absolute Error (MAE) for Linear Regression is 8776.173809638285
The Root Mean Square Error (RMSE) for Linear Regression is 11542.811740965959


The Mean Absolute Error (MAE) for Linear Regression is 10848.374959351704
The Root Mean Square Error (RMSE) for Linear Regression is 11998.750109667633


The Mean Absolute Error (MAE) for Linear Regression is 31397.182729425782
The Root Mean Square Error (RMSE) for Linear Regression is 32805.329962554046


The Mean Absolute Error (MAE) for Linear Regression is 7658.554841590089
The Root Mean Square Error (RMSE) for Linear Regression is 8389.362179304788


The Mean Absolute Error (MAE) for Linear Regression is 8776.173809638014
The Root Mean Square Error (RMSE) for Linear Regression is 8999.28829879586


The Mean Absolute Error (MAE) for Linear Regression is 2835.2780545478126
The Root Mean Square Error (RMSE) for Linear Regression is 4205.97620413371


The Mean Absolute Error (MAE) for Linear Regression is 31397.182729425735
The Root Mean Square Error (RMSE) for Linear Regression is 31713.7122134397


The Mean Absolute Error (MAE) for Linear Regression is 8626.67690439613
The Root Mean Square Error (RMSE) for Linear Regression is 10562.324622154123


The Mean Absolute Error (MAE) for Linear Regression is 9397.09557747498
The Root Mean Square Error (RMSE) for Linear Regression is 11600.3435350453


The Mean Absolute Error (MAE) for Linear Regression is 2792.3789101286857
The Root Mean Square Error (RMSE) for Linear Regression is 2887.704770846174


The Mean Absolute Error (MAE) for Linear Regression is 31397.18272942574
The Root Mean Square Error (RMSE) for Linear Regression is 31956.930306942846


## Neural Network

Model

In [92]:
mae_list_NN = []
rmse_list_NN = []

for key in dict_countries_data.keys():
    #Setting up X and Y
    X_test = dict_countries_data[key][1].drop(['gdppc', 'date', 'country'], axis=1)
    X_train = dict_countries_data[key][0].drop(['gdppc', 'date', 'country'], axis=1)
    y_test = dict_countries_data[key][1]['gdppc']
    y_train = dict_countries_data[key][0]['gdppc']

    scaler = StandardScaler()
    X_test_scaled = scaler.fit_transform(X_test)
    X_train_scaled = scaler.fit_transform(X_train)

    model = MLPRegressor(hidden_layer_sizes=(128, 64, 32), activation='relu', solver='adam',
                            max_iter= 50000)

    ##Fit model
    model.fit(X_train_scaled, y_train)

    score = model.score(X_test_scaled, y_test)
    predictions = model.predict(X_test_scaled)

    errors = model.predict(X_test_scaled) - y_test.values

    #Mean Absolute Error
    mae = np.mean(np.abs(errors))
    mae_list_NN.append(mae)

    #Root Mean Square Error
    rmse = np.sqrt(np.mean(np.abs((errors) ** 2)))
    rmse_list_NN.append(rmse)

    #Visaulizing results
    list_predicted_gdp =  list(predictions) + list(y_train)
    df_results = pd.DataFrame()
    df_results['gdppc'] = list_predicted_gdp
    df_results['date'] = list(dict_countries_data[key][2]['date'])

    # Initialise the figure
    fig = go.Figure()

    # Add the line for the actual GDP
    fig.add_trace(
        go.Scatter(
            x=dict_countries_data[key][2]['date'],
            y=dict_countries_data[key][2]['gdppc'],
            mode='lines',
            name='Actual GDP',
            line=dict(color='blue')
        )
    )

    # Add the line for the predicted GDP
    fig.add_trace(
        go.Scatter(
            x=df_results['date'],
            y=df_results['gdppc'],
            mode='lines',
            name='Predicted GDP',
            line=dict(color='red', dash='dash')
        )
    )

    title = "GDP prediction of " + str(key) + " using Neural Network"
    
    fig.update_layout(
    title=dict(text= title))

    # Show the figure
    fig.show()

    print(f'The Mean Absolute Error (MAE) for Neural Network is {mae}')
    print(f'The Root Mean Square Error (RMSE) for Neural Network is {rmse}')

df_ML_results['Neural Network - MAE'] = mae_list_NN
df_ML_results['Neural Network - RMSE'] = rmse_list_NN

The Mean Absolute Error (MAE) for Neural Network is 16026.530919027333
The Root Mean Square Error (RMSE) for Neural Network is 19958.18147404349


The Mean Absolute Error (MAE) for Neural Network is 12232.408785808153
The Root Mean Square Error (RMSE) for Neural Network is 13492.440638097905


The Mean Absolute Error (MAE) for Neural Network is 14303.066756447755
The Root Mean Square Error (RMSE) for Neural Network is 15813.256708329318


The Mean Absolute Error (MAE) for Neural Network is 28503.696294350168
The Root Mean Square Error (RMSE) for Neural Network is 30643.255478163734


The Mean Absolute Error (MAE) for Neural Network is 9291.677766717652
The Root Mean Square Error (RMSE) for Neural Network is 13334.658657129172


The Mean Absolute Error (MAE) for Neural Network is 15532.181160409062
The Root Mean Square Error (RMSE) for Neural Network is 16241.831263767323


The Mean Absolute Error (MAE) for Neural Network is 6066.731571414696
The Root Mean Square Error (RMSE) for Neural Network is 6242.966854892982


The Mean Absolute Error (MAE) for Neural Network is 28003.13016877125
The Root Mean Square Error (RMSE) for Neural Network is 29996.30556158755


The Mean Absolute Error (MAE) for Neural Network is 7538.183469757368
The Root Mean Square Error (RMSE) for Neural Network is 9460.98374594521


The Mean Absolute Error (MAE) for Neural Network is 8911.430282708168
The Root Mean Square Error (RMSE) for Neural Network is 11121.77073001829


The Mean Absolute Error (MAE) for Neural Network is 2859.599551413305
The Root Mean Square Error (RMSE) for Neural Network is 2988.316801126215


The Mean Absolute Error (MAE) for Neural Network is 32636.247524980226
The Root Mean Square Error (RMSE) for Neural Network is 32823.0495526665


## Results

In [93]:
df_ML_results

Unnamed: 0,Datasets,Random Forest - MAE,Random Forest - RMSE,Linear Regression - MAE,Linear Regression - RMSE,Neural Network - MAE,Neural Network - RMSE
0,The Netherlands - Inequality and Social Welfare,8105.0232,9722.25398,16116.766039,20102.151114,16026.530919,19958.181474
1,Germany - Inequality and Social Welfare,8139.794968,10897.986898,8776.17381,11542.811741,12232.408786,13492.440638
2,Greece - Inequality and Social Welfare,4323.443011,5169.40207,10848.374959,11998.75011,14303.066756,15813.256708
3,Ireland - Inequality and Social Welfare,32593.18039,32848.032605,31397.182729,32805.329963,28503.696294,30643.255478
4,The Netherlands - Import and Export,9967.927905,12930.827247,7658.554842,8389.362179,9291.677767,13334.658657
5,Germany - Import and Export,11339.093677,11723.602775,8776.17381,8999.288299,15532.18116,16241.831264
6,Greece - Import and Export,4320.763146,4433.402944,2835.278055,4205.976204,6066.731571,6242.966855
7,Ireland - Import and Export,29186.366501,29869.919038,31397.182729,31713.712213,28003.130169,29996.305562
8,The Netherlands - Environmental,8229.187368,10004.001494,8626.676904,10562.324622,7538.18347,9460.983746
9,Germany - Environmental,9272.982975,11419.269958,9397.095577,11600.343535,8911.430283,11121.77073
