<a href="https://colab.research.google.com/github/nicolesaade/WorldHappinessReportAnalysis/blob/main/WorldHappinessReport_Time_Series_method1_(one_feature).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#3. Predict 2023-29 Life Ladder Method 1

##3(a) Libraries

In [21]:
import numpy as np
import pandas as pd
import plotly.express as px
import statsmodels.api as sm
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LogisticRegression, LinearRegression
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

##3(b) Load Life Ladder Score



In [24]:
# Load data
data = pd.read_csv('happiness_dataset.csv')

# Normalize data
scaler = MinMaxScaler()
data['Life Ladder'] = scaler.fit_transform(data[['Life Ladder']])

# Dataframe with life ladder score 2018-2023
ladder_data = data[['Country name', 'year', 'Life Ladder']].loc[data['year'].isin([2018, 2019, 2020, 2021, 2022, 2023])]

# Remove countries missing any of the 2018-2023 life ladder scores
countries = set(ladder_data.loc[ladder_data['year']==2018]['Country name'].unique()).intersection(
    ladder_data.loc[ladder_data['year']==2019]['Country name'].unique(),
    ladder_data.loc[ladder_data['year']==2020]['Country name'].unique(),
    ladder_data.loc[ladder_data['year']==2021]['Country name'].unique(),
    ladder_data.loc[ladder_data['year']==2022]['Country name'].unique(),
    ladder_data.loc[ladder_data['year']==2023]['Country name'].unique()
)
ladder_data = ladder_data.loc[ladder_data['Country name'].isin(countries)]

# Show all the countries using df.head()
ladder_data.head(-1)

Unnamed: 0,Country name,year,Life Ladder
25,Albania,2018,0.552538
26,Albania,2019,0.551202
27,Albania,2020,0.606115
28,Albania,2021,0.589789
29,Albania,2022,0.583408
...,...,...,...
2357,Zimbabwe,2018,0.346542
2358,Zimbabwe,2019,0.209706
2359,Zimbabwe,2020,0.278866
2360,Zimbabwe,2021,0.278124


Visualize Life Ladder Score Trajectory

In [25]:
fig = px.line(ladder_data, x='year', y='Life Ladder', color='Country name', title='Life Ladder Score by Country (Normalized)')
fig.show()

##3(c) LSTM

In [26]:
prediction = {}
for country in countries:
    country_data = ladder_data.loc[ladder_data['Country name']==country]

    X = country_data['Life Ladder'].values.reshape(-1, 1)

    model = Sequential()
    model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(1, 1)))
    model.add(Dropout(0.2))
    model.add(LSTM(50, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')
    model.fit(X[:-1], X[:-1], epochs=100, verbose=0) #train model with 2018-2022 data

    last_ladder = X[-2] #2022 data
    forecast = [] #stores predicted 2023-29 data
    for _ in range(6):
        lstm_preds = model.predict(last_ladder.reshape(-1, 1))
        forecast.append(lstm_preds[0,0])
        last_ladder = lstm_preds[0,0]

    prediction[country] = forecast

#print(prediction)



In [27]:
#Convert dictionary into Dataframe
p1_df = pd.DataFrame.from_dict(prediction, orient='index', columns=['Pred 2023', 'Pred 2024', 'Pred 2025', 'Pred 2026', 'Pred 2027', 'Pred 2028'])

#Store True 2023 values and the squared error (True 2023 vs Predicted 2023) into a list
true_2023 = []
error = []
for country in p1_df.index:
  y_true = ladder_data.loc[(ladder_data['Country name'] == country) & (ladder_data['year'] == 2023), 'Life Ladder'].values[0]
  y_pred = p1_df['Pred 2023'][country]
  true_2023.append(y_true)
  error.append((y_true-y_pred)**2)

#Add new columns 'True 2023' and 'Error' to the DataFrame
p1_df['True 2023'] = true_2023
p1_df['Squared Error'] = error

#Move country names in the index into a column
p1_df = p1_df.reset_index()
p1_df.columns=['Country name', 'Pred 2023', 'Pred 2024', 'Pred 2025', 'Pred 2026', 'Pred 2027', 'Pred 2028', 'True 2023', 'Squared Error']

from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/My Drive/Colab Notebooks/DS Group Project/'
p1_df.to_csv(path+'prediction_2023final.csv', index=False)

p1_df.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,Country name,Pred 2023,Pred 2024,Pred 2025,Pred 2026,Pred 2027,Pred 2028,True 2023,Squared Error
0,Jordan,0.462766,0.464904,0.465619,0.465859,0.465939,0.465966,0.446869,0.000253
1,Croatia,0.692028,0.718007,0.730669,0.736888,0.739954,0.741468,0.694123,4e-06
2,Peru,0.598564,0.564627,0.551555,0.546572,0.544681,0.543964,0.690858,0.008518
3,Nepal,0.632289,0.636756,0.63876,0.63966,0.640064,0.640246,0.609676,0.000511
4,Saudi Arabia,0.647288,0.60692,0.592511,0.587425,0.585636,0.585007,0.841793,0.037832


##3(d) Evaluation

In [16]:
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/My Drive/Colab Notebooks/DS Group Project/'
p1_df = pd.read_csv(path+'prediction_2023final.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [28]:
#Initialize list of mean squared errors
mse = []

for country in p1_df['Country name']:
  y_true = p1_df.loc[p1_df['Country name']==country]['True 2023']
  y_pred = p1_df.loc[p1_df['Country name']==country]['Pred 2023']
  mse.append(mean_squared_error(y_true, y_pred)) #Store MSEs for each country

p1_df['MSE'] = mse
print('Average MSE: ', sum(mse)/len(mse))

r_squared = r2_score(p1_df['True 2023'], p1_df['Pred 2023'])
print('R Squared: ', r_squared)

p1_df

Average MSE:  0.0043610372879435855
R Squared:  0.7979561002552203


Unnamed: 0,Country name,Pred 2023,Pred 2024,Pred 2025,Pred 2026,Pred 2027,Pred 2028,True 2023,Squared Error,MSE
0,Jordan,0.462766,0.464904,0.465619,0.465859,0.465939,0.465966,0.446869,0.000253,0.000253
1,Croatia,0.692028,0.718007,0.730669,0.736888,0.739954,0.741468,0.694123,0.000004,0.000004
2,Peru,0.598564,0.564627,0.551555,0.546572,0.544681,0.543964,0.690858,0.008518,0.008518
3,Nepal,0.632289,0.636756,0.638760,0.639660,0.640064,0.640246,0.609676,0.000511,0.000511
4,Saudi Arabia,0.647288,0.606920,0.592511,0.587425,0.585636,0.585007,0.841793,0.037832,0.037832
...,...,...,...,...,...,...,...,...,...,...
99,Bulgaria,0.617603,0.621443,0.622990,0.623614,0.623865,0.623967,0.639507,0.000480,0.000480
100,Senegal,0.553162,0.558354,0.560155,0.560781,0.560999,0.561075,0.565747,0.000158,0.000158
101,Nicaragua,0.750818,0.747105,0.745323,0.744468,0.744058,0.743862,0.754081,0.000011,0.000011
102,Norway,0.691329,0.623320,0.601494,0.594609,0.592450,0.591774,0.885723,0.037789,0.037789


###3(d).1 Scatter plot of True 2023 vs Predicted 2023

In [30]:
import plotly.graph_objects as go

#Scatter plot of True 2023 vs Predicted 2023 by country
scatter = go.Scatter(
    x=p1_df['True 2023'],
    y=p1_df['Pred 2023'],
    mode='markers',
    marker=dict(color='red', size=8),
    text=p1_df['Country name'])

#Add a layout
layout = go.Layout(
    title='True 2023 vs Predicted 2023',
    xaxis=dict(title='True 2023'),
    yaxis=dict(title='Predicted 2023'),
    width=600,
    height=600)

fig = go.Figure(data=scatter, layout=layout)

#Add a y=x line
fig.add_trace(go.Scatter(
        x=np.linspace(0, 1, 400),
        y=np.linspace(0, 1, 400),
        mode='lines',
        line=dict(color='black'),
        name='y=x'))

fig.show()

###3(d).2 Squared Error

In [32]:
line_plot = go.Scatter(
    x=p1_df['Country name'],
    y=p1_df['Squared Error'],
    mode='lines',
    line=dict(color='red'))

layout = go.Layout(
    title='Squared Error',
    xaxis=dict(title='Countries'),
    yaxis=dict(title='Squared Error'))

fig = go.Figure(data=line_plot, layout=layout)

#Add an average line
fig.add_shape(type="line",
              x0=0, x1=104,
              y0=0.05717388945530661, y1=0.05717388945530661,
              line=dict(color="black", width=2))

fig.update_xaxes(tickangle=30)
fig.show()

###3(d).3 2024-2029 Forcast #Not final results used in slides, improved in following sections

In [33]:
# Calculate the happiness score increase/decrease and final happiness score for each country
country_stats = []
for country, forecast in prediction.items():
  initial_score = forecast[0]
  final_score = forecast[-1]
  change = final_score - initial_score
  country_stats.append((country, change, final_score))

# Sort the countries based on the happiness score increase/decrease and final happiness score
country_stats.sort(key=lambda x: (x[1], x[2]), reverse=True)

# Create traces for each country's predicted happiness scores
traces = []
for country, _, _ in country_stats:
  predicted = pd.DataFrame({'Year': range(2023, 2029), 'Life Ladder': prediction[country]})
  trace = go.Scatter(x=predicted['Year'], y=predicted['Life Ladder'], mode='lines+markers', name=country)
  traces.append(trace)

# Create the layout for the plot
layout = go.Layout(
  title='Predicted Life Ladder for All Countries',
  xaxis=dict(title='Year'),
  yaxis=dict(title='Life Ladder'),
  hovermode='closest',
  width=800,
  height=600)

# Create the figure and display the plot
fig = go.Figure(data=traces, layout=layout)
fig.show()

# Print the ranking of countries based on happiness score increase/decrease and final happiness score
print("Ranking of Countries:")
for i, (country, change, final_score) in enumerate(country_stats, start=1):
    print(f"{i}. {country}: Increase/Decrease = {change:.4f}, Final Score = {final_score:.4f}")

Ranking of Countries:
1. Malta: Increase/Decrease = 0.0591, Final Score = 0.8533
2. Hungary: Increase/Decrease = 0.0568, Final Score = 0.7869
3. Croatia: Increase/Decrease = 0.0494, Final Score = 0.7415
4. Finland: Increase/Decrease = 0.0456, Final Score = 1.0275
5. Mauritius: Increase/Decrease = 0.0435, Final Score = 0.7507
6. Slovakia: Increase/Decrease = 0.0400, Final Score = 0.7914
7. Ghana: Increase/Decrease = 0.0368, Final Score = 0.5399
8. Dominican Republic: Increase/Decrease = 0.0345, Final Score = 0.7090
9. Bangladesh: Increase/Decrease = 0.0323, Final Score = 0.4453
10. Lebanon: Increase/Decrease = 0.0313, Final Score = 0.3229
11. Tajikistan: Increase/Decrease = 0.0307, Final Score = 0.6490
12. United States: Increase/Decrease = 0.0254, Final Score = 0.8515
13. Argentina: Increase/Decrease = 0.0244, Final Score = 0.7857
14. Uruguay: Increase/Decrease = 0.0225, Final Score = 0.8402
15. Mali: Increase/Decrease = 0.0201, Final Score = 0.4943
16. Latvia: Increase/Decrease = 0.01

#Findings

The LSTM model using only life ladder score yielded a 2023 prediction with **R-Squared: 0.7979561002552203**. The **squared error** for the models build for each country were all lower than 0.1, with an average of **0.0043610372879435855**. Israel had a highest squared error of 0.09711011, followed by Saudi Arabia (0.03783214) and Norway (0.03778912).

These show that the predictions for 2023 is fairly good, but could be improved by adding other features such as social support, freedom to make life choices and generosity, which were identified to be important features in OLS regression.