<a href="https://colab.research.google.com/github/seanmcalevey/example-repo/blob/master/Predicting_Happiness_Mini_Hackithon_for_tabular_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Objective: Predict World Happiness Rankings 

What makes the citizens of one country more happy than the citizens of other countries?  Do variables measuing perceptions of corruption, GDP, maintaining a healthy lifestyle, or social support associate with a country's happiness ranking?  

Let's use the United Nation's World Happiness Rankings country level data to experiment with models that predict happiness rankings well.


---

**Data**: 2019 World Happiness Survey Rankings
*(Data can be found on Advanced Projects in ML courseworks site)*

**Features**
*   Country or region
*   GDP per capita
*   Social support
*   Healthy life expectancy
*   Freedom to make life choices
*   Generosity
*   Perceptions of corruption

**Target**
*   Happiness_level (Very High = Top 20% and Very Low = Bottom 20%)

Source: https://worldhappiness.report/




# Mini-Hackathon In Class Tasks



1.   Build, save, and submit at least one Keras model.
2.   Build, save, and submit at least one Scikit-learn model.
3.   Seek advice through collaboration via Github:

*      Save notebook w/ best model to private repo
*      Invite a collaborator
*      Collaborator should submit at least two issues w/ suggestions for model improvement

4.   If time, improve model further!











# Import the data




In [1]:
! pip install scikit-learn --upgrade # load newest version of sklearn

Requirement already up-to-date: scikit-learn in /usr/local/lib/python3.6/dist-packages (0.22.1)


In [32]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data=pd.read_csv("worldhappiness2019.csv")

data.head()

Unnamed: 0,Happiness_level,Country or region,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,Very High,Finland,1.34,1.587,0.986,0.596,0.153,0.393
1,Very High,Denmark,1.383,1.573,0.996,0.592,0.252,0.41
2,Very High,Norway,1.488,1.582,1.028,0.603,0.271,0.341
3,Very High,Iceland,1.38,1.624,1.026,0.591,0.354,0.118
4,Very High,Netherlands,1.396,1.522,0.999,0.557,0.322,0.298


# Build a model to predict happiness rankings

In [34]:
# Set up training and test data
from sklearn.model_selection import train_test_split

y=data['Happiness_level']
X=data.drop(['Happiness_level'],axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

print(X_train.shape)
print(y_train.shape)
print(X_train.columns.tolist())

(117, 7)
(117,)
['Country or region', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']


## Preprocess data using Column Transformer and save fit preprocessor to ".pkl" file

In [0]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# We create the preprocessing pipelines for both numeric and categorical data.

numeric_features=X.columns.tolist()
numeric_features.remove('Country or region')

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['Country or region']

#Replacing missing values with Modal value and then one hot encoding.
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# final preprocessor object set up with ColumnTransformer

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])


#Fit your preprocessor object
prediction_input_preprocessor=preprocessor.fit(X_train) 

import pickle
pickle.dump(prediction_input_preprocessor, open( "preprocessor.pkl", "wb" ) )

In [36]:
# Check shape for keras input:
prediction_input_preprocessor.transform(X_train).shape # pretty small dataset

(117, 123)

In [37]:
# Check shape for keras output:
pd.get_dummies(y_train).shape

(117, 5)

## Fit a neural network with Keras

In [43]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
import keras
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(128, input_dim=123, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))

model.add(Dense(5, activation='softmax'))
                                            
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

# Fitting the NN to the Training set
model.fit(prediction_input_preprocessor.transform(X_train), pd.get_dummies(y_train), 
               batch_size = 31, 
               epochs = 300, validation_split=0.2)



Train on 93 samples, validate on 24 samples
Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epo

<keras.callbacks.History at 0x7f41fe88c390>

## An important aside for production ready Keras models: 
*Keras classification model objects return the predicted probabilities of each class for every prediction.  How do we return a target label instead?*

In [44]:
# using predict_classes() for multi-class data to return predicted class index.

print(model.predict_classes(prediction_input_preprocessor.transform(X_test)))

prediction_index=model.predict_classes(prediction_input_preprocessor.transform(X_test))

#Now lets run some code to get keras to return the label rather than the index...

# get labels from one hot encoded y_train data
labels=pd.get_dummies(y_train).columns

# Function to use to return label from column index location
def index_to_label(labels,index_n): 
    return labels[index_n]
    
# Example: return label at predicted index location 1
index_to_label(labels,1)

# Iterate through all predicted indices using map method

predicted_labels=list(map(lambda x: labels[x], prediction_index))
print(predicted_labels)

[1 0 0 1 4 1 0 0 1 2 1 0 3 4 1 1 4 4 4 0 4 3 1 3 1 2 1 1 1 4 0 2 1 4 2 4 4
 4 1]
['High', 'Average', 'Average', 'High', 'Very Low', 'High', 'Average', 'Average', 'High', 'Low', 'High', 'Average', 'Very High', 'Very Low', 'High', 'High', 'Very Low', 'Very Low', 'Very Low', 'Average', 'Very Low', 'Very High', 'High', 'Very High', 'High', 'Low', 'High', 'High', 'High', 'Very Low', 'Average', 'Low', 'High', 'Very Low', 'Low', 'Very Low', 'Very Low', 'Very Low', 'High']


# Evaluate Keras Model using model_eval_metrics(), assign result to modelevalobject.  
Will use this for leaderboard submission in a bit.

In [45]:
# Now we can extract some evaluative metrics to use for model submission

import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
import pandas as pd
from math import sqrt

def model_eval_metrics(y_true, y_pred,classification="TRUE"):
     if classification=="TRUE":
        accuracy_eval = accuracy_score(y_true, y_pred)
        f1_score_eval = f1_score(y_true, y_pred,average="macro",zero_division=0)
        precision_eval = precision_score(y_true, y_pred,average="macro",zero_division=0)
        recall_eval = recall_score(y_true, y_pred,average="macro",zero_division=0)
        mse_eval = 0
        rmse_eval = 0
        mae_eval = 0
        r2_eval = 0
        metricdata = {'accuracy': [accuracy_eval], 'f1_score': [f1_score_eval], 'precision': [precision_eval], 'recall': [recall_eval], 'mse': [mse_eval], 'rmse': [rmse_eval], 'mae': [mae_eval], 'r2': [r2_eval]}
        finalmetricdata = pd.DataFrame.from_dict(metricdata)
     else:
        accuracy_eval = 0
        f1_score_eval = 0
        precision_eval = 0
        recall_eval = 0
        mse_eval = mean_squared_error(y_true, y_pred)
        rmse_eval = sqrt(mean_squared_error(y_true, y_pred))
        mae_eval = mean_absolute_error(y_true, y_pred)
        r2_eval = r2_score(y_true, y_pred)
        metricdata = {'accuracy': [accuracy_eval], 'f1_score': [f1_score_eval], 'precision': [precision_eval], 'recall': [recall_eval], 'mse': [mse_eval], 'rmse': [rmse_eval], 'mae': [mae_eval], 'r2': [r2_eval]}
        finalmetricdata = pd.DataFrame.from_dict(metricdata)
     return finalmetricdata

model_eval_metrics( y_test,predicted_labels,classification="TRUE")


Unnamed: 0,accuracy,f1_score,precision,recall,mse,rmse,mae,r2
0,0.487179,0.482559,0.576623,0.511111,0,0,0,0


In [46]:
# add metrics to submittable object
modelevalobject=model_eval_metrics( y_test,predicted_labels,classification="TRUE")

modelevalobject


Unnamed: 0,accuracy,f1_score,precision,recall,mse,rmse,mae,r2
0,0.487179,0.482559,0.576623,0.511111,0,0,0,0


# Save keras model to onnx file.  We will use this file to make predictions within a production ready scalable REST API.

In [47]:
# Load libraries for onnx model conversion (keras to onnx)
! pip3 install keras2onnx
! pip3 install onnxruntime



In [0]:
#Convert keras model object to onnx and then save it to .onnx file
import os

if not os.path.exists('mymodel.onnx'):
    from keras2onnx import convert_keras
    onx = convert_keras(model, 'mymodel.onnx')
    with open("mymodel.onnx", "wb") as f:
        f.write(onx.SerializeToString())

## Aside: Example of code similar to what is run behind the scenes within our REST api:

In [49]:
# In onnx you can make predictions in the following manner.  This is what happens behinds the scenes in our live web-application.
# the json input data is sent to a REST Api, transformed to a pandas dataframe, preprocessed, then predictions are generated from our onnx model.

import onnxruntime as rt
sess= rt.InferenceSession("mymodel.onnx")
input_name = sess.get_inputs()[0].name
bodydict={'Country or region': 'United States', 'GDP per capita': [1], 'Social support': [1], 'Healthy life expectancy': [1], 'Freedom to make life choices': [1], 'Generosity': [1], 'Perceptions of corruption': [1]}
bodynew = pd.DataFrame.from_dict(bodydict)

input_data=preprocessor.transform(bodynew).astype("float32").toarray()
input_data

array([[ 0.21686903, -0.6897203 ,  1.0740974 ,  4.0057316 ,  8.259227  ,
         8.666142  ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0. 

In [50]:
# Here is the resulting predicted probability for each of the five cats of our target variable
res = sess.run(None,  {input_name: input_data})
res[0]

array([[3.4816905e-03, 8.3222622e-03, 1.9751673e-04, 9.8796755e-01,
        3.1087362e-05]], dtype=float32)

# Submit model to live REST API @ [World Happiness Prediction Model Detail Page](http://mlsitetest.com.s3-website-us-east-1.amazonaws.com/detail/World%20Happiness%20Prediction%20Model/c6f76d3649fe11ea9b520242ac1c0002)

In [51]:
#install aimodelshare library
! pip3 install https://github.com/mikedparrott/aimodelshare/blob/master/aimodelshare-0.0.2.tar.gz?raw=true

Collecting https://github.com/mikedparrott/aimodelshare/blob/master/aimodelshare-0.0.2.tar.gz?raw=true
  Using cached https://github.com/mikedparrott/aimodelshare/blob/master/aimodelshare-0.0.2.tar.gz?raw=true
Building wheels for collected packages: aimodelshare
  Building wheel for aimodelshare (setup.py) ... [?25l[?25hdone
  Created wheel for aimodelshare: filename=aimodelshare-0.0.2-cp36-none-any.whl size=5375 sha256=59b1a1cd929daf854b1051c10fedbf468ab2c49af29b22bbafc0a0b5341bd137
  Stored in directory: /root/.cache/pip/wheels/31/8d/ac/09cb6ef7374ec79e02843c347195e5478144006b11def6799a
Successfully built aimodelshare


### To submit a model you need to sign up for username and password at:
[AI Model Share Initiative Site](http://mlsitetest.com.s3-website-us-east-1.amazonaws.com/login)

# Set up necessary arguments for model submission using aimodelshare python library.

## Required information for tabular models:
* api_url ( the api url for whatever aimodelshare project you are submitting a model to)
* aws key  and password (provided for you)
* model file path
* preprocessor file path
* training data (a pandas data frame such as X_train)
* model evaluation object (we created this using the model eval metrics function above)



In [0]:
import pickle

# Loading AWS keys necessary to submit model.  Loading to object, so we don't print them out in our notebook

aws_key_password_region = pickle.load( open( "worldhappiness_modelsubmission_keys.pkl", "rb" ) )


In [0]:
# Example Model Pre-launched into Model Share Site
apiurl="https://btuvanmi55.execute-api.us-east-1.amazonaws.com/prod/m"
username = "seanmcalevey"
password = "George14"

region='us-east-1'
model_filepath="mymodel.onnx"   
preprocessor_filepath="preprocessor.pkl"
preprocessor="TRUE"

trainingdata=X_train

# Set aws keys for this project (these keys give you access to collaborate on a single project)

#Importing from object that stores keys so we do not print out keys for others to see.

aws_key_password_region = pickle.load( open( "worldhappiness_modelsubmission_keys.pkl", "rb" ) )

aws_key=aws_key_password_region[0]
aws_password=aws_key_password_region[1]
region=aws_key_password_region[2]

In [54]:
# Submit your model using submit_model() function
# Works with models and preprocessors. 
import aimodelshare as ai

ai.submit_model(model_filepath=model_filepath, model_eval_metrics=modelevalobject,apiurl=apiurl, username=username, password=password, aws_key=aws_key,aws_password=aws_password, region=region, trainingdata=trainingdata,preprocessor_filepath=preprocessor_filepath,preprocessor=preprocessor)

"mymodel.onnx" has been loaded to version 49 of your prediction API.
This version of the model will be used by your prediction api for all future predictions automatically.
If you wish to use an older version of the model, please reference the getting started guide at aimodelshare.com.


# Now you can check the leaderboard!

In [0]:
# arguments required to get leaderboard below
apiurl="https://btuvanmi55.execute-api.us-east-1.amazonaws.com/prod/m"
username = "seanmcalevey"
password = "George14"

In [56]:
import aimodelshare as ai

leaderboard = ai.get_leaderboard(apiurl, username, password, aws_key, aws_password, region)

LEADERBOARD RANKINGS:


Unnamed: 0,accuracy,f1_score,precision,recall,mse,rmse,mae,r2,username,model_version,avg_ranking_classification,avg_ranking_regression
3,0.512821,0.50806,0.625909,0.544444,0,0,0,0,Yihui_Wang,19,1.0,1.0
25,0.512821,0.504464,0.604242,0.544444,0,0,0,0,Paarth_Malkan,37,2.0,1.0
30,0.487179,0.482925,0.605195,0.511111,0,0,0,0,bavilaa,21,2.333333,1.0
16,0.487179,0.482925,0.605195,0.511111,0,0,0,0,bavilaa,20,2.333333,1.0
0,0.487179,0.482559,0.576623,0.511111,0,0,0,0,jaeham,22,3.333333,1.0
45,0.487179,0.482559,0.576623,0.511111,0,0,0,0,Taketo,26,3.333333,1.0
44,0.487179,0.482559,0.576623,0.511111,0,0,0,0,Taketo,43,3.333333,1.0
34,0.487179,0.482559,0.576623,0.511111,0,0,0,0,zivzach,15,3.333333,1.0
26,0.487179,0.482559,0.576623,0.511111,0,0,0,0,AlisaAi,24,3.333333,1.0
17,0.487179,0.482559,0.576623,0.511111,0,0,0,0,yaowang126,46,3.333333,1.0


In [24]:
# Build, save, and submit a sklearn model

from numpy import loadtxt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier
#Create a Gaussian Classifier
model=RandomForestClassifier(n_estimators=1000, random_state = 0)
#Train the model using the training sets y_pred=clf.predict(X_test)
model.fit(prediction_input_preprocessor.transform(X_train), y_train)
y_pred=model.predict(prediction_input_preprocessor.transform(X_test))

#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
#print("Accuracy on Test Data:",metrics.accuracy_score(y_test, y_pred))

print("Random Forest Classifier's cross validation accuracy:", np.mean(cross_val_score(model, prediction_input_preprocessor.transform(X_train), y_train, cv=10)))
print("Random Forest Classifier's Test-Data prediction accuracy: {:.5f}".format(model.score(prediction_input_preprocessor.transform(X_test), y_test)))


Random Forest Classifier's cross validation accuracy: 0.5909090909090908
Random Forest Classifier's Test-Data prediction accuracy: 0.38462


In [0]:
# Save sklearn modle to pkl file
import pickle
pickle.dump(model, open( "rff_model.pkl", "wb" ) )

In [26]:
# Simply update model evaluation metric object, then change the filepaths for your preprocessor file(if new) and sklearn model to submit a sklearn model
# add metrics to submittable object
modelevalobject=model_eval_metrics(y_test,y_pred,classification="TRUE")
modelevalobject

Unnamed: 0,accuracy,f1_score,precision,recall,mse,rmse,mae,r2
0,0.384615,0.397653,0.482937,0.383333,0,0,0,0


In [0]:
# Example Model Pre-launched into Model Share Site
apiurl="https://btuvanmi55.execute-api.us-east-1.amazonaws.com/prod/m"
username = "seanmcalevey"
password = "George14"

region='us-east-1'
model_filepath="rff_model.pkl"   
preprocessor_filepath="preprocessor.pkl"
preprocessor="TRUE"

trainingdata=X_train

# Set aws keys for this project (these keys give you access to collaborate on a single project)

#Importing from object that stores keys so we do not print out keys for others to see.
aws_key_password_region = pickle.load( open( "worldhappiness_modelsubmission_keys.pkl", "rb" ) )

aws_key=aws_key_password_region[0]
aws_password=aws_key_password_region[1]
region=aws_key_password_region[2]

In [29]:
# Submit new model
import aimodelshare as ai

ai.submit_model(model_filepath=model_filepath, model_eval_metrics=modelevalobject,apiurl=apiurl, username=username, password=password, aws_key=aws_key,aws_password=aws_password, region=region, trainingdata=trainingdata,preprocessor_filepath=preprocessor_filepath,preprocessor=preprocessor)

"rff_model.pkl" has been loaded to version 29 of your prediction API.
This version of the model will be used by your prediction api for all future predictions automatically.
If you wish to use an older version of the model, please reference the getting started guide at aimodelshare.com.


In [30]:
# Check leaderboard
import aimodelshare as ai

leaderboard = ai.get_leaderboard(apiurl, username, password, aws_key, aws_password, region)

LEADERBOARD RANKINGS:


Unnamed: 0,accuracy,f1_score,precision,recall,mse,rmse,mae,r2,username,model_version,avg_ranking_classification,avg_ranking_regression
2,0.512821,0.50806,0.625909,0.544444,0,0,0,0,Yihui_Wang,19,1.0,1.0
18,0.487179,0.482925,0.605195,0.511111,0,0,0,0,bavilaa,21,2.0,1.0
10,0.487179,0.482925,0.605195,0.511111,0,0,0,0,bavilaa,20,2.0,1.0
22,0.487179,0.482559,0.576623,0.511111,0,0,0,0,zivzach,15,2.666667,1.0
15,0.487179,0.482559,0.576623,0.511111,0,0,0,0,AlisaAi,24,2.666667,1.0
28,0.487179,0.482559,0.576623,0.511111,0,0,0,0,Taketo,26,2.666667,1.0
0,0.487179,0.482559,0.576623,0.511111,0,0,0,0,jaeham,22,2.666667,1.0
5,0.461538,0.456259,0.576623,0.486111,0,0,0,0,Nayyer-Qureshi,13,4.333333,1.0
24,0.461538,0.456845,0.557576,0.486111,0,0,0,0,SUN-Wenjun,14,4.666667,1.0
12,0.461538,0.446907,0.561497,0.494444,0,0,0,0,XU,16,5.0,1.0


In [31]:
! # Live REST API example for tabular data!
! curl -X POST -H "Content-Type: application/json" -d '{"data":{"Country or region": "Mexico", "GDP per capita": [-10000],"Social support": [1],"Healthy life expectancy": [1],"Freedom to make life choices": [-1000],"Generosity": [1], "Perceptions of corruption": [-1000]}}' "https://btuvanmi55.execute-api.us-east-1.amazonaws.com/prod/m"

["Very Low"]