# Analyze Results

Overall, the model does very well (RMSE of only 2.0291 years on the test set!). However, it is still worthwhile to explore the results and see the model's strengths and weaknesses. 

In [1]:
# global imports
import numpy as np
import pandas as pd
# visualizations 
import altair as alt
from scipy.stats import gaussian_kde

# supress scientific notation
pd.options.display.float_format = '{:.2f}'.format
# show all columns
pd.set_option('display.max_columns', None)

## Import Data

In [2]:
%store -r dfs
# obtain test data
test = dfs[1]

In [3]:
%store -r model_lst
# obtain predictions
predictions = model_lst[0]
# obtain model
gb_model = model_lst[1]

In [4]:
%store -r scaled_dfs
# import X_train and y_train
X_train = scaled_dfs[0]

## Feature Importance

In [5]:
# obtain variables and feature importance
variable = X_train.columns
importance = gb_model.feature_importances_
# zip together
zipped = zip(importance,variable)
feat_importances = sorted(zipped)
# output feature importance
for feature in feat_importances:
    print(feature[1],': ',feature[0])

Region_North America :  0.00012508980378222597
Region_South Asia :  0.00027573743874612615
Region_Middle East & North Africa :  0.0008947438950863178
Region_Latin America & Caribbean :  0.0033000095351710403
Region_Europe & Central Asia :  0.003783923497691004
Measles :  0.007420597258653234
Population :  0.008095750474849941
HepatitisB :  0.009580413305215276
PercentExpenditure :  0.011181779649114433
Polio :  0.014066273058858604
TotalExpenditure :  0.016809585401358
Alcohol :  0.017463880081336997
Diphtheria :  0.018286575567201344
Status_Developing :  0.01965937843717777
GDP :  0.03242893614750072
BMI :  0.034061298645379705
Thinness1_19 :  0.03521957028484361
Schooling :  0.08706208951026029
Region_Sub-Saharan Africa :  0.12677581364823926
HIV/AIDS :  0.23343819171471392
IncomeComposition :  0.32007036264482014


In [6]:
# create data frame
feature_df = pd.DataFrame({'Variable':variable,'Importance':importance})
# create bar chart
alt.Chart(feature_df,title='Bar Chart of Feature Importance').mark_bar().encode(
    x=alt.X('Variable:N',sort='y'),
    y='Importance:Q'
).properties(
    width=500,
    height=300
).display()

Income composition of resources, HIV/AIDS, whether observations are in sub-Sahara Africa, and schooling are the most important features for determining life expectancy. 

## Absolute Difference Between Predicted and Observed

I explored the data to determine which types of observations perform particularly well or poorly with the model.

In [7]:
# add predictions to test data
test['Predictions'] = predictions
# get absolute difference
test['Difference'] = test['LifeExpectancy']-test['Predictions']
test['AbsDiff'] = test['Difference'].abs()
# drop extra columns
test.drop(columns=['Year','Thinness5_9'],inplace=True)
# output data frame
test.head()

Unnamed: 0,Status,LifeExpectancy,Alcohol,PercentExpenditure,HepatitisB,Measles,BMI,Polio,TotalExpenditure,Diphtheria,HIV/AIDS,GDP,Population,Thinness1_19,IncomeComposition,Schooling,Region,Predictions,Difference,AbsDiff
2008,Developing,74.9,5.14,885.99,95.0,0,53.6,94.0,5.18,95.0,0.1,6387.79,3158966.0,1.1,0.72,13.4,Latin America & Caribbean,75.23,-0.33,0.33
825,Developing,71.2,2.83,52.3,99.0,0,5.8,99.0,6.32,99.0,0.3,334.84,683475.0,1.8,0.66,12.9,Latin America & Caribbean,74.11,-2.91,2.91
605,Developing,59.5,0.08,29.42,54.5,0,18.0,98.0,3.39,89.0,0.1,433.27,569479.0,7.7,0.0,8.8,Sub-Saharan Africa,61.55,-2.05,2.05
2844,Developing,71.2,0.85,457.97,63.0,0,49.9,65.0,3.85,65.0,0.1,3275.92,241871.0,1.5,0.59,10.8,East Asia & Pacific,72.16,-0.96,0.96
1861,Developing,74.5,3.55,473.12,98.0,0,53.2,99.0,9.4,98.0,0.1,1975.46,613997.0,1.8,0.64,11.6,Latin America & Caribbean,74.11,0.39,0.39


### Categorical Variables

In [8]:
# function for visualizing absolute difference for categorical variables
def abs_diff_violin(var,Width):
    alt.Chart(test,title='Violin Plot Absolute Difference').transform_density(
        'AbsDiff',
        as_=['AbsDiff', 'density'],
        groupby=[var]
    ).mark_area(orient='horizontal').encode(
        y=alt.Y('AbsDiff:Q',title='Absolute Difference'),
        color=var,
        x=alt.X(
            'density:Q',
            stack='center',
            impute=None,
            title=None,
            axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True)
        ),
        column=alt.Column(
            var,
            header=alt.Header(
                labelOrient='bottom',
                labelAngle=-90,
                labelPadding=0,
                labelAlign='right')
        )
    ).properties(
        width=Width
    ).configure_facet(
        spacing=0
    ).configure_view(
        stroke=None
    ).display()

In [9]:
# create violin plot for status
abs_diff_violin('Status',300)

The model performs similarly for both values of `Status`.

In [10]:
# create violin plot for region
abs_diff_violin('Region',100)

The model predicts most accurately for North America and South Asia. It performs the worst for Sub-Saharan Africa, Middle East and North Africa, and Europe and Central Asia. 

The regions with the worst performance cover countries that have a wide variety of life expectancies, so it seems reasonable that the model would have issues with these areas.

## Numeric Variables

In [11]:
# create lists
lst_of_charts = []
all_columns = test.select_dtypes(include=np.number).columns.tolist()
response_columns = ['LifeExpectancy','Predictions','Difference','AbsDiff']
columns = [col for col in all_columns if col not in response_columns]
# create charts
for col in columns:
    chart = alt.Chart(test, title='Histogram of '+str(col)).mark_bar().encode(
        alt.X(col,bin=True),
        y='mean(AbsDiff):Q'
    ).properties(
        width=165,
        height=165
    )
    lst_of_charts.append(chart)
# output charts
alt.vconcat((lst_of_charts[0]|lst_of_charts[1]|lst_of_charts[2]|lst_of_charts[3]),
            (lst_of_charts[4]|lst_of_charts[5]|lst_of_charts[6]|lst_of_charts[7]),
            (lst_of_charts[8]|lst_of_charts[9]|lst_of_charts[10]|lst_of_charts[11]),
            (lst_of_charts[12]|lst_of_charts[13]))

The model performs similarly for all levels of alcohol and schooling. It predicts very well for large values of BMI and thinness. It has difficulties predicting middle values of expenditure on health as a percent of GDP, hepatitus B, polio immunization coverage, diptheria immunization coverage, and HIV/AIDS. 