# WHO LIVES: FEATURE ENGINEERING, Modelling, Testing NOTEBOOK
##### This notebook runs you through the Feature Engineering, Modelling and Testing Process.

## Importing Libraries and Importing data

In [106]:
# Importing necessary packages and modules
import numpy as np     
import pandas as pd
import seaborn as sns             
import matplotlib.pyplot as plt 
import joblib


# for train-test splitting
from sklearn.model_selection import train_test_split  # to perform our train-test split

# for modelling
import statsmodels.api as sm  # for the linear regression model
import statsmodels.tools      # for the evaluation of our model
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.tools.tools import add_constant

# to scale features
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

In [42]:
df = pd.read_csv('Life Expectancy Data.csv') # reading in the necessary dataset

## Splitting the data: Train-Test-Split

In [187]:
# Separating into necessary columns for X and y
feature_cols = list(df.columns)  # get ALL the columns (features)
feature_cols.remove('Life_expectancy')  # taking out the Life Expectancy as the target

In [189]:
# Split the dataset to create X, and y
X = df[feature_cols]   
y = df['Life_expectancy']

In [191]:
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X,   # features
                                                    y,   # target
                                                    test_size=0.2,  # 20% for testing
                                                    random_state=23,
                                                    stratify=X['Country'])  
                                                    # Stratify ensures that both test and train contain all the countries

## Checking values after splittling


In [193]:
# Testing that we have the correct number of observations across Train and Test
print(f'Same number of records in Train: {X_train.shape[0] == y_train.shape[0]}')
print(f'Same number of records in Test: {X_test.shape[0] == y_test.shape[0]}')

Same number of records in Train: True
Same number of records in Test: True


In [195]:
# checking number of unique countries in Train
X_train['Country'].nunique()

179

In [197]:
# Checking same in Test
X_test['Country'].nunique()

179

# Robust function
This function prepares the data for analysis or modeling by:
* Dropping unnecessary columns
* Creating dummy variables for categorical features
* Generating interaction terms
* Scaling numerical features
* Adding a constant column for statistical modeling


### **Code Explanation**:

- **Dropping Region**: The **Region** column was dropped as it is non-numerical and not relevant for the analysis, making it unnecessary to apply one-hot encoding (OHE).  
- **One-Hot Encoding (OHE)**: Applied OHE to the **Country** column to handle categorical data for countries.  
- **Interaction Terms**:  
  - Created an interaction term between **Hepatitis_B**, **Polio**, and **Diphtheria** due to high multicollinearity among these features.  
  - Added an interaction term for **Infant_deaths** and **Under_five_deaths** to combine these closely related features into a more representative variable.  
- **Feature Scaling**: Scaled numerical features to ensure they are on a comparable scale, which is essential for robust model performance.  
- **Adding a Constant**: Included a constant term for the regression model.

### **Choice of Features**:
The following features were selected for the model:  
- **Year**  
- **Adult_mortality**  
- **Illnesses** (interaction of **Hepatitis_B**, **Polio**, and **Diphtheria**)  
- **Mortality_rates** (interaction of **Infant_deaths** and **Under_five_deaths**)  
- **Countries**: All countries were included using OHE, excluding one as a reference.

### **Results**:
- **RMSE (Training)**: 0.51  
- **RMSE (Test)**: 0.55  
- **R-squared**: 99.7%  
- **Condition Number**: 295  

**Conclusion**:  
The high R-squared value and low RMSE indicate strong performance on the training data. However, the small gap between training and test RMSE, combined with the near-perfect R-squared, suggests the model is **overfit** and may not generalize well to new data.

In [150]:
def feature_eng(df):

    # Make a copy of the dataframe to avoid altering the original data
    df = df.copy()

    # Drop the 'Region' column as 'Country' provides sufficient geographic information for analysis
    df = df.drop('Region', axis=1)
    
    # One-Hot Encode (OHE) the 'Country' column
    # Each unique country becomes a new column, with binary values (0 or 1)
    # Setting 'drop_first=True' removes the first country to avoid the dummy variable trap.
    # The dropped country's presence can be inferred from the other country columns.   
    df = pd.get_dummies(df, columns=['Country'], drop_first=True, prefix='Country', dtype=int)
    
    # Interaction terms between specific columns
    # Create a combined effect of illness-related features
    df['illnesses'] = df['Hepatitis_B'] * df['Polio'] * df['Diphtheria']
    
    # Interaction term for mortality-related features
    df['mortality_rates'] = df['Infant_deaths'] * df['Under_five_deaths']
    
    # Scaling numerical features to standardize their ranges
    # Initialize scalers
    ss = StandardScaler()
    
    # Scale 'Adult_mortality' using StandardScaler
    df[['Infant_deaths', 'Under_five_deaths', 'Adult_mortality']] = ss.fit_transform(df[['Infant_deaths', 'Under_five_deaths', 'Adult_mortality']])
    
    # Scale interaction terms using RobustScaler (less sensitive to outliers)
    df[['illnesses', 'mortality_rates']] = ss.fit_transform(df[['illnesses', 'mortality_rates']])
    
    # Scale 'Year' using StandardScaler 
    df[['Year']] = ss.fit_transform(df[['Year']])
    
    # Add a constant column to the dataframe
    df = sm.add_constant(df)
    
    return df

In [152]:
X_train_fe = feature_eng(X_train)

In [154]:
feature_cols =  [
 'const',
 'Year',
 # 'Infant_deaths',
 # 'Under_five_deaths',
 'Adult_mortality',
 # 'Alcohol_consumption',
 # 'Hepatitis_B',
 # 'Measles',
 # 'BMI',
 # 'Polio',
 # 'Diphtheria',
 # 'Incidents_HIV',
 # 'GDP_per_capita',
 # 'Population_mln',
 # 'Thinness_ten_nineteen_years',
 # 'Thinness_five_nine_years',
 # 'Schooling',
 # 'Economy_status_Developed',
 # 'Economy_status_Developing',
 # 'GDP_Schooling_interaction',
 'illnesses',
 'mortality_rates',
 'Country_Albania', 'Country_Algeria', 'Country_Angola', 'Country_Antigua and Barbuda', 'Country_Argentina', 'Country_Armenia',
 'Country_Australia', 'Country_Austria', 'Country_Azerbaijan', 'Country_Bahamas, The', 'Country_Bahrain', 'Country_Bangladesh',
 'Country_Barbados', 'Country_Belarus', 'Country_Belgium', 'Country_Belize', 'Country_Benin', 'Country_Bhutan', 'Country_Bolivia',
 'Country_Bosnia and Herzegovina', 'Country_Botswana', 'Country_Brazil', 'Country_Brunei Darussalam', 'Country_Bulgaria',
 'Country_Burkina Faso', 'Country_Burundi', 'Country_Cabo Verde', 'Country_Cambodia', 'Country_Cameroon', 'Country_Canada',
 'Country_Central African Republic', 'Country_Chad', 'Country_Chile', 'Country_China', 'Country_Colombia', 'Country_Comoros',
 'Country_Congo, Dem. Rep.', 'Country_Congo, Rep.', 'Country_Costa Rica', "Country_Cote d'Ivoire", 'Country_Croatia', 'Country_Cuba',
 'Country_Cyprus', 'Country_Czechia', 'Country_Denmark', 'Country_Djibouti', 'Country_Dominican Republic', 'Country_Ecuador', 
 'Country_Egypt, Arab Rep.', 'Country_El Salvador', 'Country_Equatorial Guinea', 
 'Country_Eritrea', 'Country_Estonia', 'Country_Eswatini',
 'Country_Ethiopia', 'Country_Fiji', 'Country_Finland', 'Country_France', 'Country_Gabon', 'Country_Gambia, The', 'Country_Georgia',
 'Country_Germany', 'Country_Ghana', 'Country_Greece', 'Country_Grenada', 'Country_Guatemala', 'Country_Guinea', 'Country_Guinea-Bissau',
 'Country_Guyana', 'Country_Haiti', 'Country_Honduras', 'Country_Hungary', 'Country_Iceland', 'Country_India', 'Country_Indonesia',
 'Country_Iran, Islamic Rep.', 'Country_Iraq', 'Country_Ireland', 'Country_Israel', 'Country_Italy', 'Country_Jamaica', 'Country_Japan',
 'Country_Jordan', 'Country_Kazakhstan', 'Country_Kenya', 'Country_Kiribati', 'Country_Kuwait', 'Country_Kyrgyz Republic', 'Country_Lao PDR',
 'Country_Latvia', 'Country_Lebanon', 'Country_Lesotho', 'Country_Liberia', 'Country_Libya', 'Country_Lithuania', 'Country_Luxembourg',
 'Country_Madagascar', 'Country_Malawi', 'Country_Malaysia', 'Country_Maldives', 'Country_Mali', 'Country_Malta', 'Country_Mauritania',
 'Country_Mauritius', 'Country_Mexico', 'Country_Micronesia, Fed. Sts.', 'Country_Moldova', 'Country_Mongolia', 'Country_Montenegro',
 'Country_Morocco', 'Country_Mozambique', 'Country_Myanmar', 'Country_Namibia', 'Country_Nepal', 'Country_Netherlands', 'Country_New Zealand',
 'Country_Nicaragua', 'Country_Niger', 'Country_Nigeria', 'Country_North Macedonia', 'Country_Norway', 'Country_Oman', 'Country_Pakistan',
 'Country_Panama', 'Country_Papua New Guinea', 'Country_Paraguay', 'Country_Peru', 'Country_Philippines', 'Country_Poland', 'Country_Portugal',
 'Country_Qatar', 'Country_Romania', 'Country_Russian Federation', 'Country_Rwanda', 
 'Country_Samoa', 'Country_Sao Tome and Principe',
 'Country_Saudi Arabia', 'Country_Senegal', 'Country_Serbia', 'Country_Seychelles', 'Country_Sierra Leone', 'Country_Singapore', 'Country_Slovak Republic',
 'Country_Slovenia', 'Country_Solomon Islands', 'Country_Somalia', 'Country_South Africa', 'Country_Spain', 'Country_Sri Lanka', 'Country_St. Lucia',
 'Country_St. Vincent and the Grenadines', 'Country_Suriname', 'Country_Sweden', 'Country_Switzerland', 'Country_Syrian Arab Republic', 
 'Country_Tajikistan', 'Country_Tanzania', 'Country_Thailand', 'Country_Timor-Leste', 'Country_Togo', 'Country_Tonga', 'Country_Trinidad and Tobago',
 'Country_Tunisia', 'Country_Turkiye', 'Country_Turkmenistan', 'Country_Uganda', 'Country_Ukraine', 'Country_United Arab Emirates', 
 'Country_United Kingdom', 'Country_United States', 'Country_Uruguay', 'Country_Uzbekistan', 'Country_Vanuatu', 'Country_Venezuela, RB', 
 'Country_Vietnam',  'Country_Yemen, Rep.', 'Country_Zambia', 'Country_Zimbabwe'
]


# Train the model and check its summary
# Create the model object
lin_reg = sm.OLS(y_train, X_train_fe[feature_cols])
# fit the model
results = lin_reg.fit() 
#calculate desired metrics
results.summary()

0,1,2,3
Dep. Variable:,Life_expectancy,R-squared:,0.997
Model:,OLS,Adj. R-squared:,0.997
Method:,Least Squares,F-statistic:,3924.0
Date:,"Mon, 09 Dec 2024",Prob (F-statistic):,0.0
Time:,16:28:15,Log-Likelihood:,-1708.6
No. Observations:,2291,AIC:,3783.0
Df Residuals:,2108,BIC:,4833.0
Df Model:,182,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,64.4984,0.160,404.181,0.000,64.185,64.811
Year,0.6009,0.015,39.455,0.000,0.571,0.631
Adult_mortality,-4.7977,0.067,-71.908,0.000,-4.929,-4.667
illnesses,0.2716,0.029,9.266,0.000,0.214,0.329
mortality_rates,-1.2022,0.040,-30.428,0.000,-1.280,-1.125
Country_Albania,5.9670,0.239,25.003,0.000,5.499,6.435
Country_Algeria,5.3851,0.228,23.619,0.000,4.938,5.832
Country_Angola,-3.7193,0.214,-17.359,0.000,-4.139,-3.299
Country_Antigua and Barbuda,7.7535,0.229,33.831,0.000,7.304,8.203

0,1,2,3
Omnibus:,457.283,Durbin-Watson:,2.035
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4264.444
Skew:,0.666,Prob(JB):,0.0
Kurtosis:,9.55,Cond. No.,295.0


In [156]:
## Let's check the performance of our model

y_pred = results.predict(X_train_fe[feature_cols])
rmse = statsmodels.tools.eval_measures.rmse(y_train, y_pred)
print(f'The TRAINING RMSE is: {rmse}')

The TRAINING RMSE is: 0.5100976837091634


In [158]:
def mape(y, y_pred):
    y, y_pred = np.array(y), np.array(y_pred)
    return np.mean(np.abs((y - y_pred) / y)) * 100

In [160]:
## We apply feature_eng to the X_test set! 
X_test_fe = feature_eng(X_test)
X_test_fe = X_test_fe[feature_cols]


In [162]:

y_test_pred = results.predict(X_test_fe)
rmse = statsmodels.tools.eval_measures.rmse(y_test, y_test_pred)
print(rmse)

0.5568559807600197


In [76]:
print(f'The Training MAE is: {statsmodels.tools.eval_measures.meanabs(y_train, y_pred)}')
print(f'The Training MSE is: {statsmodels.tools.eval_measures.mse(y_train, y_pred)}')
print(f'The Training mean error is:{statsmodels.tools.eval_measures.bias(y_train, y_pred)}')
print(f'The Training Mape is: {round(mape(y_train, y_pred), 2)}% off the actual value')

The Training MAE is: 0.34445609749008704
The Training MSE is: 0.2601996469254537
The Training mean error is:-2.7702172482798842e-14
The Training Mape is: 0.54% off the actual value


In [78]:
# Create the comparison DataFrame
train_comparison = pd.DataFrame({
    'Actual': y_train,         # True target values
    'Predicted': y_pred  # Predicted values
})

# Display only the first 10 rows
train_comparison.sample(15)

Unnamed: 0,Actual,Predicted
2814,72.4,72.591837
1479,57.2,57.581244
2046,74.7,74.465256
218,71.9,72.088312
795,72.6,70.904272
2688,72.9,72.489149
745,82.2,82.291478
1163,78.8,79.907304
1763,75.2,75.158449
2276,67.3,66.86242


# Ethical function

### **Code Explanation**:

- **Dropping Region and Country**: Both **Region** and **Country** columns were dropped as they are non-numerical and not relevant for this analysis. Removing these features avoids potential bias and simplifies the model by focusing on numerical predictors.  
- **Adding a Constant**: A constant term was included in the regression model to account for the intercept.

### **Choice of Features**:
The selected features are entirely numerical and focus solely on mortality rates to minimize bias from external factors such as vaccination rates, country, or region:  
- **Year**  
- **Adult_mortality**  
- **Infant_deaths**

This selection provides a direct, unbiased approach to analyzing life expectancy trends based solely on death rates.

### **Results**:
- **RMSE (Training)**: 1.6  
- **RMSE (Test)**: 1.5  
- **R-squared**: 97.1%  
- **Condition Number**: 451  

**Conclusion**:  
The model achieves strong performance with high R-squared and low RMSE. However, the small gap between training and test RMSE suggests **overfitting**, meaning the model may perform less effectively on new, unseen data.

In [205]:
def feature_eng_e(df):
    
    # Create a copy of the dataframe to avoid altering the original
    df = df.copy() 

    # Drop Unnecessary Columns
    df= df.drop(['Country', 'Region'], axis = 1)
    df = sm.add_constant(df)                 # adding the constant
    return df

In [211]:
X_train_fe = feature_eng_e(X_train)

In [213]:
feature_cols =  [
 'const',
 'Infant_deaths',
 'Adult_mortality'
]

# Train the model and check its summary
# Create the model object
lin_reg = sm.OLS(y_train, X_train_fe[feature_cols])
# fit the model
results = lin_reg.fit() 
#calculate desired metrics
results.summary()

0,1,2,3
Dep. Variable:,Life_expectancy,R-squared:,0.971
Model:,OLS,Adj. R-squared:,0.971
Method:,Least Squares,F-statistic:,38470.0
Date:,"Mon, 09 Dec 2024",Prob (F-statistic):,0.0
Time:,16:44:44,Log-Likelihood:,-4324.6
No. Observations:,2291,AIC:,8655.0
Df Residuals:,2288,BIC:,8672.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,82.7258,0.066,1249.042,0.000,82.596,82.856
Infant_deaths,-0.1571,0.002,-78.607,0.000,-0.161,-0.153
Adult_mortality,-0.0474,0.000,-99.378,0.000,-0.048,-0.046

0,1,2,3
Omnibus:,0.205,Durbin-Watson:,2.044
Prob(Omnibus):,0.903,Jarque-Bera (JB):,0.163
Skew:,-0.017,Prob(JB):,0.922
Kurtosis:,3.025,Cond. No.,451.0


In [215]:
## Let's check the performance of our model

y_pred = results.predict(X_train_fe[feature_cols])
rmse = statsmodels.tools.eval_measures.rmse(y_train, y_pred)
print(f'The TRAINING RMSE for the Ethical Model is: {rmse}')

The TRAINING RMSE for the Ethical Model is: 1.5979415632915601


In [217]:
## We apply feature_eng to the X_test set! 
X_test_fe = feature_eng(X_test)
X_test_fe = X_test_fe[feature_cols]


In [219]:
y_test_pred = results.predict(X_test_fe)
rmse = statsmodels.tools.eval_measures.rmse(y_test, y_test_pred)
print(rmse)

1.6135714583846015


In [89]:
print(f'The Training MAE is: {statsmodels.tools.eval_measures.meanabs(y_train, y_pred)}')
print(f'The Training MSE is: {statsmodels.tools.eval_measures.mse(y_train, y_pred)}')
print(f'The Training mean error is:{statsmodels.tools.eval_measures.bias(y_train, y_pred)}')
print(f'The Training Mape is: {round(mape(y_train, y_pred), 2)}% off the actual value')

The Training MAE is: 1.2618436753553444
The Training MSE is: 2.5534172396946757
The Training mean error is:-2.0509904459107562e-14
The Training Mape is: 1.91% off the actual value


In [139]:
# Create the comparison DataFrame
train_comparison = pd.DataFrame({
    'Actual': y_train,         # True target values
    'Predicted': y_pred  # Predicted values
})

# Display only the first 10 rows
train_comparison.sample(15)

Unnamed: 0,Actual,Predicted
300,53.1,53.612137
1974,75.7,75.458601
2502,70.4,68.745453
1882,62.7,63.414069
2732,58.0,57.833023
1205,75.7,76.18954
111,80.8,79.080405
1203,74.2,73.806169
2185,80.6,78.855021
2484,66.6,67.933566


## Variance inflation factor

In [99]:
# Run a variance inflation factor (VIF) check to identify the degree of multicollinearity between features

cols =  [
 'const',
 'Year',
 'Adult_mortality',
 'illnesses',
 'mortality_rates',
 'Country_Albania', 'Country_Algeria', 'Country_Angola', 'Country_Antigua and Barbuda', 'Country_Argentina', 'Country_Armenia',
 'Country_Australia', 'Country_Austria', 'Country_Azerbaijan', 'Country_Bahamas, The', 'Country_Bahrain', 'Country_Bangladesh',
 'Country_Barbados', 'Country_Belarus', 'Country_Belgium', 'Country_Belize', 'Country_Benin', 'Country_Bhutan', 'Country_Bolivia',
 'Country_Bosnia and Herzegovina', 'Country_Botswana', 'Country_Brazil', 'Country_Brunei Darussalam', 'Country_Bulgaria',
 'Country_Burkina Faso', 'Country_Burundi', 'Country_Cabo Verde', 'Country_Cambodia', 'Country_Cameroon', 'Country_Canada',
 'Country_Central African Republic', 'Country_Chad', 'Country_Chile', 'Country_China', 'Country_Colombia', 'Country_Comoros',
 'Country_Congo, Dem. Rep.', 'Country_Congo, Rep.', 'Country_Costa Rica', "Country_Cote d'Ivoire", 'Country_Croatia', 'Country_Cuba',
 'Country_Cyprus', 'Country_Czechia', 'Country_Denmark', 'Country_Djibouti', 'Country_Dominican Republic', 'Country_Ecuador', 
 'Country_Egypt, Arab Rep.', 'Country_El Salvador', 'Country_Equatorial Guinea', 
 'Country_Eritrea', 'Country_Estonia', 'Country_Eswatini',
 'Country_Ethiopia', 'Country_Fiji', 'Country_Finland', 'Country_France', 'Country_Gabon', 'Country_Gambia, The', 'Country_Georgia',
 'Country_Germany', 'Country_Ghana', 'Country_Greece', 'Country_Grenada', 'Country_Guatemala', 'Country_Guinea', 'Country_Guinea-Bissau',
 'Country_Guyana', 'Country_Haiti', 'Country_Honduras', 'Country_Hungary', 'Country_Iceland', 'Country_India', 'Country_Indonesia',
 'Country_Iran, Islamic Rep.', 'Country_Iraq', 'Country_Ireland', 'Country_Israel', 'Country_Italy', 'Country_Jamaica', 'Country_Japan',
 'Country_Jordan', 'Country_Kazakhstan', 'Country_Kenya', 'Country_Kiribati', 'Country_Kuwait', 'Country_Kyrgyz Republic', 'Country_Lao PDR',
 'Country_Latvia', 'Country_Lebanon', 'Country_Lesotho', 'Country_Liberia', 'Country_Libya', 'Country_Lithuania', 'Country_Luxembourg',
 'Country_Madagascar', 'Country_Malawi', 'Country_Malaysia', 'Country_Maldives', 'Country_Mali', 'Country_Malta', 'Country_Mauritania',
 'Country_Mauritius', 'Country_Mexico', 'Country_Micronesia, Fed. Sts.', 'Country_Moldova', 'Country_Mongolia', 'Country_Montenegro',
 'Country_Morocco', 'Country_Mozambique', 'Country_Myanmar', 'Country_Namibia', 'Country_Nepal', 'Country_Netherlands', 'Country_New Zealand',
 'Country_Nicaragua', 'Country_Niger', 'Country_Nigeria', 'Country_North Macedonia', 'Country_Norway', 'Country_Oman', 'Country_Pakistan',
 'Country_Panama', 'Country_Papua New Guinea', 'Country_Paraguay', 'Country_Peru', 'Country_Philippines', 'Country_Poland', 'Country_Portugal',
 'Country_Qatar', 'Country_Romania', 'Country_Russian Federation', 'Country_Rwanda', 
    'Country_Samoa', 'Country_Sao Tome and Principe',
 'Country_Saudi Arabia', 'Country_Senegal', 'Country_Serbia', 'Country_Seychelles', 'Country_Sierra Leone', 'Country_Singapore', 'Country_Slovak Republic',
 'Country_Slovenia', 'Country_Solomon Islands', 'Country_Somalia', 'Country_South Africa', 'Country_Spain', 'Country_Sri Lanka', 'Country_St. Lucia',
 'Country_St. Vincent and the Grenadines', 'Country_Suriname', 'Country_Sweden', 'Country_Switzerland', 'Country_Syrian Arab Republic', 
 'Country_Tajikistan', 'Country_Tanzania', 'Country_Thailand', 'Country_Timor-Leste', 'Country_Togo', 'Country_Tonga', 'Country_Trinidad and Tobago',
 'Country_Tunisia', 'Country_Turkiye', 'Country_Turkmenistan', 'Country_Uganda', 'Country_Ukraine', 'Country_United Arab Emirates', 
 'Country_United Kingdom', 'Country_United States', 'Country_Uruguay', 'Country_Uzbekistan', 'Country_Vanuatu', 'Country_Venezuela, RB', 
 'Country_Vietnam',  'Country_Yemen, Rep.', 'Country_Zambia', 'Country_Zimbabwe'
]

## We can create an indexed list (a series) where we list the VIF of each of the columns. Note the use of '.shape' in the second part of the loop
VIF = pd.Series([variance_inflation_factor(X_train_fe[cols].values, i) for i in range(X_train_fe[cols].shape[1])], index = X_train_fe[cols].columns)


In [100]:
VIF.sort_values(ascending=False).head(20)

const                   206.305313
Adult_mortality          36.064632
mortality_rates          12.647653
illnesses                 6.958947
Country_Zimbabwe          3.739019
Country_Lesotho           3.596560
Country_Eswatini          3.382141
Country_Sierra Leone      2.912591
Country_Botswana          2.881096
Country_Namibia           2.790616
Country_Malawi            2.707909
Country_South Africa      2.687476
Country_Cyprus            2.661745
Country_Kuwait            2.647322
Country_Italy             2.645833
Country_Singapore         2.632978
Country_Bahrain           2.631282
Country_Qatar             2.625373
Country_Iceland           2.614046
Country_Spain             2.609485
dtype: float64

## Creating Coefficients csv
This Python script performs the following tasks:

1. **Retrieve Model Coefficients**:
2. **Save Coefficients to a CSV File**:
   - The DataFrame is saved as a CSV file named `feature_coefficients_ethical.csv`.
   - The `index=False` parameter ensures the index column is not included in the CSV file.


In [126]:
# coefficients for robust data
scaler = StandardScaler()
joblib.dump(scaler, "scaler.pkl")
# Save the trained model to a file
joblib.dump(results, "results.pkl")
coefficients = results.params
print(coefficients)
coefficients_df = results.params.reset_index()
coefficients_df.columns = ['Feature_cols', 'Coefficient']
coefficients_df.to_csv('feature_coefficients.csv', index=False)
print("Coefficients saved to feature_coefficients.csv")

const                    64.498370
Year                      0.600948
Adult_mortality          -4.797702
illnesses                 0.271556
mortality_rates          -1.202238
                           ...    
Country_Venezuela, RB     6.121338
Country_Vietnam           6.722470
Country_Yemen, Rep.       2.218612
Country_Zambia           -0.006606
Country_Zimbabwe          0.763254
Length: 183, dtype: float64
Coefficients saved to feature_coefficients.csv


In [132]:
# coefficients for ethical model
scaler = StandardScaler()
joblib.dump(scaler, "scaler.pkl")
# Save the trained model to a file
joblib.dump(results, "results.pkl")
coefficients = results.params
print(coefficients)
coefficients_df = results.params.reset_index()
coefficients_df.columns = ['Feature_cols', 'Coefficient']
coefficients_df.to_csv('feature_coefficients_ethical.csv', index=False)
print("Coefficients saved to feature_coefficients_ethical.csv")

const              68.821694
Infant_deaths      -4.317817
Adult_mortality    -5.458750
dtype: float64
Coefficients saved to feature_coefficients_ethical.csv
