# <font color="red"> <div align="center"> REGRESSION PROJECT  
    
## <div align="center"> Life Expectancy Data


## Contents
1.  Introduction
2.  The Aim of Analysis
3.  General Information of the Data
4.  Data Exploration
     * 4.1. Importing an External Data Frame
     * 4.2. Merging Two Data Frame in One
5.  Cleaning of Row Data 
6.  Filling of the Row Data 
7.  General Looking on Life Expectancy Values Based on Regions and Years
8.  Overview about Outliers 
     * 8.1 Winsorization
9.  Feature Engineering
     * 9.1 Getting PCA Values
     * 9.2 Getting PCA Values for all Elements by Switching Variables to Dummies*
10.  Building Models
     * 10.1 Building Model with All Numerical Variables
      * 10.1. a)Residual Distributions on the Model
      * 10.1. b)Jarque Bera Test
     * 10.2 Adding Polinomial Features
     * 10.3 Building Polinomial Regression Models
      * 10.3 a)Checking the Best Polinomial Degree
      * 10.3 b)Checking the Performance of Models within Polinomial Degree
     * 10.4 Building Ridge Regression Models
     * 10.5 Building Lasso Regression Models
     * 10.6 Building ElasticNet Regression Models
11. Evaluating the Model
12. Predicting with the Best Model
13. Conclusions

### 0. Importing Packages

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model as lm
import statsmodels.api as sm

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

from statsmodels.tools.eval_measures import mse, rmse

import warnings
warnings.filterwarnings(action= "ignore")

In [None]:
from matplotlib import style
style.use('fivethirtyeight')

# <div align="center">  **1. Introduction**

### <font color="gray"> **Provided Information about The preparation of Row Data:**  The Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries The datasets are made available to public for the purpose of health data analysis. The dataset related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. Among all categories of health-related factors only those critical factors were chosen which are more representative. It has been observed that in the past 15 years , there has been a huge development in health sector resulting in improvement of human mortality rates especially in the developing nations in comparison to the past 30 years. Therefore, in this project we have considered data from year 2000-2015 for 193 countries for further analysis. The individual data files have been merged together into a single dataset.

### <font color="gray">On initial visual inspection of the data showed some missing values. As the datasets were from WHO,  None of evident errors are found. It is approved by puplishers that missing data was handled in R software by using Missmap command. The result indicated that most of the missing data was for population, Hepatitis B and GDP. The missing data were from less known countries like Vanuatu, Tonga, Togo,Cabo Verde etc. Finding all data for these countries was difficult and hence, it was decided that we exclude these countries from the final model dataset. The final merged file (final dataset) consists of 22 Columns and 2938 rows which meant 20 predicting variables. All predicting variables was then divided into several broad categories:Immunization related factors, Mortality factors, Economical factors and Social factors.

### <font color="gray"> In order to generate our regression models, I preferred  to merged an external data set to check  values based on  regions, sub-regions and countries to have deeper view on data. It also helped me to filled missing values accurately by using 'Sub-Region' values.

### <font color="gray"> **The preparation on Observations before Machine Learning:** Missing values were filled by interpolate method firstly, but the rest was filled grouping by 'Sub-Region' and 'Year' columns. 

# <div align="center"> **2. The Aim of Analysis**

### <font color="gray"> This study aims to search for the elements which effects life expectancy by using statistical tools such as MSE, R squared, RMSE, ect. on different regression models.

# <div align="center">  **3. General Information of the Data**

<font color="gray">Country : Country
 
Year : Year 

Status : Developed or Developing status

Life expectancy : Life Expectancy in age

Adult Mortality : Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)

infant deaths : Number of Infant Deaths per 1000 population


Alcohol          : Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol)

Percentage        :  Expenditure on health as a percentage of Gross Domestic Product per capita(%)

Hepatitis B        : Hepatitis B (HepB) immunization coverage among 1-year-olds (%)

Measles : Measles - number of reported cases per 1000 population

BMI : Average Body Mass Index of entire population

Under-five deaths : Number of under-five deaths per 1000 population

Polio : Polio (Pol3) immunization coverage among 1-year-olds (%)

Total expenditure : General government expenditure on health as a percentage of total government expenditure (%)

Diphtheria :  Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)

HIV/AIDS : Deaths per 1 000 live births HIV/AIDS (0-4 years)

GDP : Gross Domestic Product per capita (in USD)

Population : Population of the country

Thinness 1-19 years : Prevalence of thinness among children and adolescents for Age 10 to 19 (% )

Income composition of resources : Human Development Index in terms of income composition of resources (index ranging from 0 to 1)

Thinness 5-9 years   : Prevalence of thinness among children for Age 5 to 9(%)

Schooling : Number of years of Schooling(years)

# <div align="center"> **4. Data Exploration**

#### ***Getting Data***

In [None]:
from subprocess import check_output

print(check_output(["ls", "../input/allcsv"]).decode("utf8"))

In [None]:
LifeExpectancyData = pd.read_csv('../input/life-expectancy-who/led.csv')
regions = pd.read_csv('../input/allcsv/all.csv')

#### ***First 5 rows***

In [None]:
LifeExpectancyData.head()

#### ***About data***

In [None]:
LifeExpectancyData.info()

####  ***Looking null values***

In [None]:
LifeExpectancyData.isnull().sum()

#### ***Checking for column names for further steps***

In [None]:
LifeExpectancyData.columns 

#### ***Manipulating column names for future steps***

In [None]:
LifeExpectancyData.columns= ['Country', 'Year', 'Status', 'Life_Expectancy', 'Adult_Mortality',
       'infant_deaths', 'Alcohol', 'percentage_expenditure', 'Hepatitis_B',
       'Measles', 'BMI', 'under_five_deaths', 'Polio', 'Total_Expenditure',
       'Diphtheria', 'HIV/AIDS', 'GDP','Population', 'thinness_1_19_years', 'thinness_5_9_years',
       'Income_composition_of_resources', 'Schooling']

#### ***Looking NAN values with heatmap***

In [None]:
total_missing_values = LifeExpectancyData.isnull().sum()
missing_values_per = LifeExpectancyData.isnull().sum()/LifeExpectancyData.isnull().count()
null_values = pd.concat([total_missing_values, missing_values_per], axis=1, keys=['total_null', 'total_null_perc'])
null_values = null_values.sort_values('total_null', ascending=False)

In [None]:
def null_cell(LifeExpectancyData):
    total_missing_values = LifeExpectancyData.isnull().sum()
    missing_values_per = LifeExpectancyData.isnull().sum()/LifeExpectancyData.isnull().count()
    null_values = pd.concat([total_missing_values, missing_values_per], axis=1, keys=['total_null', 'total_null_perc'])
    null_values = null_values.sort_values('total_null', ascending=False)
    return null_values[null_values['total_null'] > 0]

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(LifeExpectancyData.isnull(), cmap='viridis')

## **4.1. Importing an External Data Frame**

<font color="green"> ***Further steps requires extra tools to compare such as regions and sub regions. Lack of those information does not help us to group data on necessary areas. Therefore, I imported an external data frame with only neccessary columns.***

#### ***Getting new dataset***

In [None]:
#regions = pd.read_csv('./data/all.csv')

#### ***Looking new dataset - first 5 rows***

In [None]:
regions.head()

#### ***Checking NULL values***

In [None]:
regions[['name', 'region', 'sub-region']].isnull().sum()

#### ***Last check on column names***

In [None]:
regions.columns

## **4.2. Merging two data frame in one, LifeExpectancyData_merged, will help us to fill in accurately.**

#### ***Merging datasets***

In [None]:
LifeExpectancyData_merged = pd.merge(LifeExpectancyData, regions[['name', 'region', 'sub-region']],
                                     left_on='Country', right_on='name')

#### ***Looking at NAN values***

In [None]:
null_cell(LifeExpectancyData_merged)

#### ***Checking merged dataset - first 5 rows***

In [None]:
LifeExpectancyData_merged.head()

# <div align="center"> **5. Cleaning of the Row Data**

#### <font color="green">***We have a lot of missing population values in many countries. However, having GDP values from population for each country can help us as well. We also have status (Developed or Developing) for each country. Therefore, I preferred to drop column from data frame.***

#### ***Dropping Population column***

In [None]:
LifeExpectancyData_merged.drop('Population', inplace=True, axis=1)

#### ***Looking at columns of the new merged dataset***

In [None]:
LifeExpectancyData_merged.columns

#### ***Getting NAN values from index***

In [None]:
fill_list = (null_cell(LifeExpectancyData_merged)).index

# <div align="center"> **6. Filling of the Row Data**

#### ***Filling NAN values with interpolate method with both option as having values for some rows in each countries***

In [None]:
df_interpolate = LifeExpectancyData_merged.copy()

for col in fill_list:
    df_interpolate[col] = df_interpolate.groupby(['Country'])[col].transform(lambda x: x.interpolate(limit_direction = 'both'))

#### ***Checking NAN values after interpolate***

In [None]:
null_cell(df_interpolate)

In [None]:
df_interpolate[df_interpolate['Adult_Mortality'].isna()]

<font color="green">  ***Applying interpolate method on both direction with grouping by Country, does not help on missing values. It only helped to decrease number of missing values at once.
On those rows, there is no previous information for relevant countries. Thus, I used interpolte method with grouping by sub-region and Year.***

In [None]:
for col in fill_list:
    df_interpolate[col] = df_interpolate.groupby(['sub-region', 'Year'])[col].transform(lambda x: x.interpolate(limit_direction='both'))

#### ***Checking for NAN values***

In [None]:
null_cell(df_interpolate)

#### <font color="green"> ***Now data is ready for further steps.***

####  ***Getting numeric values for mathematical and statistical operations.***

In [None]:
LifeExpectancyData_num = df_interpolate._get_numeric_data() 

#### ***Correlations Between All Variables.***

In [None]:
corr_matrix = LifeExpectancyData_num.corr()
corr_list = corr_matrix.Life_Expectancy.abs().sort_values(ascending=False).index[1:]

In [None]:
corr_list

In [None]:
plt.figure(figsize=(15,15))
sns.heatmap(corr_matrix, annot=True, cmap='RdBu_r')
plt.title('Correlation Matrix')

<font color="green">   ***As we see above 'Income_composition_of_resources' and 'Schooling' have high correlation, while  'Adult_Mortality' has high negative correlation between Life Expectancy.***


***'HIV/AIDS', 'BMI', 'Diphtheria', 'thinness_1_19_years', 'thinness_5_9_years', 'Polio', 'GDP', and 'Alcohol' have medium correlation between Life Expectancy.***

***And the rest of our columns; 'percentage_expenditure', ’Hepatitis_B', 'Total_Expenditure', 'under_five_deaths', 'infant_deaths', 'Year', and 'Measles' have low correlation between Life Expectancy.***

#### ***Corellations between illnesses***

In [None]:
corr_matrix = LifeExpectancyData_num[['Hepatitis_B','Measles', 'Polio','Diphtheria','HIV/AIDS', 'thinness_1_19_years',
                                      'thinness_5_9_years','Life_Expectancy']].corr()
corr_list = corr_matrix.Life_Expectancy.abs().sort_values(ascending=False).index[1:]

In [None]:
corr_list

In [None]:
plt.figure(figsize=(10,10))
sns.heatmap(corr_matrix, annot=True, cmap='RdBu_r')
plt.title('Correlation Matrix')

<font color="green">   ***Life Expectancy has high negative correlation with HIV/AIDS when only consedering correlations based on sicknesses***
    
<font color="green">   ***It also has medium correlation with thinness_1_19_years and thinness_5_9_years, Diphtheria and Polio.***
    
<font color="green">   ***Life Expectancy has low correlation with Hepatitis_B and Measles.***


###  <div align="center"> 7. General Looking on Life Expectancy Values Based on Regions and Years 

### General Looking on Life Expectancy in Years

In [None]:
plt.figure(figsize=(20,10))
sns.violinplot(x=df_interpolate["Year"], y=df_interpolate["Life_Expectancy"], data=df_interpolate)
plt.title('General Looking on Life Expectancy in Years')

<font color="green"> ***As we see on the violin graph, general Life Expectancy value is decreasing after 2010 till 2014. Let’s have a look more detailed.***

### **Life Expectancy Values in Years by Regions**

In [None]:
plt.figure(figsize=(20,10))
sns.violinplot(x=df_interpolate.loc[df_interpolate['Year']>2009]["Year"], 
               y=df_interpolate["Life_Expectancy"],
               hue=df_interpolate["region"], 
               data=df_interpolate.loc[df_interpolate['Year']>2010], 
               palette="muted")

plt.title('Life Expectancy Values in Years by Regions')

### <font color="green"> ***People in Africa and Asia regions have a stable Life Expectancy in general while Ocenania and Europe regions have decreasing trend between 2010-2014.***

### General View in Life Expectancy by Grouping Countries with GDP Values Based on Regions

In [None]:
plt.figure(figsize=(20,10))
sns.scatterplot(x='Life_Expectancy', 
                y='Alcohol', 
                hue='region',
                data=df_interpolate, 
                s=df_interpolate.GDP/100);
plt.xlabel('Life_expectancy',size=15)
plt.ylabel('Alcohol', size =10)
plt.show()

# <div align="center"> 8. Overview about Outliers and Dealing with Them

In [None]:
plt.rcParams['figure.dpi'] = 60
plt.rcParams['figure.figsize'] = (8,5.5)

In [None]:
outliers_by_nineteen_variables = ['Year', 'Life_Expectancy','Adult_Mortality', 'infant_deaths', 'Alcohol', 'percentage_expenditure',
                                    'Hepatitis_B','Measles', 'BMI',
                                    'under_five_deaths', 'Polio', 'Total_Expenditure','Diphtheria', 'HIV/AIDS', 'GDP',
                                    'thinness_1_19_years', 'thinness_5_9_years', 'Income_composition_of_resources', 'Schooling'] 
plt.figure(figsize=(25,25))

for i in range(0,19):
    plt.subplot(5, 4, i+1)
    plt.boxplot(df_interpolate[outliers_by_nineteen_variables[i]])
    plt.title(outliers_by_nineteen_variables[i])

##  **8.1 Winsorization**

In [None]:
from scipy.stats.mstats import winsorize

#### ***Finding best limit for Winsorize for Each Variables***

In [None]:
def winsor(x, multiplier=3): 
    upper= x.median() + x.std()*multiplier
    for limit in np.arange(0.001, 0.20, 0.001):
        if np.max(winsorize(x,(0,limit))) < upper:
            return limit
    return None 

In [None]:
#An example to get limit value for winsorization
limit= winsor(df_interpolate['infant_deaths'])
print(limit)

In [None]:
df_interpolate["Adult_Mortality"]        = winsorize(df_interpolate["Adult_Mortality"], (0, 0.018))
df_interpolate["infant_deaths"]          = winsorize(df_interpolate["infant_deaths"], (0, 0.018))
df_interpolate["percentage_expenditure"] = winsorize(df_interpolate["percentage_expenditure"], (0, 0.036))
df_interpolate["Hepatitis_B"]            = winsorize(df_interpolate["Hepatitis_B"], (0,0.001))
df_interpolate["Measles"]                = winsorize(df_interpolate["Measles"], (0, 0.018))
df_interpolate["under_five_deaths"]      = winsorize(df_interpolate["under_five_deaths"], (0, 0.013))
df_interpolate["Polio"]                  = winsorize(df_interpolate["Polio"], (0, 0.001))
df_interpolate["Total_Expenditure"]      = winsorize(df_interpolate["Total_Expenditure"], (0, 0.011))
df_interpolate["Diphtheria"]             = winsorize(df_interpolate["Diphtheria"], (0, 0.001))
df_interpolate["HIV/AIDS"]               = winsorize(df_interpolate["HIV/AIDS"], (0, 0.030))
df_interpolate["GDP"]                    = winsorize(df_interpolate["GDP"], (0, 0.43))
df_interpolate["thinness_1_19_years"]    = winsorize(df_interpolate["thinness_1_19_years"], (0, 0.026))
df_interpolate["thinness_5_9_years"]     = winsorize(df_interpolate["thinness_5_9_years"], (0, 0.27))
df_interpolate["Income_composition_of_resources"] = winsorize(df_interpolate["Income_composition_of_resources"], (0, 0.001))
df_interpolate["Schooling"]              = winsorize(df_interpolate["Schooling"], (0, 0.001))


# <div align="center"> 9. Feature Engineering

## **9.1 PCA Results with only numeric variables**

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

#### ***Getting PCA Model***

In [None]:
LifeExpectancyData_num = df_interpolate._get_numeric_data() 

In [None]:
LifeExpectancyData_num = LifeExpectancyData_num.dropna()

X = StandardScaler().fit_transform(LifeExpectancyData_num) #standardize the feature matrix

pca = PCA(n_components=0.90, whiten=True)

X_pca = pca.fit_transform(X)

#### ***Looking explained variance ratios***

In [None]:
print (pca.explained_variance_ratio_)

#### ***Looking at results***

In [None]:
print('Original Number of Features', X.shape[1]) 
print('Reduced Number of Features',X_pca.shape[1])

In [None]:
#Creating a scaler object
sc = StandardScaler()

#fit the scaler to the features and transform
X_std = sc.fit_transform(X)

# Fit the PCA and transform the data
X_std_pca = pca.fit_transform(X_std)

# View the new feature data's shape
X_std_pca.shape

In [None]:
from sklearn.decomposition import PCA
from sklearn import decomposition, datasets

In [None]:
#Creating a PCA object with 12 components as a parameter
pca = decomposition.PCA(n_components=12) 
# Fit the PCA and transform the data
X_std_pca = pca.fit_transform(X_std)

# View the new feature data's shape
X_std_pca.shape

In [None]:
plt.figure(figsize = (10,5))
plt.plot(pca.explained_variance_ratio_)
plt.title('Total variance explained: {}'.format(pca.explained_variance_ratio_.sum()))
plt.show()

### <font color="green"> At the further steps, I will search for the best model based on number of variables. This PCS formula above is just an example to get results quickly. I would rather check the best model with MSE and another related values on different regression models in this project. 

## **9.2 Getting PCA Values for all Elements by Switching Variables to Dummies**

In [None]:
df_interpolate.info()

In [None]:
df_dummies = pd.get_dummies(df_interpolate)
df_dummies.head()

###  PCA Results with all features with dummies

In [None]:
df_dummies = df_dummies.dropna()

X = StandardScaler().fit_transform(df_dummies)#standardize the feature matrix

pca = PCA(n_components=0.95, whiten=True)

X_pca = pca.fit_transform(X)

In [None]:
print('Original Number of Features', X.shape[1]) 
print('Reduced Number of Features',X_pca.shape[1])

In [None]:
#Creating a scaler object
sc = StandardScaler()

#fit the scaler to the features and transform
X_std = sc.fit_transform(X)

In [None]:
#Creating a PCA object with 178 components as a parameter
pca = decomposition.PCA(n_components=178) 
# Fit the PCA and transform the data
X_std_pca = pca.fit_transform(X_std)

# View the new feature data's shape
X_std_pca.shape

In [None]:
plt.figure(figsize = (10,5))
plt.plot(pca.explained_variance_ratio_)
plt.title('Total variance explained: {}'.format(pca.explained_variance_ratio_.sum()))
plt.show()

PCA with 178 variables can explain of 95% of total variance.

# <div align="center"> 10. Building Regression Models

## **10.1 Building Model with All Numerical Variables**

In [None]:
y_allValues = LifeExpectancyData_num['Life_Expectancy']
X_allValues = LifeExpectancyData_num[corr_list]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_allValues, y_allValues, test_size = 0.2, random_state = 101)

print(" Observations in Training Group : {}".format(X_train.shape[0]))
print(" Observations in Test Group     : {}".format(X_test.shape[0]))

<font color="green"> ***We're splitting the data in two, so out of 100 rows, 80 rows will go into the training set, and 20 rows will go into the testing set.***

In [None]:
X_train = sm.add_constant(X_train)

Model_all = sm.OLS(y_train, X_train).fit()

Model_all.summary()

In [None]:
pValue = Model_all.pvalues
significant_values = list(pValue[pValue<= 0.05].index)

### 10.1. a)Residual Distributions on the Model

In [None]:
from sklearn import linear_model

In [None]:
Model_all = linear_model.LinearRegression()
Model_all.fit(X_allValues, y_allValues)

In [None]:
pred = Model_all.predict(X_allValues)
Residuals = y_allValues - pred

In [None]:
from statsmodels.tsa.stattools import acf

acf_data = acf(Residuals)

plt.figure(figsize=(9,6))
plt.plot(acf_data[1:])
plt.show()

In [None]:
rand_nums = np.random.normal(np.mean(Residuals), np.std(Residuals), len(Residuals))

plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.scatter(np.sort(rand_nums), np.sort(Residuals))
plt.xlabel("Normally Distributed Random Variable")
plt.ylabel("Residuals")
plt.title("QQ Plot")

plt.subplot(1,2,2)
plt.hist(Residuals)
plt.xlabel("Residuals")
plt.title("Residuals Histogram")

plt.tight_layout()
plt.show()

### 10.1. b) Jarque Bera Test

In [None]:
from scipy.stats import jarque_bera
from scipy.stats import normaltest

In [None]:
jb_stats = jarque_bera(Residuals)
norm_stats = normaltest(Residuals)

print("Jarque-Bera test value : {0} ve p değeri : {1}".format(jb_stats[0], jb_stats[1]))
print("Normal test value      : {0}  ve p değeri : {1:.30f}".format(norm_stats[0], norm_stats[1]))

<font color="green"> ***Jarque Bera shows us that residuals distributed normally.***

## **10.2 Adding Polinomial Features**

In [None]:
df = LifeExpectancyData_num.drop(["Life_Expectancy", "Year"], axis=1)

In [None]:
df.shape

## **10.3 Building Polinomial Regression Models**

**10.3 a)Checking the Performance of Models within Polynomial  Degree**

In [None]:
from sklearn.preprocessing import PolynomialFeatures 

In [None]:
def polynomial(df,pol):
    poly = PolynomialFeatures(pol)
    poly_array = poly.fit_transform(df.drop('Life_Expectancy', axis=1))
    df_dropped = df.drop('Life_Expectancy', axis=1)
    df_pol = pd.DataFrame(poly_array, columns= poly.get_feature_names(df_dropped.columns))
    df_pol = pd.concat([df_pol, df['Life_Expectancy']], axis=1)
    Feature_list = df_pol.corr()['Life_Expectancy'].abs().sort_values(ascending = False)[1:].index
    return pd.concat([df_pol[Feature_list], df['Life_Expectancy']], axis=1)

In [None]:
df_pol1 = polynomial(LifeExpectancyData_num,1)

In [None]:
def model_pol(df,pol):
    y = df['Life_Expectancy']
    Feature_list = Feature_list = df.columns[:500] #Having overfitting after 200 variables I prefer to limit until 500
    MSE_list_test=[]
    R_list=[]
    number_of_variables=[]
    MAE_list=[]
    RMSE_list=[]
    MAPE_list=[]
    R_train_list=[]
    MSE_train_list=[]
    adj_R_test=[]
    adj_R_train=[]
    for variable in range(1,len(Feature_list)-1, pol**pol*2):
        selected_features =  Feature_list[:(-1*variable)]
        X_poly=df[selected_features]
        X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size = 0.2, random_state = 0)
        
        
        model_poly = LinearRegression()
        results = model_poly.fit(X_train, y_train)
        y_pred  = model_poly.predict(X_test)
        y_pred_train = model_poly.predict(X_train)

        MSE_list_test.append(mse(y_test, y_pred))
        MSE_train_list.append(mse(y_train, y_pred_train))

        R_list.append(model_poly.score(X_test, y_test))
        adj_R_test.append(1 - (1-model_poly.score(X_test, y_test))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
        
        R_train_list.append(model_poly.score(X_train, y_train))
        adj_R_train.append(1 - (1-model_poly.score(X_train, y_train))*(len(y_train)-1)/(len(y_train)-X_train.shape[1]-1))


        number_of_variables.append(len(selected_features))

        MAE_list.append(mean_absolute_error(y_test, y_pred))

        RMSE_list.append(rmse(y_test, y_pred))

        MAPE_list.append(np.mean(np.abs((y_test-y_pred) / y_test)) * 100)
        
    model_means = list(zip(number_of_variables, R_list,R_train_list,MSE_list_test,MSE_train_list,MAE_list,RMSE_list,MAPE_list,adj_R_test,adj_R_train))
    poly_means = pd.DataFrame(model_means, columns= ['number_of_variables','R_list','R_train_list',
                                                            'MSE_list_test','MSE_train_list','MAE_list', 'RMSE_list', 'MAPE_list','adj_R_test', 'adj_R_train'])
    
    return poly_means

In [None]:
df_poly_transform1 = polynomial(LifeExpectancyData_num,1)
df_pol1 = model_pol(df_poly_transform1,2)

In [None]:
df_poly_transform2 = polynomial(LifeExpectancyData_num,2)
df_pol2 = model_pol(df_poly_transform2,2)

In [None]:
#%%time #checking total time of process in Pyhton
df_poly_transform3 = polynomial(LifeExpectancyData_num,3)
df_pol3 = model_pol(df_poly_transform3,3)

#### Displaying 3 polynomial models with data frames 

In [None]:
display(df_pol1.sort_values(by='MSE_list_test').head())
display(df_pol2.sort_values(by='MSE_list_test').head())
display(df_pol3.sort_values(by='MSE_list_test').head())

In [None]:
plt.figure(1, figsize = (25,10))
plt.suptitle('MSE TEST TRAIN VALUES', size=20)



plt.subplot(1,3,1)
plt.plot(df_pol1.number_of_variables,df_pol1.MSE_list_test, label  = 'MSE Values', color='blue', linewidth=5)
plt.plot(df_pol1.number_of_variables,df_pol1.MSE_train_list, label = 'MSE_train Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values ')
plt.title('POL 1 MSE Test/Train Values')
plt.ylim(0,30)
plt.legend()

plt.subplot(1,3,2)
plt.plot(df_pol2.number_of_variables, df_pol2.MSE_list_test, label = 'MSE Values', color='blue', linewidth=5)
plt.plot(df_pol2.number_of_variables, df_pol2.MSE_train_list,label = 'MSE_train Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values ')
plt.ylim(0,30)
plt.title('POL 2 MSE Test/Train Values')
plt.legend()

plt.subplot(1,3,3)
plt.plot(df_pol3.number_of_variables, df_pol3.MSE_list_test, label = 'MSE Values', color='blue', linewidth=5)
plt.plot(df_pol3.number_of_variables, df_pol3.MSE_train_list,label = 'MSE_train Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values ')
plt.ylim(0,30)
plt.title('POL 3 MSE Test/Train Values')


plt.subplots_adjust()
plt.legend()
plt.show()



In [None]:
plt.figure(figsize=(15,10))
objects = ('df_pol1', 'df_pol2', 'df_pol3')

y_pos = np.arange(len(objects)) 
performance  =[df_pol1.MSE_list_test.min() ,df_pol2.MSE_list_test.min(), df_pol3.MSE_list_test.min()]
performance2 =[df_pol1.MSE_train_list.min(), df_pol2.MSE_train_list.min(), df_pol3.MSE_train_list.min()]

plt.subplot(121)
plt.bar(y_pos, performance, align='center')
plt.xticks(y_pos, objects,size=10)
plt.xlabel('Model',size=10)
plt.ylabel('MSE Values',size=10)
plt.title('MSE TEST Values \n', fontsize=10)

plt.subplot(122)
plt.bar(y_pos, performance2, align='center')
plt.xticks(y_pos, objects,size=10)
plt.title('MSE TRAIN Values \n', size = 10)


plt.xlabel('Model',size=10)
plt.ylabel('MSE Values',size=10)

plt.show()



In [None]:
plt.figure(figsize=(15,10))
objects = ('df_pol1', 'df_pol2', 'df_pol3')

y_pos = np.arange(len(objects)) 
performance  =[df_pol1.adj_R_test.max() ,df_pol2.adj_R_test.max(), df_pol3.adj_R_test.max()]
performance2 =[df_pol1.adj_R_train.max(), df_pol2.adj_R_train.max(), df_pol3.adj_R_train.max()]

plt.subplot(121)
plt.bar(y_pos, performance, align='center')
plt.xticks(y_pos, objects,size=10)
plt.xlabel('Model',size=10)
plt.ylabel('Adj R Squared Test Values',size=10)
plt.title('Adj R Squared Test Values \n', fontsize=10)

plt.subplot(122)
plt.bar(y_pos, performance2, align='center')
plt.xticks(y_pos, objects,size=10)
plt.ylabel('Adj R Squared Train Values',size=10)
plt.title('Adj R Squared Train Values \n', size = 10)
plt.xlabel('Model',size=10)


plt.show()

# **Visualization the three category based models**

In [None]:
df = LifeExpectancyData_num.drop(["Life_Expectancy", "Year"], axis=1)

In [None]:
poly = PolynomialFeatures(2)
poly_array = poly.fit_transform(df)

In [None]:
df_poly2 = pd.DataFrame(poly_array, columns= poly.get_feature_names())

In [None]:
y = LifeExpectancyData_num['Life_Expectancy']
X = df_poly2

X_train_pol2, X_test_pol2, y_train_pol2, y_test_pol2 = train_test_split(X, y, test_size = 0.2, random_state = 101)

print("Eğitim kümesindeki gözlem sayısı : {}".format(X_train.shape[0]))
print("Test kümesindeki gözlem sayısı   : {}".format(X_test.shape[0]))

X_train = sm.add_constant(X_train)

poly_model_2 = sm.OLS(y_train_pol2, X_train_pol2).fit()
y_preds_pol2 = poly_model_2.predict(X_test_pol2)
y_preds_train_pol2 = poly_model_2.predict(X_train_pol2)

In [None]:
poly = PolynomialFeatures(3)
poly_array = poly.fit_transform(df)
df_poly3 = pd.DataFrame(poly_array, columns= poly.get_feature_names())

y = LifeExpectancyData_num['Life_Expectancy']
X = df_poly3

X_train_pol3, X_test_pol3, y_train_pol3, y_test_pol3 = train_test_split(X, y, test_size = 0.2, random_state = 101)

print("Observations in Train Group : {}".format(X_train.shape[0]))
print("Observations in Test Group  : {}".format(X_test.shape[0]))

X_train = sm.add_constant(X_train)

poly_model_3 = sm.OLS(y_train_pol3, X_train_pol3).fit()
y_preds_pol3 = poly_model_3.predict(X_test_pol3)
y_preds_train_pol3 = poly_model_3.predict(X_train_pol3)

In [None]:
poly = PolynomialFeatures(1)
poly_array = poly.fit_transform(df)
df_poly1 = pd.DataFrame(poly_array, columns= poly.get_feature_names())

y = LifeExpectancyData_num['Life_Expectancy']
X = df_poly1

X_train_pol1, X_test_pol1, y_train_pol1, y_test_pol1 = train_test_split(X, y, test_size = 0.2, random_state = 101)

print("Observations in Train Group : {}".format(X_train.shape[0]))
print("Observations in Test Group  : {}".format(X_test.shape[0]))

X_train = sm.add_constant(X_train)

poly_model_1 = sm.OLS(y_train_pol1, X_train_pol1).fit()
y_preds_pol1 = poly_model_1.predict(X_test_pol1)
y_preds_train_pol1 = poly_model_1.predict(X_train_pol1)


In [None]:
plt.figure(figsize=(18,8))
plt.suptitle('Scatter Plots of Life Expectancy Predictions', size = 16)

plt.subplot(1,3,1)
plt.title('Poly 1 Model \n', size = 14)
plt.scatter(y_test_pol1, y_preds_pol1)
plt.scatter(y_train_pol1, y_preds_train_pol1,alpha=0.10)
plt.plot(y_test_pol1, y_test_pol1, color="red")
plt.ylim(0,90)
plt.xlabel("True Values")
plt.ylabel("Predictions")

plt.subplot(1,3,2)
plt.title('Poly 2 Model \n', size = 14)
plt.scatter(y_test_pol2, y_preds_pol2 )
plt.scatter(y_train_pol2, y_preds_train_pol2,alpha=0.10)
plt.plot(y_test_pol2, y_test_pol2, color="red")
plt.xlabel("True Values")
plt.ylabel("Predictions")

plt.subplot(1,3,3)
plt.title('Poly 3 Model \n', size = 14)
plt.scatter(y_test_pol3, y_preds_pol3)
plt.scatter(y_train_pol3, y_preds_train_pol3,alpha=0.10)
plt.plot(y_test_pol3, y_test_pol3, color="red")
plt.ylim(0,90)
plt.xlabel("True Values")
plt.ylabel("Predictions")




plt.subplots_adjust()
plt.show()

<font color="green"> There are two critical characteristics of estimators to be considered: the bias and the variance. The bias is the difference between the true population parameter and the expected estimator.
It measures the accuracy of the estimates. Variance, on the other hand, measures the spread, or uncertainty, in these estimates. 

<font color="green"> So, setting λ to 0 is the same as using the OLS, while the larger its value, the stronger is the coefficients' size penalized
as λ becomes larger, the variance decreases, and the bias increases.
    
A more traditional approach would be to choose λ such that some information criterion,Akaike or Bayesian(AIC or BIC), is the smallest. A more machine learning-like approach is to perform cross-validation and select the value of λ that minimizes the cross-validated sum of squared residuals.

As we see on scatter plots, True values of Poly 2 model are distributed better than Poly 3 Model on test and train group. Poly 3 Model is not enough to explain some of higher values. 

# <div align="center">  10.4 Building Ridge Regression Models

#### <font color="green">While Least Squares determines values for the parameters in an equation, it minimizes the sum of the squared residuals. On the other hand, Ridge Regression minimizes the sum of the squared residuals plus lambda and the slope of the regression line.

#### <font color="green"> As having mostly parameters important for my prediction, I am willing to use Ridge Model as well to keep all of components in my model. 
    
#### <font color="red"> I prefered not to use " sklearn.feature_selection.RFE(estimator, n_features_to_select=None, step=1, verbose=0)"  considering the size of my data set. However, this function also helps to estimate the most suitable variables within number of features in selected model. 

In [None]:
from sklearn.linear_model import Ridge

In [None]:
def Ridge_model(df,pol, alpha, col=None):

    y = df['Life_Expectancy']
    Feature_list = df.columns[:500]
    
    MSE_list_test=[]
    R_list=[]
    adj_R_test=[]
    number_of_variables=[]
    MAE_list=[]
    RMSE_list=[]
    MAPE_list=[]
    R_train_list=[]
    adj_R_train=[]
    MSE_train_list=[]
    model_list=[]
    feature_list=[]
        
    
    for variable in range(1,len(Feature_list)-1, pol**pol*2):
        selected_features =  Feature_list[:(-1*variable)]
        X_poly=df[selected_features]
        X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size = 0.2, random_state = 0)
                
        model_poly = Ridge(alpha= alpha) 
        model_poly.fit(X_train, y_train)
        results = model_poly.fit(X_train, y_train)
               
        y_pred  = model_poly.predict(X_test)
        
        y_pred_train = model_poly.predict(X_train)
      
        MSE_list_test.append(mse(y_test, y_pred))
        
        MSE_train_list.append(mse(y_train, y_pred_train))
        R_list.append(model_poly.score(X_test, y_test))
        adj_R_test.append(1 - (1-model_poly.score(X_test, y_test))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
        
        R_train_list.append(model_poly.score(X_train, y_train))
        adj_R_train.append(1 - (1-model_poly.score(X_train, y_train))*(len(y_train)-1)/(len(y_train)-X_train.shape[1]-1))
        
        number_of_variables.append(len(selected_features))
        MAE_list.append(mean_absolute_error(y_test, y_pred))
        
        RMSE_list.append(rmse(y_test, y_pred))
        
        MAPE_list.append(np.mean(np.abs((y_test-y_pred) / y_test)) * 100)
        model_list.append(model_poly)
        feature_list.append(selected_features)
        
        
    
        
        
    model_means = list(zip(number_of_variables, R_list, adj_R_test, R_train_list, adj_R_train, MSE_list_test,
                           MSE_train_list,MAE_list,RMSE_list,MAPE_list,model_list,feature_list))
    
    poly_means = pd.DataFrame(model_means, columns= ['number_of_variables', 'R_list','adj_R_test',
                                                     'R_train_list','adj_R_train',
                                                     'MSE_list_test','MSE_train_list','MAE_list','RMSE_list','MAPE_list',
                                                     'model_list', 'feature_list'])
    
    
    return poly_means, (y_pred,y_pred_train, X_train,y_train, X_test, y_test, MSE_list_test,MSE_train_list)

In [None]:
%%time
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 1, 10, 100, 1000]: 
    df, _  = Ridge_model(df_poly_transform2,2,alpha)
    print(alpha, df.MSE_list_test.min())

In [None]:
#The Best Model option with minimum MSE_test Value on Alpha 10-⁵ and polynomial 2nd degree.

df_Ridge_alpha_pol2, degerler1_2 = Ridge_model(df_poly_transform2,2,0.000001)

In [None]:
df_Ridge_alpha_pol2.head()

In [None]:
%%time
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 1, 10, 100, 1000]:
    df, _  = Ridge_model(df_poly_transform3,3,alpha)
    print(alpha, df.MSE_list_test.min())

In [None]:
#The Best Model option with minimum MSE_test Value on Alpha 10³ and polynomial 3rd degree.
df_Ridge_alpha_pol3, degerler1_3 = Ridge_model(df_poly_transform3,3,1000)

In [None]:
df_Ridge_alpha_pol3.head()

In [None]:
MSE_list_test_alpha_pol2  = df_Ridge_alpha_pol2['MSE_list_test']
MSE_train_test_alpha_pol2 = df_Ridge_alpha_pol2['MSE_train_list']
MSE_list_test_alpha_pol3  = df_Ridge_alpha_pol3['MSE_list_test']
MSE_train_test_alpha_pol3 = df_Ridge_alpha_pol3['MSE_train_list']

In [None]:
plt.figure(1, figsize = (15,8))

plt.subplot(1,2,1)
plt.plot(df_Ridge_alpha_pol2.number_of_variables, MSE_list_test_alpha_pol2,label  = 'MSE Test Alpha Pol2 Values', color='blue', linewidth=5)
plt.plot(df_Ridge_alpha_pol2.number_of_variables, MSE_train_test_alpha_pol2,label = 'MSE Train  Alpha Pol2 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values')
plt.title('POLY 2 MSE Test/Train Values')
plt.legend()

plt.subplot(1,2,2)
plt.plot(df_Ridge_alpha_pol3.number_of_variables,MSE_list_test_alpha_pol3,label  = 'MSE Test Alpha Pol3 Values', color='blue', linewidth=5)
plt.plot(df_Ridge_alpha_pol3.number_of_variables, MSE_train_test_alpha_pol3,label = 'MSE Train Alpha Pol3 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values')
plt.title('POLY 3 MSE Test/Train Values')


plt.subplots_adjust()
plt.legend()
plt.show()

In [None]:
adj_R_test_alpha_pol2  = df_Ridge_alpha_pol2['adj_R_test']
adj_R_train_alpha_pol2 = df_Ridge_alpha_pol2['adj_R_train']
adj_R_test_alpha_pol3  = df_Ridge_alpha_pol3['adj_R_test']
adj_R_train_alpha_pol3 = df_Ridge_alpha_pol3['adj_R_train']




plt.figure(1, figsize = (15,8))

plt.subplot(1,2,1)
plt.plot(df_Ridge_alpha_pol2.number_of_variables, adj_R_test_alpha_pol2,label  = 'Adjusted R² Test Alpha Pol2 Values', color='blue', linewidth=5)
plt.plot(df_Ridge_alpha_pol2.number_of_variables, adj_R_train_alpha_pol2,label = 'Adjusted R² Train  Alpha Pol2 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Adjusted Values')
plt.title('Ridge POLY 2 Adjusted R² Test/Train Values')
plt.legend()

plt.subplot(1,2,2)
plt.plot(df_Ridge_alpha_pol3.number_of_variables,adj_R_test_alpha_pol3,label  = 'Adjusted R² Test Alpha Pol3 Values', color='blue', linewidth=5)
plt.plot(df_Ridge_alpha_pol3.number_of_variables, adj_R_train_alpha_pol3,label = 'Adjusted R² Train Alpha Pol3 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Adjusted Values')
plt.title('Ridge POLY 3 Adjusted R² Test/Train Values')


plt.subplots_adjust()
plt.legend()
plt.show()

#### <font color="green"> 2nd polynomial degree gives higher Adjusted R squared values compering with 3rd polynomial degree.

#### <font color="green"> While having the same trend until 125th variable on the Poly 2 MSE results, Poly 3 MSE results shows us that after the 200th variable trend is not good any more. 
    
#### <font color="green"> Because having the low MSE value, I will continue with 2nd polynomial degree ridge Model. Later on, we also compare R squared values as well


# 10.5 Building Lasso Regression Models

#### <font color="green"> While Ridge Regression minimizes the sum of the squared residuals plus lambda and squaring the slope of the regression line, Lasso Regression minimizes the sum of the squared residuals, plus lambda and absolute value of slope of the regression line.
    
#### <font color="green">In contrast, Ridge shrink the parameters by keeping all of them, Lasso Regression eliminates and creates a simpler model to explain. Therefore, I would like to have results of this model as well to have a wider range of elements for my prediction.

In [None]:
from sklearn.linear_model import Lasso

In [None]:
def Lasso_model(df,pol, alpha):

    y = df['Life_Expectancy']
    Feature_list = df.columns[:500]
    
    MSE_list_test=[]
    R_list=[]
    adj_R_test=[]
    number_of_variables=[]
    MAE_list=[]
    RMSE_list=[]
    MAPE_list=[]
    R_train_list=[]
    adj_R_train=[]
    MSE_train_list=[]
    
    for variable in range(1,len(Feature_list)-1, pol**pol*2):
        selected_features =  Feature_list[:(-1*variable)]
        X_poly=df[selected_features]
        X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size = 0.2, random_state = 0)
                
        model_poly = Lasso(alpha= alpha) 
        model_poly.fit(X_train, y_train)
        results = model_poly.fit(X_train, y_train)
               
        y_pred  = model_poly.predict(X_test)
        
        y_pred_train = model_poly.predict(X_train)
      
        MSE_list_test.append(mse(y_test, y_pred))
        
        MSE_train_list.append(mse(y_train, y_pred_train))
        
        R_list.append(model_poly.score(X_test, y_test))
        adj_R_test.append(1 - (1-model_poly.score(X_test, y_test))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
        
        R_train_list.append(model_poly.score(X_train, y_train))
        adj_R_train.append(1 - (1-model_poly.score(X_train, y_train))*(len(y_train)-1)/(len(y_train)-X_train.shape[1]-1))
        
        number_of_variables.append(len(selected_features))
        MAE_list.append(mean_absolute_error(y_test, y_pred))
        
        RMSE_list.append(rmse(y_test, y_pred))
        
        MAPE_list.append(np.mean(np.abs((y_test-y_pred) / y_test)) * 100)
        
        
    model_means = list(zip(number_of_variables, R_list, adj_R_test, R_train_list, adj_R_train, MSE_list_test,MSE_train_list,MAE_list,RMSE_list,MAPE_list))
    
    poly_means = pd.DataFrame(model_means, columns= ['number_of_variables', 'R_list', 'adj_R_test', 'R_train_list', 'adj_R_train','MSE_list_test','MSE_train_list','MAE_list','RMSE_list','MAPE_list'])
    
    
    return poly_means, (y_pred,y_pred_train, X_train,y_train, X_test, y_test, model_poly, MSE_list_test,MSE_train_list)

In [None]:
%%time
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 1, 10, 100, 1000]:
    df, _  = Lasso_model(df_poly_transform2,2,alpha)
    print(alpha, df.MSE_list_test.min())

In [None]:
#The Best Model option with minimum MSE_test Value on Alpha 10-⁵ and polynomial 2 degree

df_Lasso_alpha_pol2, degerler1_2 = Lasso_model(df_poly_transform2,2,0.000001)

In [None]:
df_Lasso_alpha_pol2.head()

In [None]:
%%time
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 1, 10, 100, 1000]:
    df, _  = Lasso_model(df_poly_transform3,3,alpha)
    print(alpha, df.MSE_list_test.min())

In [None]:
# The Best Model option with minimum MSE_test Value on Alpha 10³ and polynomial 3 degree.

df_Lasso_alpha_pol3, degerler1_3 = Lasso_model(df_poly_transform3,3,1000)

In [None]:
df_Lasso_alpha_pol3.head()

In [None]:
MSE_list_test_Lasso_alpha_pol2  = df_Lasso_alpha_pol2['MSE_list_test']
MSE_train_test_Lasso_alpha_pol2 = df_Lasso_alpha_pol2['MSE_train_list']
MSE_list_test_Lasso_alpha_pol3  = df_Lasso_alpha_pol3['MSE_list_test']
MSE_train_test_Lasso_alpha_pol3 = df_Lasso_alpha_pol3['MSE_train_list']

In [None]:
plt.figure(1, figsize = (15,8))

plt.subplot(1,2,1)
plt.plot(df_Lasso_alpha_pol2.number_of_variables,MSE_list_test_Lasso_alpha_pol2, label = 'MSE Test  Alpha Pol2 Values', color='blue', linewidth=5)
plt.plot(df_Lasso_alpha_pol2.number_of_variables,MSE_train_test_Lasso_alpha_pol2,label = 'MSE Train  Alpha Pol2 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values')
plt.title('Lasso POLY 2 MSE Test/Train Values')
plt.legend()

plt.subplot(1,2,2)
plt.plot(df_Lasso_alpha_pol3.number_of_variables, MSE_list_test_Lasso_alpha_pol3,label = 'MSE Alpha1 Pol3 Values', color='blue', linewidth=5)
plt.plot(df_Lasso_alpha_pol3.number_of_variables, MSE_train_test_Lasso_alpha_pol3,label = 'MSE Train  Alpha Pol3 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values')
plt.title('Lasso POLY 3 MSE Test/Train Values')
plt.legend()

plt.subplots_adjust()

plt.show()

In [None]:
adj_R_test_Lasso_alpha_pol2  = df_Lasso_alpha_pol2['adj_R_test']
adj_R_train_Lasso_alpha_pol2 = df_Lasso_alpha_pol2['adj_R_train']
adj_R_test_Lasso_alpha_pol3  = df_Lasso_alpha_pol3['adj_R_test']
adj_R_train_Lasso_alpha_pol3 = df_Lasso_alpha_pol3['adj_R_train']

plt.figure(1, figsize = (15,8))

plt.subplot(1,2,1)
plt.plot(df_Lasso_alpha_pol2.number_of_variables,adj_R_test_Lasso_alpha_pol2, label = 'Adjusted R² Test  Alpha Pol2 Values', color='blue', linewidth=5)
plt.plot(df_Lasso_alpha_pol2.number_of_variables,adj_R_train_Lasso_alpha_pol2,label = 'Adjusted R² Train  Alpha Pol2 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Adjusted R² Values')
plt.title('Lasso POLY 2 Adjusted R² Test/Train Values')
plt.legend()

plt.subplot(1,2,2)
plt.plot(df_Lasso_alpha_pol3.number_of_variables, adj_R_test_Lasso_alpha_pol3,label = 'Adjusted R² Alpha1 Pol3 Values', color='blue', linewidth=5)
plt.plot(df_Lasso_alpha_pol3.number_of_variables, adj_R_train_Lasso_alpha_pol3,label = 'Adjusted R² Train  Alpha Pol3 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Adjusted R² Values')
plt.title('Lasso POLY 3 Adjusted R² Test/Train Values')
plt.legend()

plt.subplots_adjust()

plt.show()



#### <font color="green"> Lasso 2nd Polynomial degree model gives higher Adjusted R² values than 3rd polynomial degree. With the same variables 3rd polynomial degree is not a good option for our regression model. 

#### <font color="green"> As we see, Poly 2 Model is breaking after 130th variable and train and test values loses direction in Poly 3 Model after 400th variable. 
    
#### <font color="green"> As Lasso Model eliminates features in function, overfitting is not happening as before.

# 10.6 Building ElasticNet Regression Models

#### <font color="green"> This type of regression is a mixed of Ridge and Lasso Regression models for a huge data set while not keeping all elements in the model. This model also eliminates unnecessary variables. 

In [None]:
from sklearn.linear_model import ElasticNet

In [None]:
def ElasticNet_model(df,pol, alpha):

    y = df['Life_Expectancy']
    Feature_list = df.columns[:500]
    
    MSE_list_test=[]
    R_list=[]
    adj_R_test=[]
    number_of_variables=[]
    MAE_list=[]
    RMSE_list=[]
    MAPE_list=[]
    R_train_list=[]
    adj_R_train=[]
    MSE_train_list=[]
    
    for variable in range(1,len(Feature_list)-1, pol**pol*2):
        selected_features =  Feature_list[:(-1*variable)]
        X_poly=df[selected_features]
        X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size = 0.2, random_state = 0)
                
        model_poly = ElasticNet(alpha=alpha, l1_ratio=0.5)
        model_poly.fit(X_train, y_train)
        results = model_poly.fit(X_train, y_train)
               
        y_pred  = model_poly.predict(X_test)
        
        y_pred_train = model_poly.predict(X_train)
      
        MSE_list_test.append(mse(y_test, y_pred))
        
        MSE_train_list.append(mse(y_train, y_pred_train))
        
        R_list.append(model_poly.score(X_test, y_test))
        adj_R_test.append(1 - (1-model_poly.score(X_test, y_test))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
        
        R_train_list.append(model_poly.score(X_train, y_train))
        adj_R_train.append(1 - (1-model_poly.score(X_train, y_train))*(len(y_train)-1)/(len(y_train)-X_train.shape[1]-1))
                
        number_of_variables.append(len(selected_features))
        MAE_list.append(mean_absolute_error(y_test, y_pred))
        
        RMSE_list.append(rmse(y_test, y_pred))
        
        MAPE_list.append(np.mean(np.abs((y_test-y_pred) / y_test)) * 100)
        
        
    model_means = list(zip(number_of_variables, R_list, adj_R_test, R_train_list, adj_R_train, MSE_list_test,MSE_train_list,MAE_list,RMSE_list,MAPE_list))
    
    poly_means = pd.DataFrame(model_means, columns= ['number_of_variables', 'R_list', 'adj_R_test', 'R_train_list', 'adj_R_train', 'MSE_list_test','MSE_train_list','MAE_list','RMSE_list','MAPE_list'])
    
    
    return poly_means, (y_pred,y_pred_train, X_train,y_train, X_test, y_test, model_poly, MSE_list_test,MSE_train_list)

In [None]:
%%time
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 1, 10, 100, 1000]:
    df, _  = ElasticNet_model(df_poly_transform3,3,alpha)
    print(alpha, df.MSE_list_test.min())

In [None]:
#The Best Model with minimum MSE_test Value on Alpha 10⁴ and polynomial 3 degree 
df_ElasticNet_alpha_pol3, degerler1_3 = ElasticNet_model(df_poly_transform3,3,0.00001)

In [None]:
%%time
for alpha in [0.000001, 0.00001, 0.0001, 0.001, 0.01, 1, 10, 100, 1000]:
    df, _  = ElasticNet_model(df_poly_transform2,2,alpha)
    print(alpha, df.MSE_list_test.min())

In [None]:
#The Best Model with minimum MSE_test Value on Alpha 10-⁵ and polynomial 2 degree 

df_ElasticNet_alpha_pol2, degerler1_2 = ElasticNet_model(df_poly_transform2,2,0.000001)

In [None]:
MSE_list_test_ElasticNet_alpha_pol2  = df_ElasticNet_alpha_pol2['MSE_list_test']
MSE_list_train_ElasticNet_alpha_pol2 = df_ElasticNet_alpha_pol2['MSE_train_list']
MSE_list_test_ElasticNet_alpha_pol3  = df_ElasticNet_alpha_pol3['MSE_list_test']
MSE_list_train_ElasticNet_alpha_pol3 = df_ElasticNet_alpha_pol3['MSE_train_list']

In [None]:
plt.figure(1, figsize = (15,8))

plt.subplot(1,2,1)
plt.plot(df_ElasticNet_alpha_pol2.number_of_variables,MSE_list_test_ElasticNet_alpha_pol2, label = 'MSE Test  Alpha Pol2 Values', color='blue', linewidth=5)
plt.plot(df_ElasticNet_alpha_pol2.number_of_variables,MSE_list_train_ElasticNet_alpha_pol2,label = 'MSE Train  Alpha Pol2 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values')
plt.title('POLY 2 MSE Test/Train Values')
plt.legend()

plt.subplot(1,2,2)
plt.plot(df_ElasticNet_alpha_pol3.number_of_variables, MSE_list_test_ElasticNet_alpha_pol3,label = 'MSE Alpha1 Pol3 Values', color='blue', linewidth=5)
plt.plot(df_ElasticNet_alpha_pol3.number_of_variables, MSE_list_train_ElasticNet_alpha_pol3,label = 'MSE Train  Alpha Pol3 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Values')
plt.title('POLY 3 MSE Test/Train Values')
plt.legend()

plt.subplots_adjust()

plt.show()

In [None]:
adj_R_test_ElasticNet_alpha_pol2  = df_ElasticNet_alpha_pol2['adj_R_test']
adj_R_train_ElasticNet_alpha_pol2 = df_ElasticNet_alpha_pol2['adj_R_train']
adj_R_test_ElasticNet_alpha_pol3  = df_ElasticNet_alpha_pol3['adj_R_test']
adj_R_train_ElasticNet_alpha_pol3 = df_ElasticNet_alpha_pol3['adj_R_train']


plt.figure(1, figsize = (15,8))

plt.subplot(1,2,1)
plt.plot(df_ElasticNet_alpha_pol2.number_of_variables,adj_R_test_ElasticNet_alpha_pol2, label = 'Adjusted R² Test  Alpha Pol2 Values', color='blue', linewidth=5)
plt.plot(df_ElasticNet_alpha_pol2.number_of_variables,adj_R_train_ElasticNet_alpha_pol2,label = 'Adjusted R² Train  Alpha Pol2 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Adjusted R²')
plt.title('Elastic Net POLY 2 Adjusted R² Test/Train Values')
plt.legend()

plt.subplot(1,2,2)
plt.plot(df_ElasticNet_alpha_pol3.number_of_variables, adj_R_test_ElasticNet_alpha_pol3,label = 'Adjusted R² Alpha1 Pol3 Values', color='blue', linewidth=5)
plt.plot(df_ElasticNet_alpha_pol3.number_of_variables, adj_R_train_ElasticNet_alpha_pol3,label = 'Adjusted R² Train  Alpha Pol3 Values', color='red', linewidth=5)
plt.xlabel('Number of Variable')
plt.ylabel('Adjusted R²')
plt.title('Elastic Net POLY 3 Adjusted R² Test/Train Values')
plt.legend()

plt.subplots_adjust()

plt.show()

#### <font color="green"> This models gives best adjusted R² values with 2nd polynomial degree.

# <div align="center">  11. Evaluating the Model

#### Comparing All Results of our Models in one BarPlot

In [None]:
plt.figure(figsize=(25,20))

objects=('df_pol1', 'df_pol2', 'df_pol3',
           'df_Ridge_alpha_pol2', 'df_Ridge_alpha_pol3',
           'df_Lasso_alpha_pol2', 'df_Lasso_alpha_pol3',
           'df_ElasticNet_alpha_pol2', 'df_ElasticNet_alpha_pol3' )

y_pos = np.arange(len(objects)) 
performance  =[df_pol1.MSE_list_test.min() ,df_pol2.MSE_list_test.min(), df_pol3.MSE_list_test.min(),
               df_Ridge_alpha_pol2.MSE_list_test.min(),df_Ridge_alpha_pol3.MSE_list_test.min(),
               df_Lasso_alpha_pol2.MSE_list_test.min(), df_Lasso_alpha_pol3.MSE_list_test.min(),
               df_ElasticNet_alpha_pol2.MSE_list_test.min(), df_ElasticNet_alpha_pol3.MSE_list_test.min()]

performance2 =[df_pol1.MSE_train_list.min(), df_pol2.MSE_train_list.min(), df_pol3.MSE_train_list.min(),
               df_Ridge_alpha_pol2.MSE_train_list.min(),df_Ridge_alpha_pol3.MSE_train_list.min(),
               df_Lasso_alpha_pol2.MSE_train_list.min(), df_Lasso_alpha_pol3.MSE_train_list.min(),
               df_ElasticNet_alpha_pol2.MSE_train_list.min(), df_ElasticNet_alpha_pol3.MSE_train_list.min()]
               
               
performance3 = [df_pol1.R_list.max() ,df_pol2.R_list.max(), df_pol3.R_list.max(),
               df_Ridge_alpha_pol2.R_list.max(),df_Ridge_alpha_pol3.R_list.max(),
               df_Lasso_alpha_pol2.R_list.max(), df_Lasso_alpha_pol3.R_list.max(),
               df_ElasticNet_alpha_pol2.R_list.max(), df_ElasticNet_alpha_pol3.R_list.max()]

performance4 = [df_pol1.adj_R_test.max() ,df_pol2.adj_R_test.max(), df_pol3.adj_R_test.max(),
               df_Ridge_alpha_pol2.adj_R_test.max(),df_Ridge_alpha_pol3.adj_R_test.max(),
               df_Lasso_alpha_pol2.adj_R_test.max(), df_Lasso_alpha_pol3.adj_R_test.max(),
               df_ElasticNet_alpha_pol2.adj_R_test.max(), df_ElasticNet_alpha_pol3.adj_R_test.max()]

plt.subplot(411)
plt.bar(y_pos, performance, align='center')
plt.xticks(y_pos, objects,size=13)

plt.ylabel('MSE Values',size=15)
plt.title('MSE TEST Values \n', fontsize=15)


plt.subplots_adjust()
plt.subplot(412)
plt.bar(y_pos, performance2, align='center')
plt.xticks(y_pos, objects,size=13)

plt.ylabel('MSE TRAIN Values',size=15)
plt.title('MSE  Values \n', size = 15)

plt.subplot(413)
plt.bar(y_pos, performance3, align='center')
plt.xticks(y_pos, objects,size=13)
plt.title('R Squared Values \n', size = 15)

plt.ylabel('R Squared Values',size=15)

plt.subplot(414)
plt.bar(y_pos, performance4, align='center')
plt.xticks(y_pos, objects,size=13)
plt.title('Adjusted R Squared Values \n', size = 15)

plt.ylabel('Adjusted R Squared Values',size=15)


plt.subplots_adjust()
plt.show()

#### Getting All Values of each Models in One Data Frame

In [None]:
objects =(df_pol1, df_pol2, df_pol3,
             df_Ridge_alpha_pol2, df_Ridge_alpha_pol3,
             df_Lasso_alpha_pol2, df_Lasso_alpha_pol3,
             df_ElasticNet_alpha_pol2, df_ElasticNet_alpha_pol3)

df_results = pd.DataFrame()
for df in objects:
    df_results= df_results.append(df.sort_values(by='MSE_list_test').head(1), ignore_index=True)
    

df_results['Model'] = ['Linear Regression (Polynomial 1)',
                           'Linear Regression (Polynomial 2)',
                           'Linear Regression (Polynomial 3)',
                           'Ridge Regression (Polynomial 2)',
                           'Ridge Regression (Polynomial 3)',
                           'Lasso Regression (Polynomial 2)',
                           'Lasso Regression (Polynomial 3)',                           
                           'ElasticNet Regression(Polynomial 2)',
                           'ElasticNet Regression(Polynomial 3)']
    
df_results.sort_values('MSE_list_test')[['Model', 'number_of_variables', 'MSE_list_test','MSE_train_list', 'R_list','adj_R_test', 'adj_R_train']]

### <font color="green"> After searching different type of regression models, we have the minimum MSE and the better adjusted R² values from  Linear Regression and Ridge Regression on two polynomial degree. Polynomial degree does not affect values on different type of regression models. 


### <font color="green"> Low MSE values and highest adjusted R² came from two polynomial degree models. Applying other type of regressions with three polynomial degree only increased MSE Test values to a higher level. Therefore, I agree to choose the Ridge Regression with two polynomial degree.

### <font color="green">  **After selecting the best model of Ridge Regression with 2 polynomial degree on alpha 0.000001, here we will see the results of our model by applying coefficients on each variable as an example to check our model performance.**

# <div align="center">  12. Predicting with the Best Model

#### An Example from a Rondom Row to Check The Model Performance

In [None]:
#As we see on the graph of this model, best performance is starting after 125th variable.
#Thus, I selected the first 126 variables from our model.

df_Ridge_alpha_pol2[df_Ridge_alpha_pol2['number_of_variables']== 126 ]

In [None]:
# A rondom row[5] of our data set to find values for each columns as an example:

Selected_Model = df_Ridge_alpha_pol2.iloc[5].model_list

In [None]:
#Here are the first 5 coeficiants from our model. 

Selected_Model.coef_[:5]

In [None]:
#Switching our values to doctionary for the further step.
LifeExpectancyData_num.iloc[5].to_dict()

In [None]:
#Creating a dictionary to have values for each variables.

dictionary = {'Year': 2010.0,
 'Adult_Mortality': 279.0,
 'infant_deaths': 74.0,
 'Alcohol': 0.01,
 'percentage_expenditure': 79.67936736,
 'Hepatitis_B': 66.0,
 'Measles': 1989.0,
 'BMI': 16.7,
 'under_five_deaths': 102.0,
 'Polio': 66.0,
 'Total_Expenditure': 9.2,
 'Diphtheria': 66.0,
 'HIV/AIDS': 0.1,
 'GDP': 553.32894,
 'thinness_1_19_years': 16.6,
 'thinness_5_9_years': 6.9,
 'Income_composition_of_resources': 0.45,
 'Schooling': 9.2}

In [None]:
Example = np.array(list(dictionary.values())).reshape(1,-1)
poly = PolynomialFeatures(2)
df = LifeExpectancyData_num.drop('Life_Expectancy', axis=1)
poly.fit_transform(df)

df_example = pd.DataFrame(poly.transform(Example), columns= poly.get_feature_names(df.columns))

df_Ridge_alpha_pol2, degerler1_2 = Ridge_model(df_poly_transform2,2,0.000001)
selected_fetures = df_Ridge_alpha_pol2.iloc[5]['feature_list']
selected_model = df_Ridge_alpha_pol2.iloc[5]['model_list']

Selected_Model.predict(df_example[selected_fetures]) 

# <div align="center">  13. Conclusions

### <font color="green">  We can see that having values as following 'Year': 2010,  'Adult_Mortality': 279.0,  'infant_deaths': 74.0, 'Alcohol': 0.01, 'percentage_expenditure': 79.67936736, 'Hepatitis_B': 66.0, 'Measles': 1989.0, 'BMI': 16.7, 'under_five_deaths': 102.0, 'Polio': 66.0, 'Total_Expenditure': 9.2, 'Diphtheria': 66.0, 'HIV/AIDS': 0.1, 'GDP': 553.32894, 'thinness_1_19_years': 16.6, 'thinness_5_9_years': 6.9, 'Income_composition_of_resources': 0.45, 'Schooling': 9.2, gives the result of Life Expectancy as '61'. 
 
### <font color="green">  The original value of Life Expectancy was 58.8 in 2010.  MSE Test value is 6.367 with average +-2.52 of RMSE value. Simply 61 minus 2.52 gives results as around 58 from the real value of Life Expectancy.
    
### <font color="green">  Regression models is luckily helping us to predict our dependent variabl0 with using many parameters. In order to have an accurate result, we need to check as many as regression models. Having the lowest MSE and highest R squared values are helping us on our way. 
 