### **Read and Explore the dataset**



>The dataset related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. Among all categories of health-related factors only those critical factors were chosen which are more representative.

>In this dataset it had been considered data from year 2000-2015 for 193 countries for further analysis. The individual data files have been merged together into a single dataset. On initial visual inspection of the data showed some missing values.


In [None]:
import pandas as pd
df = pd.read_csv('/content/LifeExpectancyData.csv')
df.head()

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,...,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,...,6.0,8.16,65.0,0.1,584.25921,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,...,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,...,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.47,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,...,67.0,8.52,67.0,0.1,669.959,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,...,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5


### **Predictive Model**

>Compare between different prediction models with different feature selection techniques (by charts or table)

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.feature_selection import VarianceThreshold
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.feature_selection import RFE
from sklearn.feature_selection import SelectFromModel

In [None]:
# Data Cleaning Process.

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                     10
Adult Mortality                     10
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
 BMI                                34
under-five deaths                    0
Polio                               19
Total expenditure                  226
Diphtheria                          19
 HIV/AIDS                            0
GDP                                448
Population                         652
 thinness  1-19 years               34
 thinness 5-9 years                 34
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df.columns = df.columns.str.strip()

In [None]:
df['Life expectancy'].fillna(df['Life expectancy'].mean(), inplace = True)
df['Life expectancy'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                     10
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                 34
under-five deaths                    0
Polio                               19
Total expenditure                  226
Diphtheria                          19
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                34
thinness 5-9 years                  34
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df['Adult Mortality'].fillna(df['Adult Mortality'].mean(), inplace = True)
df['Adult Mortality'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                 34
under-five deaths                    0
Polio                               19
Total expenditure                  226
Diphtheria                          19
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                34
thinness 5-9 years                  34
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df['BMI'].fillna(df['BMI'].mean(), inplace = True)
df['BMI'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                               19
Total expenditure                  226
Diphtheria                          19
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                34
thinness 5-9 years                  34
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df['Polio']

0        6.0
1       58.0
2       62.0
3       67.0
4       68.0
        ... 
2933    67.0
2934     7.0
2935    73.0
2936    76.0
2937    78.0
Name: Polio, Length: 2938, dtype: float64

In [None]:
df['Polio'].value_counts()

Polio
99.0    376
98.0    255
96.0    207
97.0    205
95.0    180
       ... 
48.0      2
39.0      2
23.0      1
17.0      1
33.0      1
Name: count, Length: 73, dtype: int64

In [None]:
df['Polio'].fillna(round(df['Polio'].mean(), 0), inplace = True)
df['Polio'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                  226
Diphtheria                          19
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                34
thinness 5-9 years                  34
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df['Diphtheria']

0       65.0
1       62.0
2       64.0
3       67.0
4       68.0
        ... 
2933    65.0
2934    68.0
2935    71.0
2936    75.0
2937    78.0
Name: Diphtheria, Length: 2938, dtype: float64

In [None]:
df['Diphtheria'].value_counts()

Diphtheria
99.0    350
98.0    254
97.0    205
96.0    201
95.0    200
       ... 
16.0      1
56.0      1
21.0      1
19.0      1
27.0      1
Name: count, Length: 81, dtype: int64

In [None]:
df['Diphtheria'].fillna(round(df['Diphtheria'].mean(), 0), inplace = True)
df['Diphtheria'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                  226
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                34
thinness 5-9 years                  34
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df['thinness  1-19 years']

0       17.2
1       17.5
2       17.7
3       17.9
4       18.2
        ... 
2933     9.4
2934     9.8
2935     1.2
2936     1.6
2937    11.0
Name: thinness  1-19 years, Length: 2938, dtype: float64

In [None]:
df['thinness  1-19 years'].value_counts()

thinness  1-19 years
1.0     74
1.9     65
0.8     64
0.7     63
1.2     62
        ..
16.5     1
16.7     1
16.9     1
17.1     1
15.8     1
Name: count, Length: 200, dtype: int64

In [None]:
df['thinness  1-19 years'].fillna(df['thinness  1-19 years'].mean(), inplace = True)
df['thinness  1-19 years'].isnull().sum()

0

In [None]:
df['thinness 5-9 years']

0       17.3
1       17.5
2       17.7
3       18.0
4       18.2
        ... 
2933     9.4
2934     9.9
2935     1.3
2936     1.7
2937    11.2
Name: thinness 5-9 years, Length: 2938, dtype: float64

In [None]:
df['thinness 5-9 years'].value_counts()

thinness 5-9 years
0.9     69
1.1     67
0.5     63
1.9     63
1.0     62
        ..
16.9     1
17.2     1
17.4     1
17.6     1
27.9     1
Name: count, Length: 207, dtype: int64

In [None]:
df['thinness 5-9 years'].fillna(df['thinness 5-9 years'].mean(), inplace = True)
df['thinness 5-9 years'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                            194
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                  226
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df.head()

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,...,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,...,6.0,8.16,65.0,0.1,584.25921,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,...,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,...,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.47,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,...,67.0,8.52,67.0,0.1,669.959,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,...,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5


In [None]:
df['Alcohol'].value_counts()

Alcohol
0.01    288
0.03     15
0.04     13
0.02     12
0.09     12
       ... 
4.33      1
7.09      1
5.54      1
3.46      1
4.57      1
Name: count, Length: 1076, dtype: int64

In [None]:
df['Alcohol'].fillna(df['Alcohol'].mode()[0], inplace = True)
df['Alcohol'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                              0
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                  226
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources    167
Schooling                          163
dtype: int64

In [None]:
df['Schooling']

0       10.1
1       10.0
2        9.9
3        9.8
4        9.5
        ... 
2933     9.2
2934     9.5
2935    10.0
2936     9.8
2937     9.8
Name: Schooling, Length: 2938, dtype: float64

In [None]:
df['Schooling'].value_counts()

Schooling
12.9    58
13.3    52
12.5    49
12.8    46
12.3    44
        ..
20.7     1
19.8     1
3.4      1
3.6      1
2.8      1
Name: count, Length: 173, dtype: int64

In [None]:
df['Schooling'].fillna(df['Schooling'].mean(), inplace = True)
df['Schooling'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                              0
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                  226
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources    167
Schooling                            0
dtype: int64

In [None]:
df['Income composition of resources']

0       0.479
1       0.476
2       0.470
3       0.463
4       0.454
        ...  
2933    0.407
2934    0.418
2935    0.427
2936    0.427
2937    0.434
Name: Income composition of resources, Length: 2938, dtype: float64

In [None]:
df['Income composition of resources'].value_counts()

Income composition of resources
0.000    130
0.700     17
0.739     13
0.714     12
0.636     12
        ... 
0.933      1
0.930      1
0.925      1
0.347      1
0.460      1
Name: count, Length: 625, dtype: int64

In [None]:
df['Income composition of resources'].fillna(df['Income composition of resources'].mode()[0], inplace = True)
df['Income composition of resources'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                              0
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                  226
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources      0
Schooling                            0
dtype: int64

In [None]:
df['Total expenditure']

0       8.16
1       8.18
2       8.13
3       8.52
4       7.87
        ... 
2933    7.13
2934    6.52
2935    6.53
2936    6.16
2937    7.10
Name: Total expenditure, Length: 2938, dtype: float64

In [None]:
df['Total expenditure'].value_counts()

Total expenditure
4.60     15
6.70     12
5.60     11
9.10     10
3.40     10
         ..
12.24     1
12.23     1
13.66     1
9.00      1
3.52      1
Name: count, Length: 818, dtype: int64

In [None]:
df['Total expenditure'].fillna(df['Total expenditure'].mean(), inplace = True)
df['Total expenditure'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                              0
percentage expenditure               0
Hepatitis B                        553
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                    0
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources      0
Schooling                            0
dtype: int64

In [None]:
df['Hepatitis B']

0       65.0
1       62.0
2       64.0
3       67.0
4       68.0
        ... 
2933    68.0
2934     7.0
2935    73.0
2936    76.0
2937    79.0
Name: Hepatitis B, Length: 2938, dtype: float64

In [None]:
pd.set_option('display.max_rows', None)

In [None]:
df['Hepatitis B'].value_counts()

Hepatitis B
99.0    240
98.0    210
96.0    167
97.0    155
95.0    149
94.0    127
93.0    101
92.0     92
91.0     75
89.0     71
9.0      65
88.0     65
83.0     44
87.0     42
84.0     41
82.0     39
8.0      39
86.0     35
81.0     35
85.0     31
75.0     30
78.0     30
77.0     27
64.0     27
76.0     22
73.0     22
74.0     22
79.0     21
7.0      20
67.0     18
72.0     17
6.0      17
66.0     17
62.0     17
65.0     16
63.0     15
68.0     13
71.0     11
42.0     11
69.0     11
5.0       9
61.0      8
47.0      8
46.0      7
57.0      7
14.0      7
56.0      7
59.0      6
48.0      6
49.0      6
51.0      6
55.0      5
44.0      5
39.0      5
52.0      5
28.0      5
43.0      5
4.0       4
45.0      4
58.0      4
53.0      4
41.0      4
54.0      4
2.0       4
22.0      3
31.0      3
29.0      3
36.0      3
27.0      3
35.0      3
38.0      2
33.0      2
25.0      2
17.0      2
24.0      2
37.0      2
18.0      2
21.0      2
26.0      1
16.0      1
23.0      1
1.0       1
15.0

In [None]:
df['Hepatitis B'].fillna(df['Hepatitis B'].mean(), inplace = True)
df['Hepatitis B'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                              0
percentage expenditure               0
Hepatitis B                          0
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                    0
Diphtheria                           0
HIV/AIDS                             0
GDP                                448
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources      0
Schooling                            0
dtype: int64

In [None]:
pd.set_option('display.max_rows', 25)

In [None]:
df['GDP'].value_counts()

GDP
584.259210     1
354.818600     1
358.997310     1
43.646498      1
416.148380     1
              ..
4274.376857    1
4142.869175    1
3725.632210    1
2964.477340    1
547.358878     1
Name: count, Length: 2490, dtype: int64

In [None]:
import random as rd
df['GDP'].fillna(rd.choice(df['GDP'].dropna()), inplace = True)
df['GDP'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                              0
Year                                 0
Status                               0
Life expectancy                      0
Adult Mortality                      0
infant deaths                        0
Alcohol                              0
percentage expenditure               0
Hepatitis B                          0
Measles                              0
BMI                                  0
under-five deaths                    0
Polio                                0
Total expenditure                    0
Diphtheria                           0
HIV/AIDS                             0
GDP                                  0
Population                         652
thinness  1-19 years                 0
thinness 5-9 years                   0
Income composition of resources      0
Schooling                            0
dtype: int64

In [None]:
df['Population'].value_counts()

Population
444.0         4
292.0         2
127445.0      2
26868.0       2
1141.0        2
             ..
4136.0        1
482.0         1
43.0          1
3978.0        1
12222251.0    1
Name: count, Length: 2278, dtype: int64

In [None]:
df['Population'].fillna(rd.choice(df['Population'].dropna()), inplace = True)
df['Population'].isnull().sum()

0

In [None]:
df.isnull().sum()

Country                            0
Year                               0
Status                             0
Life expectancy                    0
Adult Mortality                    0
infant deaths                      0
Alcohol                            0
percentage expenditure             0
Hepatitis B                        0
Measles                            0
BMI                                0
under-five deaths                  0
Polio                              0
Total expenditure                  0
Diphtheria                         0
HIV/AIDS                           0
GDP                                0
Population                         0
thinness  1-19 years               0
thinness 5-9 years                 0
Income composition of resources    0
Schooling                          0
dtype: int64

In [None]:
# dropping target 'Life expectancy' & categorical columns
X = df.drop(['Life expectancy', 'Country', 'Status'], axis = 1)
y = df['Life expectancy']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 0)

In [None]:
# First Selection Feature Method: Variance
variances = X.var()
var_selector = VarianceThreshold()
var_selector.fit(X_train)
var_selected_features = X.columns[var_selector.get_support()]

print('Selected Features and Variance: ', len(var_selected_features))

Selected Features and Variance:  19


In [None]:
X_train_var = X_train[var_selected_features]
X_test_var = X_test[var_selected_features]

In [None]:
from sklearn.preprocessing import StandardScaler

X_train = StandardScaler().fit_transform(X_train)
X_test = StandardScaler().fit_transform(X_test)

In [None]:
# Linear Regression Model
LinearModel = LinearRegression()
LinearModel.fit(X_train_var, y_train)

In [None]:
# Predicting Linear Regression's Y
y_linear_pred = LinearModel.predict(X_test_var)

In [None]:
# Calculating Metrics
linear_mae = round(mean_absolute_error(y_test, y_linear_pred), 2)
linear_mse = round(mean_squared_error(y_test, y_linear_pred), 2)
linear_rmse = round(mean_squared_error(y_test, y_linear_pred, squared = False), 2)
linear_r2 = round(r2_score(y_test, y_linear_pred), 2)

# Printing Metrics
print('Selected features (Linear Regressiokn): ', X.columns[var_selector.get_support()])
print('\nMAE: ', linear_mae)
print('MSE: ', linear_mse)
print('RMSE: ', linear_rmse)
print('R2: ', linear_r2)

Selected features (Linear Regressiokn):  Index(['Year', 'Adult Mortality', 'infant deaths', 'Alcohol',
       'percentage expenditure', 'Hepatitis B', 'Measles', 'BMI',
       'under-five deaths', 'Polio', 'Total expenditure', 'Diphtheria',
       'HIV/AIDS', 'GDP', 'Population', 'thinness  1-19 years',
       'thinness 5-9 years', 'Income composition of resources', 'Schooling'],
      dtype='object')

MAE:  3.02
MSE:  16.81
RMSE:  4.1
R2:  0.81


In [None]:
# KNN Regressor Model
from sklearn.neighbors import KNeighborsRegressor

knn = KNeighborsRegressor(n_neighbors = 3)
knn.fit(X_train_var, y_train)

In [None]:
# Predicting KNN's Y
y_knn_pred = knn.predict(X_test_var)

In [None]:
# Calculating Metrics
knn_mae = round(mean_absolute_error(y_test, y_knn_pred), 2)
knn_mse = round(mean_squared_error(y_test, y_knn_pred), 2)
knn_rmse = round(mean_squared_error(y_test, y_knn_pred, squared = False), 2)
knn_r2 = round(r2_score(y_test, y_knn_pred), 2)

# Printing Metrics
print('Selected Features (KNN): ', X.columns[var_selector.get_support()])
print('\nMAE: ', knn_mae)
print('MSE: ', knn_mse)
print('RMSE: ', knn_rmse)
print('R2: ', knn_r2)

Selected Features (KNN):  Index(['Year', 'Adult Mortality', 'infant deaths', 'Alcohol',
       'percentage expenditure', 'Hepatitis B', 'Measles', 'BMI',
       'under-five deaths', 'Polio', 'Total expenditure', 'Diphtheria',
       'HIV/AIDS', 'GDP', 'Population', 'thinness  1-19 years',
       'thinness 5-9 years', 'Income composition of resources', 'Schooling'],
      dtype='object')

MAE:  6.77
MSE:  89.44
RMSE:  9.46
R2:  -0.02


In [None]:
# Second Selection Feature Method: SelectKBest
kbest_selector = SelectKBest(f_regression, k = 3)
kbest_selector.fit(X_train, y_train)

kbest_selected_features = X.columns[kbest_selector.get_support()]
print('Selected Features and KBest: ', len(kbest_selected_features))

Selected Features and KBest:  3


In [None]:
X_train_kbest = kbest_selector.transform(X_train)
X_test_kbest = kbest_selector.transform(X_test)

In [None]:
# Linear Model
LinearModel2 = LinearRegression()
LinearModel2.fit(X_train_var, y_train)

In [None]:
# Predicting Linear Regression's Y
y_linear_pred2 = LinearModel2.predict(X_test_var)

In [None]:
# Calculating Metrics
linear2_mae = round(mean_absolute_error(y_test, y_linear_pred2), 2)
linear2_mse = round(mean_squared_error(y_test, y_linear_pred2), 2)
linear2_rmse = round(mean_squared_error(y_test, y_linear_pred2, squared = False), 2)
linear2_r2 = round(r2_score(y_test, y_linear_pred2), 2)

# Printing Metrics
print('Selected features (Linear Regression): ', X.columns[kbest_selector.get_support()])
print('\nMAE: ', linear_mae)
print('MSE: ', linear_mse)
print('RMSE: ', linear_rmse)
print('R2: ', linear_r2)

Selected features (Linear Regression):  Index(['Adult Mortality', 'Income composition of resources', 'Schooling'], dtype='object')

MAE:  3.02
MSE:  16.81
RMSE:  4.1
R2:  0.81


In [None]:
# KNN Regressor Model
knn2 = KNeighborsRegressor(n_neighbors = 3)
knn2.fit(X_train_var, y_train)

In [None]:
# Predicting KNN's Y
y_knn_pred2 = knn2.predict(X_test_var)

In [None]:
# Calculating Metrics
knn2_mae = round(mean_absolute_error(y_test, y_knn_pred2), 2)
knn2_mse = round(mean_squared_error(y_test, y_knn_pred2), 2)
knn2_rmse = round(mean_squared_error(y_test, y_knn_pred2, squared = False), 2)
knn2_r2 = round(r2_score(y_test, y_knn_pred2), 2)

# Printing Metrics
print('Selected Features (KNN): ', X.columns[kbest_selector.get_support()])
print('\nMAE: ', knn_mae)
print('MSE: ', knn_mse)
print('RMSE: ', knn_rmse)
print('R2: ', knn_r2)

Selected Features (KNN):  Index(['Adult Mortality', 'Income composition of resources', 'Schooling'], dtype='object')

MAE:  6.77
MSE:  89.44
RMSE:  9.46
R2:  -0.02
