# Evaluation of the Model

### Three Ways to Evaluate Scikit Learn Model / Estimator.

**A.Estimator Score Method**

**B.The Scoring Parameter**

**C.Problem Specific Metric Function**

## Evaluation of a Regression Model

### Regression Model Evaluation Metrics

**1.R^2 : R Squared or Coefficient of Determination**

**2.MAE : Mean Absolute Error**

**3.MSE : Mean Squared Error**

**Import Libraries**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

**Import Data Set**

In [2]:
from sklearn.datasets import load_boston

Data = load_boston()

**Convert Data into Data Frame**

In [3]:
Boston = pd.DataFrame(data=Data.data, columns=Data.feature_names)

Boston['Target'] = Data.target

In [4]:
Boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


**Splitting the Data into Independent Features and Dependent Variables Labels**

In [5]:
X = Boston.drop('Target', axis = 'columns')

Y = Boston['Target']

**Splitting Data into Training Set and Testing Set**

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
np.random.seed(101)

X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size=0.3)

**X_Train : The Set of Data to be Trained.**

**Y_Train : The Set of Data consists of Class - Labels of Corresponding X_Train Data.**

**X_Test : The Set of Data on which the Trained Model is to Tested to Check the Accuracy of the Model.**

**Y_Test : The Set of Data Labels which is to be Compared with the Result of Trained Model to Chec Accuracy as well as Evaluation Metrics.**

In [8]:
print(f'70% of Train Set : {X_Train.shape} and Labels of Train Set {Y_Train.shape}')

print(f'70% of Train Set : {X_Test.shape} and Labels of Test Set {Y_Test.shape}')

70% of Train Set : (354, 13) and Labels of Train Set (354,)
70% of Train Set : (152, 13) and Labels of Test Set (152,)


### According to Scikit Learn Algorithm Map

**Ridge Regression**

In [9]:
from sklearn.linear_model import Ridge

**Instantiate the Ridge Model Object**

In [10]:
Ridge_Model = Ridge()

**Fit the Training Data to the Ridge Model to find Pattern**

In [11]:
Ridge_Model.fit(X_Train, Y_Train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

### 1.Estimator Score Method :

**Check the R Square of Model on Training Data and Testing Data**

In [12]:
print(f'Score (R^2) of Model on Training Data : {Ridge_Model.score(X_Train, Y_Train)*100:.2f}%')

print()

print(f'Score (R^2) of Model on Testing Data :{Ridge_Model.score(X_Test, Y_Test)*100:.2f}%')

Score (R^2) of Model on Training Data : 74.42%

Score (R^2) of Model on Testing Data :71.03%


### How to Improve the Score ?

**According to Scikit Learn Algorithm Map let's Try Randm Forest Regressor**

**Splitting the Data Set into Independent and Dependent Features**

In [13]:
X = Boston.drop('Target', axis = 'columns')

Y = Boston['Target']

**Splitting the Data into Training and Testing Data Sets**

In [14]:
np.random.seed(101)

X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size = 0.3)

**X_Train :  The Set of Data which wil be Trained.**

**Y_Train : The Set of Dependent Feature i.e.Labels of Corresponding X_Train Data which is to be Trained with X_Train.**

**X_Test : The Set of Data on which the Trained Model is to be Tested to Check its Accuracy.**

**Y_Test : The Set of Dependent Features of which will be Compared with the X_Test Results after Testing to Chec Accuracy and Evaluation Metrics.**

**Import the Random Forest Regressor**

In [15]:
from sklearn.ensemble import RandomForestRegressor

**Instantiate the Random Forest Regressor Object**

In [16]:
RFR_Model = RandomForestRegressor(n_estimators=100)

**Fit the Model with the Data to Find the Patterns**

In [17]:
RFR_Model.fit(X_Train, Y_Train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

**Check the R^2 of Model on Training and Testing Data Set**

In [18]:
print(f'R^2 of Random Forest Regressor on Train Set : {RFR_Model.score(X_Train, Y_Train)*100:.2f}%')

print()

print(f'R^2 of Random Forest Regressor on Test Set : {RFR_Model.score(X_Test, Y_Test)*100:.2f}%')

R^2 of Random Forest Regressor on Train Set : 98.13%

R^2 of Random Forest Regressor on Test Set : 86.76%


**R^2 of Random Forest Regressor is better than Ridge**

**Prediction of Model on X_Test Data**

In [19]:
RFR_Prediction = RFR_Model.predict(X_Test)

**To Improve the Accuracy further, we will Check and Improve Evaluation Metrics of Regression Model**

### Evaluation of Random Forest Regressor 

**[Metrics and Scoring : Quantifing the Quality of Predictions](https://scikit-learn.org/stable/modules/model_evaluation.html)**

**Random Forest Regressor Evaluation Metrics.**

**1. R^2 : R Square or Coeffiecient of Determination**

**R^2 Compares the Predictions of Model with the Mean of the Target Variables**

**R^2 is similar to Accuracy, Quick Indication of How well the Model might be.**

**But it do not tell Exactly how wrong your Model is and How much the difference is between the Predictions and Actual values.**

In [20]:
print(f'R^2 of Random Forest Regressor on Train Set : {RFR_Model.score(X_Train, Y_Train)*100:.2f}%')

print()

print(f'R^2 of Random Forest Regressor on Test Set : {RFR_Model.score(X_Test, Y_Test)*100:.2f}%')

R^2 of Random Forest Regressor on Train Set : 98.13%

R^2 of Random Forest Regressor on Test Set : 86.76%


**2.MAE : Mean Absolute Error**

**The Average of the Absolute Difference between the Predictions and the Actual Values**

**It gives us Idea of How Wrong our Model Predictions are.**

In [21]:
from sklearn.metrics import mean_absolute_error

In [22]:
print(f'Mean Absolute Error : {round(mean_absolute_error(Y_Test,RFR_Prediction),2)}')

Mean Absolute Error : 2.6


**To Understand it Better we will Compare the Actual Value and the predicted Value by the Model in a Data Frame.**

In [23]:
MAE = pd.DataFrame({'Actual Value' : Y_Test,
                    'Predicted Value' : RFR_Prediction})

MAE['Absolute Error'] = abs(MAE['Predicted Value'] - MAE['Actual Value'])

print(MAE[:5])

print()

print('Mean Absolute Error of the Predictions made by the Model :',round(MAE['Absolute Error'].mean(),2))

     Actual Value  Predicted Value  Absolute Error
195          50.0           47.155           2.845
4            36.2           33.137           3.063
434          11.7           13.534           1.834
458          14.9           15.435           0.535
39           30.8           28.998           1.802

Mean Absolute Error of the Predictions made by the Model : 2.6


**3.MSE : Mean Squared Error**

**The Average of the Squared Difference between the Predictions and the Actual Values**

**Squaring the Error Amplifies the Large Difference**

In [24]:
from sklearn.metrics import mean_squared_error

In [25]:
print(f'Mean Squared Error : {round(mean_squared_error(Y_Test,RFR_Prediction),2)}')

print()

print(f'Root Mean Squared Error : {round(np.sqrt(mean_squared_error(Y_Test, RFR_Prediction)),2)}')

Mean Squared Error : 13.14

Root Mean Squared Error : 3.63


**To Understand it better we will compare the Actual Values and the Predictions**

In [26]:
MSE = pd.DataFrame({'Actual Values' : Y_Test,
                    'Predicted Values' : RFR_Prediction})

MSE['Squared Error'] = pow(MSE['Predicted Values'] - MSE['Actual Values'],2)

print(MSE[:5])

print()

print('Mean of Squared Error :',round(MSE['Squared Error'].mean(),2))

     Actual Values  Predicted Values  Squared Error
195           50.0            47.155       8.094025
4             36.2            33.137       9.381969
434           11.7            13.534       3.363556
458           14.9            15.435       0.286225
39            30.8            28.998       3.247204

Mean of Squared Error : 13.14


**For Better Model, Our Aim should be to Maximize R^2 and Minimize the MAE and MSE**

## 2.Scoring Parameter :

**Model Evaluation Tools using Cross Value Score or Grid Search CV rely on Scoring Parameters.**

### Cross Validation Score for Regression Model

In [27]:
from sklearn.model_selection import cross_val_score

np.random.seed(101)

CVS = cross_val_score(RFR_Model, X, Y, cv=5)

**5 Fold Cross Validation Score for Random Forest Regressor**

In [28]:
print(CVS)

[0.78220258 0.86330421 0.7390564  0.4749719  0.25629191]


**Cross Validation Mean Score of Random Forest Regressor**

In [29]:
print(f'Cross Validation Mean Score : {np.mean(CVS)*100:.2f}%')

Cross Validation Mean Score : 62.32%


**Default Scoring Parameter = None (R^2)**

In [30]:
np.random.seed(101)

CVS_R2 = cross_val_score(RFR_Model, X, Y, cv=5, scoring='r2')

print(f'Cross Validation Mean Score : {np.mean(CVS_R2)*100:.2f}%')

Cross Validation Mean Score : 62.32%


**Scoring Parameter = MAE**

In [31]:
np.random.seed(101)

CVS_MAE = cross_val_score(RFR_Model, X, Y, cv = 5, scoring='neg_mean_absolute_error')

print(f'Cross Validation Mean Score : {CVS_MAE}')

Cross Validation Mean Score : [-2.04432353 -2.56029703 -3.40577228 -3.76956436 -3.10521782]


**Scoring Parameter = MSE**

In [32]:
np.random.seed(101)

CVS_MSE = cross_val_score(RFR_Model, X, Y, cv = 5, scoring='neg_mean_squared_error')

print(f'Cross Validation Mean Score : {CVS_MSE}')

Cross Validation Mean Score : [ -7.52167634 -12.44434115 -20.89824282 -46.05118669 -19.77394643]


## C. Problem Specific Metric Function

**Regression Evaluation Functions**

In [33]:
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [34]:
print(f'R^2 Score : {r2_score(Y_Test, RFR_Prediction)*100:.2f}%')

print()

print(f'Mean Absolute Error : {round(mean_absolute_error(Y_Test, RFR_Prediction),2)}')

print()

print(f'Mean Squared Error : {round(mean_squared_error(Y_Test, RFR_Prediction),2)}')

print()

print(f'Root Mean Squared Error : {round(np.sqrt(mean_squared_error(Y_Test, RFR_Prediction)),2)}')

R^2 Score : 86.76%

Mean Absolute Error : 2.6

Mean Squared Error : 13.14

Root Mean Squared Error : 3.63
