# 4. Evaluating Model.
Once you've trained a model, you'll want a way to measure how trustworthy its predictions are.

Scikit-Learn has 3 different API for evaluating the quality of a model's predictions:
1. **Estimator score method**: score() method provides default evaluation criterion for the problem they are designed to solve. This is embedded in the chosen machile learning estiomator algorithm. Thus each estimator will have their own score() method.
1. **Scoring parameter**: Model evaluation tools using cross validation (such as model_selection.cross_val_score and model_selection.GridSearchCV) rely on an internal scoring strategy. 
1. **Metric functions**: The sklearn.metrics module implements functions assessing prediction error for specific purposes. These metrics are detailed in sections on Classification metrics, Multilabel ranking metrics, Regression metrics and Clustering metrics. 

Finally, Dummy estimators are useful to get a baseline value of those metrics for random prodictions.

In this session we will try to evaluate the model deeper. One case for classification will use the heart_disease data. The other case for regression will use the Boston Housing data frame. 

## 4.1 Default score() method
This is the method attached to the estimator selected.

In our cases it will be score() methods for the RandomForestClassifier() and RandomForestRegressor() estimators.

In [19]:
import numpy as np, pandas as pd 
# import the heart disease data from csv
heart_df = pd.read_csv('../data/heart-disease.csv')
heart_df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


I think since we already use this data before in previous sessions there are no need to validate the data any more. However, for best practice I will still validate just to see no NaN and Null in the data. 

In [20]:
# check for any Null data inside the df
heart_df.isnull().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [21]:
# check for any NaN data in the df
heart_df.isna().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

All are safe, no null and no NaN values inside the data frame. We can start splitting the dataframe into label data and features data.

In [22]:
heart_y = heart_df['target']
heart_y.head()

0    1
1    1
2    1
3    1
4    1
Name: target, dtype: int64

In [23]:
heart_X = heart_df.drop('target', axis=1)
heart_X.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2


In [24]:
# split the data into train and tests
from sklearn.model_selection import train_test_split
# set random seed
np.random.seed(42)
heart_X_train, heart_X_test, heart_y_train, heart_y_test = train_test_split(heart_X, heart_y, test_size=0.2)
(heart_X_train.shape, heart_y_train.shape, heart_X_test.shape, heart_y_test.shape)

((242, 13), (242,), (61, 13), (61,))

Now the split to train and test are done next is to select the model, fit it and then score it. 

For this I directly just select the RandomForestClassifier.

In [25]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
# fitting the model to the train and test data
rfc.fit(heart_X_train, heart_y_train)
rfc.score(heart_X_test, heart_y_test)

0.8524590163934426

I think it will be better to make the regression process first. This will make it easier as I will explore more model evaluation for classification and regression. Thius the progression of the model evaluation learning will be more seamless.

In [26]:
from sklearn.datasets import load_boston
boston = load_boston()
# since boston is a dictionary I need to transform to dataframe.
boston_df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
boston_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [27]:
# Now I need to add the target column to the boston df
boston_df['target'] = boston['target']
boston_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


Now as usual as best practice I will check for NaN, and Null but I will make it faster in the form of total of all axis=1.

In [28]:
(boston_df.isna().sum().sum(), boston_df.isnull().sum().sum())

(0, 0)

Okay all data are valid since non NaN and null values inside the boston_df. 

Now we are ready to go on splitting the labelled data y and features data X.

Then split it into train and test data

In [29]:
boston_y = boston_df['target']
boston_X = boston_df.drop('target', axis=1)
boston_X_train, boston_X_test, boston_y_train, boston_y_test = train_test_split(boston_X, boston_y, test_size=0.2)
# since I already set the random seed above I don't need to set it again here as it is universal for all inside this file
# validate the shape
(boston_X_train.shape, boston_y_train.shape, boston_X_test.shape, boston_y_test.shape)

((404, 13), (404,), (102, 13), (102,))

Okay the shapes of the train test split results are validated.

Now we are ready to fit to the model and score it.

In [30]:
from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor()
rfr.fit(boston_X_train, boston_y_train)
rfr.score(boston_X_test, boston_y_test)

0.8494501439301596

Here the model evaluations are using the model() methods from each estimator. 

NOTE: different estimator will most likely have different score() method algorithm. For instance:

For RandomForestClassifier the score uses mean accuracy while for RandomForestRegressor uses coefficient of determination.

Calling the score() method on any model instance and passing it test data is a good quick way to see how the model is going. However, when you get further into a problem, it's likely you'll want to start using more powerful metrics to evaluate your models performance.

## 4.2 Evaluating the model using the scoring parameter
The next step up from using score() is to use custom scoring parameter with cross_val_score() or GridSearchCV.

As you mya have gusessed, the scoring parameter you set will be different depending on the problem you're working on. 

We'll see some specific examples of different parameters in a moment but first let's check out cross_val_score().

We will use the heart_disease data frame and previous random forest classifier estimator we instantiate previously.

In [31]:
from sklearn.model_selection import cross_val_score
# note the random fores classifier is baing instantiated as rfc thus:
cross_val_score(rfc, heart_X, heart_y)

array([0.81967213, 0.86885246, 0.80327869, 0.86666667, 0.78333333])

The result is an array containing five numbers which resembles the score() result from previous scoring using the default random forest classifier estimator.

Just refresher here is the score of the random forest classifier on heart disease model:

In [32]:
rfc.score(heart_X_test, heart_y_test)

0.8524590163934426

The reason why the cross validation result array with 5 numbers is because when building random forest classifier estimator we set the test-size=0.2 which means one fifth of the total data frame total data.

Let's see their mean:

In [33]:
np.mean(cross_val_score(rfc, heart_X, heart_y))

0.8282513661202187

Learn more about this later on.