# Evaluating a ML Model
![](https://media.giphy.com/media/mnwc6vn9T8dag/giphy.gif)<br>
###### Give this a read - https://scikit-learn.org/stable/modules/model_evaluation.html
<p>There are 3 different APIs for evaluating the quality of a model&rsquo;s predictions:</p>
<ul class="simple">
<li>
<p><strong>Estimator score method</strong>: Estimators have a&nbsp;<code class="docutils literal notranslate"><span class="pre">score</span></code>&nbsp;method providing a default evaluation criterion for the problem they are designed to solve. This is not discussed on this page, but in each estimator&rsquo;s documentation.</p>
</li>
<li>
<p><strong>Scoring parameter</strong>: Model-evaluation tools using&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation"><span class="std std-ref">cross-validation</span></a>&nbsp;(such as&nbsp;<a class="reference internal" title="sklearn.model_selection.cross_val_score" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score"><code class="xref py py-func docutils literal notranslate"><span class="pre">model_selection.cross_val_score</span></code></a>&nbsp;and&nbsp;<a class="reference internal" title="sklearn.model_selection.GridSearchCV" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV"><code class="xref py py-class docutils literal notranslate"><span class="pre">model_selection.GridSearchCV</span></code></a>) rely on an internal&nbsp;<em>scoring</em>&nbsp;strategy. This is discussed in the section&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter"><span class="std std-ref">The scoring parameter: defining model evaluation rules</span></a>.</p>
</li>
<li>
<p><strong>Metric functions</strong>: The&nbsp;<code class="xref py py-mod docutils literal notranslate"><span class="pre">metrics</span></code>&nbsp;module implements functions assessing prediction error for specific purposes. These metrics are detailed in sections on&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics"><span class="std std-ref">Classification metrics</span></a>,&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/model_evaluation.html#multilabel-ranking-metrics"><span class="std std-ref">Multilabel ranking metrics</span></a>,&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics"><span class="std std-ref">Regression metrics</span></a>&nbsp;and&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/model_evaluation.html#clustering-metrics"><span class="std std-ref">Clustering metrics</span></a>.</p>
</li>
</ul>
<p>Finally,&nbsp;<a class="reference internal" href="https://scikit-learn.org/stable/modules/model_evaluation.html#dummy-estimators"><span class="std std-ref">Dummy estimators</span></a>&nbsp;are useful to get a baseline value of those metrics for random predictions.</p>

In [1]:
# standard imports
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

In [2]:
#import dataset
hd = pd.read_csv("https://raw.githubusercontent.com/ineelhere/Machine-Learning-and-Data-Science/master/scikit-learn/heart-disease.csv")
hd.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [3]:
#get-set-go!

# import the ensemble classifiers
from sklearn.ensemble import RandomForestClassifier

# setup a random seed
np.random.seed(42)

# create the data
x = hd.drop("target", axis=1)
y = hd["target"]

# split into test and train sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2)

# fit the model to the data = training the ML model
clf = RandomForestClassifier() #isntantiate
clf.fit(x_train, y_train) #fit

# evaluate the above fitted RandomForestClassifier model = use the pattens the ML model has learnt above!
clf.score(x_test,y_test)

0.8524590163934426

### 3 ways to evaluate sklearn models/estimators
* Estimator `score()` method.
* The `scoring` parameter.
* Problem specific metric functions.

#### The `score() method` (already done above though)

In [4]:
clf.score(x_train,y_train) #Return the mean accuracy on the given test data and labels.

1.0

In [5]:
clf.score(x_test,y_test) #Return the mean accuracy on the given test data and labels.

0.8524590163934426

Let us do the same, but now with regression

In [7]:
# import the ensemble regressors
from sklearn.ensemble import RandomForestRegressor

# setup a random seed
np.random.seed(42)

# create the data
x = hd.drop("target", axis=1)
y = hd["target"]

# split into test and train sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2)

# fit the model to the data = training the ML model
model = RandomForestRegressor() #isntantiate
model.fit(x_train, y_train) #fit

# evaluate the above fitted RandomForestClassifier model = use the pattens the ML model has learnt above!
model.score(x_test,y_test)

0.5106393318965518

In [8]:
model.score(x_train,y_train) #Return the coefficient of determination R^2 of the prediction.

0.924203269641995

In [9]:
model.score(x_test,y_test) #Return the coefficient of determination R^2 of the prediction.

0.5106393318965518

See the difference?
* for classification - `score()` - Return the `mean accuracy` on the given test data and labels.
<br>
<pre>
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
</pre>
* for regression - `score()` - Return the `coefficient of determination` R^2 of the prediction.
<br>
<pre>The coefficient R^2 is defined as (1 - u/v), where u is the residual
sum of squares ((y_true - y_pred) ** 2).sum() and v is the total
sum of squares ((y_true - y_true.mean()) ** 2).sum().
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always
predicts the expected value of y, disregarding the input features,
would get a R^2 score of 0.0.
</pre>

<br>
<br><strong>Yeah, you need to know the math!</strong><br>

![](https://media.giphy.com/media/bupsZiBKn7vAk/giphy.gif)
<br>
In case you want to learn these statistical and mathematical concepts, this resource would be very helpful (available for free)<br><br>

* MIT 18.650 Statistics for Applications, Fall 2016 - https://www.youtube.com/playlist?list=PLUl4u3cNGP60uVBMaoNERc6knT_MgPKS0
* MIT OCW - https://ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/