# Evaluation Metrics

So far, we have mainly used the $R^2$ metric to evaluate our models. There are many other evaluation metrics that are provided by scikit-learn and are all found in the metrics module.

### Score vs Error/Loss metrics

Take a look at the [metrics module][1] in the API. You will see a number of different evaluation metrics for classification, clustering, and regression. Most of these end with either the word 'score' or 'error'/'loss'. Those functions that end in 'score' return a metric where **greater is better**. For example, the `r2_score` function returns $R^2$ in which a greater value corresponds with a better model.

Other metrics that end in 'error' or 'loss' return a metric where **lesser is better**. That should make sense intuitively, as minimizing error or loss is what we naturally desire for our models.

### Regression Metrics

Take a look at the regression metrics section of the scikit-learn API. These are all functions that accept the ground truth y values along with the predicted y values and return a metric. Let's see a few of these in action. We will read in the data, build a model with a few variables using one of the supervised regression models we've covered and then use one of the metric functions.

[1]: https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics

In [None]:
import pandas as pd
import numpy as np
housing = pd.read_csv('../data/housing_sample.csv')
X = housing[['GrLivArea', 'GarageArea', 'FullBath']]
y = housing['SalePrice']
X.head()

Let's use a random forest to model the relationship between the input and sale price and complete our standard three-step process.

In [None]:
from sklearn.ensemble import RandomForestRegressor
rfr = RandomForestRegressor(n_estimators=50)
rfr.fit(X, y);

First, use the built-in `score` method which always returns the $R^2$ for every regression estimator.

In [None]:
rfr.score(X, y)

Let's verify that we can get the same result with the corresponding `r2_score` function from the metrics module. We need to get the predicted y-values and pass it along with the ground truth to the function.

In [None]:
from sklearn.metrics import r2_score
y_pred = rfr.predict(X)
r2_score(y, y_pred)

Let's use a different metric such as mean squared error (MSE).

In [None]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y, y_pred)

### Easy to construct our own function

Most of these metrics are easy to compute on your own. The function below computes the same result from above.

In [None]:
def mse(y_true, y_pred):
    error = y_true - y_pred
    return np.mean(error ** 2)

In [None]:
mse(y, y_pred)

Taking the square root of the MSE computes the root mean squared error (RMSE) which provides insight as to what the average error is, though it is theoretically going to be slightly larger than the average error. Therer is no function in scikit-learn to compute the RMSE. We can use the numpy `sqrt` function to calculate it.

In [None]:
rmse = np.sqrt(mean_squared_error(y, y_pred))
rmse

The units of this metric are the same as the target variable, so we can think of our model as "averaging" about &#36;18,000. The word averaging is in quotes because this isn't the actual average error, but will be somewhat near it. Use the `mean_absolute_error` to calculate the actual average error per observation.

In [None]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y, y_pred)

We can compute this manually as well.

In [None]:
(y - y_pred).abs().mean()

## Different metrics with cross validation

It is possible to use these scores when doing cross validation with the `cross_val_score` function. It has a `scoring` parameter that you can pass a string to represent the type of score you want returned. Let's see an example  with the default $R^2$ and then with other metrics. We use a linear regression here and continue to keep the data shuffled as before by setting the `random_state` parameter to 123.

In [None]:
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
kf = KFold(n_splits=5, shuffle=True, random_state=123)

By default, if no scoring method is given, `cross_val_score` uses the same metric as what the `score` method of the estimator uses.

In [None]:
cross_val_score(lr, X, y, cv=kf).round(2)

Use the string 'r2' to return $R^2$ values, which is the default and will be the same as above.

In [None]:
cross_val_score(lr, X, y, cv=kf, scoring='r2').round(2)

### Use the documentation to find the string names

The possible strings for each metric are found in the [user guide section of the official documentation][1]. The string 'neg_mean_squared_error' is used to return the negative of the mean squared error.

[1]: https://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values

In [None]:
cross_val_score(lr, X, y, cv=kf, scoring='neg_mean_squared_error')

### Why are negative values returned?

In an upcoming chapter, we cover model selection. scikit-learn selects models based on their scores and treats higher scores as better. But, with mean squared error, lower scores are better. In order to make this score work with model selection, scikit-learn negates this value when doing cross validation so that higher scores are indeed better. For instance, a score of -9 is better than -10.

### Mean squared log error

Another popular regression scoring metric is the mean squared log error. This works by computing the natural logarithm of both the predicted value and the ground truth, then calculates the error, squares it and takes the mean. Let's import the function from the metrics module and use it.

In [None]:
from sklearn.metrics import mean_squared_log_error
mean_squared_log_error(y, y_pred)

We can use the metric with `cross_val_score` by passing it the string 'neg_mean_squared_log_error'. Again, greater scores here are better.

In [None]:
cross_val_score(lr, X, y, cv=kf, scoring='neg_mean_squared_log_error')

### Finding the error metrics

You can find all the error metrics by navigating to the scikit-learn API or the user guide, but you can also find them directly in the `SCORERS` dictionary in the `metrics` module. The keys of the dictionary are the string names of the metrics. If you are on Python 3.7 or later, the dictionary will be ordered. There are eight (as of now) regression metrics and they are listed first. Let's take a look at their names.

In [None]:
from sklearn.metrics import SCORERS
list(SCORERS)[:8]

Let's use the maximum error as our cross validation metric, which simply returns the maximum error of all the predictions.

In [None]:
cross_val_score(lr, X, y, cv=kf, scoring='max_error').round(-3)

Most of the built-in scoring metrics are for classification or clustering and not for regression. Let's find the total number of scoring metrics.

In [None]:
len(SCORERS)

### Custom scoring functions

If you desire to use a scoring metric not built into scikit-learn, you can create your own custom scoring function. This is a bit more advanced and will be presented in a later chapter.

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">Use some of the available regression scoring metrics available in the `metrics` module to compute scores on various models. Use these metrics again when doing cross validation.</span>

### Exercise 2

<span  style="color:green; font-size:16px">Write a function that computes the mean squared log error. scikit-learn adds one first before taking the log.</span>