<a href="https://colab.research.google.com/github/kellianneyang/teaching-materials/blob/main/Regression_Metrics_Cheat_Sheet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Regression Metrics Cheat Sheet**

*Adapted from Coding Dojo lesson in Machine Learning stack on regression metrics.*

To evaluate a machine learning model, we should use multiple metrics. Each metric gives us a value that tells us in some sense how well the model predicted the data.

All error metrics below show a RELATIVE measure (dependent on the scale and units of the target data, and meaningless out of context) except for R^2, which is between 0 and 1.

# Import libraries


```
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
```

Most metrics below can be calculated with either numpy or sklearn. The exception is that sklearn does not compute a root mean squared error (RMSE). 


# Mean Absolute Error (MAE)

"Mean absolute error measures the average of the *absolute values* of all of the errors our model makes."

Features:
- Penalizes small and large errors proportionally
- Accounts for negative and positive errors potentially cancelling out by taking the absolute value of the error
- Simple, easy to interpret
- Result unit is same unit as data

```
# numpy
train/test_MAE = np.mean(np.abs(train/test_pred - y_train/test))

#sklearn (note: must list true y values first, then predicted values)
train/test_MAE = mean_absolute_error(y_train/test, train/test_pred)
```


# Mean Squared Error (MSE)

"Mean squared error is similar to mean absolute error, but it penalizes large errors more. This metric squares the error for each sample and then averages those squared errors."

Features:
- Punishes larger errors more than smaller errors
- Result unit is squared
- More difficult to interpret than MAE

```
# numpy
train/test_MSE = np.mean(np.abs(train/test_pred - y_train/test)**2)

# sklearn
train/test_MSE = mean_squared_error(y_train/test, train/test_pred
```



# Root Mean Squared Error (RMSE)

"Root mean squared error is the square root of the mean squared error."

Features:
- Punishes larger errors more than smaller errors
- Result unit is same unit as data (therefore easier to interpret than MSE)

```
# numpy
train/test_RMSE = np.sqrt(np.mean(np.abs(train/test_pred - y_train/test)**2)

# sklearn (note: no dedicated method)
train/test_RMSE = np.sqrt(train/test_MSE)
```



# R-Squared Score (R2, R^2)

"The R2 score ... describes the percentage of the variation in the target variable that a model can explain by using all the features together."

Features:
- Expressed as a percentage (between 0 and 1) of the amount of target variable the model can predict
- Higher R2 not always better, lower R2 not always worse (must look at residual plots; subject to overfitting and other considerations)

```
# numpy
train/test_r2 = np.corrcoef(y_train/test, train/test_pred)[0][1]**2

# sklearn
train/test_r2 = r2_score(y_train/test, train/test_pred)
```

More information/helpful video about R2: 
[https://statisticsbyjim.com/regression/interpret-r-squared-regression/](https://statisticsbyjim.com/regression/interpret-r-squared-regression/)

