In [1]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython

config_ipython()
setup_matplotlib()
set_css_style()

# Performance Metrics in Regression

## RMSE: Root Mean Square Error

This is probably the most common metric used to assess the quality of a regression task. The RMSE is calculated as

$$
RMSE = \sqrt{\frac{ \sum_{i=1}^n (y_i - \hat{y_i})^2 }{n}} \ ,
$$

where $n$ is the number of samples in the set, $y$ the actual value and $\hat{y}$ the predicted score. This metric represents the square root of the average of the squared differences between the actual and the predicted values. 

## RSS: Residual Sum of Squares

The RSS is calculated as

$$
RSS = \sum_{i=1}^n (y_i - \hat{y_i})^2
$$

The RSS expresses the *unexplained* variance, the variance not captured by the model.

## $R^2$: Coefficient of Determination

The coefficient of determination, usually indicated as $R^2$, expresses the proportion of the variance in the dependent variable that is predictable from the independent variable. It is a number smaller or equal than 1, 1 being the best situation.

Calling $\hat{y}$ the predicted values and $y$ the actual values, we calculate the average of the actual values 

$$
\bar y = \frac{1}{n} \sum_{i=1}^n y_i \ ,
$$

the *total sum of squares*

$$
SS_{TOT} = \sum_{i=1}^n (y_i - \bar y)^2
$$

and the explained sum of squares

$$
SS_{exp} = \sum_{i=1}^n (\hat{y_i} - \hat y)^2
$$

With the definition of the RSS from above, we have

$$
R^2 = 1 - \frac{RSS}{SS_{TOT}}
$$

The second bit expresses the fraction of unexplained variance to the total variance in the data, so the $R^s$ is the fraction of variance explained to the total variance.

## MAE: Mean Absolute Error

The MAE is calculated as

$$
MAE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i}) \ ,
$$

that is, as the average of the differences of the actual to the predicted values. 