# Regression metrics from scratch

The `sklearn.metrics` module implements several loss, score, and utility functions to measure the performance of our trained models. In the last notebook you defined functions for several classification metrics on your own and compared the results to the imported functions from `sklearn.metrics`. Now it's time to do the same for some of the most common regression metrics.

## Task

In this notebook you will write functions for three of the most commonly used regression metrics, namely 
* MAE (`mean_absolute_error`)
* MSE (`mean_squared_error`)
* R-squared (`r2_score`)

To check whether your functions work as expected import the **wine-quality** dataset from the data folder and split the data into a train and test dataset. Fit a linear regression model on the train set and make predictions for the test set. 
Import the three regression metrics from the `sklearn.metrics` module and compare their result with the result of your self-written functions. 

In [2]:
# importing the data
import pandas as pd
df = pd.read_csv('data/wine-quality.csv', sep=";")
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [3]:
#split the data into a train and test dataset

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

X = df.drop(columns=['quality'])
y = df['quality']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mae_sk = mean_absolute_error(y_test, y_pred)
mse_sk = mean_squared_error(y_test, y_pred)
r2_sk = r2_score(y_test, y_pred)
print ("mae:", mae_sk, "mse:", mse_sk, "R²:", r2_sk)


mae: 0.5777235656921463 mse: 0.5542927330136752 R²: 0.26586871324780914


#### Mean Absolute Error (MAE)

The [mean absolute error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html) (or MAE) is the average of the absolute differences between predictions and actual values. It gives an idea of how wrong the predictions were. The measure gives an idea of the magnitude of the error, but no idea of the direction (e.g. over or under predicting).

In [4]:
import numpy as np

def mae(y_true, y_pred):
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    return np.mean(np.abs(y_true - y_pred))





#### Mean Squared Error (MSE)

The [mean squared error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) (or MSE) is much like the mean absolute error in that it provides a gross idea of the magnitude of error.
Taking the square root of the mean squared error converts the units back to the original units of the output variable and can be meaningful for description and presentation. This is called the **root mean squared error** (or RMSE).

In [5]:
# Your code for MSE (or RMSE)!
def mse(y_true, y_pred):
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    return np.mean((y_true - y_pred) ** 2)

#### R-squared

The [R^2](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html) (or R-squared) metric provides an indication of the goodness of fit of a set of predictions to the actual values. In statistical literature, this measure is called the coefficient of determination.
It is a value between 0 and 1 for no-fit and perfect fit respectively.

In [6]:
# Your code for R-squared!


def r2(y_true, y_pred):
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - ss_res / ss_tot

#### Test your functions :) 
Start with the import of the necessary functions and modules. 

In [8]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_pred = model.predict(X_test)

assert np.isclose(mae(y_test, y_pred), mean_absolute_error(y_test, y_pred))
assert np.isclose(mse(y_test, y_pred), mean_squared_error(y_test, y_pred))
assert np.isclose(r2(y_test, y_pred), r2_score(y_test, y_pred))
