# Model evaluation

Suppose that we have learned a model for a specific task, e.g. to estimate the quality of wine. How do we know how good the model actually is at this task? In the process of evaluating the model, we try to assess that. In practice, we may not only care about effectiveness, but also efficiency and a qualitative analysis of mistakes that the model makes. But for now, we will focus on effectiveness.

# Metrics

To describe how effective a classifier is, we could for instance report how often the model makes the right prediction. But beyond looking at one model, we can also compare two models to conclude if one model is better than the other. In both cases, we need a way to measure the extent to which we reach our objective: an **evaluation metric**.

There are many existing evaluation metrics. Some of them apply to just one particular type of task (e.g. regression, classification), and some to multiple types of tasks. For now, we focus on a few of the most commonly used evaluation metrics in Machine Learning for Regression:

- Root Mean Squared Error (RMSE)
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)
- $R^2$

# Data

We will start by loading the Wine data set for Linear Regression, i.e. using the quality score as the target variable.

In [1]:
from ml import advertising_pd

In [2]:
df = advertising_pd()

In [3]:
X = df[['TV']]
y = df.Sales

In [4]:
from sklearn.model_selection import train_test_split

train_X, valid_X, train_y, valid_y = train_test_split(X, y)

# Model

We use a linear_regression model with sgd. The `ml` library can compute several metrics during training, we just have to pass them to metrics in a List. Currently there is support for mse, r2, acc(uracy), recall, precision, f1. For Linear Regression, mse and r2 are useful.

In [5]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()

# Train

In [6]:
model.fit(train_X, train_y)

LinearRegression()

# Computing metrics with SKLearn

After learning the model, we can use it to predict values all training examples. The functions in SKLearn that compute these metrics  and compute the Mean Squared Error:

In [7]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

In [8]:
y_pred = model.predict(valid_X)
mean_squared_error(y_pred, valid_y)

6.462942205365798

In [9]:
mean_absolute_error(y_pred, valid_y)

2.019234825110543

In [10]:
r2_score(y_pred, valid_y)

0.6908504924280363