# Regression Evaluation

To demonstrate evaluation, we will create simple *regression model* using **sklearn**. Let's begin by importing required packages.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

We will generate a toy data set consisting of: 
- 6 observations (x) 
- and their target variable (y) 

This will constitute data for our model. `y` will be considered our ground truth.

In [2]:
# Generate data

x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])

Now, we have two arrays:  
>   - **input** `x` 
>   - **target** `y`

We have to call .reshape() on `x` because this array is required to be two-dimensional. This means, we need the array to only have one column and as many rows as necessary. That is exactly what `.reshape((-1, 1))` specifies. 

To check how `x` and `y` look now, we can use **print statements**:

In [3]:
print(x)

[[ 5]
 [15]
 [25]
 [35]
 [45]
 [55]]


In [4]:
print(y)

[ 5 20 14 32 22 38]


The next step is to create a **linear regression model** and **train** it using the existing data.

In [5]:
model = LinearRegression()
model.fit(x, y)

LinearRegression()

**R^2**
> You can obtain the Coefficient of Determination (𝑅²) with .score() called on model:

In [6]:
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)

coefficient of determination: 0.7158756137479542


**Mean Squared Error (MSE)** and **Root Means Squared Error (RMSE)**

In [7]:
# import MSE from sklearn
from sklearn.metrics import mean_squared_error

We will use method .predict() to obtain the predicted response.

In [8]:
y_pred = model.predict(x)

In [9]:
y_pred

array([ 8.33333333, 13.73333333, 19.13333333, 24.53333333, 29.93333333,
       35.33333333])

Then, we compute evaluation metrics:

In [13]:
# compute MSE
MSE = mean_squared_error(y, y_pred)  

# print MSE
print("MSE: ", round(MSE,2))

MSE:  33.76


All **regression evaluation functions** from `sklearn.metrics` take two mandatory arrays as parameters:  

The first is an array with ground truth values (in our case `y` variable) and the second is our prediction (in our case `y_pred` variable).  

To get **RMSE** from **MSE** we have to options: 
> * Compute the square root from MSE by Numpy
> * Use the squared=False option in a function. 

Let's try both options:

In [17]:
# RMSE by Numpy
RMSE = np.sqrt(MSE)
print("RMSE using Numpy:", round(RMSE,5))

# RMSE by sklearn
RMSE = mean_squared_error(y,y_pred,squared=False)
print("RMSE using squared=False:", round(RMSE,5))

RMSE using Numpy: 5.80995
RMSE using squared=False: 5.80995


We can see that both options are equal.  

We choose the **MSE** metric when we want to penalize bigger errors. 