# Regression evaluation metrics
This notebook explores different evaluation metrics that can be used with regression models. It uses the Boston house price dataset, with the following features:

<pre>
CRIM    per capita crime rate by town
ZN      proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS   proportion of non-retail business acres per town
CHAS    Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX     nitric oxides concentration (parts per 10 million)
RM      average number of rooms per dwelling
AGE     proportion of owner-occupied units built prior to 1940
DIS     weighted distances to five Boston employment centres
RAD     index of accessibility to radial highways
TAX     full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
LSTAT % lower status of the population
MEDV    Median value of owner-occupied homes in $1000's
<pre>

## Imports

In [1]:
# Core libraries
import pandas as pd

# Sklearn processing
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# Sklearn regression algorithms
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor

# Sklearn regression model evaluation functions
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.metrics import r2_score

## Load data

In [2]:
# Load Boston housing data set
boston = pd.read_csv("boston.csv")

## Inspect data

In [3]:
# View the features
boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33,36.2


## Split into X and y

In [4]:
# Define the X (input) and y (target) features
X = boston.drop("MEDV", axis=1)
y = boston["MEDV"]

## Scale features to same range

In [5]:
# Rescale the input features
scaler = MinMaxScaler(feature_range=(0,1))
X = scaler.fit_transform(X)

  return self.partial_fit(X, y)


## Split into train and test sets

In [6]:
# Split into train (2/3) and test (1/3) sets
test_size = 0.33
seed = 7
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=seed)

## Build 2 models and explore evaluation metrics on training set

In [7]:
# Build some models and check them against training data using MAE, RMSE and R2
models = [LinearRegression(), KNeighborsRegressor()]
for model in models:
    model.fit(X_train, y_train)
    predictions = model.predict(X_train)
    print(type(model).__name__)
    print("    MAE", mean_absolute_error(y_train, predictions))
    print("    RMSE", sqrt(mean_squared_error(y_train, predictions)))
    print("    R2", r2_score(y_train, predictions))

LinearRegression
    MAE 3.320553824991157
    RMSE 4.669735688876091
    R2 0.7538411248592967
KNeighborsRegressor
    MAE 2.5493215339233037
    RMSE 3.921007185039673
    R2 0.8264493776882362


## Explore evaluation metrics on test set

In [8]:
# Evaluation the models against test data using MAE, RMSE and R2
for model in models:
    predictions = model.predict(X_test)
    print(type(model).__name__)
    print("    MAE", mean_absolute_error(y_test, predictions))
    print("    RMSE", sqrt(mean_squared_error(y_test, predictions)))
    print("    R2", r2_score(y_test, predictions))

LinearRegression
    MAE 3.4097336094727595
    RMSE 5.086878583324625
    R2 0.6590081405512094
KNeighborsRegressor
    MAE 3.0038323353293412
    RMSE 4.546077074408997
    R2 0.7276578531589541
