# Evaluation of Regression Models

* Supervised learning: learn y using features X.
* r2 vs error
* Training and testing
* Cross validation
* Comparing to a baseline 

In [52]:
import pandas
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
abalone = pandas.read_csv('../Datasets/abalone.csv')
X = pandas.get_dummies(abalone.drop(columns=['Rings']))
scaled_X = pandas.DataFrame(scaler.fit_transform(X), columns=X.columns)
y = abalone['Rings']
X.sample()

Unnamed: 0,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Sex_F,Sex_I,Sex_M
1170,0.625,0.485,0.175,1.3745,0.7335,0.2715,0.332,1,0,0


In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor

model1 = LinearRegression()
model2 = KNeighborsRegressor(n_neighbors=9)
print(model1, model2)

LinearRegression() KNeighborsRegressor(n_neighbors=9)


In [3]:
new_data = X.sample(3)
new_data = pandas.DataFrame(
    scaler.transform(new_data), 
    columns=new_data.columns,
    index = new_data.index,
)
new_data

Unnamed: 0,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Sex_F,Sex_I,Sex_M
3740,0.777027,0.747899,0.150442,0.496724,0.466039,0.418038,0.320877,1.0,0.0,0.0
2700,0.804054,0.764706,0.159292,0.594298,0.622058,0.391047,0.390633,0.0,0.0,1.0
3918,0.77027,0.731092,0.168142,0.462015,0.321453,0.468729,0.342302,1.0,0.0,0.0


In [4]:
model1.fit(X,y)
model2.fit(X,y)
y1 = model1.predict(new_data)
y2 = model2.predict(new_data)
print(y1,y2)


[7.28125 6.3125  9.53125] [9.55555556 9.11111111 9.66666667]


In [5]:
y.loc[new_data.index]

3740    11
2700    13
3918    18
Name: Rings, dtype: int64

### How good are these predictions? How good is the model's ability to make predictions?

Several things are needed. First, we need a metric.

In [6]:
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
true_values = [1,1,1]
predict_values = [1,0,1]
print(mean_absolute_error(true_values,predict_values))

0.3333333333333333


In [7]:
y1, y2, y.loc[new_data.index].values

(array([7.28125, 6.3125 , 9.53125]),
 array([9.55555556, 9.11111111, 9.66666667]),
 array([11, 13, 18]))

In [8]:
mean_absolute_error(y.loc[new_data.index], y1)

6.291666666666667

In [9]:
mean_absolute_error(y.loc[new_data.index], y2)

4.555555555555556

In [10]:
y.loc[new_data.index]

3740    11
2700    13
3918    18
Name: Rings, dtype: int64

##### Summary

In [11]:
scaler = MinMaxScaler()
abalone = pandas.read_csv('../Datasets/abalone.csv')
X = pandas.get_dummies(abalone.drop(columns=['Rings']))
scaled_X = pandas.DataFrame(scaler.fit_transform(X), columns=X.columns)

model1 = LinearRegression()
model1.fit(scaled_X,y)
model2 = KNeighborsRegressor(n_neighbors=9)
model2.fit(scaled_X,y)

y1 = model1.predict(scaled_X)
y2 = model2.predict(scaled_X)

print(mean_absolute_error(y, y1))
print(mean_absolute_error(y, y2))

1.5790901065357912
1.3702284999866998


### Next, we need two separate training and testing sets

To test a learner's ability to learn, first, we give the learner data to learning from.

Then, we test the learner using different data.

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, test_size=0.01)

In [13]:
len(y), 0.01*4117

(4177, 41.17)

In [14]:
y_train.index

Int64Index([3008,  262, 1355,  875, 3020, 1845, 3541, 2980, 1395, 2884,
            ...
             829, 1275, 2339,  688, 1970, 1442,  766, 3558, 2466, 2446],
           dtype='int64', length=4135)

In [15]:
X_train.index

Int64Index([3008,  262, 1355,  875, 3020, 1845, 3541, 2980, 1395, 2884,
            ...
             829, 1275, 2339,  688, 1970, 1442,  766, 3558, 2466, 2446],
           dtype='int64', length=4135)

In [16]:
y_test.index

Int64Index([ 615, 2353, 1830, 1478, 1277, 3487, 2722,  516, 2607, 2960,  271,
            2599, 2811, 1247, 1523,  386,  480, 1300, 1331, 1563,  315,  267,
            3496, 1168, 3557, 2865,  871, 1756, 1686,  905, 3364, 2808,  792,
            2926, 1266, 3929, 2413, 3938, 2520, 2428, 3343, 3430],
           dtype='int64')

In [17]:
X_test.index

Int64Index([ 615, 2353, 1830, 1478, 1277, 3487, 2722,  516, 2607, 2960,  271,
            2599, 2811, 1247, 1523,  386,  480, 1300, 1331, 1563,  315,  267,
            3496, 1168, 3557, 2865,  871, 1756, 1686,  905, 3364, 2808,  792,
            2926, 1266, 3929, 2413, 3938, 2520, 2428, 3343, 3430],
           dtype='int64')

In [18]:

model1 = LinearRegression()
model1.fit(X_train, y_train)

model2 = KNeighborsRegressor(n_neighbors=9)
model2.fit(X_train, y_train)

y1 = model1.predict(X_test)
y2 = model2.predict(X_test)

print(mean_absolute_error(y_test, y1))
print(mean_absolute_error(y_test, y2))

1.7380952380952381
1.8439153439153435


This is a single step of validation.

#### Summary

To evaluate a model we need the following:
* A metric, e.g. mean_absolute_error
* Validation: two different datasets, a training dataset and a testing dataset.
* Cross validation: validating the model in multiple rounds.
* Other things...

In [19]:
# 1. prepare the data, select features

import pandas
from sklearn.preprocessing import MinMaxScaler

abalone = pandas.read_csv('../Datasets/abalone.csv')

y = abalone['Rings']

X = abalone.drop(columns=['Rings'])
X = pandas.get_dummies(X)

scaler = MinMaxScaler()
scaled_X = pandas.DataFrame(
    scaler.fit_transform(X), 
    columns=X.columns,
)




In [20]:
# 2. Create model

from sklearn.linear_model import LinearRegression
model = LinearRegression()



In [21]:
# 3. Plan for validation
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

def one_round_validate(model, X, y, test_size):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size) 
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    error = mean_absolute_error(y_test, predictions)
    return error


In [22]:
one_round_validate(model, scaled_X, y, 0.1)

1.6927098908492824

In [23]:
# 4. Cross validation

# repeated sampling (this is equivalent to ShuffleSplit)

errors = []
for i in range(100):
    errors.append(one_round_validate(model, scaled_X, y, 0.1))

print('Average test error:', sum(errors) / len(errors))

Average test error: 1.5903280757471476


### Cross Validation with ShuffleSplit

Cross validation simply means that we validate a model across differet splits.



In [24]:
from sklearn.model_selection import cross_validate, ShuffleSplit

In [25]:
ss = ShuffleSplit(n_splits=100, test_size=0.05)
splits = list(ss.split(scaled_X,y))
len(splits)

100

In [26]:
train_idx, test_idx = splits[0]

In [27]:
train_idx

array([ 160, 3923, 1342, ..., 4067, 1585, 1880])

In [28]:
len(test_idx), len(y)*0.05

(209, 208.85000000000002)

In [40]:
ss = ShuffleSplit(n_splits=100, test_size=0.05)
# for train_idx, test_idx in ss.split(scaled_X,y):
#     print(test_idx)

In [41]:
# to get all the names of "scoring"
#from sklearn.metrics import get_scorer_names
#get_scorer_names()

In [42]:
ss = ShuffleSplit(n_splits=100, test_size=0.05)
model = LinearRegression()

result = cross_validate(model, scaled_X, y, cv=ss,  scoring='neg_mean_absolute_error')
result.keys()

dict_keys(['fit_time', 'score_time', 'test_score'])

In [43]:
result['test_score'].mean().round(3), result['test_score'].std().round(3)

(-1.598, 0.108)

Using `cross_validate` and `ShufflSplit`, we combine all of the cross validation procedure in one step.


#### KFold Cross Validation

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

This picture shows the 5-fold cross validation procedure.
<img src="https://scikit-learn.org/stable/_images/grid_search_cross_validation.png" width="60%">

Benefits of K-fold compared to ShuffleSplit:
* Fewer splits.
* Each data point is tested exactly once.

Cons:
* Can be biased. Groups of data are learned/tested together.
* We should shuffle the data at the beginning. Sklearn does not do this by default.

### Baseline Comparison

In [51]:
ss = ShuffleSplit(n_splits=100, test_size=0.05)
model = LinearRegression()

result = cross_validate(model, scaled_X, y, cv=ss,  
                        scoring=['r2','neg_mean_absolute_error'])
#print(result.keys())
print(result['test_neg_mean_absolute_error'].mean().round(2))
print(result['test_r2'].mean().round(2))


-1.6
0.51


In [45]:
from sklearn.dummy import DummyRegressor
baseline = DummyRegressor()
baseline.strategy

'mean'

In [46]:
result = cross_validate(baseline, scaled_X, y, cv=ss,  
                        scoring=['r2','neg_mean_absolute_error'])
print(result['test_neg_mean_absolute_error'].mean().round(2))
print(result['test_r2'].mean().round(2))

-2.35
-0.01


### Learning Curves

How does a model learn with more data?  Does learning increase with more experience?

A learning curve reveals insights about the data and the learner.  The learning curve reveals how good the learner (model) is, and/or how difficult the problem is.

Learning curve:
* x-axis training size
* y-axis score (higher is better)


In [47]:
def one_round_validate(model, X, y, test_size):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size) 
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    test_score = -mean_absolute_error(y_test, predictions)

    predictions = model.predict(X_train)
    train_score = -mean_absolute_error(y_train, predictions)
    
    return train_score, test_score



In [48]:
# from sklearn.metrics import get_scorer_names
# get_scorer_names()

In [49]:
import lcplot

model = LinearRegression()
lcplot.plot(model, scaled_X, y, scoring='neg_mean_squared_error')


ModuleNotFoundError: No module named 'lcplot'


<img src="https://scikit-learn.org/stable/_images/sphx_glr_plot_learning_curve_001.png" width="75%">


#PID:21
#### Exercise: evaluate the following learning curve

https://i.stack.imgur.com/uHDIM.png

My evaluation:
* learner's ability: test score may not get above 0.8
* problem difficulty: with additional data, scores may not get higher.



https://i.stack.imgur.com/uHDIM.png

https://i.stack.imgur.com/MHRKD.png

https://i.stack.imgur.com/VGhxI.png

https://i.stack.imgur.com/dDgMw.png