# Linear Regression Implementation
TOC:
* [Load and analyze data](#LoadData)
* [Features Engineering] (#FeaturesEngineering)
* [Features scaling](#FeaturesScaling)
* [Benchmark] (#Benchmark)
* [Linear regression] (#LinearRegression)
* [Cost function] (#Cost)
* [Gradient Descent] (#GradientDescent)

### Load data <a class="anchor" id="LoadData"></a>

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn import model_selection, metrics

# Use plotly as it is an interaction plot
import plotly.express as px
# sub plot
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [2]:
# Load data from TF dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.boston_housing.load_data(
    path='boston_housing.npz', test_split=0.2, seed=113
)

In [3]:
# We got X_test np.array with row:404, col:13
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
# We got y_test np.array with row:404
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")

X Shape: (404, 13), X Type:<class 'numpy.ndarray'>)
y Shape: (404,), y Type:<class 'numpy.ndarray'>)


In [4]:
# To make sure you understand the data. Read dataspec first.

# Variables in order:

#  X_dataset
#  CRIM     per capita crime rate by town
#  ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
#  INDUS    proportion of non-retail business acres per town
#  CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
#  NOX      nitric oxides concentration (parts per 10 million)
#  RM       average number of rooms per dwelling
#  AGE      proportion of owner-occupied units built prior to 1940
#  DIS      weighted distances to five Boston employment centres
#  RAD      index of accessibility to radial highways
#  TAX      full-value property-tax rate per $10,000
#  PTRATIO  pupil-teacher ratio by town
#  B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
#  LSTAT    % lower status of the population

#  y_dataset
#  MEDV     Median value of owner-occupied homes in $1000's

We see that most of the data is numerical. Only CHAS is catagorical.

Create pd.Dataframe from np.array for radability

In [5]:
X_train_df = pd.DataFrame(X_train, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])
y_train_df = pd.DataFrame(y_train, columns=['MEDV'])
y_test_df = pd.DataFrame(y_test, columns=['MEDV'])

### Take a look at dataset

In [6]:
X_train_df

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,1.23247,0.0,8.14,0.0,0.5380,6.142,91.7,3.9769,4.0,307.0,21.0,396.90,18.72
1,0.02177,82.5,2.03,0.0,0.4150,7.610,15.7,6.2700,2.0,348.0,14.7,395.38,3.11
2,4.89822,0.0,18.10,0.0,0.6310,4.970,100.0,1.3325,24.0,666.0,20.2,375.52,3.26
3,0.03961,0.0,5.19,0.0,0.5150,6.037,34.5,5.9853,5.0,224.0,20.2,396.90,8.01
4,3.69311,0.0,18.10,0.0,0.7130,6.376,88.4,2.5671,24.0,666.0,20.2,391.43,14.65
...,...,...,...,...,...,...,...,...,...,...,...,...,...
399,0.21977,0.0,6.91,0.0,0.4480,5.602,62.0,6.0877,3.0,233.0,17.9,396.90,16.20
400,0.16211,20.0,6.96,0.0,0.4640,6.240,16.3,4.4290,3.0,223.0,18.6,396.90,6.59
401,0.03466,35.0,6.06,0.0,0.4379,6.031,23.3,6.6407,1.0,304.0,16.9,362.25,7.83
402,2.14918,0.0,19.58,0.0,0.8710,5.709,98.5,1.6232,5.0,403.0,14.7,261.95,15.79


In [7]:
y_train_df

Unnamed: 0,MEDV
0,15.2
1,42.3
2,50.0
3,21.1
4,17.7
...,...
399,19.4
400,25.2
401,19.4
402,19.4


### Check the datatype and null values

In [8]:
X_train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 404 entries, 0 to 403
Data columns (total 13 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   CRIM     404 non-null    float64
 1   ZN       404 non-null    float64
 2   INDUS    404 non-null    float64
 3   CHAS     404 non-null    float64
 4   NOX      404 non-null    float64
 5   RM       404 non-null    float64
 6   AGE      404 non-null    float64
 7   DIS      404 non-null    float64
 8   RAD      404 non-null    float64
 9   TAX      404 non-null    float64
 10  PTRATIO  404 non-null    float64
 11  B        404 non-null    float64
 12  LSTAT    404 non-null    float64
dtypes: float64(13)
memory usage: 41.2 KB


There are all numeric (no catagorial) and no null data.

So no need to preprocessing the null and encoding

### Plot and see relationship

In [9]:
def plot_relation(X: pd.DataFrame, y: pd.DataFrame, columns):
    '''
    Plot relation between X input and y target
    
    Args:
        X (pd.DataFrame (m,n))  : Data, m examples, n features
        y (pd.DataFrame (m,1))  : target values, m values
        columns (int)             : number of desired subplot column
        
    Output
        Interation graph
    
    '''
    
    m = X.shape[1]
    rows = m // columns     # Get row
    frac = m % columns      # Get fractual
    row = 0
    col = 1

    if frac > 0:
        rows += 1
            
    fig = make_subplots(rows=rows, cols=columns)

    for i in range(m):
            
        if row >= rows:
            row = 1
            col += 1
        else:
            row += 1
        
        fig.add_trace(go.Scatter(
            x=X.iloc[:,i],
            y=y[y.columns[0]],
            mode='markers',
            name=X.columns[i],
            customdata=X.index.values,                                  # Add customdata for data's row index for more convinient to analysis
            hovertemplate="index:%{customdata} (X: %{x}, y: %{y})"
        ), row=row, col=col)

    fig.update_layout(height=400 * rows, width=600 * columns, title_text='Relationship between All features / ' + y.columns.values[0])
    fig.show()

In [10]:
plot_relation(X_train_df, y_train_df, 2)

### Check min max for feature scaling

In [13]:
X_train_df.agg(['min', 'max'])

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,188.0,12.6,0.32,1.73
max,88.9762,100.0,27.74,1.0,0.871,8.725,100.0,10.7103,24.0,711.0,22.0,396.9,37.97


Min in  0 and max is 396

So Scaling is needed.

In this case we will use z-score normalization

## Features scaling <a class="anchor" id="FeatureScaling"></a>

### Z-score normalization

$ x^{(i)}_j = \frac{x^{(i)}_j - \mu_j }{ \sigma_j}$

$ \mu_j = \frac{1}{m} \sum_{i=0}^{m-1} x^{(i)}_j $

$ \sigma^2 = \frac{1}{m} \sum_{i=0}^{m-1} (x^{(i)}_j - \mu_j)^2 $

In [14]:
# Z-score normalization loop
def zscore_normalize_features(X):
    
    m = X.shape[0]
    n = X.shape[1]
    
    mu = np.zeros(n)
    sigma = np.zeros(n)
    X_norm = np.zeros((m,n))
    

    
    for j in range(n):
        
        x_j_sum = 0
        sigma_j_sum = 0
        
        for i in range(m):
            x_j_sum += X[i][j]
        mu[j] = x_j_sum / m
        
        for i in range(m):
            sigma_j_sum += (X[i][j] - mu[j]) ** 2
        sigma[j] = (sigma_j_sum / m) ** (1/2)

        for i in range(m):
            X_norm[i][j] = (X[i][j] - mu[j]) / sigma[j]
            
    return (X_norm, mu, sigma)

In [15]:
# Z-score normalization np
def zscore_normalize_features(X: np.array):
    '''
    Feature scaling: Z-score normalize
    Args:
        X       (np.array (m,n))    : Data, m,n examples
    Returns
        X_norm  (np.array (m,n))    : Data with z-score normallized, m,n examples
    '''
    
    # Mean
    mu = np.mean(X, axis=0)
    # Standard deviation
    sigma = np.std(X, axis=0)
    # Z-score normalize
    X_norm = (X - mu) / sigma   
            
    return X_norm, mu, sigma

Compute and check if min and max is changed

In [17]:
X_train_zscore, X_train_mu, X_train_sigma = zscore_normalize_features(X_train)
armin, armax = np.min(X_train_zscore[0]), np.max(X_train_zscore[0])
print(f'min: {armin}, max: {armax}')

min: -0.6262490526587586, max: 1.1485004386235735


In [18]:
# Z-score normalize
X_test_zscore = (X_test - X_train_mu) / X_train_sigma
X_test_zscore 

array([[ 1.55369355, -0.48361547,  1.0283258 , ...,  0.78447637,
        -3.48459553,  2.25092074],
       [-0.39242675, -0.48361547, -0.16087773, ..., -0.30759583,
         0.42733126,  0.47880119],
       [-0.39982927, -0.48361547, -0.86940196, ...,  0.78447637,
         0.44807713, -0.41415936],
       ...,
       [-0.20709507, -0.48361547,  1.24588095, ..., -1.71818909,
         0.37051949, -1.49344089],
       [-0.36698601, -0.48361547, -0.72093526, ..., -0.48960787,
         0.39275481, -0.41829982],
       [-0.0889679 , -0.48361547,  1.24588095, ..., -1.71818909,
        -1.21946544, -0.40449827]])

In [19]:
X_train_zscore_df  = pd.DataFrame(X_train_zscore, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])
X_test_zscore_df  = pd.DataFrame(X_test_zscore, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])

In [20]:
X_train_zscore_df.agg(['min','max'])

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
min,-0.405101,-0.483615,-1.564696,-0.256833,-1.471269,-3.81725,-2.369042,-1.287503,-0.971569,-1.311311,-2.673752,-3.771101,-1.519664
max,9.234847,3.72899,2.445374,3.893584,2.677335,3.467186,1.110488,3.437406,1.675886,1.836097,1.603531,0.448077,3.482019


You will see that X scale is between -4 to 10, much better than 0 to 700

Notice our categorical data CHAS with binary 0,1 change to -0.2 - 3.89 it is good or bad?

if you're visual thinker, let's plot and see X scale of all features

### Feature engineering

We do Feature engineering after scaling because if you have large $ X $ ie. 400
in 6 degree polynomial will become $400^6$ but after scaling 400 will become $1.2$  and $1.2^6$ is much smaller than $400^6$.

We've seen in previos plot (original dataset) that some of charts are not in straight line.
The model will perform better if we do the Feature engineering by apply equation that match the shape of charts.

For example $ -X^{\frac{1}{2}} $ in CRIM, $ X^2 $ in DIS after we do Feature Scaling that the chart become more stright like equaiton $ y = w * x $

This method require you to read data manually and would cause underfitting

Another method is to multiple every features(x) by degree of polynomial
for example $ [1, 2, 3] $ 3 degree => $ [1, 1^2, 1^3, 2^1, 2^2, 2^3, 3^1, 3^2, 3^3] $ => $ [1, 1, 1, 2, 4, 8, 3, 9, 27] $
and let's the Grdient Descent adjust the weight by in/decreasing $ w $

In [21]:
# X_train_feature = np.c_[-X_train[:,0] ** (1/2), X_train[:,1] ** 2, -X_train[:,2] ** (1/2), X_train[:,3], -X_train[:,4] ** (1/2), X_train[:,5], -X_train[:,6], X_train[:,7] ** 2, X_train[:,8], X_train[:,9], X_train[:,10], X_train[:,11] ** 2, -X_train[:,12] ** (1/2)]
# X_test_feature = np.c_[-X_test[:,0] ** (1/2), X_test[:,1] ** 2, -X_test[:,2] ** (1/2), X_test[:,3], -X_test[:,4] ** (1/2), X_test[:,5], -X_test[:,6], X_test[:,7] ** 2, X_test[:,8], X_test[:,9], X_test[:,10], X_test[:,11] ** 2, -X_test[:,12] ** (1/2)]

# X_train_feature_df = pd.DataFrame(X_train_feature)
# X_test_feature_df = pd.DataFrame(X_test_feature)

In [114]:
def feature_engineering(X: np.array, degree: int):
    
    m = X.shape[0]
    n = X.shape[1]

    X_out = np.zeros([m,n * degree], dtype=float)
        
    for j in range(n):              # 13
        for i in range(degree):     # say 6
            k = (j * degree) + i
            X_out[:,k] = X[:,j] ** (i+1)
            
    return X_out

In [209]:
X_train_z_feature = feature_engineering(X_train_zscore, 3)
X_test_z_feature = feature_engineering(X_test_zscore, 3)

X_train_z_feature_df = pd.DataFrame(X_train_z_feature)
X_test_z_feature_df = pd.DataFrame(X_test_z_feature)

In [210]:
# plot_relation(X_train_z_feature_df, y_train_df, 2)

You will see that all graph have the same proportion as the original dataset

Before going to the prediction.
we need  some value to benchmark our model called
# Baseline Model

In [211]:
# X_train_mean = X_train.mean()
# y_pred = [X_train_mean] * len(y_train)
# y_pred
# mae = sum(y_train - y_pred) / len(y_pred)
# mse = sum((y_train - y_pred) ** 2) / len(y_pred)

# print(f' MAE: {mae}, MSE: {mse} ')

# Linear regression

### Formula

$ f_{w,b}(x^{(i)}) = wx^{(i)} + b $

In [212]:
# Vectorized implementation
def compute_linear_regression_v(X, w, b):
    f_wb = X.dot(w.T) + b
    return f_wb

In [213]:
w_init = np.ones(X_train_z_feature.shape[1])
b_init = 1
f_wb = compute_linear_regression_v(X=X_train_z_feature, w=w_init , b=b_init)

# Cost function

$ J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^2 $

In [214]:
def compute_cost_v(X, y, w, b):
    f_wb = compute_linear_regression_v(X, w, b)
    cost = ((f_wb - y) ** 2).mean() / 2
    return cost

In [215]:
compute_cost_v(X_train_z_feature, y_train, w=w_init, b=b_init)

1995.3750765121786

Gradeint Descent

$ \{ $
    
$ w^{(i)}_j := w^{(i)}_j - \alpha \frac{\sigma}{\sigma w}J(w, b)x^{(i)}_j $

$ b^{(i)} := b^{(i)} - \alpha \frac{\sigma}{\sigma w}J(w, b) $

$ \} {stimulous update} $


$ \frac{\sigma}{\sigma w}J(w, b) = \frac{1}{m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)}) $

In [216]:
def gradient_function_v(X, y, w, b):
    
    m = X.shape[0]
    
    dj_dw = 0
    dj_db = 0

    f_wb = compute_linear_regression_v(X, w, b)
    error = f_wb - y
    dj_dw = error.T.dot(X)
    dj_db = sum(error)
    dj_dw = dj_dw / m
    dj_db = dj_db / m
        
    return dj_dw, dj_db

In [217]:
dj_dw, dj_db = gradient_function_v(X_train_z_feature, y_train, w=w_init, b=b_init)

In [218]:
def gradient_descent(X, y, w, b, alpha, num_iters, cost_function, gradient_function):
    
    j_hist = []
    p_hist = []

    for i in range(num_iters):
        
        dj_dw, dj_db = gradient_function(X, y, w, b)
        
        w = w - np.dot(alpha, dj_dw)
        b = b - np.dot(alpha, dj_db)

        j = cost_function(X, y, w, b)
        
        # stop when converge (j_old - j_new < thresold)
        if (len(j_hist) > 100):
            if ((j_hist[i-1] - j) < 0.01):
                if i % 1000 == 0:
                    j_hist.append(j)
                    p_hist.append([w,b])
                break
        
        if i % 100 == 0:
            j_hist.append(j)
            p_hist.append([w,b])
        
    return w, b, j_hist, p_hist

In [238]:
w_out, b_out, j_hist, p_hist = gradient_descent(X_train_z_feature, y_train, w=w_init, b=b_init,alpha=0.0003, num_iters=10000, cost_function=compute_cost_v, gradient_function=gradient_function_v)

Let's check is decrease gradually

In [239]:
j_hist

[488.41897956266035,
 144.8695654089854,
 104.91291827359063,
 82.20794618232065,
 67.7628733223071,
 58.00296875256171,
 51.08684560249107,
 45.9745554122366,
 42.05053224922355,
 38.93792285134418,
 36.39910225359134,
 34.27990281395995,
 32.47740297644354,
 30.920883631931428,
 29.560333944865963,
 28.359375815578936,
 27.290822751650623,
 26.333837176827053,
 25.47207440210651,
 24.69244613028079,
 23.984279699026395,
 23.338734513642045,
 22.748388564980765,
 22.20693942276939,
 21.70898365138164,
 21.249850911345963,
 20.825476875473427,
 20.432304182190865,
 20.067203993356884,
 19.72741295063597,
 19.41048182761122,
 19.114233203465215,
 18.836726197722925,
 18.576226807483817,
 18.331182746403893,
 18.100201943239643,
 17.88203404706852,
 17.675554426712793,
 17.47975025736829,
 17.293708367648275,
 17.116604581970627,
 16.947694341246706,
 16.786304422609746,
 16.631825608940964,
 16.483706183033505,
 16.341446140726074,
 16.20459203324656,
 16.07273236209009,
 15.94549346059

In [240]:
print(f'result cost:{min(j_hist)} with parameter w:{w_out} b:{b_out}')

result cost:12.071988698456405 with parameter w:[-2.04392384 -0.09731873  0.02166244 -1.44755362  1.36283838 -0.44085799
 -0.55007519  2.01046659 -0.73570413 -0.41641102  1.07541507 -0.14214515
 -1.32615154  0.91297907 -0.40369603  1.26050446  1.42958626  0.13574167
  0.21686815  2.07680668  0.71280203 -0.0073348   1.16463715 -0.62372793
 -1.31170577  1.81075579  0.47950568 -1.41290689  2.33590436 -0.77044754
 -0.15242974  2.42313329  0.5816863   2.48873716  1.82824261  0.34252225
 -2.93537286  2.19182882 -0.7654755 ] b:6.226550267397597


Test with gradient descent output w, b parameters

In [241]:
compute_cost_v(X_train_z_feature, y_train, w=w_out, b=b_out)

12.021605650644354

Now let's check with unknown data (Test dataset)

In [242]:
compute_cost_v(X_test_z_feature, y_test, w=w_out, b=b_out)

21.376966853064253

#### Let's plot Cost function

In [243]:
fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(
        x=[i for i in range(len(j_hist))],
        y=j_hist,
        mode='lines+markers'
        ))

# include shapes in layout
fig.update_layout(height=400, width=600, title_text="Cost Function")
fig.show()

Cost not decrease significantly after 300 iteration

Maybe we should consider to set threshold to interupt iteration after reach certain value ($ \epsilon $)

In [244]:
print(f'result cost:{min(j_hist)} with parameter w:{w_out} b:{b_out}')

result cost:12.071988698456405 with parameter w:[-2.04392384 -0.09731873  0.02166244 -1.44755362  1.36283838 -0.44085799
 -0.55007519  2.01046659 -0.73570413 -0.41641102  1.07541507 -0.14214515
 -1.32615154  0.91297907 -0.40369603  1.26050446  1.42958626  0.13574167
  0.21686815  2.07680668  0.71280203 -0.0073348   1.16463715 -0.62372793
 -1.31170577  1.81075579  0.47950568 -1.41290689  2.33590436 -0.77044754
 -0.15242974  2.42313329  0.5816863   2.48873716  1.82824261  0.34252225
 -2.93537286  2.19182882 -0.7654755 ] b:6.226550267397597


In [245]:
y_train_pred = np.dot(X_train_z_feature, w_out) + b_out
y_test_pred = np.dot(X_test_z_feature, w_out) + b_out
y_train_pred_df = pd.DataFrame(y_train_pred)
y_test_pred_df = pd.DataFrame(y_test_pred)

Let's check our result with parameter w,b

and plot to see how the model fit the targets

In [246]:
def plot_result(X: pd.DataFrame, y: pd.DataFrame, y_pred: pd.DataFrame, columns):
    '''
    Plot relation between X input and y target
    
    Args:
        X (pd.DataFrame (m,n))  : Data, m examples, n features
        y (pd.DataFrame (m,1))  : target values, m values
        columns (int)             : number of desired subplot column
        
    Output
        Interation graph
    
    '''
    
    m = X.shape[1]
    rows = m // columns     # Get row
    frac = m % columns      # Get fractual
    row = 0
    col = 1

    if frac > 0:
        rows += 1
            
    fig = make_subplots(rows=rows, cols=columns)

    for i in range(m):
            
        if row >= rows:
            row = 1
            col += 1
        else:
            row += 1
        
        fig.add_trace(go.Scatter(
            x=X.iloc[:,i],
            y=y[y.columns[0]],
            mode='markers',
            name=X.columns[i],
            customdata=X.index.values,                                  # Add customdata for data's row index for more convinient to analysis
            hovertemplate="index:%{customdata} (X: %{x}, y: %{y})"
        ), row=row, col=col)
        
        fig.add_trace(go.Scatter(
            x=X.iloc[:,i],
            y=y_pred[y_pred.columns[0]],
            mode='markers',
            name=X.columns[i],
            customdata=X.index.values,                                  # Add customdata for data's row index for more convinient to analysis
            hovertemplate="index:%{customdata} (X: %{x}, y: %{y})"
        ), row=row, col=col)

    fig.update_layout(height=400 * rows, width=600 * columns, title_text='Relationship between All features / ' + y.columns.values[0])
    fig.show()

In [247]:
plot_result(X_train_z_feature_df, y_train_df, y_train_pred_df, 2)

Plot again with test set

In [248]:
plot_result(X_test_z_feature_df, y_test_df, y_test_pred_df, 2)

In [249]:
# Train model Evaluation
print('R^2:',metrics.r2_score(y_train, y_train_pred))
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_train, y_train_pred))*(len(y_train)-1)/(len(y_train)-X_test_zscore.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_train, y_train_pred))
print('MSE:',metrics.mean_squared_error(y_train, y_train_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_train, y_train_pred)))

R^2: 0.7158760192693987
Adjusted R^2: 0.706405219911712
MAE: 3.6860953427284087
MSE: 24.043211301288707
RMSE: 4.903387737196469


In [250]:
# Test model Evaluation
print('R^2:',metrics.r2_score(y_test, y_test_pred))
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_test, y_test_pred))*(len(y_test)-1)/(len(y_test)-X_test_zscore.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_test, y_test_pred))
print('MSE:',metrics.mean_squared_error(y_test, y_test_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)))

R^2: 0.4864012841025819
Adjusted R^2: 0.4105287465268269
MAE: 4.5811915503569365
MSE: 42.75393370612851
RMSE: 6.538649226417372


Prediction in test set perform slightly poorly (9%) than training set.
0.75 vs - 0.66 = 0.09