# Linear Regression Implementation

In this intermeiate level:

Addition from basic level. I've implemented Multiple variables, Feature Scaling, Feature engineering with pure mathematic, no framework

TOC:
* [Setup](#setup)
* ~~[Load and analyze data](#load-analyze)~~ // See in basic 
* [Features Scaling](#feature-scaling)
* [Features Engineering](#feature-engineering)
* [Baseline](#baseline)
* [Linear regression](#linear-regression)
* [Cost function w Regularization](#cost)
* [Gradient Descent w Regularization](#gradient-descent)

## Set up <a id='setup'></a>

In [204]:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn import model_selection, metrics

# Use plotly as it is an interaction plot
import plotly.express as px
# sub plot
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [205]:
# Load data from TF dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.boston_housing.load_data(
    path='boston_housing.npz', test_split=0.2, seed=113
)

Create pd.Dataframe from np.array for radability

In [240]:
X_train_df = pd.DataFrame(X_train, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])
y_train_df = pd.DataFrame(y_train, columns=['MEDV'])
y_test_df = pd.DataFrame(y_test, columns=['MEDV'])

In [242]:
# Plot and see relationship

def plot_relation(X: pd.DataFrame, y: pd.DataFrame, columns):
    '''
    Plot relation between X input and y target
    
    Args:
        X (pd.DataFrame (m,n))  : Data, m examples, n features
        y (pd.DataFrame (m,1))  : target values, m values
        columns (int)             : number of desired subplot column
        
    Output
        Interation graph
    
    '''
    
    m = X.shape[1]
    rows = m // columns     # Get row
    frac = m % columns      # Get fractual
    row = 0
    col = 1

    if frac > 0:
        rows += 1
            
    fig = make_subplots(rows=rows, cols=columns)

    for i in range(m):
            
        if row >= rows:
            row = 1
            col += 1
        else:
            row += 1
        
        fig.add_trace(go.Scatter(
            x=X.iloc[:,i],
            y=y[y.columns[0]],
            mode='markers',
            name=X.columns[i],
            customdata=X.index.values,                                  # Add customdata for data's row index for more convinient to analysis
            hovertemplate="index:%{customdata} (X: %{x}, y: %{y})"
        ), row=row, col=col)

    fig.update_layout(height=400 * rows, width=600 * columns, title_text='Relationship between All features / ' + y.columns.values[0])
    fig.show()

In [243]:
plot_relation(X_train_df, y_train_df, 2)

## Features scaling <a id='feature-scaling'></a>

Features scaling rescale data with too large/small values to be in range that close to each feature.
And prevent large values influence to paremeters and help model converge more faster

From the dataset minimum value range form 0-1 maximum from 0-700. So scaling would help.

### Z-score normalization

$ x^{(i)}_j = \frac{x^{(i)}_j - \mu_j }{ \sigma_j}$

$ \mu_j = \frac{1}{m} \sum_{i=0}^{m-1} x^{(i)}_j $

$ \sigma^2 = \frac{1}{m} \sum_{i=0}^{m-1} (x^{(i)}_j - \mu_j)^2 $

In [207]:
# Z-score normalization loop
def zscore_normalize_features(X):
    
    m = X.shape[0]
    n = X.shape[1]
    
    mu = np.zeros(n)
    sigma = np.zeros(n)
    X_norm = np.zeros((m,n))
    

    
    for j in range(n):
        
        x_j_sum = 0
        sigma_j_sum = 0
        
        for i in range(m):
            x_j_sum += X[i][j]
        mu[j] = x_j_sum / m
        
        for i in range(m):
            sigma_j_sum += (X[i][j] - mu[j]) ** 2
        sigma[j] = (sigma_j_sum / m) ** (1/2)

        for i in range(m):
            X_norm[i][j] = (X[i][j] - mu[j]) / sigma[j]
            
    return (X_norm, mu, sigma)

In [208]:
# Z-score normalization np
def zscore_normalize_features(X: np.array):
    '''
    Feature scaling: Z-score normalize
    Args:
        X       (np.array (m,n))    : Data, m,n examples
    Returns
        X_norm  (np.array (m,n))    : Data with z-score normallized, m,n examples
    '''
    
    # Mean
    mu = np.mean(X, axis=0)
    # Standard deviation
    sigma = np.std(X, axis=0)
    # Z-score normalize
    X_norm = (X - mu) / sigma   
            
    return X_norm, mu, sigma

Compute and check if min and max is changed

In [209]:
X_train_zscore, X_train_mu, X_train_sigma = zscore_normalize_features(X_train)
armin, armax = np.min(X_train_zscore[0]), np.max(X_train_zscore[0])
print(f'min: {armin}, max: {armax}')

min: -0.6262490526587586, max: 1.1485004386235735


Why we have to return X_train_mu, X_train_sigma?

Because the test set have to scale with this values. Bacause we have to handled test set blindly.

In [210]:
X_test_zscore = (X_test - X_train_mu) / X_train_sigma
X_test_zscore 

array([[ 1.55369355, -0.48361547,  1.0283258 , ...,  0.78447637,
        -3.48459553,  2.25092074],
       [-0.39242675, -0.48361547, -0.16087773, ..., -0.30759583,
         0.42733126,  0.47880119],
       [-0.39982927, -0.48361547, -0.86940196, ...,  0.78447637,
         0.44807713, -0.41415936],
       ...,
       [-0.20709507, -0.48361547,  1.24588095, ..., -1.71818909,
         0.37051949, -1.49344089],
       [-0.36698601, -0.48361547, -0.72093526, ..., -0.48960787,
         0.39275481, -0.41829982],
       [-0.0889679 , -0.48361547,  1.24588095, ..., -1.71818909,
        -1.21946544, -0.40449827]])

In [211]:
X_train_zscore_df  = pd.DataFrame(X_train_zscore, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])
X_test_zscore_df  = pd.DataFrame(X_test_zscore, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'])

In [212]:
X_train_zscore_df.agg(['min','max'])

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
min,-0.405101,-0.483615,-1.564696,-0.256833,-1.471269,-3.81725,-2.369042,-1.287503,-0.971569,-1.311311,-2.673752,-3.771101,-1.519664
max,9.234847,3.72899,2.445374,3.893584,2.677335,3.467186,1.110488,3.437406,1.675886,1.836097,1.603531,0.448077,3.482019


You will see that X scale is between -4 to 10, much better than 0 to 700

Notice our categorical data CHAS with binary 0,1 change to -0.2 - 3.89 it is good or bad?

## Feature engineering <a id='feature-engineering'></a>

We do Feature engineering after scaling because if you have large $ X $ ie. 400
in 6 degree polynomial will become $400^6$ but after scaling 400 will become $1.2$  and $1.2^6$ is much smaller than $400^6$.

We've seen in previos plot (original dataset) that some of charts are not in straight line.
The model will perform better if we do the Feature engineering by apply equation that match the shape of charts.

For example $ -X^{\frac{1}{2}} $ in CRIM, $ X^2 $ in DIS after we do Feature Scaling, the chart will become more straight line like equaiton $ y = w * x $

This method require you to read data manually and would cause underfitting

Another method is to multiple every features(x) by degree of polynomial
for example $ [1, 2, 3] $ 3 degree => $ [1, 1^2, 1^3, 2^1, 2^2, 2^3, 3^1, 3^2, 3^3] $ => $ [1, 1, 1, 2, 4, 8, 3, 9, 27] $
and let's the Grdient Descent adjust the weight by in/decreasing $ w $

In [213]:
# X_train_feature = np.c_[-X_train[:,0] ** (1/2), X_train[:,1] ** 2, -X_train[:,2] ** (1/2), X_train[:,3], -X_train[:,4] ** (1/2), X_train[:,5], -X_train[:,6], X_train[:,7] ** 2, X_train[:,8], X_train[:,9], X_train[:,10], X_train[:,11] ** 2, -X_train[:,12] ** (1/2)]
# X_test_feature = np.c_[-X_test[:,0] ** (1/2), X_test[:,1] ** 2, -X_test[:,2] ** (1/2), X_test[:,3], -X_test[:,4] ** (1/2), X_test[:,5], -X_test[:,6], X_test[:,7] ** 2, X_test[:,8], X_test[:,9], X_test[:,10], X_test[:,11] ** 2, -X_test[:,12] ** (1/2)]

# X_train_feature_df = pd.DataFrame(X_train_feature)
# X_test_feature_df = pd.DataFrame(X_test_feature)

In [214]:
def feature_engineering(X: np.array, degree: int):
    
    m = X.shape[0]
    n = X.shape[1]

    X_out = np.zeros([m,n * degree], dtype=np.float64)
        
    for j in range(n):              # 13
        for i in range(degree):     # say 2
            k = (j * degree) + i
            X_out[:,k] = X[:,j] ** (i+1)
            
    return X_out

In [215]:
#bypass scaling
# X_train_zscore = X_train
# X_test_zscore = X_test

degree = 2

X_train_z_feature = feature_engineering(X_train_zscore, degree)
X_test_z_feature = feature_engineering(X_test_zscore, degree)

X_train_z_feature_df = pd.DataFrame(X_train_z_feature)
X_test_z_feature_df = pd.DataFrame(X_test_z_feature)

In [216]:
X_test_z_feature_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,1.553694,2.413964,-0.483615,0.233884,1.028326,1.057454,-0.256833,0.065963,1.038381,1.078234,...,1.675886,2.808593,1.565287,2.450125,0.784476,0.615403,-3.484596,12.142406,2.250921,5.066644
1,-0.392427,0.153999,-0.483615,0.233884,-0.160878,0.025882,-0.256833,0.065963,-0.088401,0.007815,...,-0.396036,0.156844,0.157078,0.024674,-0.307596,0.094615,0.427331,0.182612,0.478801,0.229251
2,-0.399829,0.159863,-0.483615,0.233884,-0.869402,0.755860,-0.256833,0.065963,-0.361560,0.130725,...,-0.511142,0.261266,-1.094663,1.198287,0.784476,0.615403,0.448077,0.200773,-0.414159,0.171528
3,-0.267805,0.071720,-0.483615,0.233884,1.245881,1.552219,3.893584,15.160000,0.406700,0.165405,...,-0.511142,0.261266,-0.017443,0.000304,-1.718189,2.952174,-0.168767,0.028482,-0.999345,0.998691
4,-0.398037,0.158434,-0.483615,0.233884,-0.972300,0.945367,-0.256833,0.065963,-0.924950,0.855533,...,-0.741356,0.549608,-0.956249,0.914413,0.010925,0.000119,0.429459,0.184435,-0.593580,0.352337
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,-0.029345,0.000861,-0.483615,0.233884,1.028326,1.057454,3.893584,15.160000,1.371293,1.880445,...,1.675886,2.808593,1.565287,2.450125,0.784476,0.615403,-0.002481,0.000006,-1.028329,1.057460
98,-0.397230,0.157792,-0.483615,0.233884,0.253653,0.064340,-0.256833,0.065963,-1.027385,1.055520,...,-0.511142,0.261266,-0.047533,0.002259,0.101931,0.010390,0.427012,0.182339,-0.822685,0.676811
99,-0.207095,0.042888,-0.483615,0.233884,1.245881,1.552219,3.893584,15.160000,0.406700,0.165405,...,-0.511142,0.261266,-0.017443,0.000304,-1.718189,2.952174,0.370519,0.137285,-1.493441,2.230366
100,-0.366986,0.134679,-0.483615,0.233884,-0.720935,0.519748,3.893584,15.160000,-0.429849,0.184771,...,-0.165822,0.027497,-0.595170,0.354227,-0.489608,0.239716,0.392755,0.154256,-0.418300,0.174975


In [217]:
# plot_relation(X_train_z_feature_df, y_train_df, 2)

You will see that all graph have the same proportion as the original dataset

Before going to the prediction.
we need  some value to benchmark our model called
## Baseline Model <a id='baseline'></a>

Baseline model is somehow complex and I will explain in saperate topic.

For quick explainaiton. It is some value you use to benchmark the performance of your model.

You might use MSE from simple linear regression as baseline. Or use mean from target as I show below as baseline.
Which it has MSE:84.62 which I think it's not a good baseline. Imagine you predict 84,000$ house's price for actual 1,000$. A company will bankrupt in no time.

So maybe adding simple linear regression is better (For sake of simplisity I will add in advanced topic).

In [218]:
# Mean of y
y_train_mean = y_train.mean()
y_pred = [y_train_mean] * len(y_train)

In [219]:
# Test model Evaluation
print('R^2:',metrics.r2_score(y_train, y_pred))
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_train, y_pred))*(len(y_train)-1)/(len(y_train)-X_test.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_train, y_pred))
print('MSE:',metrics.mean_squared_error(y_train, y_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_train, y_pred)))

R^2: 0.0
Adjusted R^2: -0.03333333333333344
MAE: 6.647632585040682
MSE: 84.62225272032155
RMSE: 9.199035423364862


# Linear regression <a id='linear-regression'></a>

### Formula

$ f_{w,b}(x^{(i)}) = wx^{(i)} + b $

In [220]:
# Vectorized implementation
def compute_linear_regression_v(X, w, b):
    f_wb = X.dot(w.T) + b
    return f_wb

In [221]:
w_init = np.ones(X_train_z_feature.shape[1])
b_init = 1
f_wb = compute_linear_regression_v(X=X_train_z_feature, w=w_init , b=b_init)

# Cost function with Regularization <a id='cost'></a>

Problem: if parameter $w$ is too large, it's would havily influence the model, hence would cause overfitting.
To fix this ploblem Regularization is involvded by compute on $ w $ as shown after simple cost functio below

For human explaination it is &Cost(Error)& increase with mean of $w$ squared with some multiply rate.

if $w$ is too large $Cost$ will increase significantly to decrease $w$ later in Gradient Descent to balance all the parameters.

$ J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m}\sum^{n-1}_{j=0}w^2_j $

In [222]:
def compute_cost_v(X, y, w, b, lambda_):
    f_wb = compute_linear_regression_v(X, w, b)
    reg = lambda_ / 2 * sum(w ** 2).mean()
    cost = ((f_wb - y) ** 2).mean() / 2 + reg
    return cost

In [223]:
compute_cost_v(X_train_z_feature, y_train, w=w_init, b=b_init, lambda_=0.01)

190.4103480110494

## Gradeint Descent <a id='gradient-descent'></a>

$ \{ $
    
$ w^{(i)}_j := w^{(i)}_j - \alpha \frac{\sigma}{\sigma w} [J(w, b)^2 x^{(i)}_j + \frac{\lambda}{2m}\sum^{n-1}_{j=0}w^2_j] $

$ b^{(i)} := b^{(i)} - \alpha \frac{\sigma}{\sigma w}J(w, b) $

$ \} {stimulous update} $


$ = \alpha \frac{1}{m} \sum_{i=1}^{m} (f_{(w,b)}(x^{(i)}) - y^{(i)}) + \alpha \frac{\lambda}{m}\sum^{n-1}_{j=0}w_j $

In [224]:
def gradient_function_w_reg_v(X, y, w, b, lambda_):
    
    m = X.shape[0]
    
    dj_dw = 0
    dj_db = 0

    f_wb = compute_linear_regression_v(X, w, b)
    error = f_wb - y
    reg = lambda_ * sum(w)
    dj_dw = error.T.dot(X)
    dj_db = sum(error)
    dj_dw = (dj_dw + reg) / m
    dj_db = dj_db / m
        
    return dj_dw, dj_db

In [225]:
dj_dw, dj_db = gradient_function_w_reg_v(X_train_z_feature, y_train, w=w_init, b=b_init, lambda_=0.01)

In [226]:
def gradient_descent(X, y, w, b, alpha, num_iters, cost_function, gradient_function, lambda_):
    
    j_hist = []
    p_hist = []

    for i in range(num_iters):
        
        dj_dw, dj_db = gradient_function(X, y, w, b, lambda_)
        
        w = w - np.dot(alpha, dj_dw)
        b = b - np.dot(alpha, dj_db)

        j = cost_function(X, y, w, b, lambda_)
        
        # stop when converge (j_old - j_new < thresold)
        if (len(j_hist) > 100):
            if ((j_hist[i-1] - j) < 0.01):
                if i % 1000 == 0:
                    j_hist.append(j)
                    p_hist.append([w,b])
                break
        
        if i % 100 == 0:
            j_hist.append(j)
            p_hist.append([w,b])
        
    return w, b, j_hist, p_hist

In [227]:
w_out, b_out, j_hist, p_hist = gradient_descent(X_train_z_feature, y_train, w=w_init, b=b_init,alpha=0.01, num_iters=10000, cost_function=compute_cost_v, gradient_function=gradient_function_w_reg_v, lambda_=0.01)

Let's check is decrease gradually

In [228]:
j_hist

[150.874144559467,
 20.418817689256002,
 15.037417452154353,
 12.882233416437064,
 11.564416759769951,
 10.667440246909857,
 10.024387375693907,
 9.547409371367829,
 9.183882388730192,
 8.89997486686741,
 8.67307565142146,
 8.487732191958623,
 8.333229831343731,
 8.202053751625554,
 8.08887613083035,
 7.989876030935744,
 7.902277680818495,
 7.824034907016077,
 7.753614416955466,
 7.689846365403927,
 7.63182090432487,
 7.5788162558735195,
 7.530248453556698,
 7.485636016983675,
 7.44457494806728,
 7.406720883916261,
 7.371776230631846,
 7.339480779093601,
 7.309604767801273,
 7.2819436763793215,
 7.256314252390853,
 7.232551424988013,
 7.210505863052492,
 7.190042007492695,
 7.171036457286366,
 7.153376623560934,
 7.136959590204697,
 7.121691136441771,
 7.107484888713464,
 7.094261577625915,
 7.081948381709632,
 7.070478344025242,
 7.059789850749057,
 7.04982616313536,
 7.040534995924244,
 7.031868136516182,
 7.023781100186067,
 7.016232817344681,
 7.009185349433018,
 7.002603630496374,

In [229]:
print(f'result cost:{min(j_hist)} with parameter w:{w_out} b:{b_out}')

result cost:6.910042312907027 with parameter w:[-3.7410071   0.3082413  -1.1034199   0.46741147  0.10404217  0.15602278
 -3.21162092  1.17250876 -2.92070294  0.03653542  1.89109421  0.96863414
  0.23873259  0.17787232 -3.18575185  0.56419148  2.83090339 -0.58075488
 -1.2340833   0.95661075 -1.06570447  0.61943985 -0.56472081 -0.36396258
 -5.68439843  1.35225351] b:16.548287773715884


Test with gradient descent output w, b parameters

In [230]:
compute_cost_v(X_train_z_feature, y_train, w=w_out, b=b_out, lambda_=0.01)

6.909812033110577

Now let's check with unknown data (Test dataset)

In [231]:
compute_cost_v(X_test_z_feature, y_test, w=w_out, b=b_out, lambda_=0.01)

11.757004004660994

#### Let's plot Cost function

In [232]:
fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(
        x=[i for i in range(len(j_hist))],
        y=j_hist,
        mode='lines+markers'
        ))

# include shapes in layout
fig.update_layout(height=400, width=600, title_text="Cost Function")
fig.show()

Cost not decrease significantly after 300 iteration

Maybe we should consider to set threshold to interupt iteration after reach certain value ($ \epsilon $)

In [233]:
print(f'result cost:{min(j_hist)} with parameter w:{w_out} b:{b_out}')

result cost:6.910042312907027 with parameter w:[-3.7410071   0.3082413  -1.1034199   0.46741147  0.10404217  0.15602278
 -3.21162092  1.17250876 -2.92070294  0.03653542  1.89109421  0.96863414
  0.23873259  0.17787232 -3.18575185  0.56419148  2.83090339 -0.58075488
 -1.2340833   0.95661075 -1.06570447  0.61943985 -0.56472081 -0.36396258
 -5.68439843  1.35225351] b:16.548287773715884


In [234]:
y_train_pred = np.dot(X_train_z_feature, w_out) + b_out
y_test_pred = np.dot(X_test_z_feature, w_out) + b_out
y_train_pred_df = pd.DataFrame(y_train_pred)
y_test_pred_df = pd.DataFrame(y_test_pred)

Let's check our result with parameter w,b

and plot to see how the model fit the targets

In [235]:
def plot_result(X: pd.DataFrame, y: pd.DataFrame, y_pred: pd.DataFrame, columns):
    '''
    Plot relation between X input and y target
    
    Args:
        X (pd.DataFrame (m,n))  : Data, m examples, n features
        y (pd.DataFrame (m,1))  : target values, m values
        columns (int)             : number of desired subplot column
        
    Output
        Interation graph
    
    '''
    
    m = X.shape[1]
    rows = m // columns     # Get row
    frac = m % columns      # Get fractual
    row = 0
    col = 1

    if frac > 0:
        rows += 1
            
    fig = make_subplots(rows=rows, cols=columns)

    for i in range(m):
            
        if row >= rows:
            row = 1
            col += 1
        else:
            row += 1
        
        fig.add_trace(go.Scatter(
            x=X.iloc[:,i],
            y=y[y.columns[0]],
            mode='markers',
            name=X.columns[i],
            customdata=X.index.values,                                  # Add customdata for data's row index for more convinient to analysis
            hovertemplate="index:%{customdata} (X: %{x}, y: %{y})"
        ), row=row, col=col)
        
        fig.add_trace(go.Scatter(
            x=X.iloc[:,i],
            y=y_pred[y_pred.columns[0]],
            mode='markers',
            name=X.columns[i],
            customdata=X.index.values,                                  # Add customdata for data's row index for more convinient to analysis
            hovertemplate="index:%{customdata} (X: %{x}, y: %{y})"
        ), row=row, col=col)

    fig.update_layout(height=400 * rows, width=600 * columns, title_text='Relationship between All features / ' + y.columns.values[0])
    fig.show()

In [236]:
plot_result(X_train_z_feature_df, y_train_df, y_train_pred_df, 2)

Plot again with test set

In [237]:
plot_result(X_test_z_feature_df, y_test_df, y_test_pred_df, 2)

In [238]:
# Train model Evaluation
print('R^2:',metrics.r2_score(y_train, y_train_pred))
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_train, y_train_pred))*(len(y_train)-1)/(len(y_train)-X_test.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_train, y_train_pred))
print('MSE:',metrics.mean_squared_error(y_train, y_train_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_train, y_train_pred)))

R^2: 0.8482422381569039
Adjusted R^2: 0.8431836460954674
MAE: 2.555200584497176
MSE: 12.842083674956848
RMSE: 3.5835853101268356


In [239]:
# Test model Evaluation
print('R^2:',metrics.r2_score(y_test, y_test_pred))
print('Adjusted R^2:',1 - (1-metrics.r2_score(y_test, y_test_pred))*(len(y_test)-1)/(len(y_test)-X_test.shape[1]-1))
print('MAE:',metrics.mean_absolute_error(y_test, y_test_pred))
print('MSE:',metrics.mean_squared_error(y_test, y_test_pred))
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)))

R^2: 0.7292716756998898
Adjusted R^2: 0.68927771870101
MAE: 3.102417282328132
MSE: 22.53646761805768
RMSE: 4.747258958394589


Prediction in test set perform slightly poorly.