# Dataset: Car Prices

In this homework, we will use the Car price dataset. Download it from here.

Or you can do it with wget:

wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv
We'll keep working with the MSRP variable, and we'll transform it to a classification task.

In [295]:
import pandas as pd

df = pd.read_csv('car_price_data.csv')

# Features
For the rest of the homework, you'll need to use only these columns:

1) Make
2) Model
3) Year
4) Engine HP
5) Engine Cylinders
6) Transmission Type
7) Vehicle Style
8) highway MPG
9) city mpg
10) MSRP

# Data preparation
Select only the features from above and transform their names using the next line:
data.columns = data.columns.str.replace(' ', '_').str.lower()
Fill in the missing values of the selected features with 0.
Rename MSRP variable to price.

In [296]:
df.columns = df.columns.str.replace(' ', '_').str.lower()
df.columns 

Index(['make', 'model', 'year', 'engine_fuel_type', 'engine_hp',
       'engine_cylinders', 'transmission_type', 'driven_wheels',
       'number_of_doors', 'market_category', 'vehicle_size', 'vehicle_style',
       'highway_mpg', 'city_mpg', 'popularity', 'msrp'],
      dtype='object')

In [297]:
df=df.rename(columns={'msrp':'price'})
df.columns

Index(['make', 'model', 'year', 'engine_fuel_type', 'engine_hp',
       'engine_cylinders', 'transmission_type', 'driven_wheels',
       'number_of_doors', 'market_category', 'vehicle_size', 'vehicle_style',
       'highway_mpg', 'city_mpg', 'popularity', 'price'],
      dtype='object')

In [298]:
df.isnull().sum()

make                    0
model                   0
year                    0
engine_fuel_type        3
engine_hp              69
engine_cylinders       30
transmission_type       0
driven_wheels           0
number_of_doors         6
market_category      3742
vehicle_size            0
vehicle_style           0
highway_mpg             0
city_mpg                0
popularity              0
price                   0
dtype: int64

In [299]:
df['engine_hp'] = df['engine_hp'].fillna(0)
df['engine_cylinders'] = df['engine_cylinders'].fillna(0)

# Question 1
What is the most frequent observation (mode) for the column transmission_type?

1) AUTOMATIC
2) MANUAL
3) AUTOMATED_MANUAL
4) DIRECT_DRIVE

In [300]:
df['transmission_type'].value_counts().to_frame('count')

Unnamed: 0_level_0,count
transmission_type,Unnamed: 1_level_1
AUTOMATIC,8266
MANUAL,2935
AUTOMATED_MANUAL,626
DIRECT_DRIVE,68
UNKNOWN,19


In [301]:
df['transmission_type'].mode().to_frame('frequent_col')

Unnamed: 0,frequent_col
0,AUTOMATIC


# Question 2
Create the correlation matrix for the numerical features of your dataset. In a correlation matrix, you compute the correlation coefficient between every pair of features in the dataset.

What are the two features that have the biggest correlation in this dataset?

1) engine_hp and year
2) engine_hp and engine_cylinders
3) highway_mpg and engine_cylinders
4) highway_mpg and city_mpg

In [302]:
df[['engine_hp','year', 'engine_cylinders', 'highway_mpg', 'city_mpg']].corr()

Unnamed: 0,engine_hp,year,engine_cylinders,highway_mpg,city_mpg
engine_hp,1.0,0.338714,0.774851,-0.415707,-0.424918
year,0.338714,1.0,-0.040708,0.25824,0.198171
engine_cylinders,0.774851,-0.040708,1.0,-0.614541,-0.587306
highway_mpg,-0.415707,0.25824,-0.614541,1.0,0.886829
city_mpg,-0.424918,0.198171,-0.587306,0.886829,1.0


# Make price binary
Now we need to turn the price variable from numeric into a binary format.
Let's create a variable above_average which is 1 if the price is above its mean value and 0 otherwise.

In [303]:
mean_val = df['price'].mean()
df['above_average'] = (df['price'] > mean_val).astype(int).fillna(0)

In [304]:
mean_val, df[['price', 'above_average']]

(40594.737032063116,
        price  above_average
 0      46135              1
 1      40650              1
 2      36350              0
 3      29450              0
 4      34500              0
 ...      ...            ...
 11909  46120              1
 11910  56670              1
 11911  50620              1
 11912  50920              1
 11913  28995              0
 
 [11914 rows x 2 columns])

# Split the data
Split your data in train/val/test sets with 60%/20%/20% distribution.
Use Scikit-Learn for that (the train_test_split function) and set the seed to 42.
Make sure that the target value (above_average) is not in your dataframe.


In [340]:
from sklearn.model_selection import train_test_split

data = df.copy()
data = data[['make', 'model', 'year', 'engine_hp', 'engine_cylinders', 'transmission_type', 'vehicle_style', 
             'highway_mpg', 'city_mpg', 'price', 'above_average']]

X = data
y = data['above_average']

# del data['above_average']
data.columns, X, y

(Index(['make', 'model', 'year', 'engine_hp', 'engine_cylinders',
        'transmission_type', 'vehicle_style', 'highway_mpg', 'city_mpg',
        'price', 'above_average'],
       dtype='object'),
           make       model  year  engine_hp  engine_cylinders  \
 0          BMW  1 Series M  2011      335.0               6.0   
 1          BMW    1 Series  2011      300.0               6.0   
 2          BMW    1 Series  2011      300.0               6.0   
 3          BMW    1 Series  2011      230.0               6.0   
 4          BMW    1 Series  2011      230.0               6.0   
 ...        ...         ...   ...        ...               ...   
 11909    Acura         ZDX  2012      300.0               6.0   
 11910    Acura         ZDX  2012      300.0               6.0   
 11911    Acura         ZDX  2012      300.0               6.0   
 11912    Acura         ZDX  2013      300.0               6.0   
 11913  Lincoln      Zephyr  2006      221.0               6.0   
 
       t

In [341]:
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [342]:
X_train, X_val, y_train,y_val = train_test_split(X_train_full, y_train_full, test_size=0.25, random_state=42)

In [343]:
data.shape, len(X_train) , len(X_val) , len(X_test)

((11914, 11), 7148, 2383, 2383)

In [344]:
y_train, y_val, y_test

(3972     0
 1997     0
 5216     1
 2805     0
 11369    0
         ..
 9232     0
 5710     0
 11306    0
 4414     0
 10286    0
 Name: above_average, Length: 7148, dtype: int32,
 1918     0
 9951     1
 5486     0
 292      0
 3644     0
         ..
 4385     0
 7339     0
 9806     0
 11162    1
 3256     1
 Name: above_average, Length: 2383, dtype: int32,
 3995     0
 7474     0
 7300     0
 3148     0
 747      0
         ..
 267      0
 4320     1
 5799     0
 6080     0
 11511    1
 Name: above_average, Length: 2383, dtype: int32)

# Question 3
Calculate the mutual information score between above_average and other categorical variables in our dataset. Use the training set only.
Round the scores to 2 decimals using round(score, 2).
Which of these variables has the lowest mutual information score?

1) make
2) model
3) transmission_type
4) vehicle_style

In [345]:
from sklearn.metrics import mutual_info_score

def calculate_mi(series):
    return mutual_info_score(series, X_train['above_average'])

cols = ['make', 'model', 'transmission_type', 'vehicle_style']

df_mi = X_train[cols].apply(calculate_mi)
df_mi = df_mi.sort_values(ascending=False).to_frame('MI')
df_mi

Unnamed: 0,MI
model,0.462344
make,0.239769
vehicle_style,0.084143
transmission_type,0.020958


In [346]:
del X_train['above_average']
del X_test['above_average']
del X_val['above_average']

# Question 4
Now let's train a logistic regression. Remember that we have several categorical variables in the dataset. 

Include them using one-hot encoding.
Fit the model on the training dataset.
To make sure the results are reproducible across different versions of Scikit-Learn, fit the model with these parameters:
model = LogisticRegression(solver='liblinear', C=10, max_iter=1000, random_state=42)

Calculate the accuracy on the validation dataset and round it to 2 decimal digits.
What accuracy did you get?

1) 0.60
2) 0.72
3) 0.84
4) 0.95

In [347]:
from IPython.display import display
for col in data.columns:
    display(data[col].value_counts().to_frame(col))

Unnamed: 0_level_0,make
make,Unnamed: 1_level_1
Chevrolet,1123
Ford,881
Volkswagen,809
Toyota,746
Dodge,626
Nissan,558
GMC,515
Honda,449
Mazda,423
Cadillac,397


Unnamed: 0_level_0,model
model,Unnamed: 1_level_1
Silverado 1500,156
Tundra,140
F-150,126
Sierra 1500,90
Beetle Convertible,89
...,...
MKZ Hybrid,1
M4 GTS,1
LFA,1
Horizon,1


Unnamed: 0_level_0,year
year,Unnamed: 1_level_1
2015,2170
2016,2157
2017,1668
2014,589
2012,387
2009,379
2013,366
2008,349
2007,345
2010,298


Unnamed: 0_level_0,engine_hp
engine_hp,Unnamed: 1_level_1
200.0,456
170.0,351
210.0,320
240.0,268
285.0,246
...,...
557.0,1
661.0,1
451.0,1
660.0,1


Unnamed: 0_level_0,engine_cylinders
engine_cylinders,Unnamed: 1_level_1
4.0,4752
6.0,4489
8.0,2031
12.0,230
5.0,225
0.0,86
10.0,68
3.0,30
16.0,3


Unnamed: 0_level_0,transmission_type
transmission_type,Unnamed: 1_level_1
AUTOMATIC,8266
MANUAL,2935
AUTOMATED_MANUAL,626
DIRECT_DRIVE,68
UNKNOWN,19


Unnamed: 0_level_0,vehicle_style
vehicle_style,Unnamed: 1_level_1
Sedan,3048
4dr SUV,2488
Coupe,1211
Convertible,793
4dr Hatchback,702
Crew Cab Pickup,681
Extended Cab Pickup,623
Wagon,592
2dr Hatchback,506
Passenger Minivan,417


Unnamed: 0_level_0,highway_mpg
highway_mpg,Unnamed: 1_level_1
24,876
23,801
26,778
22,753
25,731
28,682
27,585
31,568
30,547
20,515


Unnamed: 0_level_0,city_mpg
city_mpg,Unnamed: 1_level_1
17,1230
16,1106
15,1038
18,997
19,841
...,...
89,1
7,1
129,1
86,1


Unnamed: 0_level_0,price
price,Unnamed: 1_level_1
2000,1036
29995,19
25995,19
20995,16
27995,16
...,...
18855,1
22575,1
20050,1
26965,1


Unnamed: 0_level_0,above_average
above_average,Unnamed: 1_level_1
0,8645
1,3269


In [348]:
from sklearn.feature_extraction import DictVectorizer

In [349]:
X_train

Unnamed: 0,make,model,year,engine_hp,engine_cylinders,transmission_type,vehicle_style,highway_mpg,city_mpg,price
3972,Mitsubishi,Endeavor,2011,225.0,6.0,AUTOMATIC,4dr SUV,19,15,33599
1997,Kia,Borrego,2009,276.0,6.0,AUTOMATIC,4dr SUV,21,17,26245
5216,Lamborghini,Gallardo,2012,570.0,10.0,MANUAL,Convertible,20,12,248000
2805,Chevrolet,Colorado,2016,200.0,4.0,AUTOMATIC,Crew Cab Pickup,27,20,24990
11369,Pontiac,Vibe,2009,158.0,4.0,AUTOMATIC,4dr Hatchback,26,20,20475
...,...,...,...,...,...,...,...,...,...,...
9232,Toyota,Sienna,2016,266.0,6.0,AUTOMATIC,Passenger Minivan,25,18,37655
5710,Chevrolet,HHR,2009,260.0,4.0,MANUAL,Wagon,29,21,25135
11306,Hyundai,Veracruz,2012,260.0,6.0,AUTOMATIC,4dr SUV,22,17,28345
4414,Mitsubishi,Expo,1993,136.0,4.0,MANUAL,2dr Hatchback,26,19,2000


In [378]:
def one_hot_encoding(X):
    X1 = X.copy()
    X1 = X1[['year', 'engine_hp', 'engine_cylinders', 'highway_mpg', 'city_mpg', 'price', 'transmission_type', 'vehicle_style']]
    X1['age'] = 2023 - X1['year']
    X1.drop(columns=['year'], inplace=True)
    X1=X1.fillna(0)
    X_dict = X1.to_dict(orient='records')
    dv = DictVectorizer(sparse=False)
    dv.fit(X_dict)
    return dv, dv.transform(X_dict)

In [379]:
dv, train_dataset=one_hot_encoding(X_train)
dv.get_feature_names_out()
dict(zip(dv.get_feature_names_out(), model.coef_[0].round(3)))

{'age': -0.301,
 'city_mpg': -0.131,
 'engine_cylinders': -0.093,
 'engine_hp': -0.025,
 'highway_mpg': -0.164,
 'price': 0.0,
 'transmission_type=AUTOMATED_MANUAL': -0.003,
 'transmission_type=AUTOMATIC': -0.021,
 'transmission_type=DIRECT_DRIVE': 0.003,
 'transmission_type=MANUAL': -0.001,
 'transmission_type=UNKNOWN': -0.0,
 'vehicle_style=2dr Hatchback': -0.001,
 'vehicle_style=2dr SUV': -0.0,
 'vehicle_style=4dr Hatchback': 0.0,
 'vehicle_style=4dr SUV': -0.012,
 'vehicle_style=Cargo Minivan': -0.0,
 'vehicle_style=Cargo Van': -0.001,
 'vehicle_style=Convertible': -0.0,
 'vehicle_style=Convertible SUV': 0.0,
 'vehicle_style=Coupe': 0.007,
 'vehicle_style=Crew Cab Pickup': -0.002,
 'vehicle_style=Extended Cab Pickup': -0.003,
 'vehicle_style=Passenger Minivan': -0.004,
 'vehicle_style=Passenger Van': -0.007,
 'vehicle_style=Regular Cab Pickup': -0.002,
 'vehicle_style=Sedan': 0.006,
 'vehicle_style=Wagon': -0.002}

In [380]:
dv, test_dataset=one_hot_encoding(X_test)
dict(zip(dv.get_feature_names_out(), model.coef_[0].round(3)))

{'age': -0.301,
 'city_mpg': -0.131,
 'engine_cylinders': -0.093,
 'engine_hp': -0.025,
 'highway_mpg': -0.164,
 'price': 0.0,
 'transmission_type=AUTOMATED_MANUAL': -0.003,
 'transmission_type=AUTOMATIC': -0.021,
 'transmission_type=DIRECT_DRIVE': 0.003,
 'transmission_type=MANUAL': -0.001,
 'transmission_type=UNKNOWN': -0.0,
 'vehicle_style=2dr Hatchback': -0.001,
 'vehicle_style=2dr SUV': -0.0,
 'vehicle_style=4dr Hatchback': 0.0,
 'vehicle_style=4dr SUV': -0.012,
 'vehicle_style=Cargo Minivan': -0.0,
 'vehicle_style=Cargo Van': -0.001,
 'vehicle_style=Convertible': -0.0,
 'vehicle_style=Convertible SUV': 0.0,
 'vehicle_style=Coupe': 0.007,
 'vehicle_style=Crew Cab Pickup': -0.002,
 'vehicle_style=Extended Cab Pickup': -0.003,
 'vehicle_style=Passenger Minivan': -0.004,
 'vehicle_style=Passenger Van': -0.007,
 'vehicle_style=Regular Cab Pickup': -0.002,
 'vehicle_style=Sedan': 0.006,
 'vehicle_style=Wagon': -0.002}

In [381]:
dv, val_dataset=one_hot_encoding(X_val)
dict(zip(dv.get_feature_names_out(), model.coef_[0].round(3)))

{'age': -0.301,
 'city_mpg': -0.131,
 'engine_cylinders': -0.093,
 'engine_hp': -0.025,
 'highway_mpg': -0.164,
 'price': 0.0,
 'transmission_type=AUTOMATED_MANUAL': -0.003,
 'transmission_type=AUTOMATIC': -0.021,
 'transmission_type=DIRECT_DRIVE': 0.003,
 'transmission_type=MANUAL': -0.001,
 'transmission_type=UNKNOWN': -0.0,
 'vehicle_style=2dr Hatchback': -0.001,
 'vehicle_style=2dr SUV': -0.0,
 'vehicle_style=4dr Hatchback': 0.0,
 'vehicle_style=4dr SUV': -0.012,
 'vehicle_style=Cargo Minivan': -0.0,
 'vehicle_style=Cargo Van': -0.001,
 'vehicle_style=Convertible': -0.0,
 'vehicle_style=Convertible SUV': 0.0,
 'vehicle_style=Coupe': 0.007,
 'vehicle_style=Crew Cab Pickup': -0.002,
 'vehicle_style=Extended Cab Pickup': -0.003,
 'vehicle_style=Passenger Minivan': -0.004,
 'vehicle_style=Passenger Van': -0.007,
 'vehicle_style=Regular Cab Pickup': -0.002,
 'vehicle_style=Sedan': 0.006,
 'vehicle_style=Wagon': -0.002}

In [382]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(solver='liblinear', C=10, max_iter=1000, random_state=42)
model.fit(train_dataset, y_train)

In [383]:
model.intercept_[0]

-0.021792062906585852

In [384]:
model.coef_[0].round(3)

array([-0.301, -0.131, -0.093, -0.025, -0.164,  0.   , -0.003, -0.021,
        0.003, -0.001, -0.   , -0.001, -0.   ,  0.   , -0.012, -0.   ,
       -0.001, -0.   ,  0.   ,  0.007, -0.002, -0.003, -0.004, -0.007,
       -0.002,  0.006, -0.002])

In [385]:
train_dataset.shape, val_dataset.shape

((7148, 27), (2383, 27))

In [386]:
y_pred=model.predict_proba(val_dataset)[:,1]
price=y_pred>0.5

In [388]:
round((y_val == price).mean(),2)

0.96

# Question 5
Let's find the least useful feature using the feature elimination technique.
Train a model with all these features (using the same parameters as in Q4).
Now exclude each feature from this set and train a model without it. Record the accuracy for each model.
For each feature, calculate the difference between the original accuracy and the accuracy without the feature.
Which of following feature has the smallest difference?

1) year
2) engine_hp
3) transmission_type
4) city_mpg

In [402]:
import numpy as np
def one_hot_encoding_v2(X, feature_set):
    X1 = X.copy()
    X1 = X1[feature_set]
    if 'year' in feature_set: 
        X1['age'] = 2023 - X1['year']
        X1.drop(columns=['year'], inplace=True)
#     if 'price' in feature_set:
#         X1['price'] = np.log1p(X1['price'])
    X1=X1.fillna(0)
    X_dict = X1.to_dict(orient='records')
    dv = DictVectorizer(sparse=False)
    dv.fit(X_dict)
    return dv.transform(X_dict)

In [403]:
from sklearn.linear_model import LogisticRegression
def train_return_accuracy(train, val, y_train, y_val):
    model = LogisticRegression(solver='liblinear', C=10, max_iter=1000, random_state=42)
    model.fit(train, y_train)
    y_pred=model.predict_proba(val)[:,1]
    price=y_pred>0.5
    return (y_val == price).mean()

In [406]:
feature_cols=['year', 'engine_hp', 'engine_cylinders', 'highway_mpg', 'city_mpg', 'price', 'transmission_type', 'vehicle_style']
train_dataset = one_hot_encoding_v2(X_train, feature_cols)
val_dataset = one_hot_encoding_v2(X_val, feature_cols)
original_accuracy = train_return_accuracy(train_dataset, val_dataset, y_train, y_val)
for f_col in feature_cols: 
    feature_set = [f for f in feature_cols if f != f_col]
    train_wo = one_hot_encoding_v2(X_train, feature_set)
    val_wo = one_hot_encoding_v2(X_val, feature_set)
    accuracy_wo = train_return_accuracy(train_wo, val_wo, y_train, y_val)
    print(f'The accuracy diff for {f_col} = {original_accuracy - accuracy_wo}')

The accuracy diff for year = 0.010071338648762085
The accuracy diff for engine_hp = 0.0012589173310952884
The accuracy diff for engine_cylinders = 0.0
The accuracy diff for highway_mpg = -0.002937473772555599
The accuracy diff for city_mpg = -0.0025178346621905767
The accuracy diff for price = 0.07469576164498526
The accuracy diff for transmission_type = -0.002937473772555599
The accuracy diff for vehicle_style = -0.002937473772555599


# Question 6
For this question, we'll see how to use a linear regression model from Scikit-Learn.
We'll need to use the original column price. 
Apply the logarithmic transformation to this column.
Fit the Ridge regression model on the training data with a solver 'sag'. 
Set the seed to 42.
This model also has a parameter alpha. Let's try the following values: [0, 0.01, 0.1, 1, 10].
Round your RMSE scores to 3 decimal digits.
Which of these alphas leads to the best RMSE on the validation set?


1) 0
2) 0.01
3) 0.1
4) 1
5) 10

In [436]:
import numpy as np
from sklearn.model_selection import train_test_split
feature_cols=['engine_hp', 'engine_cylinders', 'highway_mpg', 'city_mpg', 'transmission_type', 'vehicle_style', 'price', 'year']
def prepare_dataset():
   reg_data = df.copy()
   reg_data = reg_data[feature_cols]
   reg_data['price'] = np.log1p(reg_data['price'])
   if 'year' in feature_cols: 
        reg_data['age'] = 2023 - reg_data['year']
        reg_data.drop(columns=['year'], inplace=True)
   X = reg_data
   y = reg_data['price']
   return X, y

X, y = prepare_dataset()
X_rtrain_full, X_rtest, y_rtrain_full, y_rtest = train_test_split(X, y, test_size=0.2, random_state=42)
X_rtrain, X_rval, y_rtrain, y_rval = train_test_split(X_rtrain_full, y_rtrain_full, test_size=0.25, random_state=42)
del X_rtrain['price']
del X_rval['price']
del X_rtest['price']

In [437]:
def one_hot_encoding_v3(X):
    X1 = X.copy()
    X1=X1.fillna(0)
    X_dict = X1.to_dict(orient='records')
    dv = DictVectorizer(sparse=False)
    dv.fit(X_dict)
    return dv.transform(X_dict)

In [438]:
rtrain_dataset = one_hot_encoding_v3(X_rtrain)
rval_dataset = one_hot_encoding_v3(X_rval)
y_rtrain = y_rtrain.values
y_rval = y_rval.values

In [452]:
y_rval, y_rtrain

(array([10.26381581, 11.00544424,  9.90802723, ...,  9.99747868,
        11.72558222, 10.87749987]),
 array([10.42228135, 10.17526888, 12.42118806, ..., 10.25224121,
         7.60140233, 10.60214453]))

In [439]:
rtrain_dataset.shape, rval_dataset.shape

((7148, 26), (2383, 26))

In [443]:
def calc_rmse(y_pred, y_orig):
    yp = y_pred.copy()
    yo = y_orig.copy()
    y_diff = yp - yo
    mse = (y_diff ** 2).mean()
    return np.round(np.sqrt(mse),3)

In [449]:
from sklearn.linear_model import Ridge
def train_return_rmse(alpha):
    model = Ridge(solver='sag', alpha=alpha, random_state=42)
    model.fit(rtrain_dataset, y_rtrain)
    y_pred=model.predict(rval_dataset)
    print(model.coef_)
    return calc_rmse(y_pred, y_val)

In [450]:
model.coef_

array([[-3.00780468e-01, -1.30863616e-01, -9.28365045e-02,
        -2.53497318e-02, -1.63588812e-01,  4.15404253e-04,
        -2.51294434e-03, -2.11035771e-02,  2.57161153e-03,
        -7.45476881e-04, -1.67614000e-06, -7.43255058e-04,
        -4.26878202e-04,  1.42507274e-04, -1.23984483e-02,
        -4.04633710e-04, -1.11714193e-03, -1.38247775e-04,
         3.23956030e-05,  6.86564386e-03, -1.84438221e-03,
        -3.04365920e-03, -4.06768643e-03, -7.20888123e-03,
        -1.74036603e-03,  5.92174841e-03, -1.62077794e-03]])

In [451]:
alphas=[0, 0.01, 0.1, 1, 10]
for a in alphas: 
    rmse=train_return_rmse(a)
    print(f'The rmse for alpha {a} = {rmse}')



[-0.08964204  0.00982103  0.07629415  0.00352139 -0.0029892   0.09395597
  0.00365594  0.03262307 -0.11564706 -0.01458792 -0.01169672 -0.0155583
 -0.05427659  0.04794918  0.00528172 -0.03993496  0.20882252  0.00634363
  0.05677201 -0.07903619 -0.11449467  0.04781628 -0.03917726 -0.07175323
  0.00281268  0.05012992]
The rmse for alpha 0 = 9.86




[-0.08964203  0.00982102  0.07629393  0.0035214  -0.00298919  0.09395526
  0.00365613  0.03262278 -0.11564638 -0.01458779 -0.01169666 -0.01555817
 -0.05427619  0.04794889  0.00528167 -0.0399346   0.20882089  0.00634357
  0.0567715  -0.07903557 -0.11449378  0.0478159  -0.03917691 -0.07175265
  0.0028126   0.0501295 ]
The rmse for alpha 0.01 = 9.86




[-0.08964191  0.00982095  0.07629197  0.00352144 -0.00298907  0.09394886
  0.00365784  0.03262019 -0.11564026 -0.01458663 -0.01169606 -0.01555698
 -0.05427256  0.04794629  0.0052813  -0.03993141  0.20880622  0.00634305
  0.05676689 -0.07902993 -0.11448576  0.04781249 -0.03917378 -0.07174745
  0.00281189  0.0501258 ]
The rmse for alpha 0.1 = 9.86




[-0.08964078  0.00982023  0.07627234  0.00352188 -0.00298789  0.09388489
  0.00367487  0.03259429 -0.11557905 -0.01457501 -0.01169009 -0.0155451
 -0.05423629  0.04792029  0.00527759 -0.03989954  0.20865963  0.00633782
  0.05672086 -0.07897357 -0.11440555  0.04777846 -0.03914254 -0.07169546
  0.00280477  0.05008873]
The rmse for alpha 1 = 9.86
[-0.08962949  0.00981307  0.07607683  0.00352621 -0.00297615  0.0932486
  0.00384396  0.03233688 -0.11496996 -0.01445948 -0.0116307  -0.01542698
 -0.05387549  0.04766156  0.00524065 -0.03958262  0.20720164  0.00628592
  0.05626315 -0.07841307 -0.11360785  0.04743995 -0.03883194 -0.07117852
  0.00273413  0.04972016]
The rmse for alpha 10 = 9.86


