# Ridge Regression

* The aim is to find the coefficients of independent variables that minimize the Root Mean Square Error(RMSE) by applying a penalty to these coefficients.



* Ridge Regression has a resistance to overfitting



* Ridge Regression has biased but its variance is low.



* Ridge Regression create a model with all independent variables in the data set.



* Ridge Regression does not remove irrelevant independent variables from the model, it brings the coefficients of irrelevant independent variables closer to zero.

# 1-)Data Preprocessing

In [1]:
import pandas as pd
hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)


In [2]:
from sklearn.model_selection import train_test_split

In [3]:
X_train,X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.25, 
                                                    random_state=42)

print("X_train", X_train.shape)

print("y_train",y_train.shape)

print("X_test",X_test.shape)

print("y_test",y_test.shape)

training = df.copy()

print("training", training.shape)

X_train (197, 19)
y_train (197,)
X_test (66, 19)
y_test (66,)
training (263, 20)


# 2-)Model

In [4]:
from sklearn.linear_model import Ridge

In [5]:
ridge_model = Ridge(alpha = 0.1).fit(X_train, y_train)

* We chose ourselves alpha = 0.1 
    

* We don't know whether this value(alpha = 0.1 ) is the best or not.


* We will try to find the optimum value of alpha with the cross validation method.That's why we will tune the model 

In [6]:
ridge_model.intercept_

-4.578626905721308

In [7]:
ridge_model.coef_ # the coeeficeints of independent variables

array([ -1.77435737,   8.80240528,   7.29595605,  -3.33257639,
        -2.08316481,   5.42531283,   7.58514945,  -0.13752764,
        -0.20779701,  -0.60361067,   1.7927957 ,   0.72866408,
        -0.68710375,   0.26153564,   0.26888652,  -0.52674278,
       112.14640272, -99.80997876, -48.07152768])

# 3-) Prediction

In [8]:
y_pred = ridge_model.predict(X_test)
y_pred[0:5]

array([ 611.91293736,  695.30187895, 1013.49571186,  409.96645806,
        416.01757052])

In [9]:
import numpy as np
from sklearn.metrics import mean_squared_error

In [10]:

test_error= np.sqrt(mean_squared_error(y_test, y_pred))
test_error

357.02392526076585

# 4-)Model tuning

* We will try to find the optimum alpha value using the cros validation method.




* In other words, instead of choosing the alpha value randomly ,which is called a hyperparameter we will find it with the cross validation method.



* Finally, we will create the final model by using the most optimum alpha value.



In [11]:
alphas = 10**np.linspace(10,-2,100)*0.5 
len(alphas)



100

* We create 100 alpha valueS. We will try to find  the most optimum alpha value among them.

In [12]:
# ridge regression cross validation method

from sklearn.linear_model import RidgeCV

ridge_cv = RidgeCV(alphas = alphas, 
                   scoring = "neg_mean_squared_error",
                   normalize = True)

In [13]:
ridge_cv.fit(X_train, y_train)

RidgeCV(alphas=array([5.00000000e+09, 3.78231664e+09, 2.86118383e+09, 2.16438064e+09,
       1.63727458e+09, 1.23853818e+09, 9.36908711e+08, 7.08737081e+08,
       5.36133611e+08, 4.05565415e+08, 3.06795364e+08, 2.32079442e+08,
       1.75559587e+08, 1.32804389e+08, 1.00461650e+08, 7.59955541e+07,
       5.74878498e+07, 4.34874501e+07, 3.28966612e+07, 2.48851178e+07,
       1.88246790e+07, 1.42401793e+0...
       1.00461650e+00, 7.59955541e-01, 5.74878498e-01, 4.34874501e-01,
       3.28966612e-01, 2.48851178e-01, 1.88246790e-01, 1.42401793e-01,
       1.07721735e-01, 8.14875417e-02, 6.16423370e-02, 4.66301673e-02,
       3.52740116e-02, 2.66834962e-02, 2.01850863e-02, 1.52692775e-02,
       1.15506485e-02, 8.73764200e-03, 6.60970574e-03, 5.00000000e-03]),
        normalize=True, scoring='neg_mean_squared_error')

In [14]:
ridge_cv.alpha_ # the most optimum alpha value

0.7599555414764666

## 4.1-)Tuned Model

In [15]:
ridge_tuned_model=Ridge(alpha = ridge_cv.alpha_, normalize = True).fit(X_train,y_train)

## 4-2) Prediction with Tuned Model

In [21]:
y_pred1=ridge_tuned_model.predict(X_test)
y_pred1[0:5]

array([623.02977914, 684.23447509, 907.17971151, 417.90812634,
       548.22616565])

In [23]:
test_error=np.sqrt(mean_squared_error(y_test , y_pred1 ))
 
test_error

386.6826429756415

In [24]:
ridge_tuned_model.intercept_

-51.10797062603592

In [25]:
ridge_tuned_model.coef_

array([ 1.44624890e-01,  1.04791010e+00,  1.36872965e+00,  1.20061792e+00,
        7.82870997e-01,  1.55552882e+00,  3.76433192e+00,  1.20382822e-02,
        5.32415433e-02,  2.55307367e-01,  1.08012329e-01,  9.63035626e-02,
        8.68878351e-02,  1.62796122e-01,  3.20045418e-02, -1.01265296e+00,
        2.89891038e+01, -6.76670272e+01,  1.27302535e+01])