### Ridge Regression Using Wine Dataset

Ridge regression is a machine learning model in which we do not perform any statistical
diagnostics on the independent variables and just utilize the model to fit on test data and
check the accuracy of the fit.

In [1]:
#Importing Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge

In [2]:
 wine_quality = pd.read_csv("winequality-red.csv",sep=';')

In [4]:
wine_quality.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [5]:
wine_quality.rename(columns=lambda x: x.replace(" ", "_"), 
                    inplace=True)

In [6]:
all_colnms = ['fixed_acidity', 'volatile_acidity', 'citric_acid',
'residual_sugar', 'chlorides', 'free_sulfur_dioxide',
'total_sulfur_dioxide', 'density', 'pH', 'sulphates', 'alcohol']

In [9]:
pdx = wine_quality[all_colnms]
pdy = wine_quality["quality"]

Create the train and test data by randomly performing the data split. The random_state (random seed) is used for reproducible results:

In [10]:
x_train,x_test,y_train,y_test = train_test_split(pdx,pdy,train_size =0.7,
                                                 random_state=42)

A simple version of a grid search from scratch is described as follows, in which various
values of alphas are to be tested in a grid search to test the model's fitness:

In [11]:
alphas = [1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0]

Initial values of R-squared are set to 0 in order to keep track of its updated value and to
print whenever the new value is greater than the existing value:

In [12]:
initrsq = 0

In [23]:
print ("\nRidge Regression: Best Parameters\n")
for alph in alphas:
    ridge_reg = Ridge(alpha=alph)
    ridge_reg.fit(x_train,y_train)
    tr_rsqrd = ridge_reg.score(x_train,y_train)
    ts_rsqrd = ridge_reg.score(x_test,y_test)
    if ts_rsqrd > initrsq:
        print ("Lambda: ",alph,"Train R-Squared value:",round(tr_rsqrd,5),
               "Test R-squared value:",round(ts_rsqrd,5))
        initrsq = ts_rsqrd


Ridge Regression: Best Parameters

Lambda:  0.0001 Train R-Squared value: 0.3612 Test R-squared value: 0.35135


In [24]:
ridge_reg = Ridge(alpha=0.001)
ridge_reg.fit(x_train,y_train)
print ("\nRidge Regression coefficient values of Alpha = 0.001\n")
for i in range(11):
    print (all_colnms[i],": ",ridge_reg.coef_[i])


Ridge Regression coefficient values of Alpha = 0.001

fixed_acidity :  0.015506587508044107
volatile_acidity :  -1.1050982354876904
citric_acid :  -0.2487986553235121
residual_sugar :  0.004018895392835191
chlorides :  -1.684383962086345
free_sulfur_dioxide :  0.0046369017109631405
total_sulfur_dioxide :  -0.003283767904105511
density :  -5.5672717468030894
pH :  -0.3624800172040029
sulphates :  0.8009191228025626
alcohol :  0.2999182442952099
