# Exercise 6

## SVM & Regularization


For this homework we consider a set of observations on a number of red and white wine varieties involving their chemical properties and ranking by tasters. Wine industry shows a recent growth spurt as social drinking is on the rise. The price of wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the wine market, it would be of interest if human quality of tasting can be related to the chemical properties of wine so that certification and quality assessment and assurance process is more controlled.

Two datasets are available of which one dataset is on red wine and have 1599 different varieties and the other is on white wine and have 4898 varieties. All wines are produced in a particular area of Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking from 1 (worst) to 10 (best). Each variety of wine is tasted by three independent tasters and the final rank assigned is the median rank given by the tasters.

A predictive model developed on this data is expected to provide guidance to vineyards regarding quality and price expected on their produce without heavy reliance on volatility of wine tasters.

In [1]:
import pandas as pd
import numpy as np

In [2]:
%%html
<style>
table {float:left}
</style>

In [3]:
data_r = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/Wine_data_red.csv')
data_w = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/Wine_data_white.csv')

In [4]:
data = data_w.assign(type = 'white')

data = data.append(data_r.assign(type = 'red'), ignore_index=True)
data.sample(5)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,type
5036,7.8,0.56,0.19,2.1,0.081,15.0,105.0,0.9962,3.33,0.54,9.5,5,red
2124,7.7,0.39,0.28,4.9,0.035,36.0,109.0,0.9918,3.19,0.58,12.2,7,white
2885,6.9,0.4,0.3,10.6,0.033,24.0,87.0,0.99265,3.15,0.45,12.8,6,white
3636,6.5,0.26,0.39,1.4,0.02,12.0,66.0,0.99089,3.25,0.75,11.3,7,white
5749,9.3,0.43,0.44,1.9,0.085,9.0,22.0,0.99708,3.28,0.55,9.5,5,red


# Exercise 6.1

Show the frecuency table of the quality by type of wine

In [5]:
data.pivot_table(values='fixed acidity', index='quality',columns='type', aggfunc='count')

type,red,white
quality,Unnamed: 1_level_1,Unnamed: 2_level_1
3,10.0,20.0
4,53.0,163.0
5,681.0,1457.0
6,638.0,2198.0
7,199.0,880.0
8,18.0,175.0
9,,5.0


# SVM

# Exercise 6.2

* Standarized the features (not the quality)
* Create a binary target for each type of wine
* Create two Linear SVM's for the white and red wines, repectively.


In [6]:
#Create a binary target
data['quality2'] = [1 if i > 6 else 0 for i in data['quality']]

#Divide the df by type of wine
data_red = data[data['type'] == 'red']
data_white = data[data['type'] == 'white']

In [7]:
#Red wine df
data_red.sample(5)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,type,quality2
5010,8.4,0.6,0.1,2.2,0.085,14.0,111.0,0.9964,3.15,0.66,9.8,5,red,0
6152,7.8,0.7,0.06,1.9,0.079,20.0,35.0,0.99628,3.4,0.69,10.9,5,red,0
5170,10.9,0.37,0.58,4.0,0.071,17.0,65.0,0.99935,3.22,0.78,10.1,5,red,0
6384,6.8,0.68,0.21,2.1,0.07,9.0,23.0,0.99546,3.38,0.6,10.3,5,red,0
6118,10.9,0.32,0.52,1.8,0.132,17.0,44.0,0.99734,3.28,0.77,11.5,6,red,0


In [8]:
#White wine df
data_white.sample(5)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,type,quality2
3032,6.7,0.14,0.46,1.6,0.036,15.0,92.0,0.99264,3.37,0.49,10.9,5,white,0
737,6.9,0.38,0.25,9.8,0.04,28.0,191.0,0.9971,3.28,0.61,9.2,5,white,0
1107,7.2,0.37,0.15,2.0,0.029,27.0,87.0,0.9903,3.3,0.59,12.6,7,white,1
1754,6.4,0.16,0.28,2.2,0.042,33.0,93.0,0.9914,3.31,0.43,11.1,6,white,0
4513,6.8,0.4,0.29,2.8,0.044,27.0,97.0,0.9904,3.12,0.42,11.2,6,white,0


### Linear SVM for Red Wine

In [9]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
from sklearn import metrics

In [10]:
#Create X and y
X_red = data_red[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values

y_red = data_red['quality2'].values

#Standarized features
scaler = StandardScaler()
scaler.fit(X_red.astype(np.float))
X_red = scaler.transform(X_red.astype(np.float))

X_train_red, X_test_red, y_train_red, y_test_red = train_test_split(X_red, y_red, test_size=0.3, random_state=0)

In [11]:
#SVM Model
clf = SVC(kernel='linear')
clf.fit(X_train_red, y_train_red)

#Accuracy
y_pred_red = clf.predict(X_test_red)
print('Accuracy of SVM classifier on test set: {:.4f}'.format(accuracy_score(y_test_red, y_pred_red)))
print('-----------------------------------------------')

#Confusion matrix
confusion_matrix1 = confusion_matrix(y_test_red, y_pred_red)
print('Confusion matrix')
print(confusion_matrix1)

Accuracy of SVM classifier on test set: 0.8854
-----------------------------------------------
Confusion matrix
[[410  20]
 [ 35  15]]


### Linear SVM for White Wine

In [12]:
#Create X and y
X_white = data_white[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values
y_white = data_white['quality2'].values

#Standarized features white wine dataframe
scaler_ = StandardScaler()
scaler_.fit(X_white.astype(np.float))
X_white = scaler_.transform(X_white.astype(np.float))

X_train_white, X_test_white, y_train_white, y_test_white = train_test_split(X_white, y_white, test_size=0.3, random_state=0)

In [13]:
#SVM Model
clf_ = SVC(kernel='linear')
clf_.fit(X_train_white, y_train_white)

#Accuracy
y_pred_white = clf_.predict(X_test_white)
print('Accuracy of SVM classifier on test set: {:.4f}'.format(accuracy_score(y_test_white, y_pred_white)))
print('-----------------------------------------------')

#Confusion matrix
confusion_matrix_ = confusion_matrix(y_test_white, y_pred_white)
print('Confusion matrix')
print(confusion_matrix_)

Accuracy of SVM classifier on test set: 0.7871
-----------------------------------------------
Confusion matrix
[[1157    0]
 [ 313    0]]


# Exercise 6.3

Test the two SVM's using the different kernels (‘poly’, ‘rbf’, ‘sigmoid’)


### SVM for Red Wine

In [14]:
kernels = ['poly','rbf','sigmoid']

svm_red = []

for i in kernels:
    clf_red = SVC(kernel= i, gamma='auto')
    clf_red.fit(X_train_red, y_train_red)
    y_pred_red = clf_red.predict(X_test_red)
    svm_red.append({'Kernel': i, 'Accuraccy': accuracy_score(y_test_red, y_pred_red)})

df_svm_red = pd.DataFrame(svm_red)
df_svm_red = df_svm_red.set_index('Kernel')
df_svm_red

Unnamed: 0_level_0,Accuraccy
Kernel,Unnamed: 1_level_1
poly,0.9
rbf,0.9125
sigmoid,0.8625


### SVM for White Wine

In [15]:
svm_white = []

for i in kernels:
    clf_white = SVC(kernel=i, gamma='auto')
    clf_white.fit(X_train_white, y_train_white)
    y_pred_white = clf_white.predict(X_test_white)
    svm_white.append({'Kernel': i, 'Accuraccy': accuracy_score(y_test_white, y_pred_white)})

df_svm_white = pd.DataFrame(svm_white)
df_svm_white = df_svm_white.set_index('Kernel')
df_svm_white

Unnamed: 0_level_0,Accuraccy
Kernel,Unnamed: 1_level_1
poly,0.802041
rbf,0.817687
sigmoid,0.74898


# Exercise 6.4
Using the best SVM find the parameters that gives the best performance

'C': [0.1, 1, 10, 100, 1000], 'gamma': [0.01, 0.001, 0.0001]

### SVM for Red Wine

##### Best SVM: when using `rbf` kernel

In [16]:
C = [0.1, 1, 10, 100, 1000]
gamma = [0.01, 0.001, 0.0001]
rvf_red = []

for i in C:
    for j in gamma:
        clf_red = SVC(C=i, kernel='rbf', gamma=j)
        clf_red.fit(X_train_red, y_train_red)
        y_pred_red = clf_red.predict(X_test_red)
        rvf_red.append({'C': i, 'gamma': j,'Accuracy': accuracy_score(y_test_red, y_pred_red)})

df_rvf_red = pd.DataFrame(rvf_red)
df_rvf_red = df_rvf_red.sort_values(['Accuracy'], ascending=False)
df_rvf_red

Unnamed: 0,Accuracy,C,gamma
6,0.902083,10.0,0.01
13,0.902083,1000.0,0.001
14,0.902083,1000.0,0.0001
9,0.9,100.0,0.01
10,0.897917,100.0,0.001
0,0.895833,0.1,0.01
1,0.895833,0.1,0.001
2,0.895833,0.1,0.0001
3,0.895833,1.0,0.01
4,0.895833,1.0,0.001


* The parameters that gives the best performance for Red Wine are:

|     C     |   gamma   | Accuracy |
| :------- | :------- | :------ |
|    10.0   |    0.01   | 0.902083 |
|   1000.0  |   0.001   | 0.902083 |
|   1000.0  |   0.0001  | 0.902083 |


### SVM for White Wine

##### Best SVM: when using `rbf` kernel

In [17]:
rvf_white = []


for i in C:    
    for j in gamma:
        clf_white = SVC(C=i, kernel='rbf', gamma=j)
        clf_white.fit(X_train_white, y_train_white)
        y_pred_white = clf_white.predict(X_test_white)
        #print('-----------------------------------------------')
        #print('Using parameters C: ' + str(i) + " and gamma: " + str(j))
        #print('Accuracy: {:.4f}'.format(accuracy_score(y_test_white, y_pred_white)))
        rvf_white.append({'C': i, 'gamma': j,'Accuracy': accuracy_score(y_test_white, y_pred_white)})

df_rvf_white = pd.DataFrame(rvf_white)
df_rvf_white = df_rvf_white.sort_values(['Accuracy'], ascending=False)
df_rvf_white

Unnamed: 0,Accuracy,C,gamma
12,0.821088,1000.0,0.01
9,0.818367,100.0,0.01
6,0.813605,10.0,0.01
13,0.810204,1000.0,0.001
3,0.797279,1.0,0.01
10,0.797279,100.0,0.001
0,0.787075,0.1,0.01
1,0.787075,0.1,0.001
2,0.787075,0.1,0.0001
4,0.787075,1.0,0.001


* The parameters that gives the best performance for White Wine are:

|     C     |   gamma   | Accuracy |
| :------- | :------- | :------ |
|    1000.0   |    0.01   | 0.821088 |

# Exercise 6.5

Compare the results with other methods

### Logit for Red Wine

In [18]:
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(solver='liblinear')
logreg.fit(X_train_red, y_train_red)
y_pred_log_red = logreg.predict(X_test_red)
print('Accuracy of logistic regression classifier on test set: {:.4f}'.format(accuracy_score(y_test_red, y_pred_log_red)))

Accuracy of logistic regression classifier on test set: 0.8979


* For Red Wine, SVM model is better than Logit model in terms of accuracy:

|     Method  | Accuracy |
| :------- | :------- |
|    SVM   |    0.9020   |
|    Logit   |    0.8979   |

### Logit for White Wine

In [19]:
logreg = LogisticRegression(solver='liblinear')
logreg.fit(X_train_white, y_train_white)
y_pred_log_white = logreg.predict(X_test_white)
print('Accuracy of logistic regression classifier on test set: {:.4f}'.format(accuracy_score(y_test_white, y_pred_log_white)))

Accuracy of logistic regression classifier on test set: 0.7932


* For White Wine, SVM model is better than Logit model in terms of accuracy:

|     Method  | Accuracy |
| :------- | :------- |
|    SVM   |    0.8210   |
|    Logit   |    0.7932   |

# Regularization

# Exercise 6.6


* Train a linear regression to predict wine quality (Continous)

* Analyze the coefficients

* Evaluate the RMSE

### Linear Regression for Red Wine

In [20]:
# examine the response variable
data_red['quality'].describe()

count    1599.000000
mean        5.636023
std         0.807569
min         3.000000
25%         5.000000
50%         6.000000
75%         6.000000
max         8.000000
Name: quality, dtype: float64

In [21]:
#Create X and y
X_red_lin = data_red[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values

y_red_lin = data_red['quality'].values
X_train_red_lin, X_test_red_lin, y_train_red_lin, y_test_red_lin = train_test_split(X_red_lin, y_red_lin, test_size=0.3, random_state=0)

# build a linear regression model
from sklearn.linear_model import LinearRegression
linreg_red = LinearRegression()
linreg_red.fit(X_train_red_lin, y_train_red_lin)

# analyze the coefficients
print(linreg_red.coef_)

[ 2.02362546e-02 -1.21385635e+00 -9.84560496e-02  2.21024824e-02
 -1.89761853e+00  1.99433159e-03 -3.00386885e-03 -1.67249074e+01
 -3.97705407e-01  8.54179474e-01  2.67506351e-01]


*  For every 1-unit increase in $\beta_1, \beta_4, \beta_6, \beta_{10}, \beta_{11}$ the outcome variable will increase by the beta coefficient value.

*  For every 1-unit increase in $\beta_2, \beta_3, \beta_5, \beta_7, \beta_8, \beta_9$ the outcome variable will decrease by the beta coefficient value. 

In [22]:
# make predictions
y_pred_red_lin = linreg_red.predict(X_test_red_lin)

# calculate RMSE
print(np.sqrt(metrics.mean_squared_error(y_test_red_lin, y_pred_red_lin)))

0.6330721652193952


### Linear Regression for White Wine

In [23]:
# examine the response variable
data_white['quality'].describe()

count    4898.000000
mean        5.877909
std         0.885639
min         3.000000
25%         5.000000
50%         6.000000
75%         6.000000
max         9.000000
Name: quality, dtype: float64

In [24]:
#Create X and y
X_white_lin = data_white[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values

y_white_lin = data_white['quality'].values
X_train_white_lin, X_test_white_lin, y_train_white_lin, y_test_white_lin = train_test_split(X_white_lin, y_white_lin, test_size=0.3, random_state=0)

# build a linear regression model
linreg_white = LinearRegression()
linreg_white.fit(X_train_white_lin, y_train_white_lin)

# analyze the coefficients
print(linreg_white.coef_)

[ 1.15989956e-01 -1.80466188e+00 -1.50084077e-02  9.67717990e-02
 -4.22336380e-01  5.11799045e-03 -2.40509546e-04 -2.01070812e+02
  8.69262672e-01  6.58085048e-01  1.28438281e-01]


*  For every 1-unit increase in $\beta_1, \beta_4, \beta_6, \beta_9, \beta_{10}, \beta_{11}$ the outcome variable will increase by the beta coefficient value.

*  For every 1-unit increase in $\beta_2, \beta_3, \beta_5, \beta_7, \beta_8$ the outcome variable will decrease by the beta coefficient value. 

In [25]:
# make predictions
y_pred_white_lin = linreg_white.predict(X_test_white_lin)

# calculate RMSE
print(np.sqrt(metrics.mean_squared_error(y_test_white_lin, y_pred_white_lin)))

0.7797679548192903


# Exercise 6.7

* Estimate a ridge regression with alpha equals 0.1 and 1.
* Compare the coefficients with the linear regression
* Evaluate the RMSE

### Ridge Regression for Red Wine

In [26]:
from sklearn.linear_model import Ridge

alpha_r = [0.1, 1]
l=[]

for i in alpha_r:
    ridgereg_red = Ridge(alpha=i, normalize=True)
    ridgereg_red.fit(X_train_red_lin, y_train_red_lin)
    print('-----------------------------')
    print('Coefficients for alpha {}: \n \n {}'.format(i, ridgereg_red.coef_))
    y_pred_red_ridge = ridgereg_red.predict(X_test_red_lin)  
    print('\n RMSE: {}'.format(np.sqrt(metrics.mean_squared_error(y_test_red_lin, y_pred_red_ridge)))) 

-----------------------------
Coefficients for alpha 0.1: 
 
 [ 2.68276389e-02 -1.08428984e+00  5.65341014e-02  2.36543153e-02
 -1.76994118e+00  1.12334507e-03 -2.71388369e-03 -3.00009155e+01
 -2.54181268e-01  7.95554880e-01  2.33458671e-01]

 RMSE: 0.6344194631893589
-----------------------------
Coefficients for alpha 1: 
 
 [ 1.83712556e-02 -6.75208907e-01  2.49195784e-01  1.00770556e-02
 -1.01625393e+00 -8.36569442e-04 -1.63673406e-03 -2.68060154e+01
 -1.00168476e-01  4.58489960e-01  1.41011129e-01]

 RMSE: 0.6569663486954752


- Compared to Linear regression, Ridge regression **shrinks coefficients toward zero, but they rarely reach zero.**
- Best RSME found when using alpha: 0.1

### Ridge Regression for White Wine

In [27]:
for i in alpha_r:
    ridgereg_white = Ridge(alpha=i, normalize=True)
    ridgereg_white.fit(X_train_white_lin, y_train_white_lin)
    print('-----------------------------')
    print('Coefficients for alpha {}: \n \n {}'.format(i, ridgereg_white.coef_))
    y_pred_white_ridge = ridgereg_white.predict(X_test_white_lin)  
    print('\n RMSE: {}'.format(np.sqrt(metrics.mean_squared_error(y_test_white_lin, y_pred_white_ridge))))

-----------------------------
Coefficients for alpha 0.1: 
 
 [-8.48885792e-03 -1.63467861e+00  9.50634401e-03  3.42636009e-02
 -1.75610555e+00  5.50671129e-03 -8.39582620e-04 -5.13078081e+01
  3.09478498e-01  4.17468570e-01  2.52495787e-01]

 RMSE: 0.7797233024278934
-----------------------------
Coefficients for alpha 1: 
 
 [-2.72539154e-02 -8.57988875e-01  5.04142556e-02  6.03593151e-03
 -2.45762035e+00  2.73257813e-03 -7.31960665e-04 -2.58883533e+01
  1.61366105e-01  2.21480222e-01  1.28845959e-01]

 RMSE: 0.812034744833556


- Compared to Linear regression, Ridge regression **shrinks coefficients toward zero, but they rarely reach zero.**
- Best RSME found when using alpha: 0.1

# Exercise 6.8

* Estimate a lasso regression with alpha equals 0.01, 0.1 and 1.
* Compare the coefficients with the linear regression
* Evaluate the RMSE

### Lasso Regression for Red Wine

In [28]:
from sklearn.linear_model import Lasso

alpha_l = [0.01, 0.1, 1]

for i in alpha_l:
    lassoreg_red = Lasso(alpha=i, normalize=True)
    lassoreg_red.fit(X_train_red_lin, y_train_red_lin)
    print('-----------------------------')
    print('Coefficients for alpha {}: \n \n {}'.format(i, lassoreg_red.coef_))
    y_pred_red_lasso = lassoreg_red.predict(X_test_red_lin)
    print('\n RMSE: {}'.format(np.sqrt(metrics.mean_squared_error(y_test_red_lin, y_pred_red_lasso))))

-----------------------------
Coefficients for alpha 0.01: 
 
 [ 0.         -0.03242701  0.          0.         -0.         -0.
 -0.         -0.         -0.          0.          0.0421808 ]

 RMSE: 0.7463573531910375
-----------------------------
Coefficients for alpha 0.1: 
 
 [ 0. -0.  0.  0. -0. -0. -0. -0. -0.  0.  0.]

 RMSE: 0.7698374031730867
-----------------------------
Coefficients for alpha 1: 
 
 [ 0. -0.  0.  0. -0. -0. -0. -0. -0.  0.  0.]

 RMSE: 0.7698374031730867


- Compared to Linear regresion, Lasso regression **shrinks coefficients all the way to zero, thus removing them from the model**
- Best RSME found when using alpha: 0.1

### Lasso Regression for White Wine

In [29]:
for i in alpha_l:
    lassoreg_white = Lasso(alpha=i, normalize=True)
    lassoreg_white.fit(X_train_white_lin, y_train_white_lin)
    print('-----------------------------')
    print('Coefficients for alpha {}: \n \n {}'.format(i, lassoreg_white.coef_))
    y_pred_white_lasso = lassoreg_white.predict(X_test_white_lin)
    print('\n RMSE: {}'.format(np.sqrt(metrics.mean_squared_error(y_test_white_lin, y_pred_white_lasso))))

-----------------------------
Coefficients for alpha 0.01: 
 
 [-0. -0. -0. -0. -0.  0. -0. -0.  0.  0.  0.]

 RMSE: 0.9016157829038678
-----------------------------
Coefficients for alpha 0.1: 
 
 [-0. -0. -0. -0. -0.  0. -0. -0.  0.  0.  0.]

 RMSE: 0.9016157829038678
-----------------------------
Coefficients for alpha 1: 
 
 [-0. -0. -0. -0. -0.  0. -0. -0.  0.  0.  0.]

 RMSE: 0.9016157829038678


- Compared to Linear regresion, Lasso regression **shrinks coefficients all the way to zero, thus removing them from the model**
- Same RSME when using alpha: 0.01, 0.1, 1

# Exercise 6.9

* Create a binary target

* Train a logistic regression to predict wine quality (binary)

* Analyze the coefficients

* Evaluate the f1score

### Logistic Regression for Red Wine

In [30]:
#Create X and y
X_red_log = data_red[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values

y_red_log = data_red['quality2'].values
X_train_red_log, X_test_red_log, y_train_red_log, y_test_red_log = train_test_split(X_red_log, y_red_log, test_size=0.3, random_state=0)

#train a logistic regression
logreg = LogisticRegression(solver='liblinear')
logreg.fit(X_train_red_log, y_train_red_log)
print('Coefficients: \n \n {}'.format(logreg.coef_))
y_pred_log_red = logreg.predict(X_test_red_log)
print('\n F1 Score: {:.4f}'.format(f1_score(y_test_red_log, y_pred_log_red)))

Coefficients: 
 
 [[-0.07774469 -3.32368312  0.37914938  0.09061365 -1.28169342  0.00812384
  -0.01440026 -1.35202529 -2.16117238  1.79127935  0.86386463]]

 F1 Score: 0.4304


### Logistic Regression for White Wine

In [31]:
#Create X and y
X_white_log = data_white[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values

y_white_log = data_white['quality2'].values
X_train_white_log, X_test_white_log, y_train_white_log, y_test_white_log = train_test_split(X_white_log, y_white_log, test_size=0.3, random_state=0)

#train a logistic regression
logreg = LogisticRegression(solver='liblinear')
logreg.fit(X_train_white_log, y_train_white_log)
print('Coefficients: \n \n {}'.format(logreg.coef_))
y_pred_log_white = logreg.predict(X_test_white_log)
print('\n F1 Score: {:.4f}'.format(f1_score(y_test_white_log, y_pred_log_white)))


Coefficients: 
 
 [[-1.65057680e-01 -3.40423726e+00 -6.12138324e-01  3.87768967e-02
  -1.20499079e+00  1.32155547e-02 -3.67968096e-03 -3.78662284e+00
  -3.40534311e-01  9.21998559e-01  8.32448223e-01]]

 F1 Score: 0.3052


# Exercise 6.10

* Estimate a regularized logistic regression using:
* C = 0.01, 0.1 & 1.0
* penalty = ['l1, 'l2']
* Compare the coefficients and the f1score

### Regularized Logistic Regression for Red Wine

In [32]:
import warnings
warnings.filterwarnings('ignore')

C = [0.01, 0.1, 1.0]
penalty = ['l1','l2']

for i in C:
    for j in penalty:
        logreg = LogisticRegression(C=i, penalty=j,solver='liblinear',multi_class='auto')
        logreg.fit(X_train_red, y_train_red)
        print('----------------------')
        print('With C: {} and penalty: {}'.format(i, j))
        print('Coefficients: \n \n {}'.format(logreg.coef_))
        y_pred_red = logreg.predict(X_test_red)
        print('\n F1 Score: {:.4f}'.format(f1_score(y_test_red, y_pred_red)))

----------------------
With C: 0.01 and penalty: l1
Coefficients: 
 
 [[ 0.         -0.07507561  0.          0.          0.          0.
   0.          0.          0.          0.          0.30100863]]

 F1 Score: 0.0000
----------------------
With C: 0.01 and penalty: l2
Coefficients: 
 
 [[ 0.09817991 -0.25099     0.13390968  0.08485128 -0.11688921 -0.05023255
  -0.11901624 -0.16357189  0.00654934  0.2020224   0.38594502]]

 F1 Score: 0.3951
----------------------
With C: 0.1 and penalty: l1
Coefficients: 
 
 [[ 0.09899496 -0.67399756  0.          0.05314406 -0.13158183  0.
  -0.20342301  0.          0.          0.35136791  0.88032864]]

 F1 Score: 0.4096
----------------------
With C: 0.1 and penalty: l2
Coefficients: 
 
 [[ 0.33010525 -0.51765219  0.13033159  0.22794832 -0.24651697 -0.019544
  -0.27352367 -0.3594511   0.10830125  0.44025918  0.64887849]]

 F1 Score: 0.4286
----------------------
With C: 1.0 and penalty: l1
Coefficients: 
 
 [[ 0.54494397 -0.70662604  0.04554044  0.32

* When l1 is used as penalty, coefficients are shrink all the way to zero, thus removing them from the model.
* When l2 is used, coefficients rarely reach zero.
* Best F1 score is reached at C: 1.0 for both type of penalties: l1 and l2

### Regularized Logistic Regression for White Wine

In [33]:
for i in C:
    for j in penalty:
        logreg = LogisticRegression(C=i, penalty=j,solver='liblinear',multi_class='auto')
        logreg.fit(X_train_white, y_train_white)
        print('----------------------')
        print('With C: {} and penalty: {}'.format(i, j))
        print('Coefficients: \n \n {}'.format(logreg.coef_))
        y_pred_white = logreg.predict(X_test_white)
        print('\n F1 Score: {:.4f}'.format(f1_score(y_test_white, y_pred_white)))

----------------------
With C: 0.01 and penalty: l1
Coefficients: 
 
 [[ 0.         -0.09848937  0.          0.          0.          0.
   0.          0.          0.          0.          0.7345353 ]]

 F1 Score: 0.2000
----------------------
With C: 0.01 and penalty: l2
Coefficients: 
 
 [[ 0.03167651 -0.22754801 -0.02581894  0.24660605 -0.19922089  0.15687366
  -0.07710349 -0.27086948  0.1423239   0.10611144  0.64025844]]

 F1 Score: 0.3019
----------------------
With C: 0.1 and penalty: l1
Coefficients: 
 
 [[ 0.04183204 -0.37978547 -0.04107979  0.38271498 -0.26697115  0.1633712
  -0.01898612 -0.25568514  0.17809811  0.12445252  0.96222614]]

 F1 Score: 0.3251
----------------------
With C: 0.1 and penalty: l2
Coefficients: 
 
 [[ 0.20299939 -0.38596816 -0.07504914  0.71718954 -0.28074904  0.19782331
  -0.05390251 -0.75416188  0.29712256  0.16813933  0.72992002]]

 F1 Score: 0.3229
----------------------
With C: 1.0 and penalty: l1
Coefficients: 
 
 [[ 0.41223831 -0.41503525 -0.07873

* When l1 is used as penalty, coefficients are shrink all the way to zero, thus removing them from the model.
* When l2 is used, coefficients rarely reach zero.
* Best F1 score is reached at C: 1.0 and penalty: l1