Training and Testing a Simple Logistic Regression model on Wine Dataset

In this notebook, there will be training and testing for a 70-30 ratio. There will be 3 penalty parameters like "none", "l1" and "l2" for difference regularization parameters C.  

In [28]:
# imports 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn as sk
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Preprocess Data here with standardizing and scaling features as well as train/test split of 70/30

In [35]:
# load the csv file red wine data set.
df = pd.read_csv("wine_dataset.csv")

# preprocess data and encode type of data
label_encoder = LabelEncoder()
df['style'] = label_encoder.fit_transform(df['style'])

# X should be all the other columns minus the style which 'style' is target column
X = df.drop(columns=['style'])
y = df['style']

# split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, train_size=.7, random_state = 1, stratify = y) 

Part A. Fit 7 different versions of logistic regression models where penalty is none, L1 and L2 along with different C values 

In [36]:
# first model is logistic regression with default parameters and no regularization
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print('accuracy_score for logistic regression with no penalty:', accuracy_score(y_test, y_pred))

# second model is logistic regression with L1 penalty
logreg2 = LogisticRegression(penalty = 'l1', solver = 'liblinear', C=1)
logreg2.fit(X_train, y_train)
y_pred2 = logreg2.predict(X_test)
print('accuracy_score for logistic regression with L1 penalty:', accuracy_score(y_test, y_pred2))

# third model is logistic regression with L1 penalty and C=0.1
logreg3 = LogisticRegression(penalty = 'l1', solver = 'liblinear', C=0.1)
logreg3.fit(X_train, y_train)
y_pred3 = logreg3.predict(X_test)
print('accuracy_score for logistic regression with L1 penalty and C=0.1:', accuracy_score(y_test, y_pred3))

# fourth model is logistic regression with L1 penalty and C = 100
logreg4 = LogisticRegression(penalty = 'l1', solver = 'liblinear', C=100)
logreg4.fit(X_train, y_train)
y_pred4 = logreg4.predict(X_test)
print('accuracy_score for logistic regression with L1 penalty and C=100:', accuracy_score(y_test, y_pred4))

# fifth model is logistic regression with L2 penalty
logreg5 = LogisticRegression(penalty = 'l2', solver = 'liblinear', C=1)
logreg5.fit(X_train, y_train)
y_pred5 = logreg5.predict(X_test)
print('accuracy_score for logistic regression with L2 penalty:', accuracy_score(y_test, y_pred5))

# sixth model is logistic regression with L2 penalty and C=0.1
logreg6 = LogisticRegression(penalty = 'l2', solver = 'liblinear', C=0.1)
logreg6.fit(X_train, y_train)
y_pred6 = logreg6.predict(X_test)
print('accuracy_score for logistic regression with L2 penalty and C=0.1:', accuracy_score(y_test, y_pred6))

# seventh model is logistic regression with L2 penalty and C=100
logreg7 = LogisticRegression(penalty = 'l2', solver = 'liblinear', C=100)
logreg7.fit(X_train, y_train)
y_pred7 = logreg7.predict(X_test)
print('accuracy_score for logistic regression with L2 penalty and C=100:', accuracy_score(y_test, y_pred7))







STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


accuracy_score for logistic regression with no penalty: 0.9743589743589743
accuracy_score for logistic regression with L1 penalty: 0.9805128205128205
accuracy_score for logistic regression with L1 penalty and C=0.1: 0.9733333333333334
accuracy_score for logistic regression with L1 penalty and C=100: 0.9871794871794872
accuracy_score for logistic regression with L2 penalty: 0.9758974358974359
accuracy_score for logistic regression with L2 penalty and C=0.1: 0.9666666666666667
accuracy_score for logistic regression with L2 penalty and C=100: 0.9861538461538462


Part B. Calulate L2 norm of the trained weights of model with no regularization 

In [43]:
# get the weights of the no penalty model
weights = logreg.coef_[0]

# calcuate its L2 norm
l2_norm = np.linalg.norm(weights, 2)

print('L2 norm of the weights for logistic regression with no penalty:', l2_norm)

[0.6853496  8.24731058 1.69889098 0.1617846  1.30624021 0.04234366
 0.06281434 1.24339873 1.45310614 6.55438231 0.81554726 0.13315118]
L2 norm of the weights for logistic regression with no penalty: 10.973266367584284


Part C. Choose logistic regression model with penalty as l1 which is highest accuracy and report l2 norm

In [38]:
weights2 = logreg4.coef_[0]

l2_norm1 = np.linalg.norm(weights2, 2)

print('L1 norm of the weights for logistic regression with c = 100 :', l2_norm1)


L1 norm of the weights for logistic regression with c = 100 : 38.35983887039696


Part D. Choose logistic regression model with penalty as l2 which is highest accuracy and report l2 norm

In [39]:
weights3 = logreg7.coef_[0]

l2_norm2 = np.linalg.norm(weights3, 2)

print('L2 penality highest reg and c = 100', l2_norm2)

L2 penality highest reg and c = 100 30.666043579477833


Part E. Count number of zero weights in the three models above. 

In [53]:
weights_log1 = np.abs(logreg.coef_[0])
weights_log4 = np.abs(logreg4.coef_[0])
weights_log7 = np.abs(logreg7.coef_[0])


count1 = 0 
count2 = 0 
count3 = 0

for weight in weights_log1:
   
    if weight <= 0.000001:
        count1 += 1

for weight in weights_log4:
    if weight <= 0.00001:
        count2 += 1

for weight in weights_log7:
    if weight <= 0.00001:
        count3 += 1

print('no penalty',count1)
print('l1 penalty', count2)
print('l2 penalty', count3)

no penalty 0
l1 penalty 0
l2 penalty 0
