# Solvers ⚙️

In this exercice, you will investigate the effects of different `solvers` on `LogisticRegression` models.

👇 Run the code below

In [73]:
import pandas as pd

df = pd.read_csv("data.csv")

df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3


- The dataset consists of different wines 🍷
- The features describe different properties of the wines 
- The target 🎯 is a quality rating given by an expert

## 1. Target engineering

In this section, you are going to transform the ratings into a binary target.

👇 How many observations are there for each rating?

In [74]:
df['quality rating'].value_counts()

10    10143
5     10124
1     10090
2     10030
8      9977
6      9961
9      9955
7      9954
4      9928
3      9838
Name: quality rating, dtype: int64

👇 Transform the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]. Create your target `y` pandas Series

In [75]:
df['quality rating'] = pd.cut(df['quality rating'], bins = [0, 5, 10], labels = ['bad', 'good'])

In [76]:
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,good
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,good
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,bad
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,good
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,bad


👇 Check the class balance of the new binary target

In [77]:
df['quality rating'].value_counts()

bad     50010
good    49990
Name: quality rating, dtype: int64

👇 Scale the features

In [78]:
from sklearn.preprocessing import MinMaxScaler

# Instanciate Scaler
scaler = MinMaxScaler()

# Transform features
X_scaled = scaler.fit_transform(df.drop(columns = 'quality rating'))

## 2. LogisticRegression solvers

👇 Logistic Regression models can be optimized using different **solvers**. Find out which solver produces:
- The best precision score
- The shortest training time

In [79]:
# Encode the target
from sklearn.preprocessing import LabelEncoder
#Instantiate the encoder
le = LabelEncoder()
#Fit the encoder on the required columns
df['quality rating'] = le.fit_transform(df['quality rating'])

In [80]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate

solver_list = ['sag, saga','newton-cg', 'lbfgs','liblinear']
Log_sag = LogisticRegression(solver = 'sag')

# 10-Fold Cross validate model
cv_results = cross_validate(Log_sag, X_scaled, df['quality rating'], cv=10, scoring=['precision','f1'])
cv_results

{'fit_time': array([0.37821984, 0.45716763, 0.58177686, 0.45120931, 0.50738144,
        0.56487823, 0.45089602, 0.54979753, 0.74216652, 0.49674511]),
 'score_time': array([0.00977063, 0.01081896, 0.00942969, 0.01033473, 0.01091027,
        0.00976944, 0.01328778, 0.01306319, 0.00990224, 0.00974274]),
 'test_precision': array([0.87923817, 0.87283719, 0.87823186, 0.86896409, 0.87206646,
        0.87769934, 0.87849687, 0.87030928, 0.87866281, 0.86563518]),
 'test_f1': array([0.85936381, 0.85483871, 0.86003063, 0.85545962, 0.85571632,
        0.86133469, 0.85974053, 0.85714286, 0.86500762, 0.85803653])}

In [88]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate

#create a DF to store results
solver_list = ['sag', 'saga','newton-cg', 'lbfgs','liblinear']
# solver_list = ['sag']
results = pd.DataFrame(solver_list , columns = ['solver'])

for i, solver in enumerate(solver_list):
    Log = LogisticRegression(solver = solver)
    cv_results = cross_validate(Log, X_scaled,\
                                df['quality rating'], cv=10, scoring=['precision','f1'])
    results.loc[i,'precision'] = cv_results['test_precision'].mean()
    results.loc[i,'f1'] = cv_results['test_f1'].mean()
    results.loc[i,'fit time'] = cv_results['fit_time'].mean()
    results.loc[i,'score time'] = cv_results['score_time'].mean()
results

Unnamed: 0,solver,precision,f1,fit time,score time
0,sag,0.874232,0.858676,0.51234,0.01044
1,saga,0.874237,0.858699,0.906712,0.010447
2,newton-cg,0.874237,0.858699,0.374113,0.009819
3,lbfgs,0.874234,0.858687,0.436694,0.010446
4,liblinear,0.87434,0.858449,0.252519,0.010792


 The 'newton-cg', 'lbfgs', and 'liblinear' should take way less time to train than the 'sag' and 'saga'. However, all solvers should produce similar f1 scores.
 
For more information on solvers, check out [this stackoverflow thread](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)

## 3. Stochastic Gradient Descent

Logistic Regression models can also be optimized via Stochastic Gradient Descent.

👇 Evaluate a Logistic Regression model optimized via **Stochastic Gradient Descent**. How do its precision score and training time compare to the performance of the models trained in section 2.?


<details>
<summary>💡 Hint</summary>

- Logistic Regression models optimized by Stochastic Gradient Descent can be trained using [`SGDClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html) by chosing the appropriate loss function

</details>



In [86]:
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_validate
solver_list = ['SGD reg']
results2 = pd.DataFrame(solver_list , columns = ['algo'])
Log_SGD = SGDClassifier(loss = 'log')
cv_results = cross_validate(Log_SGD, X_scaled,\
                                df['quality rating'], cv=10, scoring=['precision','f1'])
results2.loc[0,'precision'] = cv_results['test_precision'].mean()
results2.loc[0,'f1'] = cv_results['test_f1'].mean()
results2.loc[0,'fit time'] = cv_results['fit_time'].mean()
results2.loc[0,'score time'] = cv_results['score_time'].mean()

In [89]:
all_results = pd.concat([results,results2])
all_results

Unnamed: 0,solver,precision,f1,fit time,score time,algo
0,sag,0.874232,0.858676,0.51234,0.01044,
1,saga,0.874237,0.858699,0.906712,0.010447,
2,newton-cg,0.874237,0.858699,0.374113,0.009819,
3,lbfgs,0.874234,0.858687,0.436694,0.010446,
4,liblinear,0.87434,0.858449,0.252519,0.010792,
0,,0.88053,0.856757,0.166008,0.01003,SGD reg


The SGD model should have a shorter training for similar performance. This is a direct effect of performing each epoch of the Gradient Descent on a single data point.

## 4. Predictions

👇 Use the best model to predict the quality of the following wine


<details>
    <summary>💡 Hint </summary>

- Since all solvers produce similar precision scores, you should pick the one that trains fastest
</details>

In [94]:
new_data = pd.read_csv('new_data.csv')

# Scale using original scaler
new_scaled = scaler.transform(new_data)
# # Train a model on the scaled data
model = Log_SGD.fit(X_scaled,df['quality rating'])

# Predict
model.predict(new_scaled)[0]
model.predict_proba(new_scaled)[0]

array([0.95629392, 0.04370608])

### ⚠️ Please, push your exercice when you are done 🙃

# 🏁 