# Solvers ⚙️

In this exercise, you will investigate the effects of different `solvers` on `LogisticRegression` models.

👇 Run the code below to import the dataset

In [1]:
import pandas as pd

df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers_dataset.csv")
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3


- The dataset consists of different wines 🍷
- The features describe different properties of the wines 
- The target 🎯 is a quality rating given by an expert

## 1. Target engineering

In this section, you are going to transform the ratings into a binary target.

👇 How many observations are there for each rating?

In [2]:
# Count the number of observations for each rating
rating_counts = df['quality rating'].value_counts()

rating_counts

10    10143
5     10124
1     10090
2     10030
8      9977
6      9961
9      9955
7      9954
4      9928
3      9838
Name: quality rating, dtype: int64

❓ Create `y` by transforming the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]

In [3]:
# Create a binary classification target based on quality ratings
# Ratings below 6 are considered 'bad' (0), and 6 and above are 'good' (1)

df['y'] = df['quality rating'].apply(lambda x: 0 if x < 6 else 1)

# Display the first few rows to confirm the transformation
df[['quality rating', 'y']].head()


Unnamed: 0,quality rating,y
0,6,1
1,7,1
2,4,0
3,8,1
4,3,0


❓ Check the class balance of the new binary target

In [4]:
# Check the class balance of the new binary target
class_balance = df['y'].value_counts(normalize=True)

class_balance


0    0.5001
1    0.4999
Name: y, dtype: float64

❓ Create your `X` by normalising the features. This will allow for fair comparison of different solvers.

In [5]:
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating,y
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6,1
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7,1
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4,0
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8,1
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3,0


In [6]:
from sklearn.preprocessing import StandardScaler

# Features to be normalized (excluding the target and the binary target)
features = df.columns[:-2]

# Initialize a StandardScaler object for normalization
scaler = StandardScaler()

# Apply the normalization to the feature columns
X = scaler.fit_transform(df[features])


In [7]:
# Convert back to DataFrame to show a sample
normalized_df = pd.DataFrame(X, columns=features)
normalized_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,-0.788603,-1.528461,-1.73318,0.46113,-1.526653,-0.852381,-0.221393,-0.478387,0.340231,-0.489833
1,-0.34686,-0.462069,-0.15829,-0.783868,0.117066,-3.102634,-0.301357,-0.986972,-0.769429,-0.489833
2,0.064417,0.232757,0.550411,1.017553,-0.439117,0.423432,1.45785,1.246811,1.339925,-0.307387
3,0.376684,-0.610695,-1.103224,0.099454,-1.015163,0.92572,-3.030126,1.06731,0.660133,-0.426679
4,1.229704,1.247129,0.228871,0.308113,0.583862,-0.621328,0.218409,-0.957055,-2.318954,1.320591


## 2. LogisticRegression solvers

❓ Logistic Regression models can be optimized using different **solvers**. Make a comparison of the available solvers':
- Fit time - which solver is **the fastest**?
- Precision - **how different** are their respective precision scores?

Available solvers for Logistic Regression are `['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']`
 
For more information on these 5 solvers, check out [this Stack Overflow thread](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)

In [8]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score
from sklearn.model_selection import train_test_split
import time

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, df['y'], test_size=0.3, random_state=42)

# List of available solvers
solvers = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']

# Dictionary to store fit times and precision scores for each solver
solver_stats = {}

# Iterate over each solver, train a logistic regression model, and record fit time and precision
for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=1000, random_state=42)
    start_time = time.time()
    model.fit(X_train, y_train)
    fit_time = time.time() - start_time
    y_pred = model.predict(X_test)
    precision = precision_score(y_test, y_pred)
    solver_stats[solver] = {'fit_time': fit_time, 'precision': precision}

solver_stats


{'newton-cg': {'fit_time': 0.6189272403717041,
  'precision': 0.8779372010813059},
 'lbfgs': {'fit_time': 0.12888598442077637, 'precision': 0.8779456612143055},
 'liblinear': {'fit_time': 0.05490422248840332,
  'precision': 0.8779372010813059},
 'sag': {'fit_time': 0.5510060787200928, 'precision': 0.8779372010813059},
 'saga': {'fit_time': 1.1104681491851807, 'precision': 0.8779372010813059}}

In [9]:
# YOUR ANSWER
fastest_solver = "lbfgs"
fastest_solver

'lbfgs'

<details>
    <summary>ℹ️ Click here for our interpretation</summary>

All solvers should produce similar precision scores because our cost-function is "easy" enough to have a global minimum which is found by all 5 solvers. For very complex cost-functions such as in Deep Learning, different solvers may stopping at different values of the loss function.

**The wine dataset**
    
If you check feature importance with sklearn's <a href="https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html">permutation_importance</a> on the current dataset, you'll see many features result in almost 0 importance. Liblinear solver successively moves only along *one* direction at a time, regularizing the others with L1 regularization (a.k.a, setting their beta to 0), which might provide a good fit for a dataset where many features are not that important in predicting the target.

❗️There is a cost to searching for the best solver. Sticking with the default (`lbfgs`) may save the most time overall, sklearn provides you this grid for an idea of which solver to choose to start off with: 

<img src="https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers-chart.png" width=700>



</details> 

In [10]:
from sklearn.inspection import permutation_importance

# Use the fastest solver ('lbfgs') to train a model
model = LogisticRegression(solver='lbfgs', max_iter=1000, random_state=42)
model.fit(X_train, y_train)

# Calculate feature importance using permutation importance
perm_importance = permutation_importance(model, X_test, y_test, n_repeats=10, random_state=42)

# Create a DataFrame for easier visualization
importance_df = pd.DataFrame({
    'Feature': features,
    'Importance': perm_importance.importances_mean,
    'Std': perm_importance.importances_std
}).sort_values(by='Importance', ascending=False)

importance_df


Unnamed: 0,Feature,Importance,Std
2,citric acid,0.12185,0.001748
1,volatile acidity,0.071927,0.001635
0,fixed acidity,0.036703,0.001081
4,chlorides,0.01311,0.000923
3,residual sugar,0.010357,0.000947
9,alcohol,0.000373,0.00041
8,sulphates,0.000203,0.000279
7,density,8.7e-05,0.000242
6,total sulfur dioxide,5.3e-05,0.000192
5,free sulfur dioxide,-3.7e-05,9e-05


The features with the highest importance scores are "citric acid," "volatile acidity," and "fixed acidity."

###  🧪 Test your code

In [11]:
from nbresult import ChallengeResult

result = ChallengeResult(
    'solvers',
    fastest_solver=fastest_solver
)
result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/ramzimalhas/.pyenv/versions/3.10.6/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /Users/ramzimalhas/code/ramzimalhas/05-ML/04-Under-the-hood/data-solvers/tests
plugins: asyncio-0.19.0, anyio-3.7.1, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_solvers.py::TestSolvers::test_fastest_solver [32mPASSED[0m[32m                 [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/solvers.pickle

[32mgit[39m commit -m [33m'Completed solvers step'[39m

[32mgit[39m push origin master



## 3. Stochastic Gradient Descent

Logistic Regression models can also be optimized via Stochastic Gradient Descent.

❓ Evaluate a Logistic Regression model optimized via **Stochastic Gradient Descent**. How do its precision score and training time compare to the performance of the models trained in section 2?


<details>
<summary>💡 Hint</summary>

- If you are stuck, look at the [SGDClassifier doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)!

</details>



In [12]:
from sklearn.linear_model import SGDClassifier

# Initialize the SGD classifier for logistic regression
sgd_model = SGDClassifier(loss='log_loss', random_state=42)

# Record the start time
start_time = time.time()

# Train the model
sgd_model.fit(X_train, y_train)

# Calculate the fit time
sgd_fit_time = time.time() - start_time

# Predict on the test set
sgd_y_pred = sgd_model.predict(X_test)

# Calculate the precision score
sgd_precision = precision_score(y_test, sgd_y_pred)


results_message = f"""
The logistic regression model optimized via Stochastic Gradient Descent (SGD) using the correct loss function (log_loss) achieved:

Fit Time: {sgd_fit_time:.2f} seconds
Precision Score: {sgd_precision:.4f}

"""
print(results_message)


# Create a DataFrame from the solver_stats dictionary
solver_df = pd.DataFrame(solver_stats).T
solver_df.reset_index(inplace=True)
solver_df.columns = ['Solver', 'Fit Time (seconds)', 'Precision']

# Append SGD model results to the DataFrame
sgd_stats = pd.DataFrame([{'Solver': 'sgd', 'Fit Time (seconds)': sgd_fit_time, 'Precision': sgd_precision}])
solver_df = pd.concat([solver_df, sgd_stats], ignore_index=True)

solver_df




The logistic regression model optimized via Stochastic Gradient Descent (SGD) using the correct loss function (log_loss) achieved:

Fit Time: 0.12 seconds
Precision Score: 0.8825




Unnamed: 0,Solver,Fit Time (seconds),Precision
0,newton-cg,0.618927,0.877937
1,lbfgs,0.128886,0.877946
2,liblinear,0.054904,0.877937
3,sag,0.551006,0.877937
4,saga,1.110468,0.877937
5,sgd,0.11942,0.882514


☝️ The SGD model should have one of the shortest times (maybe even shorter than `liblinear`), for similar performance. This is a direct effect of performing each epoch of the Gradient Descent on a single row as opposed to loading 100k rows into memory at a time.

## 4. Predictions

❓ Use the best model (balanced with short fit time and high precision) to predict the binary quality (0 or 1) of the following wine. Store your:
- `predicted_class`
- `predicted_proba_of_class` (i.e if your model predicted a class of 1 what is the probability it believes 1 to be the class should be between 0 and 1)

In [13]:
new_wine = pd.read_csv('https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers_new_wine.csv')
new_wine

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17


In [14]:
# Normalize the new wine features using the previously fitted scaler
new_wine_normalized = scaler.transform(new_wine)

# Predict the class and probability using the best model (SGD in this case)
predicted_class = sgd_model.predict(new_wine_normalized)[0]
predicted_proba_of_class = sgd_model.predict_proba(new_wine_normalized)[0][predicted_class]

# Formatting the prediction information
answer_message = f"""
For the new wine sample:

Predicted Class: {predicted_class} ({'bad quality wine' if predicted_class == 0 else 'good quality wine'})
Predicted Probability of Class {predicted_class}: {predicted_proba_of_class:.4f}
This model (optimized via SGD) is quite confident that the wine belongs to class {predicted_class} with a probability of {predicted_proba_of_class:.4%}.
"""

print(answer_message)



For the new wine sample:

Predicted Class: 0 (bad quality wine)
Predicted Probability of Class 0: 0.9695
This model (optimized via SGD) is quite confident that the wine belongs to class 0 with a probability of 96.9545%.



# 🏁  Check your code and push your notebook

In [15]:
from nbresult import ChallengeResult

result = ChallengeResult(
    'new_data_prediction',
    predicted_class=predicted_class,
    predicted_proba_of_class=predicted_proba_of_class
)
result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/ramzimalhas/.pyenv/versions/3.10.6/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /Users/ramzimalhas/code/ramzimalhas/05-ML/04-Under-the-hood/data-solvers/tests
plugins: asyncio-0.19.0, anyio-3.7.1, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 2 items

test_new_data_prediction.py::TestNewDataPrediction::test_predicted_class [32mPASSED[0m[32m [ 50%][0m
test_new_data_prediction.py::TestNewDataPrediction::test_predicted_proba [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/new_data_prediction.pickle

[32mgit[39m commit -m [33m'Completed new_data_prediction step'[39m

[32mgit[39m push origin master

