# Solvers ⚙️

In this exercise, you will investigate the effects of different `solvers` on `LogisticRegression` models.

👇 Run the code below to import the dataset

In [1]:
import pandas as pd

df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers_dataset.csv")
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3


- The dataset consists of different wines 🍷
- The features describe different properties of the wines 
- The target 🎯 is a quality rating given by an expert

## 1. Target engineering

In this section, you are going to transform the ratings into a binary target.

👇 How many observations are there for each rating?

In [2]:
df["quality rating"].unique()

array([ 6,  7,  4,  8,  3,  1,  2, 10,  5,  9])

In [3]:
import numpy as np

df["quality rating"].astype(np.int32).head()

0    6
1    7
2    4
3    8
4    3
Name: quality rating, dtype: int32

In [4]:
type(df["quality rating"][0])

numpy.int64

❓ Create `y` by transforming the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]

In [5]:
df["quality rating"] = df["quality rating"].map({1:0,2:0,3:0,4:0,5:0,6:1,7:1,8:1,9:1,10:1})
y = df["quality rating"]
y

0        1
1        1
2        0
3        1
4        0
        ..
99995    1
99996    1
99997    0
99998    1
99999    0
Name: quality rating, Length: 100000, dtype: int64

❓ Check the class balance of the new binary target

In [6]:
y.value_counts()

0    50010
1    49990
Name: quality rating, dtype: int64

❓ Create your `X` by normalising the features. This will allow for fair comparison of different solvers.

In [7]:
df.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'sulphates', 'alcohol', 'quality rating'],
      dtype='object')

In [8]:
df['fixed acidity'].astype(np.float).head()

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  df['fixed acidity'].astype(np.float).head()


0     9.47
1    10.05
2    10.59
3    11.00
4    12.12
Name: fixed acidity, dtype: float64

In [9]:
type(df['fixed acidity'][0])

numpy.float64

In [11]:
from sklearn.preprocessing import MinMaxScaler

X = df[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'sulphates', 'alcohol']]


mm_scaler = MinMaxScaler()
X_norm = pd.DataFrame(mm_scaler.fit_transform(X))
X_norm.columns = X.columns

X_norm.head()


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,0.531348,0.285244,0.265966,0.504968,0.229879,0.363248,0.451878,0.432173,0.557503,0.413523
1,0.576803,0.420113,0.459984,0.34327,0.412348,0.123932,0.442488,0.370948,0.435926,0.413523
2,0.619122,0.507989,0.547292,0.577236,0.350606,0.498932,0.649061,0.639856,0.667032,0.432028
3,0.651254,0.401316,0.343573,0.457995,0.286659,0.55235,0.122066,0.618247,0.592552,0.419929
4,0.739028,0.636278,0.50768,0.485095,0.464168,0.387821,0.503521,0.37455,0.266156,0.597153


## 2. LogisticRegression solvers

❓ Logistic Regression models can be optimized using different **solvers**. Make a comparison of the available solvers':
- Fit time - which solver is **the fastest**?
- Precision - **how different** are their respective precision scores?

Available solvers for Logistic Regression are `['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']`
 
For more information on these 5 solvers, check out [this Stack Overflow thread](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score
from sklearn.linear_model import SGDClassifier

In [13]:
from sklearn.model_selection import train_test_split
#
# Create training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.30, random_state=1)

In [14]:
%%time
logisticRegr = LogisticRegression(solver = 'lbfgs')

logisticRegr.fit(X_train, y_train) # Calculate value for stratgy

y_pred = logisticRegr.predict(X_test)

precision_score(y_test, y_pred)

CPU times: user 1.31 s, sys: 23.8 ms, total: 1.34 s
Wall time: 192 ms


0.8736907386990077

In [15]:
%%time
logisticRegr = LogisticRegression(solver = 'newton-cg')

logisticRegr.fit(X_train, y_train) # Calculate value for stratgy

y_pred = logisticRegr.predict(X_test)

precision_score(y_test, y_pred)

CPU times: user 1.77 s, sys: 57.2 ms, total: 1.82 s
Wall time: 255 ms


0.8736305381382209

In [16]:
%%time
logisticRegr = LogisticRegression(solver = 'liblinear')

logisticRegr.fit(X_train, y_train) # Calculate value for stratgy

y_pred = logisticRegr.predict(X_test)

precision_score(y_test, y_pred)

CPU times: user 117 ms, sys: 5.78 ms, total: 123 ms
Wall time: 100 ms


0.8736725968831885

In [17]:
%%time
logisticRegr = LogisticRegression(solver = 'sag')

logisticRegr.fit(X_train, y_train) # Calculate value for stratgy

y_pred = logisticRegr.predict(X_test)

precision_score(y_test, y_pred)

CPU times: user 154 ms, sys: 3.5 ms, total: 157 ms
Wall time: 148 ms


0.8734587035888958

In [18]:
%%time
logisticRegr = LogisticRegression(solver = 'saga')

logisticRegr.fit(X_train, y_train) # Calculate value for stratgy

y_pred = logisticRegr.predict(X_test)

precision_score(y_test, y_pred)

CPU times: user 205 ms, sys: 2.41 ms, total: 208 ms
Wall time: 204 ms


0.8736305381382209

In [19]:
# YOUR ANSWER
fastest_solver = "liblinear"

<details>
    <summary>ℹ️ Click here for our interpretation</summary>

All solvers should produce similar precision scores because our cost-function is "easy" enough to have a global minimum which is found by all 5 solvers. For very complex cost-functions such as in Deep Learning, different solvers may stopping at different values of the loss function.

**The wine dataset**
    
If you check feature importance with sklearn's <a href="https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html">permutation_importance</a> on the current dataset, you'll see many features result in almost 0 importance. Liblinear solver successively moves only along *one* direction at a time, regularizing the others with L1 regularization (a.k.a, setting their beta to 0), which might provide a good fit for a dataset where many features are not that important in predicting the target.

</details> 

###  🧪 Test your code

In [20]:
from nbresult import ChallengeResult

result = ChallengeResult(
    'solvers',
    fastest_solver=fastest_solver
)
result.write()
print(result.check())



platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/gulecs/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /Users/gulecs/code/gulecsec/data-solvers/tests
plugins: anyio-3.6.1, dash-2.7.0, asyncio-0.19.0
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_solvers.py::TestSolvers::test_fastest_solver [32mPASSED[0m[32m                 [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/solvers.pickle

[32mgit[39m commit -m [33m'Completed solvers step'[39m

[32mgit[39m push origin master



## 3. Stochastic Gradient Descent

Logistic Regression models can also be optimized via Stochastic Gradient Descent.

❓ Evaluate a Logistic Regression model optimized via **Stochastic Gradient Descent**. How do its precision score and training time compare to the performance of the models trained in section 2?


<details>
<summary>💡 Hint</summary>

- If you are stuck, look at the [SGDClassifier doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)!

</details>



In [21]:
%%time

class_SGD = SGDClassifier(loss='log')

class_SGD.fit(X_train, y_train) # Calculate value for stratgy

y_pred = class_SGD.predict(X_test)

precision_score(y_test, y_pred)


CPU times: user 99.7 ms, sys: 6.69 ms, total: 106 ms
Wall time: 93.4 ms




0.8832508833922261

☝️ The SGD model should have one of the shortest times (maybe even shorter than `liblinear`), for similar performance. This is a direct effect of performing each epoch of the Gradient Descent on a single row as opposed to loading 100k rows into memory at a time.

## 4. Predictions

❓ Use the best model (balanced with short fit time and high precision) to predict the binary quality (0 or 1) of the following wine. Store your:
- `predicted_class`
- `predicted_proba_of_class`

In [22]:
new_wine = pd.read_csv('https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers_new_wine.csv')
new_wine

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17


In [23]:
new_wine.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'sulphates', 'alcohol'],
      dtype='object')

In [24]:
from sklearn.preprocessing import MinMaxScaler

X_new = new_wine[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'sulphates', 'alcohol']]


X_norm_new = pd.DataFrame(mm_scaler.transform(X_new))
X_norm_new.columns = X_new.columns

X_norm_new.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,0.536834,0.639098,0.669361,0.379404,0.664278,0.353632,0.438967,0.507803,0.648412,0.653381


In [25]:
predicted_class = class_SGD.predict(X_norm_new)
predicted_class

array([0])

In [30]:
predicted_proba_of_class = class_SGD.predict_proba(X_norm_new)[0][predicted_class][0]
predicted_proba_of_class

0.9657510787211749

# 🏁  Check your code and push your notebook

In [31]:
from nbresult import ChallengeResult

result = ChallengeResult(
    'new_data_prediction',
    predicted_class=predicted_class,
    predicted_proba_of_class=predicted_proba_of_class
)
result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/gulecs/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /Users/gulecs/code/gulecsec/data-solvers/tests
plugins: anyio-3.6.1, dash-2.7.0, asyncio-0.19.0
asyncio: mode=strict
[1mcollecting ... [0mcollected 2 items

test_new_data_prediction.py::TestNewDataPrediction::test_predicted_class [32mPASSED[0m[32m [ 50%][0m
test_new_data_prediction.py::TestNewDataPrediction::test_predicted_proba [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/new_data_prediction.pickle

[32mgit[39m commit -m [33m'Completed new_data_prediction step'[39m

[32mgit[39m push origin master

