# Solvers ⚙️

In this exercise, you will investigate the effects of different `solvers` on `LogisticRegression` models.

👇 Run the code below

In [1]:
import pandas as pd

df = pd.read_csv("data.csv")

df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3


- The dataset consists of different wines 🍷
- The features describe different properties of the wines 
- The target 🎯 is a quality rating given by an expert

## 1. Target engineering

In this section, you are going to transform the ratings into a binary target.

👇 How many observations are there for each rating?

In [10]:
sorted(df['quality rating'].unique())

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [11]:
df['quality rating'].value_counts()

10    10143
5     10124
1     10090
2     10030
8      9977
6      9961
9      9955
7      9954
4      9928
3      9838
Name: quality rating, dtype: int64

👇 Create `y` by transforming the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]

In [12]:
df['quality rating'] = df['quality rating'].map(lambda x: 1 if x >= 6 else 0)

👇 Check the class balance of the new binary target

In [14]:
df['quality rating'].value_counts()

0    50010
1    49990
Name: quality rating, dtype: int64

Create your `X` by scaling the features. This will allow for fair comparison of different solvers.

In [15]:
X = df.drop(columns='quality rating')
y = df['quality rating']

### Standard Scaler

In [16]:
from sklearn.preprocessing import StandardScaler

In [17]:
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

### MinMax Scaler

In [20]:
from sklearn.preprocessing import MinMaxScaler

In [21]:
minmax = MinMaxScaler()
X_minmax = minmax.fit_transform(X)

## 2. LogisticRegression solvers

👇 Logistic Regression models can be optimized using different **solvers**. Find out 
- Which is the `fastest_solver` ?
- What can you say about their respective precision score?

`solvers = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']`
 
For more information on these 5 solvers, check out [this stackoverflow thread](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)

In [22]:
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LogisticRegression

# List solver types to loop over
solvers = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']

# Initiate scores and fit times lists to store for each model
scores = []
fit_times = []

# Loop over solvers
for solver in solvers:
    
    # Cross validate each model
    cv_log_s = cross_validate(LogisticRegression(solver=solver),
                    X_minmax, y,
                    cv = 5,
                    scoring = ['precision'])
    
    # Append mean score and mean fit time to lists
    scores.append(cv_log_s['test_precision'].mean())
    fit_times.append(cv_log_s['fit_time'].mean())
    
# Create dataframe with each model's performance
solvers_performance = pd.DataFrame({"precision score":scores, "fit time": fit_times}, index = solvers)
solvers_performance

Unnamed: 0,precision score,fit time
newton-cg,0.874386,0.327535
lbfgs,0.874389,0.436177
liblinear,0.874449,0.293458
sag,0.874381,0.713022
saga,0.874386,1.29224


In [23]:
# Initiate scores2 and fit times lists to store for each model
scores2 = []
fit_times2 = []

# Loop over solvers
for solver in solvers:
    
    # Cross validate each model
    cv_log_s = cross_validate(LogisticRegression(solver=solver),
                    X_std, y,
                    cv = 5,
                    scoring = ['precision'])
    
    # Append mean score and mean fit time to lists
    scores2.append(cv_log_s['test_precision'].mean())
    fit_times2.append(cv_log_s['fit_time'].mean())
    
# Create dataframe with each model's performance
solvers_performance = pd.DataFrame({"precision score":scores2, "fit time": fit_times2}, index = solvers)
solvers_performance



Unnamed: 0,precision score,fit time
newton-cg,0.873932,0.396414
lbfgs,0.873878,0.092513
liblinear,0.873878,0.244019
sag,0.873951,1.692364
saga,0.873932,2.764678


In [24]:
# YOUR ANSWER
fastest_solver = "liblinear"

<details>
    <summary>☝️ Intuition</summary>

All solvers should produce similar precision scores because our cost-function is "easy" enough to have a global minimum which is found by all 5 solvers. For very complex cost-functions such as in Deep Learning, different solvers may stopping at different values of the loss function. 

</details> 

###  🧪 Test your code

In [25]:
from nbresult import ChallengeResult

result = ChallengeResult('solvers',
                         fastest_solver=fastest_solver
                         )
result.write()
print(result.check())

platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/matheus/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/matheus/code/matheussposito/data-challenges-869/05-ML/04-Under-the-hood/02-Solvers
plugins: anyio-3.4.0
[1mcollecting ... [0mcollected 1 item

tests/test_solvers.py::TestSolvers::test_fastest_solver [32mPASSED[0m[32m           [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/solvers.pickle

[32mgit[39m commit -m [33m'Completed solvers step'[39m

[32mgit[39m push origin master


## 3. Stochastic Gradient Descent

Logistic Regression models can also be optimized via Stochastic Gradient Descent.

👇 Evaluate a Logistic Regression model optimized via **Stochastic Gradient Descent**. How do its precision score and training time compare to the performance of the models trained in section 2.?


<details>
<summary>💡 Hint</summary>

- If you are stuck, look at the [SGDClassifier doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)!

</details>



In [26]:
from sklearn.linear_model import SGDClassifier

In [30]:
sgd_model = SGDClassifier(loss='log')

cv_sgd = cross_validate(sgd_model, X_minmax, y, cv = 5, scoring = ['precision'])
cv_sgd

{'fit_time': array([0.19424295, 0.20841289, 0.22468805, 0.23423481, 0.2237978 ]),
 'score_time': array([0.00930643, 0.00869799, 0.00890064, 0.00864053, 0.00996375]),
 'test_precision': array([0.88445917, 0.88279593, 0.89558986, 0.88188061, 0.88166423])}

In [31]:
cv_sgd['fit_time'].mean()

0.21707530021667482

In [32]:
cv_sgd['test_precision'].mean()

0.8852779589076363

☝️ The SGD model should have the shortest training time, for similar performance. This is a direct effect of performing each epoch of the Gradient Descent on a single data point.

## 4. Predictions

👇 Use the best model to predict the binary quality (0 or 1) of the following wine. Store your
- `predicted_class`
- `predicted_proba_of_class`

In [33]:
new_data = pd.read_csv('new_data.csv')

new_data

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17


In [43]:
new_data = minmax.transform(new_data)
new_data

array([[0.53683386, 0.63909774, 0.66936136, 0.37940379, 0.66427784,
        0.35363248, 0.43896714, 0.50780312, 0.64841183, 0.65338078]])

In [36]:
sgd_model = SGDClassifier(loss='log')
sgd_model.fit(X_minmax,y)

SGDClassifier(alpha=0.0001, average=False, class_weight=None,
              early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
              l1_ratio=0.15, learning_rate='optimal', loss='log', max_iter=1000,
              n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5,
              random_state=None, shuffle=True, tol=0.001,
              validation_fraction=0.1, verbose=0, warm_start=False)

In [44]:
predicted_class = sgd_model.predict(new_data)[0]
predicted_class

0

In [51]:
predicted_proba_of_class = sgd_model.predict_proba(new_data)[0][0]
predicted_proba_of_class

0.9682006377305865

# 🏁  Check your code and push your notebook

In [52]:
from nbresult import ChallengeResult

result = ChallengeResult('new_data_prediction',
    predicted_class=predicted_class,
    predicted_proba_of_class=predicted_proba_of_class
)
result.write()
print(result.check())

platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/matheus/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /home/matheus/code/matheussposito/data-challenges-869/05-ML/04-Under-the-hood/02-Solvers
plugins: anyio-3.4.0
[1mcollecting ... [0mcollected 2 items

tests/test_new_data_prediction.py::TestNewDataPrediction::test_predicted_class [32mPASSED[0m[32m [ 50%][0m
tests/test_new_data_prediction.py::TestNewDataPrediction::test_predicted_proba [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/new_data_prediction.pickle

[32mgit[39m commit -m [33m'Completed new_data_prediction step'[39m

[32mgit[39m push origin master
