# Lasso Regularization

In [None]:
import pandas as pd
data = pd.read_csv('data.csv')
data

- Each row corresponds to the profile of health insurance client
- The target the `price_range` category
- The features are client specificities


👇 Optimize the regularization penalty of a Lasso classification model. According to your optimal model, which features do not influence the charges paid by a client?

We won't do a train/test split for now, simply assuming that it was already done and our data is now the training set.

You can use RandomizedSearch or GridSearch or a combination of both

Note: not all solvers support all types of penalty. Look at the [documentation for Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)!

<details>
    <summary>Hints</summary>

- [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) uses **Ridge** regularization by default. You just have to tune the hyperparameter `C` = 1/`alpha`

- To use **Lasso**, simply change the penalty hyperparameter to "l1" and the solver to ‘liblinear’ or ‘saga’ (not all solvers support all penalty types)

``` python
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(penalty='l1', solver='liblinear', C=1/10)
```
</details>


In [None]:
# Grid Search

In [None]:
# Check Best regularization penalty and best score

In [None]:
# Rank the features by order of importance

# Ridge Regularization

Redo the same with Ridge regularization. You can simply change the penalty to l2.

In [None]:
# Grid Search

In [None]:
# Best regularization penalty and best score

⚠️ Please, push the exercice once you have completed it 🙃

<span style="font-size:2em;">🏁</span>

# Regularization

## Import data

In [None]:
import pandas as pd
data = pd.read_csv('data.csv')
data

- Each row corresponds to the profile of health insurance client
- The target the `price_range` category
- The features are client specificities


We won't do a train/test split for now, simply assuming that it was already done and our data is now the training set.


👇 Create your `X` and `y`. Encode your binary target, and scale your features.

## Lasso

👇 Optimize the regularization penalty of a Lasso classification model. Don't forget to scale your features to optimize your regularization

❓ According to your optimal model, which features do not influence the charges paid by a client?

You can use RandomizedSearch or GridSearch or a combination of both  
Note: not all solvers support all types of penalty. Look at the [documentation for Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)!

<details>
    <summary>Hints</summary>

- [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) uses **Ridge** regularization by default. You just have to tune the hyperparameter `C` = 1/`alpha`

- To use **Lasso**, simply change the penalty hyperparameter to "l1" and the solver to ‘liblinear’ or ‘saga’ (not all solvers support all penalty types)

``` python
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(penalty='l1', solver='liblinear', C=1/10)
```
</details>


## Ridge

Redo the same with Ridge regularization. You can simply change the penalty to l2.

## Let's play with `GridSearchCV` a bit more

Gridsearch can be computationally expensive. You don't want to run them multiple time if you want to measure multiple performance metrics. 

👇 Can you make **one** GridSearchCV where you keep log of `accuracy`, `precision` and `recall` score at each fit, while keeping `accuracy` as your decision metric to automatically choose the `best_estimator_` ?  (Read the docs!)

<details><summary>Hints</summary>

Look at the `refit` argument
<details>

👇 Take some time to understand what's in your `GridSearchCV().cv_results_` instance attribute.
Can you rank, for instance, your trainings per mean cross-validated `recall` scores?
(Turn the cv_results_ into a DataFrame to make things clearer)

⚠️ Please, push the exercice once you have completed it 🙃

<span style="font-size:2em;">🏁</span>