### Codio Activity 13.7: Multi-Class Logistic Regression

**Expected Time = 90 minutes**

**Total Points = 60**

This activity focuses on implementing `LogisticRegression` estimator using three approaches for multi class classification.  Two of these, one vs. rest and multinomial, are available using the estimator directly.  The third example, one vs. one, is implemented from the scikit-learn `multiclass` module.  Most important is that you can consider each of these models as options when building classification models and that you select the best depending on your identified metric.

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)
- [Problem 6](#Problem-6)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, recall_score, precision_score
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsOneClassifier

### The Data

Below, the penguins data is loaded and the target feature for all classes is converted to a numeric value.  Thus, we have three classes where 0, 1, and 2 represent Adelie, Chinstrap, and Gentoo respectively.  

In [2]:
penguins = sns.load_dataset("penguins").dropna()
X = penguins.drop(["species", "island", "sex"], axis=1)
y = penguins.species
y_num = pd.factorize(y)[0]
categories = pd.factorize(y)[1]
print(categories)

Index(['Adelie', 'Chinstrap', 'Gentoo'], dtype='object')


In [3]:
X.head()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0


In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=518)

[Back to top](#-Index)

### Problem 1

#### One vs. Rest Classification

**10 Points**

To begin, use the `LogisticRegression` estimator with the argument `multi_class = 'ovr'` and `random_state = 42` to fit a model on the training data named `ovr_lgr`.   

HINT: if you receive a `ConvergenceWarning: lbfgs failed to converge (status=1):` error, use `StandardScaler()` from `sklearn.preprocessing` to scale the X variable and use that in all subsequent models.

In [5]:
from sklearn.preprocessing import StandardScaler

ss = StandardScaler().fit(X_train)
X_train, X_test = ss.transform(X_train), ss.transform(X_test)

In [6]:
ovr_lgr = LogisticRegression(multi_class="ovr", random_state=42, max_iter=1000).fit(
    X_train, y_train
)

# Answer check
ovr_lgr

[Back to top](#-Index)

### Problem 2

#### Examining the Probabilities

**10 Points**

Examine the predicted probabilities on the testing data.  Assign these to `ovr_probs` as an array below.  

In [7]:
ovr_probs = ovr_lgr.predict_proba(X_test)

# Answer check
pd.DataFrame(ovr_probs, columns=["p(adelie)", "p(gentoo)", "p(chinstrap)"]).head()

Unnamed: 0,p(adelie),p(gentoo),p(chinstrap)
0,0.986699,0.011002,0.002299
1,0.011486,0.009449,0.979065
2,0.002635,0.003707,0.993658
3,0.000266,0.03227,0.967464
4,0.977343,0.0005,0.022157


[Back to top](#-Index)

### Problem 3

#### Trying multinomial

**10 Points**

Now, fit a `LogisticRegression` estimator with `multi_class = 'multinomial'` and `random_state = 42`.  Fit the model on the training data as `multi_lgr` below.

In [8]:
multi_lgr = LogisticRegression(
    multi_class="multinomial", random_state=42, max_iter=10000
).fit(X_train, y_train)

# Answer check
multi_lgr

[Back to top](#-Index)

### Problem 4

#### Examining the Probabilities

**10 Points**

Again, examine the probabilities from the multinomial estimator above on the test data.  Assign them as an array to `multi_probs` below. 

In [9]:
multi_probs = multi_lgr.predict_proba(X_test)

# Answer check
pd.DataFrame(multi_probs, columns=["p(adelie)", "p(gentoo)", "p(chinstrap)"]).head()

Unnamed: 0,p(adelie),p(gentoo),p(chinstrap)
0,0.995683,0.003899,0.000418
1,0.004936,0.01486,0.980204
2,0.000418,0.001633,0.997949
3,6.7e-05,0.004861,0.995072
4,0.997691,0.000562,0.001747


[Back to top](#-Index)

### Problem 5

#### One vs. One Classifier

**10 Points**

Similar in thinking to the one vs. rest approach, the one vs. one approach pairs every combination of the target class and builds a logistic model on this binary problem.  This means that for three classes you would have 6 different logistic regressors.  

The LogisticRegression estimator does not have this as a default but scikitlearn implements this approach through the `OneVsOneClassifier` that accepts a classification estimator. Below, instantiate a `OneVsOneClassifier` with a `LogisticRegression` estimator as `ovo_clf` below, and fit this on the training data. In your Logistic estimator set `random_state = 42`.     

In [10]:
ovo_clf = OneVsOneClassifier(
    estimator=LogisticRegression(max_iter=1000, random_state=42)
).fit(X_train, y_train)

# Answer check
ovo_clf

[Back to top](#-Index)

### Problem 6

#### Comparing Performance

**10 Points**

Create a DataFrame that contains the scores on testing data in terms of accuracy.  Assign to `eval_df` below.  Which classifier performed best in terms of accuracy?  Assign your answer as a string -- `ovr`, `multi`, or `ovo` -- below to `best_acc`. 

| estimator | accuracy | 
| ------ | ------ |
| ovo | - |
| multi | - |
| ovo | - |

In [11]:
accs = [
    accuracy_score(y_test, ovr_lgr.predict(X_test)),
    accuracy_score(y_test, multi_lgr.predict(X_test)),
    accuracy_score(y_test, ovo_clf.predict(X_test)),
]

eval_df = pd.DataFrame({"estimator": ["ovr", "multi", "ovo"], "accuracy": accs})
display(eval_df)

best_acc = list(eval_df["estimator"])[np.argmax(accs)]

# Answer check
print(best_acc)

Unnamed: 0,estimator,accuracy
0,ovr,1.0
1,multi,1.0
2,ovo,0.988095


ovr


Hopefully this activity increased your facility with the `LogisticRegression` estimator and how it can be used in a multi-class setting.  Of course, these options are things you may consider in a grid search rather than fitting each on their own, however the One vs. One will have to implemented as its own object.  Further, many of the fitting procedures should raise warnings.  As seen before, there is regularization behind the scenes so scaling the data should happen prior to fitting.  Further, you may need to give the estimator more time for the gradient descent to converge, which you can control with the `max_iter` argument.