<a href="https://colab.research.google.com/github/revathys/AIMLLearn/blob/main/colab_activity13_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Colab Activity 13.4: Multiclass Logistic Regression

**Expected Time = 90 minutes**


This activity focuses on implementing `LogisticRegression` estimator using three approaches for multi class classification.  Two of these, one vs. rest and multinomial, are available using the estimator directly.  The third example, one vs. one, is implemented from the scikit-learn `multiclass` module.  Most important is that you can consider each of these models as options when building classification models and that you select the best depending on your identified metric.

#### Index

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)
- [Problem 6](#Problem-6)

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, recall_score, precision_score
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsOneClassifier

### The Data

Below, the `penguins` data is loaded and the target feature for all classes is converted to a numeric value.  Thus, we have three classes where 0, 1, and 2 represent Adelie, Chinstrap, and Gentoo respectively.  

In [3]:
penguins = sns.load_dataset('penguins').dropna()
X = penguins.drop(['species', 'island', 'sex'], axis = 1)
y = penguins.species
y_num = pd.factorize(y)[0]
categories = pd.factorize(y)[1]
print(categories)

Index(['Adelie', 'Chinstrap', 'Gentoo'], dtype='object')


In [4]:
X.head()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0


In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=518)

[Back to top](#-Index)

### Problem 1

#### One vs. Rest Classification


To begin, use the `LogisticRegression` estimator with the argument `multi_class = 'ovr'`, `max_iter=1000`, and `random_state = 42` to instantiate a model named `ovr_lgr`. The use itto fit the model on the training data.



In [6]:


ovr_lgr = LogisticRegression(multi_class='ovr', random_state=42, max_iter=1000)

ovr_lgr.fit(X_train,y_train)

# Answer check
ovr_lgr



[Back to top](#-Index)

### Problem 2

#### Examining the Probabilities


Use the `predict_proba` function to examine the predicted probabilities on the testing data.  Assign these to `ovr_probs` as an array below.  

In [8]:


ovr_probs = ovr_lgr.predict_proba(X_test)

print(type(ovr_probs))

# Answer check
pd.DataFrame(ovr_probs, columns = ['p(adelie)', 'p(gentoo)', 'p(chinstrap)']).head()

<class 'numpy.ndarray'>


Unnamed: 0,p(adelie),p(gentoo),p(chinstrap)
0,0.9999683,2.351634e-05,8e-06
1,0.0002123863,6.596597e-05,0.999722
2,6.624596e-05,1.18969e-07,0.999934
3,2.560936e-07,0.0008895072,0.99911
4,0.9878666,1.682196e-10,0.012133


[Back to top](#-Index)

### Problem 3

#### Trying multinomial


Now, instatiate a `LogisticRegression` estimator with `max_iter=10000`, `multi_class = 'multinomial'` and `random_state = 42`.  Fit the model on the training data as `multi_lgr` below.

In [10]:


multi_lgr = LogisticRegression(multi_class='multinomial', random_state=42, max_iter=10000)

multi_lgr.fit(X_train,y_train)



# Answer check
multi_lgr



[Back to top](#-Index)

### Problem 4

#### Examining the Probabilities



Again, use the `predict_proba` function to examine the probabilities from the multinomial estimator above on the test data.  Assign them as an array to `multi_probs` below.

In [12]:


multi_probs = multi_lgr.predict_proba(X_test)



# Answer check
pd.DataFrame(multi_probs, columns = ['p(adelie)', 'p(gentoo)', 'p(chinstrap)']).head()

Unnamed: 0,p(adelie),p(gentoo),p(chinstrap)
0,0.99999,3.38238e-06,6e-06
1,0.000384,0.0004514612,0.999165
2,2.8e-05,3.972688e-07,0.999972
3,2e-06,0.000418124,0.99958
4,0.999915,3.208431e-10,8.5e-05


[Back to top](#-Index)

### Problem 5

#### One vs. One Classifier


Similar in thinking to the one vs. rest approach, the one vs. one approach pairs every combination of the target class and builds a logistic model on this binary problem.  This means that for three classes you would have 6 different logistic regressors.  

The LogisticRegression estimator does not have this as a default. However, Scikit-learn implements this approach through the `OneVsOneClassifier` that accepts a classification estimator.

Below, instantiate a `OneVsOneClassifier` estimator with a `LogisticRegression`  estimator with parsmeter `max_iter=1000` and `random_state = 42`as `ovo_clf`. Fit this to the training data.  

In [13]:


ovo_clf = OneVsOneClassifier(estimator=LogisticRegression(max_iter=1000, random_state=42))

ovo_clf.fit(X_train,y_train)



# Answer check
ovo_clf

[Back to top](#-Index)

### Problem 6

#### Comparing Performance


Create a DataFrame that contains the scores on testing data in terms of accuracy.  Assign to `eval_df` below.  Which classifier performed best in terms of accuracy?  Assign your answer as a string -- `ovr`, `multi`, or `ovo` -- below to `best_acc`.

| estimator | accuracy |
| ------ | ------ |
| ovo | - |
| multi | - |
| ovo | - |

In [18]:


best_acc = 'all'

eval_df = pd.DataFrame({
    'estimator': ['ovr', 'multi', 'ovo'],
    'accuracy': [ accuracy_score(ovr_lgr.predict(X_test),y_test), accuracy_score(multi_lgr.predict(X_test),y_test),  accuracy_score(ovo_clf.predict(X_test),y_test)]
})
print(eval_df)
# Answer check
print(best_acc)

  estimator  accuracy
0       ovr  0.988095
1     multi  0.988095
2       ovo  0.988095
all


Hopefully this activity increased your facility with the `LogisticRegression` estimator and how it can be used in a multi-class setting.  Of course, these options are things you may consider in a grid search rather than fitting each on their own, however the One vs. One will have to implemented as its own object.  Further, many of the fitting procedures should raise warnings.  As seen before, there is regularization behind the scenes so scaling the data should happen prior to fitting.  Further, you may need to give the estimator more time for the gradient descent to converge, which you can control with the `max_iter` argument.