# Multi-class Logistic Regression

The sklearn LogisticRegression can solve binary and multi-class problems. The multi-class are checked uses `len(unique(y)) > 2` to choose the underlying algorithm. It can be used for:
1. Binary: Only one class 0 or 1 
2. Multi-class: Each sample can be > 2 classes. For example `y = [0, 1, 2, 3, 0]`
    - Use ```multi_class="over" or "multinomial"```
3. Multilabel: Each sample can be in multiple classes. But each row is a binary vector. Example is a movie is __Action__, __Not romcom__, and __Sci-Fi__. Use `OneVsRestClassifier(LogisticRegression())` for this
    ```python
    y = [
        [1, 0, 1], # Sample 1 belongs to class 0 and 2
        [0, 1, 1], # Sample 2 belongs to class 2 and 3
        [0, 1, 0], # Sample 3 belongs to class 1
    ]
    ```
4. Multi-output: It is broader term than Multi-label. Each sample can have multiple independent output. The example are predict __minimum__ and __maximum__ temperature or __x__ and __y__ of a object on a image. Use `MultiOutputClassifier(LogisticRegression())` for this
    ```python
    y = [
        [0, 1], # first task class-0, second class-1
        [2, 0], # first task class-2, second class-0
        [1, 2], # first task class-1, second class-2
    ]
    ```
## Objective
The objective if this notebook is to show OVR  (One over rest) strategy used by SKlearn LogisticRegression for multi-label example. That is same as taking each class separately, and create binary classifier. Then choose the one over the rest (with highest probability). To get the apples-to-apples comparision of probability calculations, we need to provide exact same inputs and radom_state in initialization. We shall choose the iris dataset.


## Example: Multi-class
The example uses iris data-set to solve multi-class classification. 

The species are: 0 Setosa, 1:Versicolor, 2: Virginica. Create three label columns one for each class for manual OVR
 - Column 0: [0, 1]: Class-0 Setosa
 - Column 1: [0, 1]; Class-1 Versicolor
 - Column 2: [0, 1]; Class-1 Virginica

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder, StandardScaler

In [2]:
# Load iris dataset
iris = load_iris()
X = iris.data
y_original = iris.target
np.unique(y_original), "classes = ", len(np.unique(y_original))  # Number of classes

(array([0, 1, 2]), 'classes = ', 3)

In [3]:
# Apply Label Encoder, encode the output
iris_label_encoder = LabelEncoder()
y = iris_label_encoder.fit_transform(y_original)

In [4]:
# Select training row indices: 0–40, 50–90, 100–140
# Select test row indices: 40–50, 90–100, 140–150
training_indices = np.r_[0:40, 50:90, 100:140]
test_indices = np.setdiff1d(np.arange(len(y)), training_indices)

In [5]:
# Create 3 'binary' labels: one for each class.
y_bin_class_0 = (y_original == 0).astype(int)
y_bin_class_1 = (y_original == 1).astype(int)
y_bin_class_2 = (y_original == 2).astype(int)

In [6]:
# Get y-test/train split
X_train, X_test, y_train, y_test = X[training_indices], X[test_indices], y[training_indices], y[test_indices]
y_train_0, y_test_0 = y_bin_class_0[training_indices], y_bin_class_0[test_indices]
y_train_1, y_test_1 = y_bin_class_1[training_indices], y_bin_class_1[test_indices]
y_train_2, y_test_2 = y_bin_class_2[training_indices], y_bin_class_2[test_indices]

## Trying multi-class Classification
This is example to use Multi-class classification 

In [7]:
sk_pipeline = Pipeline([
    ("scalar" ,StandardScaler()),
    ("clf", OneVsRestClassifier(LogisticRegression(solver="liblinear", random_state=42)))
]).fit(X_train, y_train)

In [8]:
# Predict probabilities
y_pred_prob = sk_pipeline.predict_proba(X_test)

# Equivalence
Under the hood LogisticRegression creates multiple models. And uses various strategies to choose the winner. To compare apples-to-apples we will use same `solver="liblinear"` and same pre-processing. The we shall compare the `predict_proba` for each.

In [9]:
# Print samples
idx = [0, 1, 50, 51, 100, 101]
(y_bin_class_0[idx], y_bin_class_1[idx], y_bin_class_2[idx])


(array([1, 1, 0, 0, 0, 0]),
 array([0, 0, 1, 1, 0, 0]),
 array([0, 0, 0, 0, 1, 1]))

In [10]:
c0_pipeline = Pipeline([
    ("scalar", StandardScaler()),
    ("clf", LogisticRegression(solver="liblinear", random_state=42))
]).fit(X_train, y_train_0)

In [11]:
c1_pipeline = Pipeline([
    ("scalar", StandardScaler()),
    ("clf", LogisticRegression(solver="liblinear", random_state=42))
]).fit(X_train, y_train_1)

In [12]:
c2_pipeline = Pipeline([
    ("scalar", StandardScaler()),
    ("clf", LogisticRegression(solver="liblinear", random_state=42))
]).fit(X_train, y_train_2)

In [13]:
# Get probabilities, and combine them into one matrix to compare with sklearn's OVR
y_pred_prob_class_0 = c0_pipeline.predict_proba(X_test)
y_pred_prob_class_1 = c1_pipeline.predict_proba(X_test)
y_pred_prob_class_2 = c2_pipeline.predict_proba(X_test)
ovr_probs = np.column_stack([
    y_pred_prob_class_0[:, 1],  # Probability of class 0
    y_pred_prob_class_1[:, 1],  # Probability of class 1
    y_pred_prob_class_2[:, 1]   # Probability of class 2
])
# Normalize the probabilities as sklearn does softmax in each row
ovr_probs_normalized = ovr_probs / ovr_probs.sum(axis=1, keepdims=True)

In [14]:
# Compare the probabilities, and count the number of matches
np.equal(ovr_probs_normalized, y_pred_prob).sum(axis=0)

array([30, 30, 30])

__Remark:__ We see that all the "30" probabilities matched.

## Printing Samples
We shall look at prediction of first 2 samples of each class. 

In [15]:
# Print the binary classification first 2 rows of each class
(y_pred_prob_class_0[:2], y_pred_prob_class_1[:2], y_pred_prob_class_2[:2])


(array([[0.01108447, 0.98891553],
        [0.22734722, 0.77265278]]),
 array([[0.89539545, 0.10460455],
        [0.24965287, 0.75034713]]),
 array([[9.99638614e-01, 3.61386382e-04],
        [9.99494295e-01, 5.05705487e-04]]))

In [16]:
# Print the calculated first 2 rows of OVR
(ovr_probs[:2], ovr_probs[10:12], ovr_probs[20:22])

(array([[9.88915525e-01, 1.04604552e-01, 3.61386382e-04],
        [7.72652776e-01, 7.50347131e-01, 5.05705487e-04]]),
 array([[0.0420859 , 0.59937954, 0.15443873],
        [0.04810585, 0.34441794, 0.27194538]]),
 array([[0.00245132, 0.22650742, 0.95718486],
        [0.00379213, 0.23693157, 0.90777971]]))

In [17]:
(ovr_probs_normalized[:2], ovr_probs_normalized[10:12], ovr_probs_normalized[20:22])

(array([[9.04042676e-01, 9.56269533e-02, 3.30370697e-04],
        [5.07154532e-01, 4.92513532e-01, 3.31935428e-04]]),
 array([[0.0528781 , 0.75308003, 0.19404187],
        [0.07239741, 0.51833547, 0.40926712]]),
 array([[0.00206663, 0.19096121, 0.80697216],
        [0.0033018 , 0.20629592, 0.79040228]]))

In [18]:
# Print sklearn's OVR
(y_pred_prob[:2], y_pred_prob[10:12], y_pred_prob[20:22])

(array([[9.04042676e-01, 9.56269533e-02, 3.30370697e-04],
        [5.07154532e-01, 4.92513532e-01, 3.31935428e-04]]),
 array([[0.0528781 , 0.75308003, 0.19404187],
        [0.07239741, 0.51833547, 0.40926712]]),
 array([[0.00206663, 0.19096121, 0.80697216],
        [0.0033018 , 0.20629592, 0.79040228]]))