# Multiclass Logistic Regression

## Multiclass using SKlearn's LogisticRegression

In the previous sections, we learnt how to use Sklearn's LogisticRegression module and how to fine tune the parameters for 2 class or binary class problem.

In this section, we will learn how to use the LogisticRegression for a multiclass problem involving 3 or more classes.

According to the sklearn documentation, in the multiclass scenario, the LogisticRegression algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’.
It uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. 

A multiclass option of ‘multinomial’ is supported only by certain solvers such as the ‘lbfgs’, ‘sag’, ‘saga’ and ‘newton-cg’.

Let us try to take the simple example of iris dataset. 

```python
from sklearn.linear_model import LogisticRegression
import pandas as pd

iris = pd.read_csv('../../../data/iris.csv', na_values='?').dropna()
iris.info()
iris.shape

lr_iris = LogisticRegression()
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)
```
Measure the performance of the trained model over the training set:
```python
lr_iris.score(y_pred, train_data['Survived'])
0.96
```

<br/>

## Exercise:

Train the model with LogisticRegression.

- Train using scikit learn logistic regression module.
- Get the prediction on the training set and print out the score.

In [13]:
# Here is the distplot used to generate Age plot. Modify features variable for fare.
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn import datasets,metrics
from sklearn.linear_model import LogisticRegression

iris = datasets.load_iris()
iris_data = iris.data
iris_data = pd.DataFrame(iris_data, columns=iris.feature_names)
iris_data['species'] = iris.target 
iris_data['species'].unique()

features = iris.feature_names
target = 'species'

X = iris_data[features]
y = iris_data[target]

#write your code below

### Solution

```python

lr_iris = LogisticRegression()
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))
```


## multiclass Parameter

SKlearn's LogisticRegression class takes a parameter called multiclass to tune the algorithm for multiclass scenario.


Let us try to set the appropriate value for multiclass parameter.

First let us try OVR approach. This uses liblinear solver.

## Exercise

Set the multiclass to 'ovr'
hit run to execute.

In [17]:
lr_iris = LogisticRegression(multi_class='ovr')
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))

0.96


## Multinomial option

Now let us try multinomial approach


Let us try to set the appropriate value for multiclass parameter.

## Exercise

Set the multiclass to 'multinomial' and solver as 'newton-cg'
hit run to execute.

In [19]:
lr_iris = LogisticRegression(multi_class='multinomial',solver='newton-cg')
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))

0.9733333333333334


## Other Solver options - lbfgs

Now let us try different solvers for multinomial


Let us try  lbfgs.

## Exercise

Set the multiclass to 'multinomial' and solver as 'lbfgs'
hit run to execute.

In [23]:
lr_iris = LogisticRegression(multi_class='multinomial',solver='lbfgs')
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))

0.9733333333333334


## Other Solver options - SAG

Now let us try different solvers for multinomial


Let us try  sag.

## Exercise

Set the multiclass to 'multinomial' and solver as 'sag'
hit run to execute.

In [20]:
lr_iris = LogisticRegression(multi_class='multinomial',solver='sag')
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))

0.9866666666666667




## Other Solver options - SAGA

Now let us try different solvers for multinomial


Let us try  saga.

## Exercise

Set the multiclass to 'multinomial' and solver as 'saga'
hit run to execute.

In [21]:
lr_iris = LogisticRegression(multi_class='multinomial',solver='saga')
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))

0.9866666666666667




## GridSearchCV to tune the model

Now let us try GridSearchCV with saga and multinomial option


## Exercise

Set the multiclass to 'multinomial' and solver as 'saga'
hit run to execute.

In [26]:
from sklearn.model_selection import GridSearchCV
import time

penalty = ['l1','l2']
max_iter=[1, 10, 20, 40, 80, 100,110,120,130,140]
C = np.linspace(0.1, 1.0, num=5)

param_grid = dict(max_iter=max_iter, C=C, penalty=penalty)

lr_iris = LogisticRegression(multi_class='multinomial',solver='saga')

grid = GridSearchCV(estimator=lr_iris, param_grid=param_grid, cv = 5)

start_time = time.time()
grid_result = grid.fit(X, y)
# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
print("Execution time: " + str((time.time() - start_time)) + ' ms')



















Best: 0.986667 using {'C': 0.325, 'max_iter': 100, 'penalty': 'l2'}
Execution time: 2.9114737510681152 ms




### Solution

Just press Run to execute


From the above the best parameters for the multiclass is below:

```python
Best: 0.986667 using {'C': 0.325, 'max_iter': 100, 'penalty': 'l2'}
```



In [28]:
lr_iris = LogisticRegression(multi_class='multinomial',solver='saga', C=0.325, max_iter= 100, penalty= 'l2')
lr_iris = lr_iris.fit(X, y)
y_pred = lr_iris.predict(X)

print(metrics.accuracy_score(y_pred,y))


0.9866666666666667




#### End of Notebook