<a href="https://colab.research.google.com/github/gtoubian/cce/blob/main/4_7_Multinomial_Logistic_Regression_and_Decision_Trees.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#4.7 Multinomial Logisitic Regression and Decision Trees

In this lecture we are going to be looking at using Logistic Regression to situations beyond the binary case and we will also use decision trees to do the same.

Logistic regression is useful in classification on more than just binary classification.

What is we want an algorithm that discriminates between cats, dogs, birds and bees?

This is where multinomial classification comes in.

The multinomial regression function consists of two functional layers-

1. Linear prediction function (a.k.a. logit layer)
2. Softmax function (a.k.a. softmax layer)

The simplest way to think of it is as  k  regression models being fit (one binary model for each class). Then, we take the softmax of the probabilities on each, and pick the one with the highest probability:


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [None]:
url = "https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [None]:
df1 = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
X = df1.values
y = list(df['species'])

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
lr = LogisticRegression().fit(X_train, y_train)
yhat = lr.predict(X_train)

dftrain = pd.DataFrame(X_train, columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
dftrain['Actual'] = y_train
dftrain['Predicted'] = yhat
dftrain.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,Actual,Predicted
0,5.0,2.0,3.5,1.0,versicolor,versicolor
1,6.5,3.0,5.5,1.8,virginica,virginica
2,6.7,3.3,5.7,2.5,virginica,virginica
3,6.0,2.2,5.0,1.5,virginica,virginica
4,6.7,2.5,5.8,1.8,virginica,virginica


In [None]:
lr.coef_

array([[-0.39770719,  0.83357708, -2.28875281, -0.98145916],
       [ 0.54449643, -0.29038568, -0.23369072, -0.65560788],
       [-0.14678924, -0.5431914 ,  2.52244353,  1.63706704]])

##The Softmax Function

After running a distinct binary classification on whether or not our data points fit into each species, we end up getting probabilities for each. These probabilities now need to be scaled in a way such that they all add up to 1. This is **normalizing** our data. For a probability distribution function to be valid, it must be normalized.

To normalize our data, we use the softmax function.

$$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$

where $x_i$ is the probability calculated after each binary logistic regression for each category.

In [None]:
lr.predict_proba(X_test)[0]

array([1.31143228e-04, 5.99446815e-02, 9.39924175e-01])

In [None]:
np.sum(lr.predict_proba(X_test)[0])

1.0

##Accuracy

Now let's test to see the accuracy of the model on both our training and test sets.

The **Accuracy** of a model is the ratio between the number of correct predictions of our model and the total number of predictions that the model makes.

$$Accuracy = \frac{TP + TN}{TP + FP + TN + FN}$$

Where TN = True Negative, TP = True Positive, FP = False Positive and FN = False Negative.

Other terms related to this are **Precision** and **Recall**.

Precision is the percentage of correct positive predictions from total positive predictions.

$$Precision = \frac{TP}{TP + FP}$$

Recall is the percentage of correct positive predictions from the number of actual positive points. 

$$Recall = \frac{TP}{TP + FN}$$

**NOTE:** For further study you can also look at *sensitivity*, *specificity* and *bias*.




In [None]:
accuracy_score(y_train, yhat)

0.9809523809523809

In [None]:
yhat = lr.predict(X_test)
accuracy_score(y_test, yhat)

0.9777777777777777

#Decision Tree

Decision trees are classifiers that split up our data points based off of distinct criteria. In doing so, the model has a criteria from which it can classify new data that it is sent. Watch the video below for an overview on decision trees.

https://www.youtube.com/watch?v=eKD5gxPPeY0


In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
clf = DecisionTreeClassifier(random_state=0)

In [None]:
model = clf.fit(X_train, y_train)

In [None]:
yhat = model.predict(X_train)

In [None]:
pd.DataFrame(y_train, yhat)

In [None]:
yhat = model.predict(X_test)

In [None]:
pd.DataFrame(y_test, yhat)

Unnamed: 0,0
virginica,virginica
versicolor,versicolor
setosa,setosa
virginica,virginica
setosa,setosa
virginica,virginica
setosa,setosa
versicolor,versicolor
versicolor,versicolor
versicolor,versicolor


In [None]:
model.score(X_train, y_train)

1.0

In [None]:
model.score(X_test, y_test)

0.9777777777777777