# Multiclass Classification: One vs All
### A study of Iris dataset

When we need to do classification where our result can be of different Discrete types.
That is y can take ***Multiple Discrete values***, and not only two values 0 and 1, as before. 

In this case we can use Logistic Regression to classify the data by using the ***one vs all approach***.

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We are going to use the Iris dataset which is quite a famous dataset. And we will classify the iris flowers into their three species ***Setosa, Versicolour and Virginica*** based on their ***petal and sepal, length and width***.

In [10]:
from sklearn.datasets import load_iris
iris = load_iris()
data = iris.data
print(type(data))
print(iris.feature_names,'are the feature names of the iris dataset')

<class 'numpy.ndarray'>
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] are the feature names of the iris dataset


In [14]:
data = pd.DataFrame(data, columns=['s_l','s_w','p_l','p_w']) 
print(data.head(5))
print(data.tail(5))

   s_l  s_w  p_l  p_w
0  5.1  3.5  1.4  0.2
1  4.9  3.0  1.4  0.2
2  4.7  3.2  1.3  0.2
3  4.6  3.1  1.5  0.2
4  5.0  3.6  1.4  0.2
     s_l  s_w  p_l  p_w
145  6.7  3.0  5.2  2.3
146  6.3  2.5  5.0  1.9
147  6.5  3.0  5.2  2.0
148  6.2  3.4  5.4  2.3
149  5.9  3.0  5.1  1.8


Now these above are the features which classify an iris flower into three species. Now we will load the iris.target in vector Y, which will have basically three values repeatedely, each corresponding to a species:
    
    y = 0 --> Setosa
    y = 1 --> Versicolour
    y = 2 --> Virginica

In [16]:
Y = pd.Series(iris.target)
print(Y.head(5))
print(Y.tail(5))

0    0
1    0
2    0
3    0
4    0
dtype: int64
145    2
146    2
147    2
148    2
149    2
dtype: int64


In one vs all technique, we run the logistic regression taking one feature and considering rest of the features as another feature, which reduces it to a binary feature Logistic Regression.

So let's do it for Setosa vs. rest first.

In [18]:
def Gradient_Descent(x, y, l_t, itr):
    #assuming x is (m,n) shaped and y is (m,) or (m,1) shaped
    m = x.shape[0]
    y = y.reshape(m,1)
    x = np.concatenate((np.ones((m,1)),x), axis=1) 
    n = x.shape[1]
    theta = np.random.random(size=(1,n))*0.01
    
    for i in range(itr):
        h_xi = (1/(1 + np.exp(-np.dot(x,np.transpose(theta)))))
        theta = theta - (l_t/m) * np.dot(np.transpose(h_xi- y) , x)
    
    return theta

In [46]:
x1 = data.values
y1 = np.concatenate((Y[0:50,].values.reshape(50,1), (np.ones((100,1))*3)), axis=0)
print( y1[:5,:], y1[145:150,:], sep='\n')

[[0.]
 [0.]
 [0.]
 [0.]
 [0.]]
[[3.]
 [3.]
 [3.]
 [3.]
 [3.]]


So in array y, indices 0 to 49, have value=0, indicating setosa. And indices 50 to 149 have  value=3, indicating the rest.

In [48]:
theta1 = Gradient_Descent(x1, y1, 0.01, 100)
print(theta1)

[[1.01939523 6.74629184 2.7235824  6.09503949 2.16395269]]
