# 3.2.2 Logistic Regression - Multiclass Classification - One Vs All Implementation

### Challenge 3: [Implementating Regression Algorithms from Scratch]

This project is a part of #100MLProjects

One vs All is an approach where Logistic regression can be adopted to classify different classes, by default Logistic regression does binary classification.

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

In [2]:
dataset = pd.read_csv('iris.data')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [3]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
y = encoder.fit_transform(y)

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [8]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(X_train, y_train)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

In [9]:
y_pred = lr.predict(X_test)

In [10]:
from sklearn.metrics import accuracy_score, confusion_matrix

print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

[[16  0  0]
 [ 0 14  2]
 [ 0  0 13]]
0.9555555555555556


## One vs All LogisticRegression Implementation

In [173]:
class CustomOvALinearRegression:
  """ Custom Linear Regression Implementation.

  Parameters
  ----------
  eta : float
    Learning rate (between 0.0 and 1.0)
  n_iter : int
    Number of Epochs or Passes over the training set
  random_state : int
    Random number generator seed for random weight initialization.
  
  Attributes:
  -----------
  w_array_ : 1D Array
    Array of Bias and Coefficients.
    First element of w_ is the bias, and rest are the coefficients.
  """
  def __init__(self, eta=0.03, n_iter=3000, random_state=123):
    self.eta = eta
    self.n_iter = n_iter
    self.random_state = random_state
    self.w_array_ = []

    
  def sigmoid(self, z):
    """Compute Sigmoid.

    Parameters
    ----------
    z : {array-like}, shape=[n_samples, n_features]

    Returns
    -------
    sigmoid_value : float
      returns the sigmoid value for given input.
    """
    return 1 / (1 + np.exp(-z))

  def cost_function(self, y_hat, y):
    """ Compute the Cost

    Parameters
    ----------
    y_hat : Predicted value
    y     : Ground truth value
    
    Returns
    -------
    cost : Cost value
    """
    n = len(y)
    cost = (1/n)*(np.sum( np.dot(-y.T, np.log(y_hat)) - np.dot((1-y).T, np.log(1-y_hat)) )) 
    return cost
    
  def fit(self, X, y):
    """Fit training data.

    Parameters
    ----------
    X : {array-like}, shape=[n_samples, n_features]
      Training vectors where n_samples is the number of datapoints,
      and n_features is the number of features.
    y : array_like, shape=[n_samples]
      Target Values

    Returns
    -------
    self : object
    """
    np.random.seed(self.random_state)

    X_mod = np.insert(X, 0, 1, axis=1)
    for target in np.unique(y):
        y_mod = np.where(y == target, 1, 0)
        w_ = np.random.rand(X_mod.shape[1])
        w_, cost = self.gradient_descent(X_mod, y_mod, w_)
        self.w_array_.append((w_, target))        
    return self

  def gradient_descent(self, X, y, w):
    """Compute Gradient Descent.

    Parameters:
    -----------
    X : {array-like}, shape=[n_samples, n_features+1]
      Training vectors where n_samples is the number of datapoints,
      and n_features is the number of features.
    y : array_like, shape=[n_samples]
      Target Values

    Returns:
    --------
    w : {array-like}, shape=[n_features + 1]
      optimized coefficients with bias unit.
    cost: float
      real number that quantifies the error.
    """
    n = y.size
    for _ in range(self.n_iter):
      y_pred = np.dot(X, w)
      error = y_pred - y
      cost = (1/(2*n)) * np.dot(error.T, error)
      w = w - (self.eta * (1/n) * np.dot(X.T, error))
    return w, cost

  def predict(self, X):
    """Make predictions for new datapoint.

    Parameters:
    -----------
    X : {array-like}, shape=[n_samples, n_features]
      Training vectors where n_samples is the number of datapoints,
      and n_features is the number of features.

    Returns:
    --------
    y_pred: {array-like}, shape=[n_samples]
      returns predicted continuous value.
    """
    X = np.insert(X, 0, 1, axis=1)
    y_pred = [max( (self.sigmoid(np.dot(xi, w_)), target) for w_,target in self.w_array_)[1] for xi in X]
    print(y_pred)
    return y_pred
#     return [1 if i > 0.5 else 0 for i in self.sigmoid(y_pred)]

#     pred = [max(self.sigmoid(i.dot(w_)) for w_ in self.w_array_) for i in X]

In [174]:
clr = CustomOvALinearRegression()

In [175]:
clr = clr.fit(X_train, y_train)

In [176]:
pred = clr.predict(X_test)

# print(confusion_matrix(y_test, pred))
# print(accuracy_score(y_test, pred))

[1, 2, 1, 1, 0, 2, 2, 1, 1, 2, 0, 0, 1, 0, 0, 1, 2, 1, 0, 0, 0, 0, 2, 0, 1, 1, 2, 0, 0, 2, 0, 1, 1, 2, 0, 2, 2, 1, 1, 0, 2, 0, 1, 2, 2]


In [179]:
print("Confusion Matrix: ")
print(confusion_matrix(y_test, pred))
print("\nAccuracy Score: ")
print(accuracy_score(y_test, pred))

Confusion Matrix: 
[[16  0  0]
 [ 0 12  4]
 [ 0  3 10]]

Accuracy Score: 
0.8444444444444444


## Conclusion

I have built a multi class classifier using logistic regression, but it is not as efficient as sklearns implementation, but it was great to learn how to implement logistic regression for multi class classification.

I used the following references:

- Machine Learning with Python book by sebastian raschska, a very good book.
- [Logistic regression multiclass classification - towards datascience](https://medium.com/analytics-vidhya/logistic-regression-from-scratch-multi-classification-with-onevsall-d5c2acf0c37c)

### Challenges Faced:

After reading the theory and trying to implement this multiclass classification, I was able to build upto the step where i calculate the sigmoid score for each datapoint, and find the max value for it, but i didnt design it properly as i didnt have the associated label.

fixing that is just a simple logical problem, but i was pressuring myself to find a clean solution. I later found an article that i quoted in the resources section above. I'm not satisfied with that solution too, but its simple and works fine. The solution requires us to append the label along with the coefficients.
```
self.w_array_.append((w_, target)) 
```

