# Introduction

In this notebook I have implemented Logistic Regression algorithm.<br>
To use and take advantage of sklearn's consisten and easy to use interface, I will creating our custom implementation using sklearn's base classes **BaseEstimator**, **ClassificatinMixin**.

The name Logistic Regression is misleading because Logistic Regression is a classification algorithm. It is a probability based model that predicts the binary labels using their probability.<br>

Logisitc Regression can also do Multi-class Classification.

In [67]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.base import BaseEstimator, ClassifierMixin

# Preprocessing

## Loading Dataset

In [68]:
breast = load_breast_cancer()
dataset = breast

X = dataset.data
y = dataset.target

xTrain, xTest, yTrain, yTest = train_test_split(X,y)

sc = StandardScaler().fit(xTrain)
xTrain = sc.transform(xTrain)
xTest = sc.transform(xTest)

# Logistic Regression

Logistic Regression is built on a mathematical function called Logisitc Function, whose output is always between 0 and 1, the value of output is the probability of a feature being class 1.<br>

Let's look at the various formulas related to logisitic regression. <br>

**Logistic Function: **
## $ f(x) = \frac{1}{1 + e^{-(b_0 + b_1*x_1)}} $
where "m" refers to slope that we compute in Linear Regression

In [86]:
# Custom Logistic Regression
class CustomLogisticRegression(BaseEstimator, ClassifierMixin):
    def __init__(self, max_iter=1000, alpha=0.0001):
        self.max_iter = max_iter
        self.alpha = alpha
    
    
    def fit(self, X, y=None):
        self._check_params(X, y)
        self._logistic_regression(X, y)
        self.fitted_ = True
        
        return self
    
    
    def predict(self, X):
        if self.fitted_ == None:
            raise Exception('"predict()" called before fit()')
        else:
            ones = np.ones((X.shape[0], 1))
            X = np.hstack((ones, X))
            
            ys = np.dot(X, self.ms)
            ps = self._sigmoid(ys)
            y_pred = np.round(ps)
            
            return y_pred
    
    
    def _sigmoid(self, y):
        return np.exp(y) / (1 + np.exp(y))
    
    
    def _logistic_regression(self, X,y=None):
        ones = np.ones((X.shape[0], 1))
        X = np.hstack((ones, X))
        
        ms = np.zeros(X.shape[1])
        
        for i in range(self.max_iter):
            ys = np.dot(X, ms)
            ps = self._sigmoid(ys)
            
            errors = y - ps
            slope = np.dot(X.T, errors)
            
            ms += self.alpha * slope
        self.ms = ms
        return
    
    
    def _check_params(self, X, y=None):
        pass

### LogisticRegression vs CustomLogisiticRegression

In [87]:
skModel = LogisticRegression(max_iter=1000).fit(xTrain, yTrain)
custModel = CustomLogisticRegression().fit(xTrain, yTrain)

print(cross_val_score(cv=5,estimator=skModel,X=xTest,y=yTest))
print(cross_val_score(cv=5,estimator=custModel,X=xTest,y=yTest))

[ 0.93333333  1.          1.          1.          0.89285714]
[ 0.96666667  1.          1.          1.          0.89285714]


# Stochastic Gradient Descent Classifier

In [107]:
class CustomSGDClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, max_iter=1000, alpha=0.0001):
        self.max_iter = max_iter
        self.alpha = alpha
    
    
    def fit(self, X, y=None):
        self._check_params(X, y)
        self._logistic_regression(X, y)
        self.fitted_ = True
        
        return self
    
    
    def predict(self, X):
        if self.fitted_ == None:
            raise Exception('"predict()" called before fit()')
        else:
            ones = np.ones((X.shape[0], 1))
            X = np.hstack((ones, X))
            
            ys = np.dot(X, self.ms)
            ps = self._sigmoid(ys)
            y_pred = np.round(ps)
            
            return y_pred
    
    
    def _sigmoid(self, y):
        return np.exp(y) / (1 + np.exp(y))
    
    
    def _logistic_regression(self, X,y=None):
        ones = np.ones((X.shape[0], 1))
        X = np.hstack((ones, X))
        
        ms = np.zeros(X.shape[1])
        
        for _ in range(self.max_iter):
            for i in range(X.shape[0]):
                ys = np.dot(X[i,:], ms)
                ps = self._sigmoid([ys])[0]

                error = y[i] - ps
                slope = np.dot(X[i,:].T, error)

                ms += self.alpha * slope
        self.ms = ms
        return
    
    
    def _check_params(self, X, y=None):
        pass

In [108]:
skModel = SGDClassifier(max_iter=1000).fit(xTrain, yTrain)
custModel = CustomSGDClassifier().fit(xTrain, yTrain)

print(cross_val_score(cv=5,estimator=skModel,X=xTest,y=yTest))
print(cross_val_score(cv=5,estimator=custModel,X=xTest,y=yTest))

[ 0.93333333  0.96551724  0.96428571  1.          0.92857143]
[ 0.96666667  1.          1.          1.          0.89285714]
