# Logistic Regression

Logistic regression is used to describe data and explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Logistic regression models the probablity that each of its input belongs to a particular category i.e. given a set of inputs X, we want to assign them to one of two possible categories (0 or 1)


A function takes some input and gives some output. To get probablities logistic regression uses a function that gives values between 0 and 1 for all values of input X, this function is called as `Sigmoid function` represented in following image.


![sigmoid curve](https://miro.medium.com/max/700/1*HXCBO-Wx5XhuY_OwMl0Phw.png)

Sigmoid function:
![sigmoid function](https://miro.medium.com/max/292/1*p4hYc2VwJqoLWwl_mV0Vjw.png)


Functions have parameters/weights (represented by theta here) and we want to find the best values for them. To start we pick random values and we need a way to measure how well the algorithm performs using those random weights. That measure is computed using the loss function, 


Loss function:
![](https://miro.medium.com/max/700/1*FdxEs8Iv_43Q8calTCjnow.png)


Our goal is to minimize this loss function, which is done by increasing or decreasing the weights. And this is done by the derivative of the loss function with respect to each weight. It tells us how loss would change if we modified the parameters which is nothing but Gradient descent after which we get the updated weights by sbtracting this derivative times the learning rate.

Partial derivative:
![gradient descent](https://miro.medium.com/max/536/1*gobKgGbRWDAoVFAan_HjxQ.png)


This process is repeated untill we find the optimal solution.

For prediction, from the sigmoid function we get the probability that some input x belongs to class 1. Let’s take all probabilities ≥ 0.5 = class 1 and all probabilities < 0 = class 0. This threshold should be defined depending on the business problem we were working.

Converting all this theory to code and we get, 

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

In [2]:
class LogisticRegression:
    """
    Logistic regression from scratch using numpy
    """
    def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, theta=0, verbose=False):
        self.lr = lr
        self.num_iter = num_iter
        self.fit_intercept = fit_intercept
        self.theta = theta
        self.verbose = verbose
    
    def add_intercept(self, X):
        """
        This function computes the intercept Twhich is the expected mean value of Y when all X=0.
        """
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate((intercept, X), axis=1)
    
    def sigmoid(self, z):
        """
        This function gives the probablities between 0 and 1
        """
        return 1 / (1 + np.exp(-z))
    
    def loss(self, h, y):
        return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
    
    def fit(self, X, y):
        if self.fit_intercept:
            X = self.add_intercept(X)
        
        # Initialize weights
        self.theta = np.zeros(X.shape[1])
        
        for i in range(self.num_iter):
            z = np.dot(X, self.theta)
            h = self.sigmoid(z)
            gradient = np.dot(X.T, (h - y)) / y.size
            self.theta -= self.lr * gradient
            
            if(self.verbose == True and i % 10000 == 0):
                z = np.dot(X, self.theta)
                h = self.sigmoid(z)
                print(f'loss: {self.loss(h, y)} \t')
    
    def predict_probablities(self, X):
        if self.fit_intercept:
            X = self.add_intercept(X)
    
        return self.sigmoid(np.dot(X, self.theta))
    
    def predict(self, X, threshold):
        return self.predict_probablities(X) >= threshold

In [3]:
model = LogisticRegression(lr=0.1, num_iter=300000)

iris = load_iris()
X = iris.data[:, :2]
y = (iris.target != 0) * 1

X_train, y_train = X[:100], y[:100]
X_test, y_test = X[100:], y[100:]


In [4]:
%time 
model.fit(X_train, y_train)

CPU times: user 10 µs, sys: 1 µs, total: 11 µs
Wall time: 21.7 µs


In [5]:
preds = model.predict(X_test, threshold=0.5)
(preds == y_test).mean()  #Accuracy

1.0