<a href="https://colab.research.google.com/github/soumyanamboo/Machine-Learning-Techniques/blob/main/LogisticRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Logistic Regression

In [None]:
#import libraries
from IPython.display import display, Math, Latex

import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

np.random.seed(1234)

1. Linear combination of features $z = w^T x$
2. Apply Sigmoid or logistic activation on the linear combinations (z) to obtain the probability.   
$$Pr(y=1|x) = sigmoid(z) = \frac{1}{1+e^{-z}}
$$   
By vectorisint it, $$z_{(n,1)} = X_{(n,m)} w_{(m,1)}    $$

In [None]:
def linear_combination(X:np.ndarray, w:np.ndarray):
  ''' linear combination z = Xw'''
  z = X@w
  return z


Sigmoid function applied on z,  
$$ P(y=1|X) = sigmoid(z_{(n,1)} $$

In [None]:
def sigmoid(z:np.ndarray):
  ''' Calculates sigmoid of linear combinations z'''
  s = 1/(1+np.exp(-z))
  return s

Apply a prediction or inference function on the activations to obtain a class label. If activation or probability > threshold, then label the sample with class 1, or 0. 

In [None]:
def predict(X:np.ndarray, w:np.ndarray, threshold:float):
  ''' Predicts class label for sample
      if sigmoid > threshols, label = 1. Otherwise label=0
  '''
  return np.where(sigmoid(linear_combination(X,w)) > threshold, 1, 0)

**Loss function**  
Binary Cross Entropy:  
$$ BCE = BCE on training samples + λ regularisation penalty$$
1. if no regularisation, $λ = 0, BCE = BCE$   
2. With L2 regularisation, the loss function is:  
$$ J(w) = -(Σ_{i=1}^n y^i log(sigmoid(w^Tx^i)) + (1 - y^i) log(1 - sigmoid(w^Tx^i)))  + λ ||w||^2 $$   
3. With L1 regularization, the loss function is:   
$$J(w) = -(Σ_{i=1}^n y^i log(sigmoid(w^Tx^i)) + (1 - y^i) log(1 - sigmoid(w^Tx^i)))  + λ ||w|| $$  

Loss function in vectorised form is: 
$$ e = (y log(sigmoid(Xw)) + (1-y)log(1-sigmoid(Xw))$$
$$ Loss, J(w) = -1^T_{(1,n)} e_{(n,1)} $$
Adding L2 penality, $$ J(w) = -1^Te + λ w^Tw $$
* Adding L1 penality, $$ J(w) = -1^Te + λ1^T|w| $$
* Set the regularization rate L1 or L2, whichever not required to 0.   
* If we set the regularization rate such that their sum is 1, we get **elastic net regularization**.


In [None]:
def loss(y, sigmoid_vector, weight_vector, l1_reg_rate, l2_reg_rate):
  bce = -1 * (np.sum(y * np.log(sigmoid_vector)  + (1-y)* np.log(1-sigmoid_vector)))
  l2_reg = l2_reg_rate * np.transpose(weight_vector)@weight_vector
  l1_reg = l1_reg_rate * np.sum(np.abs(weight_vector))
  loss_value = bce + l2_reg + l1_reg
  return loss_value

**Optimization:**
1. Calculate gradient of loss function
2. Scale the gradient with learnign rate and use it for updating the weight vector.
Gradient of Loss Function:  
$$ \frac{dJ(w)}{dw} = X^T (sigmoid(Xw)-y) + λ w  $$


In [None]:
def calculate_gradient(X:np.ndarray, y:np.ndarray, w:np.ndarray, reg_rate:float):
  ''' Calculate the gradient of loss function w.r.t the weight vector on training dataset
      gradient is calculated as np.transpose(X)(sigmoid(Xw) - y) + reg_rate * w '''
  
  grad = np.transpose(X) @ (sigmoid(linear_combination(X,w))) - y + reg_rate * w
  return grad

In [None]:
class Logistic_Regression(object):
  ''' Logistic regression model : y = sigmoid (X@w)'''

  def set_weight_vector(self,w):
    self.w = w
  
  def linear_combination(self, X:np.ndarray):
    return X @ self.w
  
  def sigmoid(self, z:np.ndarray):
    return (1/(1 + np.exp(-z)))
  
  def activation(self, X:np.ndarray):
    '''calculates sigmoid activation for logistic regression as act = sigmoid(Xw)'''

    return self.sigmoid(self.linear_combination(X))
  
  def predict(self, X:np.ndarray, threshold:float = 0.5):
    return (self.activation(X) > threshold).astype(int)
  
  def loss(self,X:np.ndarray,y:np.ndarray, reg_rate:float):
    predicted_prob = self.activation(X)
    loss = (-1 * np.sum(y * np.log(predicted_prob) + (1-y)*np.log(1 - predicted_prob))) + 
          reg_rate * (np.transpose(self.w) @ self.w)
    return loss
  
  def calculate_gradient(self, X:np.ndarray, y:np.ndarray,reg_rate:float):
    grad = np.transpose(X) @ (self.activation(X) - y) + reg_rate * self.w
    return grad
  
  def update_weight(self, grad:np.ndarray, lr:float):
    return (self.w - lr * grad)
  
  def gd(self, X:np.ndarray, y:np.ndarray, num_epochs:int, lr:float, reg_rate:float):
    self.w = np.zeros(X.shape[1])
    self.w_all = []
    self.err_all = []
    for i in np.arange(0, num_epochs):
      dJdw = self.calculate_gradient(X, y, reg_rate)
      self.w_all.append(self.w)
      self.err_all.append(self.loss(X, y, reg_rate))
      self.w = self.update_weight(dJdw, lr)
    return self.w