In [1]:
import pandas as pd
import numpy as np

<h2>Logistic Regression with Probabilistic Approach </h2>

In theory, a Logistic regression takes input and returns an output of probability, a value between 0 and 1. How does a Logistic Regression do that? With the help of a function called a *logistic function* or most commonly known as a *sigmoid*. This sigmoid function is reponsible for *predicting* or classifying a given input.
Logistic function or sigmoid is defined as:
![](https://imgur.com/Bw5gMJX.jpg)
Where:
* *e* = Euler's number which is **2.71828**.
* *x0* = the value of the sigmoid's midpoint on the x-axis.
* *L* = the maximum value.
* *k* = steepness of the curve.

For Logistic Regression however here is the definition of the logistic function:<br>
![](https://imgur.com/903IYoN.jpg)
Where:
* Θ = is the weight = w

In [2]:
#importing the data


data = pd.read_excel("dataset.xlsx")
data.head()

Unnamed: 0,Marks_1,Marks_2,Result
0,34.62366,78.024693,0
1,30.286711,43.894998,0
2,35.847409,72.902198,0
3,60.182599,86.308552,1
4,79.032736,75.344376,1


<h4>Loss minimizing / J(w) </h4><br>
Weights (represented by theta in our notation) is a vital part of Logistic Regression and other Machine Learning algorithms and we want to find the best values for them. To start we pick random values and we need a way to measure how well the algorithm performs using those random weights. That measure is computed using the loss function. <br><br>
The loss function is defined as:

![](https://imgur.com/riDHhZS.jpg)

Where:
* m = the number of samples
* y = the target class

In [3]:
#knowing my data

print("Columns and data types")
pd.DataFrame(data.dtypes).rename(columns = {0:'datatype'})

Columns and data types


Unnamed: 0,datatype
Marks_1,float64
Marks_2,float64
Result,int64


**1 denotes pass and 0 denotes fail**

In [4]:
#defining my x and y values

X = np.array(data.iloc[:,:2].values)
y = np.array(data.iloc[:,-1].values)

The goal is to **minimize the loss**  by means of increasing or decreasing the weights, which is commonly called fitting. Which weights should be bigger and which should be smaller? This can be decided by a function called **Gradient descent**. The Gradient descent is just the derivative of the loss function with respect to its weights.

![](https://imgur.com/rBVzJbt.jpg)
The weights are updated by substracting the derivative (gradient descent) times the learning rate, as defined below:
![](https://imgur.com/TAIpnwI.jpg)
Where:
* α = learning rate (usually 0.1) But since in my question it is specified that my alpha should be 0.001 I'll keep my α = 0.001

**1. L1 is Lasso**

It adds an aditional term with the cost function (Loss function)

the term is (lambda/(2*n))* Σ |weight|

lambda = factor that measures the extent of regularization
n = no. of features

In [5]:
# my fit function

def fit(X, y):
    lr = 0.001 
    gd = 100000
    n = X.shape[0]
    
    #initial value should be zero
    weight = np.zeros(X.shape[1])
    b = 0
    lamb = 100
    for _ in range(gd):
        func = (np.dot(X, weight) + b)+(lamb/(2*n)*sum(weight))

        sigmoid = 1 / (1 + np.exp(-func)) #for sigmoid

        wdash = np.dot(X.T, (sigmoid - y))/n
        bdash = np.sum(sigmoid - y)/n
        #for the next iteration
        weight = weight -(lr * wdash)
        b = b - (lr * bdash)
    return (weight,b)

In [6]:
#my predict function

def predict():
    arr = [[None]*2]*1
    m1 = int(input("Enter your marks 1: "))
    arr[0][0] = m1
    m2 =int( input("Enter your marks 2: "))
    arr[0][1] = m2
    weight,b=fit(X, y)
    fun = np.dot(arr,weight) + b
    y_pred = 1 / (1 + np.exp(-fun))
    
    for i in y_pred:
        if i > 0.5:
            y_test = "Passed" 
        else:
            y_test = "Failed"
            
    return y_test

In [7]:
predict()

Enter your marks 1: 90
Enter your marks 2: 99


'Passed'

In [8]:
predict()

Enter your marks 1: 89
Enter your marks 2: 10


'Failed'

**2. L2 is Ridge**

It adds an aditional squared term with the cost function (Loss function)

the term is (lambda/(2*n))* Σ |weight^2|

lambda = factor that measures the extent of regularization
n = no. of features

In [9]:
# my fit function

def fit(X, y):
    lr = 0.001 
    gd = 100000
    n = X.shape[0]
    
    #initial value should be zero
    weight = np.zeros(X.shape[1])
    b = 0
    lamb = 100
    for _ in range(gd):
        func = (np.dot(X, weight) + b)+(lamb/(2*n)*sum(weight**2))

        sigmoid = 1 / (1 + np.exp(-func)) #for sigmoid

        wdash = np.dot(X.T, (sigmoid - y))/n
        bdash = np.sum(sigmoid - y)/n
        #for the next iteration
        weight = weight -(lr * wdash)
        b = b - (lr * bdash)
    return (weight,b)

In [10]:
#my predict function

def predict():
    arr = [[None]*2]*1
    m1 = int(input("Enter your marks 1: "))
    arr[0][0] = m1
    m2 =int( input("Enter your marks 2: "))
    arr[0][1] = m2
    weight,b=fit(X, y)
    fun = np.dot(arr,weight) + b
    y_pred = 1 / (1 + np.exp(-fun))
    
    for i in y_pred:
        if i > 0.5:
            y_test = "Passed" 
        else:
            y_test = "Failed"
            
    return y_test

In [11]:
predict()

Enter your marks 1: 33
Enter your marks 2: 99


'Passed'

In [12]:
predict()

Enter your marks 1: 90
Enter your marks 2: 22


'Passed'