# Logistic Regression from scratch

Our second model is logistic regression. In this first example we will perform a binary classification

We will train the algorithm in the [titanic](https://www.kaggle.com/c/titanic) dataset from kaggle.

I have already done an EDA and feature engineering so we have a ready to use dataset. More information here

In [1]:
import numpy as np
import random
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
#Load data
df = pd.read_csv('train_file.csv')

In [3]:
df

Unnamed: 0,0,1,2,3,4,5,6,11,12,13,...,21,22,23,24,25,26,27,28,29,Survived
0,3,0,7.2500,2,0,3.62500,22.0,0,1,0,...,0,0,0,0,0,0,1,0,1,0
1,1,0,71.2833,2,1,35.64165,38.0,1,0,1,...,0,0,0,0,0,0,0,1,1,1
2,3,0,7.9250,1,0,7.92500,26.0,1,0,0,...,0,0,0,0,0,1,0,0,0,1
3,1,0,53.1000,2,1,26.55000,35.0,1,0,0,...,0,0,0,0,0,0,0,1,1,1
4,3,0,8.0500,1,0,8.05000,35.0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,2,0,13.0000,1,0,13.00000,27.0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
887,1,0,30.0000,1,1,30.00000,19.0,1,0,0,...,0,0,0,0,0,1,0,0,0,1
888,3,2,23.4500,4,0,5.86250,29.0,1,0,0,...,0,0,0,0,0,1,0,0,1,0
889,1,0,30.0000,1,1,30.00000,26.0,0,1,1,...,0,0,0,0,0,0,1,0,0,1


In [4]:
y = df['Survived']
df.drop('Survived',inplace = True,axis = 1)

In [5]:
X = np.array(df)
y = np.array(y)

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test,y_train,y_test = train_test_split(X_scaled,y,random_state = 0)

### Now it's time to create the model

First, notation and equations that we are going to use in the code



**Notation:**

$\alpha = \text{Learning rate}$


$\nabla_{\theta} = \text{Gradient}$

$m = \text{Length of the training set}$

$\theta = \text{Theta parameters}$


-----------------------------

**Equations:**


**Sigmoid function**

$\sigma (p) =  \frac{\mathrm{1} }{\mathrm{1} + e^{-p}}  $ 

**Model prediction** 

$p = \sigma(\theta \cdot X)$




$y =
\begin{cases}
  0 =\text {if  p} \lt 0.5\\
  1 =\text {if  p} \geq 0.5
\end{cases}
$



**Cost Function**

$\text {Binary cross entropy loss} = -\frac{1}{m} \sum_{i=1}^m[y log(p)+(1-y)log(1-p)]$

**Gradient**

$\nabla_{\theta} \to \frac{\partial}{\partial \theta} = \frac{1}{m}\sum_{i=1}^m ({p - y})x$

**Vectorized Gradient**

$\nabla_{\theta}$  $ = \frac{1}{m} \cdot X^T (p- y)$

**Update theta**

$\theta = \theta - \alpha \cdot \nabla_{\theta}$


In [21]:
class logistic_regression:
    def __init__(self,iterations,learning_rate):
        self.iterations = iterations
        self.learning_rate = learning_rate

    def sigmoid(x):
        z = 1/(1 + np.exp(-x))
        return z

    def fit(self,X,y):

        m,n = X.shape
        theta = np.random.randn(n) # Random Initialization of theta

        #add bias terms to X and theta
        X_with_bias = np.c_[np.ones(m),X] 
        theta_with_bias = np.insert(theta,0,0)

        cost = []
        for i in range(self.iterations):
            h = np.dot(X_with_bias,theta_with_bias)
            z = self.sigmoid(h)    
            J = (-1/m)*np.sum(y*np.log(z)+(1-y)*np.log(1-z))

            error = z - y
            grad = (1/m)*np.dot(X_with_bias.T,error)
            theta_with_bias =  theta_with_bias - grad*self.learning_rate
            cost.append(J)
        plt.plot(cost)
        plt.xlabel('Iterations')
        plt.ylabel('Binary cross entropy loss')
        plt.show()
        return theta_with_bias
    
    def predict(self,X_test,theta):
        m_test = X_test.shape[0]

        #Adding X0 = 1 to test set
        X_test_bias = np.c_[np.ones(m_test),X_test]

        #Calculate the prediction
        pred = np.dot(X_test_bias,theta)
    
        return (pred >= 0.5 )*1

In [22]:
#Define Sigmoid function


In [23]:
##Function for training data 
custom_model = logistic_regression(iterations = 2500,learning_rate = 0.05)

In [24]:
theta = custom_model.fit(X_train,y_train)

TypeError: sigmoid() takes 1 positional argument but 2 were given

In [None]:
y_pred = custom_model.predict(X_test,theta)

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
y_pred = predict(X_train,theta)

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
accuracy_score(y_test,y_pred)

In [None]:
model = LogisticRegression(C=1)

In [None]:
model.fit(X_train,y_train)

In [None]:
model.score(X_train,y_train)

In [None]:
y_pred_model = model.predict(X_test)

In [None]:
accuracy_score(y_test,y_pred_model)