# Logistic Regression
+ Simple logistic regression with SGD optimization.
+ Predict class fail or pass  with information on number of lectures attendance and hours spent on the final project.
+ Data: pass with 4 lectures taken and 10 hours of the final project, but fail with 10 lectures and 3  hours.
+ Problem: Will I pass with 6 lectures taken with 4 hours for the final project ?
+ It is noted that the derivative of weights are the same as that of linear regression.
+ The only difference with the code of linear regression is the sigmoid function for forward processing.

In [49]:
import numpy as np
import random
import math

eta = 0.6  # learning rate
epoch = 8000 # iteration

### Logistic Regression Model
+ In forward processing, it uses sigmoid activation function. 
+ The CE (cross-entropy) loss function is used for loss evaluation.
+ In backward processing, delta = output - target. It is indentical to that of + linear regression although the loss and activation functions are different.
+ Refer to the course note for derivation of delta equation.

In [50]:
def sigmoid(x):
    return 1.0/(1+ np.exp(-x))

def sigmoid_derivative(x):
    return x * (1.0 - x)

# Logistic Regression Model
class LogisticRegression:
    
    def __init__(self, x, w, y):
        self.inputs  = x
        self.weights = w               
        self.target  = y
        self.output  = np.zeros(self.target.shape)

    def forward_proc(self):
       # forward processing of inputs and weights using sigmoid activation function 
        self.output = sigmoid(np.dot(self.weights, self.inputs.T))

    def backprop(self):
        # backward processing of appling the chain rule to find derivative of the mean square error function with respect to weights
        dw = (self.output - self.target) * self.inputs # same formular for both linear and logistic regression

        # update the weights with the derivative of the loss function
        self.weights -= eta * dw

    def predict(self, x):
        # predict the output for a given input x
        return (sigmoid(np.dot(self.weights, x.T)))
        
    def calculate_error(self):
        # calculate error
        error = -self.target * math.log(self.output) - (1-self.target) * math.log(1-self.output)
        return abs(error)


### SGD (Stochastic Gradient Descent) Optimization
+ Train with SGD optimization.
+ In SGD, each input data is trained separately with other input data.
+ After training, the weights are adjusted to generate the target data for the given input data.
+ Check how the loss decreases as the iterations increases.

In [51]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
# load dataset
df = pd.read_csv("titanic_data.csv") 
df.dropna(inplace=True)
# preprocess dataset by changing the string to integer, and filling

df['Sex'] = df['Sex'].map({'female':1,'male':0})
df['Age'].fillna(value=df['Age'].mean(), inplace=True) 
# initially experiment with 100 samples. For final run, use full
# dataset
df = df.iloc[:, :]
# select proper features for prediction
passengers = df[["Sex", "Age", "Pclass","Survived" ]] 
# split train and test set
train, test = train_test_split(passengers, test_size=0.2)

In [52]:
# Training 

if __name__ == "__main__":

    
    test_features = test[["Sex", "Age", "Pclass"]].values
    test_labels = test[["Survived"]].values
    scaler = MinMaxScaler()
    test_features = scaler.fit_transform(test_features) 

    target = [[1.0], [0.0]]  
              
    #weights = np.random.rand(1, 3)
    weights = np.array([3.49045165, -0.80894269, -0.73375843 ])
    weights = weights.reshape(1,3)
    print("Initial Weights:", weights)

  
    # SGD Optimization
    for i in range(epoch):
   
        if i == 0: w = weights
        concat_data=np.concatenate((train_features, train_labels), axis = 1)
        np.random.shuffle(concat_data) # shuffle the training dataset 

        X = concat_data[:, 0:3]
        y = concat_data[:, 3:4]
  
        eta *= 0.95  # decreasing learning rate is found to be not good for this case

        for j in range(len(X)): 
       
            model = LogisticRegression(X[j], w, y[j])
            model.forward_proc()   # forward processing
            model.backprop()       # backward processing
            w = model.weights 

        if (i % 1000) == 0:
             print("Loss: ", model.calculate_error())
        
    print("Output:", model.output)
    print("Adjusted Weights:", model.weights)


Initial Weights: [[ 3.49045165 -0.80894269 -0.73375843]]
Loss:  [0.04373265]
Loss:  [0.04867705]
Loss:  [0.05974846]
Loss:  [0.81240031]
Loss:  [0.45813585]
Loss:  [0.03752288]
Loss:  [0.48504396]
Loss:  [0.48504396]
Output: [0.3891805]
Adjusted Weights: [[ 3.48977239 -0.80866525 -0.72782058]]


### Testing and Prediction
+ After training, you can verify that the required target is generated for a given input data.
+ During testing phase, new input data is feeded to check the output.
+ With new input data, the output is predicted.

In [53]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report 

X = test_features
y = test_labels
w = model.weights # use the weights resulting from training
y_predic = []
for j in range(len(X)):
    model = LogisticRegression(X[j], w, y[j])
    if model.predict(X[j]) >= 0.5:
        y_predic.append(1)
    elif model.predict(X[j]) < 0.5:
        y_predic.append(0) 
results = confusion_matrix(y, y_predic)
print ('Confusion Matrix :')
print(results)
print ('Classification Report : ')
print (classification_report(y, y_predic)) 

Confusion Matrix :
[[13  0]
 [ 7 17]]
Classification Report : 
              precision    recall  f1-score   support

           0       0.65      1.00      0.79        13
           1       1.00      0.71      0.83        24

    accuracy                           0.81        37
   macro avg       0.82      0.85      0.81        37
weighted avg       0.88      0.81      0.81        37

