**Logistic Regression - **

**Input values (X) are combined linearly using weights or coefficient values to predict an output value (y).
**

**The output value being modeled is a binary value (0 or 1) rather than a numeric value.**

**Linear Regression Equation:**

y = β0 + β1X1 + β2X2 …. + βnXn

Where, 

**y** *stands for the dependent variable that needs to be predicted.*

**β0** *is the Y-intercept, which is basically the point on the line which touches the y-axis.*

**β1** *is the slope of the line (the slope can be negative or positive depending on the relationship between the dependent variable and the independent variable.)*

**X** *here represents the independent variable that is used to predict our resultant dependent value.*

**Sigmoid function:**

z =1 / 1 + e-y

**Apply sigmoid function on the linear regression equation.**

**Logistic Regression equation:  **

z = 1 / 1 + e-(β0 + β1X1 + β2X2 …. + βnXn)

β0 = β0 + learning_rate (y - z)  z  (1 - z)

βi = βi + learning_rate (y - z)  z  (1 - z)  X

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from random import randrange
import warnings
warnings.filterwarnings("ignore")

import os
print(os.listdir("../input"))


In [None]:
diabetes_df = pd.read_csv("../input/diabetes.csv")
diabetes_df.head()

In [None]:
print("How many null values in the dataset?:",diabetes_df.isnull().any().sum())

**This a clean dataset without any missing values. Hence without doing any preprocessing, we can build our Logistic Regression Model**

In [None]:
#Just take the values, ignoring the labels and index
diabetes_df = diabetes_df.values
diabetes_df

**Sklearn Logistic Regression**

In [None]:
X = diabetes_df[:,0:8] #Predictors
y = diabetes_df[:,8] #Target

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)

logistic_model = LogisticRegression(fit_intercept=True,C=1e15)
logistic_model.fit(X_train,y_train)
predicted = logistic_model.predict(X_test)

print("Confusion Matrix")
matrix = confusion_matrix(y_test,predicted)
print(matrix)

print("\nClassification Report")
report = classification_report(y_test,predicted)
print(report)

lr_accuracy = accuracy_score(y_test, predicted)
print('Logistic Regression Accuracy of Scikit Model: {:.2f}%'.format(lr_accuracy*100))

**Logistic Regression from Scratch**

**Setting up the data**

In [None]:
#find the mininum and maximum value of each column
def dataset_minmax(dataset):
    minmax = list()
    
    for i in range(len(dataset[0])):
        col_values = [row[i] for row in dataset]
        
        value_min = min(col_values)
        value_max = max(col_values)

        minmax.append([value_min, value_max])
    
    return minmax

#rescale the value of each column to be within 0 and 1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)):
            row[i]= (row[i]-minmax[i][0]) / (minmax[i][1]-minmax[i][0])

**Making Predictions**

In [None]:
#Predicts an output value for a row given a set of coefficients.

def predict(row, coefficients):
    z = coefficients[0]
    for i in range(len(row)-1):
        z += coefficients[i + 1] * row[i]
    return 1.0 / (1.0 + np.exp(-z))

**Estimating the coefficients / weights**

**Learning Rate(l_rate): *The amount each coefficient is corrected each time it is updated.*
**

**n_steps: *The number of times to run through the training data while updating the coefficients.***

In [None]:
# Estimate logistic regression coefficients using stochastic gradient descent

def get_coefficients(train, l_rate, n_steps):
    coef = [0.0 for i in range(len(train[0]))]
    
    for step in range(n_steps): #steps times
        sum_error = 0

        for row in train: #all rows
        
            z = predict(row, coef)
            
            error = row[-1] - z #z - row[-1]
            
            coef[0] = coef[0] + l_rate * error * z * (1.0 - z) #b0
            
            for i in range(len(row)-1): #each coefficient (b1,b2,b3....)
                coef[i+1] = coef[i+1]+l_rate*error*z*(1.0-z)*row[i]
                
    return coef

In [None]:
def evaluate_model(test,coef):
    
    predictions = []
    for r in test:
        z = round(predict(r,coef))    
        predictions.append(z)
        
    return(predictions)

**Our Own Logistic Model**

In [None]:
def logistic_regression(train,test,l_rate,n_steps):    
    
    #get the coefficients from the training set
    coef = get_coefficients(train,l_rate,n_steps)
    
    #use these to validate against the test set
    predictions = evaluate_model(test,coef)
    
    return(predictions)

Calculate the accuracy of our model

In [None]:
# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual))

In [None]:
minmax =dataset_minmax(diabetes_df)
normalize_dataset(diabetes_df, minmax)

l_rate = 0.3
n_steps = 100
n_folds = 3

train_set, test_set = train_test_split(diabetes_df, test_size=0.3)

actual = test_set[:,8]
test_set = test_set[:,0:8]

predicted = logistic_regression(train_set, test_set,l_rate,n_steps)

print("Confusion Matrix")
matrix = confusion_matrix(actual,predicted)
print(matrix)

print("\nClassification Report")
report = classification_report(actual,predicted)
print(report)

scores = accuracy_metric(actual, predicted)
print('Logistic Regression Accuracy Of Our Model: {:.2f}%'.format(scores*100))
