<h2 align="center"> Logistic Regression </h2>

### Task 2: Load the Data and Libraries
---

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
plt.style.use("ggplot")
%matplotlib inline

In [2]:
from pylab import rcParams
rcParams['figure.figsize'] = 12, 8

In [3]:
data = pd.read_csv('DMV_Written_Tests.csv')

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
DMV_Test_1    100 non-null float64
DMV_Test_2    100 non-null float64
Results       100 non-null int64
dtypes: float64(2), int64(1)
memory usage: 2.4 KB


In [5]:
scores = data[['DMV_Test_1', 'DMV_Test_2']].values
results = data['Results'].values

### Task 3: Visualize the Data
---

In [6]:
# passed = (results == 1).reshape(100, 1)
# failed = (results == 0).reshape(100, 1)

# ax = sns.scatterplot(data=data, x=scores,y=results)
# # ax = sns.scatterplot(x = scores[passed[:, 0], 0],
# #                        y = scores[passed[:, 0], 1],
# #                        marker = '^',
# #                        color = ' green',
# #                        s = 60)
# #     sns.scatterplot(x = scores[failed[:, 0], 0],
# #                        y = scores[failed[:, 0], 1],
# #                        marker = 'x',
# #                        color = ' red',
# #                        s = 60)
# ax.set(xlabel ='DMV Written test 1 score', ylabel = 'DMV Written test 2 score')
# ax.lagend(['passed', 'failed'])
# plt.show()

### Task 4: Define the Logistic Sigmoid Function $\sigma(z)$
---

$$ \sigma(z) = \frac{1}{1+e^{-z}}$$

In [7]:
def sigmoid(z):
    return 1/(1+np.exp(-z))

In [8]:
sigmoid(0)

0.5

### Task 5: Compute the Cost Function $J(\theta)$ and Gradient
---

The objective of logistic regression is to minimize the cost function

$$J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [ y^{(i)}log(h_{\theta}(x^{(i)})) + (1 - y^{(i)})log(1 - (h_{\theta}(x^{(i)}))]$$

where the gradient of the cost function is given by

$$ \frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})x_j^{(i)}$$

In [9]:
def cost_compute(theta,x,y):
    m = len(y)
    h = sigmoid(np.dot(x, theta))
    error = (y*np.log(h)) + (1-y)*np.log(1-h)
    cost = -1/m * np.sum(error)
    grad = 1/m * np.dot(x.transpose(), (h-y))
    return cost, grad

### Task 6: Cost and Gradient at Initialization
---

In [10]:
# Gradient before optimization process
score_mean = np.mean(scores,axis=0)
score_std = np.std(scores,axis=0)
scores = (scores - score_mean)/score_std

row = scores.shape[0]
col = scores.shape[1]

X = np.append(np.ones((row, 1)), scores, axis =1)
y = results.reshape(row, 1)

theta_init = np.zeros((col + 1, 1))
cost, grad = cost_compute(theta_init, X, y)
# score.shape

In [11]:
print('Cost at initialization', cost)
print('Gradient at initialization', grad)

Cost at initialization 0.6931471805599453
Gradient at initialization [[-0.1       ]
 [-0.28122914]
 [-0.25098615]]


In [12]:
# n = np.zeros((col+1, 1))
# print(n)

### Task 7: Gradient Descent
---

Minimize the cost function $J(\theta)$ by updating the below equation and repeat until convergence
$\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}$ (simultaneously update $\theta_j$ for all $j$)

In [15]:
def gradient_descent(theta, x, y, alpha, iteration):
    costs = []
    for i in range(iteration):
        cost, grad = cost_compute(theta, x, y)
        theta -= (alpha*grad)
        costs.append(cost)
    return theta, costs

In [20]:
theta, costs = gradient_descent(theta_init, X, y, 1, 200)


In [21]:
print('Theta after running  gradient descent', theta)
print('Resulting cost after optimization', costs[-1])

Theta after running  gradient descent [[1.71671348]
 [3.98908079]
 [3.72154954]]
Resulting cost after optimization 0.20349778840675828


### Task 8: Plotting the Convergence of $J(\theta)$
---

Plot $J(\theta)$ against the number of iterations of gradient descent:

### Task 9: Plotting the decision boundary
---

$h_\theta(x) = \sigma(z)$, where $\sigma$ is the logistic sigmoid function and $z = \theta^Tx$

When $h_\theta(x) \geq 0.5$ the model predicts class "1":

$\implies \sigma(\theta^Tx) \geq 0.5$

$\implies \theta^Tx \geq 0$ predict class "1" 

Hence, $\theta_1 + \theta_2x_2 + \theta_3x_3 = 0$ is the equation for the decision boundary, giving us 

$ x_3 = \frac{-(\theta_1+\theta_2x_2)}{\theta_3}$

### Task 10: Predictions using the optimized $\theta$ values
---

$h_\theta(x) = x\theta$