Given $\{(x^{(1)}, y^{(1)}), ..., (x^{(n)}, y^{(n)})\}$, the mathmatical expression of logistic regression:

For one example $x^{(i)}$ (define $\sigma(z^{(i)}) = \frac{1}{1 + e^{-z^{(i)}}}$):
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = \sigma(z^{(i)}) \tag{2}$$
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)}) \tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{4}$$

Now we derive the derivatives for gradient decent:

$$da = -\frac{y}{a} +\frac{1-y}{1-a}$$
$$dz = da*a*(1-a) = a - y$$
$$dw = dz * x$$
$$db = dz$$

For vectorization:

$$Z = W^TX + b$$
$$A = \sigma(Z) $$

$$dA = -\frac{Y}{A} + \frac{1-Y}{1-A} $$
$$dZ = A - Y $$
$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$$

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
%matplotlib inline

path = '.\data\ex2data1.txt'
data = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])
data.head()

data = data.values

In [10]:
data_X = data[:, 0:-1]
data_Y = data[:, [-1]]

print(data_X.shape)
print(data_Y.shape)

(100, 2)
(100, 1)


In [11]:
X_train, X_test, Y_train, Y_test = train_test_split(data_X, data_Y, random_state = 0)

print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

(75, 2)
(25, 2)
(75, 1)
(25, 1)


In [12]:
def sigmoid(z):
    """
    Arguements:
    z -- numpy array (1, number of observations) or any size
    
    Return:
    sigmoid(z)
    """
    return 1 / (1 + np.exp(-z))

In [13]:
def initialize_parameters(m):
    """
    Initialize weight vector w and bia b
    Arguments:
    m: number of features
    
    Returns:
    w -- (m, 1)
    b -- scalar
    """
    
    w = np.random.randn(m, 2)
    b = 0
    
    return w, b

In [14]:
def propagate(w, b, X, Y):
    """
    Arguments:
    w -- weights (m, 1)
    b -- scalar
    X -- data matrix (m, n)
    Y -- label (1, n)
    Return:
    cost -- cross entropy cost
    dw -- (m, 1)
    db -- scalar
    """
    
    n = X.shape[1]
    
    # Forward propagation
    A = sigmoid(np.dot(w.T, X) + b)
    
    cost = -( Y*np.log(A) + (1-Y) * np.log(1 - A) ).sum() / n
    
    # Backward propagation
    dw = np.dot(X, (A - Y).T)/n
    db = (A - Y).sum()/n
    
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost
    

In [23]:
def gradient_decent(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    
    costs = []
    
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        
        dw = grads["dw"]
        db = grads["db"]

        w = w - learning_rate * dw
        b = b - learning_rate * db
        
        if i % 100 == 0:
            costs.append(cost)

        # Print the cost every 100 training iterations
        if print_cost and i % 10 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

In [48]:
w, b = initialize_parameters(X_train.shape[1])
params, grads, costs = gradient_decent(w, b, X_train.T, Y_train.T, num_iterations= 200, learning_rate = 0.009, print_cost = True)

Cost after iteration 0: nan
Cost after iteration 10: nan
Cost after iteration 20: 12.261729
Cost after iteration 30: nan
Cost after iteration 40: nan
Cost after iteration 50: nan
Cost after iteration 60: 10.671490
Cost after iteration 70: nan
Cost after iteration 80: 11.944546
Cost after iteration 90: 11.899539
Cost after iteration 100: nan
Cost after iteration 110: 13.596947
Cost after iteration 120: 4.299326
Cost after iteration 130: nan
Cost after iteration 140: 9.731182
Cost after iteration 150: 6.596018
Cost after iteration 160: nan
Cost after iteration 170: 10.912962
Cost after iteration 180: 10.115295
Cost after iteration 190: nan


