In [1]:
import numpy as np
import pandas as pd
X_train = np.load('../datasets/processed/X_train.npy')
y_train = np.load('../datasets/processed/y_train.npy')
X_test = np.load('../datasets/processed/X_test.npy')
y_test = np.load('../datasets/processed/y_test.npy')

In [2]:
X_train.shape, y_train.shape, X_test.shape, X_test.shape

((2194, 15), (2194, 1), (1464, 15), (1464, 15))

| Index        | $X_{1}$       | $X_{2}$       | $X_{3}$       | .... | .... | $X_{n}$        | y        |
|--------------|---------------|---------------|---------------|------|------|----------------|----------|
| 1            | $x_{1}^{1} $  | $x_{2}^{1}$   | $x_{3}^{1}$   | ...  | ...  | $x_{n}^{1}$    | $y^{1}$  |
| 2            | $x_{1}^{2}$   | $x_{2}^{2}$   | $x_{3}^{2}$   | ...  | ...  | $x_{n}^{2}$    | $y^{2}$  |
| 3            | $x_{1}^{3}$   | $x_{2}^{3}$   | $x_{3}^{3}$   | ...  | ...  | $x_{n}^{3}$    | $y^{3}$  |
| .            | .             | .             | .             | ...  | ...  | .              |          |
| .            | .             | .             | .             | ...  | ...  | .              |          |
| .            | .             | .             | .             | ...  | ...  | .              |          |
| m            | $x_{1}^{m}$   | $x_{2}^{m}$   | $x_{3}^{m}$   | ...  | ...  | $x_{n}^{m}$    | $y^{m}$  |

Suppose the weights of matrix for n features is denoted by column vectors of shape [1, n]
<br/>
$$ \beta = \begin{bmatrix} \beta_{1} & \beta_{2} & \beta_{3} & .... &  \beta_{n} \end{bmatrix} $$

\begin{equation}
z = \beta_{0} + \beta_{1}x_{1}^{1} +  \beta_{2}x_{2}^{1}  +  \beta_{3}x_{3}^{1}  +  \beta_{3}x_{3}^{1} + .... + \beta_{n}x_{n}^{1}
\end{equation}

\begin{equation}
z = \beta_{0} + \beta_{1}x_{1}^{2} +  \beta_{2}x_{2}^{2}  +  \beta_{3}x_{3}^{2}  +  \beta_{3}x_{3}^{2} + .... + \beta_{n}x_{n}^{2}
\end{equation}


\begin{equation}
z =  \beta_{0} +\beta_{1}x_{1}^{3} +  \beta_{2}x_{2}^{3}  +  \beta_{3}x_{3}^{3}  +  \beta_{3}x_{3}^{3} + .... + \beta_{n}x_{n}^{3}
\end{equation}

\begin{equation}
 ..................
\end{equation}

\begin{equation}
 ..................
\end{equation}

\begin{equation}
z =  \beta_{0} +\beta_{1}x_{1}^{m} +  \beta_{2}x_{2}^{m}  +  \beta_{3}x_{3}^{m}  +  \beta_{3}x_{3}^{m} + .... + \beta_{n}x_{n}^{m}
\end{equation}

\begin{equation}
\begin{bmatrix} z^{1} \\ z^{2} \\ z^{3} \\ .. \\.. \\  z^{m} \end{bmatrix} = 
\begin{bmatrix} x_{1}^{1} & x_{2}^{1} & x_{3}^{1} & .... &  x_{n}^{1}
           \\ x_{1}^{2} & x_{2}^{2} & x_{3}^{2} & .... &  x_{n}^{2}
           \\ x_{1}^{3} & x_{2}^{3} & x_{3}^{3} & .... &  x_{n}^{3}
           \\.... &.... & ... &.... &....
           \\.... &.... & ... &.... &....
            \\ x_{1}^{m} & x_{2}^{m} & x_{3}^{m} & .... &  x_{n}^{m}
\end{bmatrix}
*
\begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ .. \\.. \\  \beta_{n} \end{bmatrix}
+ \beta_{0}
\end{equation}

\begin{equation} z = X.\beta^{T} + \beta_{0} \end{equation}

 `z = np.dot(X, B.T) + b`  
Where 
 $$ B = \beta $$
 $$ b = \beta_{0}$$

$$ \hat{y} = \sigma({z}) $$
Therefore $$ \hat{y} = \sigma({z}) = \frac{1}{1 + e^{-(X\beta^{T} + \beta_{0})}} $$

Equivalent numpy implementation is:
    `y_hat = sigmoid(z)`

### The cross entropy cost function is
$$ J(\beta, \beta_{0}) = \frac{1}{m}\sum_{i=1}^{m}-y^{(i)})log\hat{y}^{(i)} - (1-y^{(i)})log(1-\hat{y}^{(i)}) $$

Equivalent numpy implementation is: <br>
`cost = -np.sum((y*log(y_hat) + (1-y)*(1-y_hat))`

The gradient descent is

$$ \frac{\partial J(\beta, \beta_{0})}{\partial z} = \frac{\partial J}{\partial \hat{y}} * \frac{\partial \hat{y}}{\partial z} $$

$$ \frac{\partial \hat{y}}{\partial z} = \frac{\partial}{\partial z}* \sigma({z}) $$

where $$ \hat{y} = \sigma({z}) = \frac{1}{1 + e^{-(X\beta^{T} + \beta_{0})}} $$

 $$ \frac{\partial \hat{y}}{\partial z} =  \sigma{(z)}*(1 - \sigma{(z)}) $$

Therefore, $$ \frac{\partial \hat{y}}{\partial z} = \hat{y}*(1 - \hat{y}) $$

 $$ \frac{\partial J}{\partial \hat{y}} =  \frac{\partial \frac{1}{m}\sum_{i=1}^{m}-y^{(i)}log\hat{y}^{(i)} - (1-y^{(i)})log(1-\hat{y}^{(i)})}{\partial \hat{y}} $$

 $$ \frac{\partial J}{\partial \hat{y}} =  \frac{1}{m}(\frac{-y}{\hat{y}} + \frac{1-y}{1-\hat{y}}) $$
 
 $$ \frac{\partial J}{\partial \hat{y}} =  \frac{1}{m}(\frac{-y + y\hat{y} + \hat{y} -y\hat{y}}{\hat{y}(1-\hat{y})}) $$
  
 $$ \frac{\partial J}{\partial \hat{y}} =  \frac{1}{m}(\frac{-y + \hat{y} }{\hat{y}(1-\hat{y})}) $$

$$ \frac{\partial J(\beta, \beta_{0})}{\partial z} = \frac{\partial J}{\partial \hat{y}} * \frac{\partial \hat{y}}{\partial z} $$

$$ \frac{\partial J(\beta, \beta_{0})}{\partial z} =  \frac{1}{m}(\frac{-y + \hat{y} }{\hat{y}*(1-\hat{y})}) \hat{y}*(1 - \hat{y}) $$

Hence: $$ \frac{\partial J}{\partial \hat{z}} =\frac{1}{m} (\hat{y} - y)  = \frac{1}{m}\partial z$$

Hence: $$ \frac{\partial J}{\partial \beta} = \frac{\partial J}{\partial z} *  \frac{\partial z}{\partial \beta} $$

Hence: $$ \frac{\partial J}{\partial \beta} = (\hat{y} - y) *  \frac{\partial (X\beta^{T} + \beta_{0})}{\partial \beta} $$

Hence: $$ \frac{\partial J}{\partial \beta} = (\hat{y} - y) * X* \frac{\partial \beta^{T} }{\partial \beta} $$
Hence: $$ \frac{\partial J}{\partial \beta} = (\hat{y} - y) * X $$

For more info on calculation of $$ \frac{\partial \beta^{T} }{\partial \beta} = I_{n*n} $$ See multivariate regression

Shape of $\partial z$ is `1*m` and shape of X is `m*n` 
Required shape of $\partial \beta$ is `[1, n]`

So $\partial \beta$ = $\frac{1}{m}\partial z^{T} * X $

Equivalent numpy implementation is: `dB = (1/m) * np.dot((y - y_hat).T, X)`

Similarly $$ \frac{\partial J}{\partial \beta_{0}} = \frac{\partial J}{\partial z} * \frac{\partial z}{\partial \beta_{0}} $$
 $$ \frac{\partial J}{\partial \beta_{0}} = \frac{\partial J}{\partial z} * \frac{\partial (X\beta^{T} + \beta_{0}))}{\partial \beta_{0}} $$
 $$ \frac{\partial J}{\partial \beta_{0}} = \frac{1}{m}\partial z = \frac{1}{m}(\hat{y} - y) $$
 
 Equivalent numpy implementation is: `db = (1/m)*np.sum(y_hat - y)`

In [3]:
import numpy as np
def sigmoid(x):
    return 1/(1 + np.exp(-x))

def propagate(B, b, X, y):
    m = X.shape[0]
#     print(m)
#     print(y.shape[0])
    assert(m == y.shape[0])
    
    z = np.dot(X, B.T) + b
    assert(z.shape == y.shape)
#     eplison = 1e-6
    y_hat = sigmoid(z) 
    df = pd.DataFrame(data = {'y':list(y), 
                              'z': list(z),
                              'y_hat': list(y_hat),
                              'log(y_hat)': list(np.log(y_hat)), 
                              '1-y':list((1-y)), 
                              'log(1-y_hat)': list(np.log(1-y_hat))},
                                index = np.arange(2194)
                     )
#     print(df)
    cost = (-1/m) * np.sum(y*np.log(y_hat) + (1-y)*np.log(1-y_hat))
    
    dB = (1/m) * np.dot((y_hat - y).T, X)
#     print(f"dB is here: {dB}")
    assert(dB.shape[1] == B.shape[1])
    
    db = (1/m) * np.sum(y_hat - y)
    grads = {"dB": dB,
             "db": db}
    
    return grads, cost, df


def optimize(B, b, X, Y, num_iterations, learning_rate):
    costs = []
    dfs = []
    
    for i in range(num_iterations):
        grads, cost, df = propagate(B,b,X,Y)
        
        # Retrieve derivatives from grads
        dB = grads["dB"]
        db = grads["db"]
        
        # update parameters
        B = B - learning_rate * dB
        b = b - learning_rate * db
        
        costs.append(cost)
        dfs.append(df)
    
    params = {"B": B,
              "b": b}
    
    grads = {"dB": dB,
             "db": db}
    
    return params, grads, costs, dfs

def predict(B, b, X):
    m = X.shape[0]
    Y_prediction = np.zeros((m,1))
    B = B.reshape(1, X.shape[1])
    y_hat = sigmoid(np.dot(X, B.T) + b)
    for i in range( y_hat.shape[0] ):

        # Convert probabilities to actual predictions p
        if y_hat[i, 0] >= 0.5 :
            Y_prediction[i,0] = 1
        else:
            Y_prediction[i,0] = 0
    
    assert(Y_prediction.shape == (m, 1))
    
    return Y_prediction

def model(X_train, Y_train, X_test, Y_test, num_iterations = 10000, learning_rate = 0.1):
    # initialize parameters with zeros 
    B = np.zeros(shape=(1, X_train.shape[1]))
#     B = np.random.randn(1, X_train.shape[1])
    b = 0
    
    # Gradient descent
    parameters, grads, costs, dfs = optimize(B, b, X_train, Y_train, num_iterations, learning_rate)
    
    # Retrieve parameters B and b from dictionary "parameters"
    B = parameters["B"]
    b = parameters["b"]
    
    # Predict test/train set examples
    Y_prediction_test = predict(B, b, X_test)
    Y_prediction_train = predict(B, b, X_train)
    
    Y_train = Y_train.reshape(Y_train.shape[0], 1)
    Y_test = Y_test.reshape(Y_test.shape[0], 1)

   # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

#     print(df)
    d = {"costs": costs,
         "dfs": dfs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "B" : B, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

In [4]:
# X = np.array(
# [
#     [1, 3, 50],
#     [3, 4, 40],
#     [2, 0, 70],
#     [2, 6, 20],
# ]
# )

# y = np.array(
# [
#     [1],
#     [0],
#     [0],
#     [1],
# ]
# )

In [5]:
d = model(X_train, y_train, X_test, y_test)

train accuracy: 85.0045578851413 %
test accuracy: 85.45081967213115 %


In [6]:
# df.head(10)

In [7]:
d['costs']

[0.6931471805599453,
 0.6813101257600014,
 0.6700512168708936,
 0.6593415784765921,
 0.6491535221786802,
 0.6394605466929694,
 0.6302373290447799,
 0.6214597082836344,
 0.6131046630117196,
 0.605150283891984,
 0.597575742174911,
 0.590361255160697,
 0.5834880493977814,
 0.5769383223107609,
 0.5706952028515018,
 0.5647427116770513,
 0.5590657212768262,
 0.553649916399243,
 0.5484817550640908,
 0.5435484303909839,
 0.5388378334255834,
 0.5343385171033087,
 0.5300396614543361,
 0.5259310401231604,
 0.5220029882502847,
 0.5182463717421268,
 0.5146525579374713,
 0.5112133876642475,
 0.5079211486686772,
 0.5047685503894719,
 0.501748700042463,
 0.49885507997547474,
 0.49608152624914736,
 0.4934222083965443,
 0.49087161031251875,
 0.48842451222280775,
 0.4860759736824934,
 0.4838213175537097,
 0.48165611491315147,
 0.4795761708409744,
 0.4775775110439772,
 0.4756563692674613,
 0.47380917545181994,
 0.4720325445916617,
 0.4703232662570922,
 0.4686782947386274,
 0.46709473977907123,
 0.46556985

In [8]:
d['dfs'][0]

Unnamed: 0,y,z,y_hat,log(y_hat),1-y,log(1-y_hat)
0,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]
1,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]
2,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]
3,[1],[0.0],[0.5],[-0.6931471805599453],[0],[-0.6931471805599453]
4,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]
...,...,...,...,...,...,...
2189,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]
2190,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]
2191,[1],[0.0],[0.5],[-0.6931471805599453],[0],[-0.6931471805599453]
2192,[0],[0.0],[0.5],[-0.6931471805599453],[1],[-0.6931471805599453]


In [9]:
d['dfs'][999]

Unnamed: 0,y,z,y_hat,log(y_hat),1-y,log(1-y_hat)
0,[0],[-1.783092923987826],[0.14392164064356],[-1.9384862897136181],[1],[-0.15539336572579207]
1,[0],[-2.8812465803637153],[0.05308843581304137],[-2.9357961557506904],[1],[-0.0545495753869752]
2,[0],[-1.2760570103582634],[0.21822215510508702],[-1.5222416749672394],[1],[-0.24618466460897603]
3,[1],[-1.0393533982545353],[0.26127477542574434],[-1.342182646034063],[0],[-0.3028292477795276]
4,[0],[-0.6635576999434676],[0.33994087701235115],[-1.078983567632891],[1],[-0.41542586768942336]
...,...,...,...,...,...,...
2189,[0],[-0.7455466615350539],[0.3217924342861978],[-1.133848555308692],[1],[-0.38830189377363833]
2190,[0],[-2.323453747956905],[0.08919906949547399],[-2.4168846711156733],[1],[-0.09343092315876822]
2191,[1],[-1.3118167428565954],[0.21218299609821306],[-1.5503061875011106],[0],[-0.23848944464451519]
2192,[0],[-2.0113985851543488],[0.11801132876758155],[-2.1369746526198736],[1],[-0.12557606746552472]


In [10]:
d['B']

array([[ 0.2483024 ,  2.57601796, -0.36588396,  0.2782662 ,  0.8285657 ,
         0.05020575,  0.12720942,  0.04821677,  0.35271368,  0.95616128,
         1.78315311,  0.9866257 ,  0.31808388, -0.13958718,  1.05171733]])

In [11]:
d['b']

-1.9259452002895203