# jupyter notebook related to __[coursera ML course](https://www.coursera.org/learn/machine-learning/)__

The course uses MATLAB, and so the main task here is
translating the MATLAB code for the various functions
(in this case neural network learning) to a python
version.

I'm adding a new version of the notebook for every
week, as a convenience for myself to see the progress,
rather than having to go through the history on github.

### Settings (put at the top for ease of changing)

In [1]:
# network configuration
layer1 = [25, 401]
layer2 = [10, 26]
layers = (layer1, layer2)

In [2]:
# data from binary files
import numpy as np
X = np.fromfile('mnist_5000_20_20.bin', dtype=np.uint8, count=5000*20*20).reshape((5000, 400)).astype(np.float64)
y = np.fromfile('mnist_5000_20_20_lab.bin', dtype=np.uint8, count=5000) - 1
Y = np.zeros([y.size, 10])
for c in range(10):
    Y[y==c, c] = 1.0

In [3]:
# sigmoid function
def sigmoid(z):
    return np.true_divide(1.0, 1.0 + np.exp(-z))

In [4]:
# define a cost function for the network
def networkcost(theta, X, Y, l, shapes):
    
    # copy out thetas
    thetas = []
    ti = 0
    for s in shapes:
        thetas.append(theta[ti:ti+np.prod(s)].reshape(s))
        ti += np.prod(s)
    
    # transpose Y if necessary
    if thetas[-1].shape[0] == Y.shape[1]:
        Y = Y.T
    
    # number of samples
    m = Y.shape[1]
    
    # prepare arrays
    Xt = []
    d = []
    grad = []
    
    # transpose X if necessary
    if X.shape[0] == m:
        Xt.append(X.T)
    else:
        Xt.append(X)
    
    # sum over theta squared
    tss = 0.0
    
    # forward prop
    for tc in range(len(thetas)):
        
        # add bias units
        if Xt[tc].shape[0] < thetas[tc].shape[1]:
            Xt[tc] = np.concatenate((np.ones((1, Xt[tc].shape[1])), Xt[tc]))
        Xt.append(sigmoid(np.matmul(thetas[tc], Xt[tc])))
        
        # square theta
        tsq = np.multiply(thetas[tc], thetas[tc])
        tss += l * np.sum(tsq)
    
    # output layer
    p = Xt[-1]
    
    # sanity replacements
    p[p==0] = 2.3e-16
    p[p==1] = 1.0 - 2.3e16
    
    # first delta
    d = [p - Y]
    
    # cost
    J = (-1.0 / m) * np.sum(np.multiply(Y, np.log(p)) + np.multiply(1.0 - Y, np.log(1.0 - p))) + (0.5 / m) * tss
    
    grad = np.zeros(theta.size)
    return J, grad, p

In [5]:
# random theta
theta = 0.5 - np.random.rand(np.prod(layer1) + np.prod(layer2))

In [6]:
J, g, p = networkcost(theta, X, Y, 1.0e-4, layers)
J

7.4664772037234135