### Training Artificial Neural Networks for image recognition

This is chapter 12 of the Python Machine Learning book written by Sebastian Raschka. There is no added value here: I just took the code, played around and enriched it with the texts and explanation from the book to have both code and explanation in one notebook Sometimes there are added code parts when I play around with new features from the code- purely for my own practise.

- getting a conceptual understanding of multi-layer neural networks + training
- implementing the powerful backward propagation algorithm
- debugging NN's

In [1]:
from IPython.display import Image
%matplotlib inline

**artificial neurons** represent the building blocks of the multi-layer artifical Neural Network (NN). The backpropagation algorithm to train NN's resparked interest in the NNs.
- Recap of the Adaline (**ADAptive LInear NEuron**), a single-layer NN that uses a **gradient descent** optimization algorithm to learn the weight coefficients of the model. In every **epoch** (pass over the training set), we updated the weight vector (W). We computed the gradient based on the whole training set and updated the weights by taking a step in the opposite direction of the gradient. The optimal weights are found by optimizing an objective function that we defined as the SSE cost function. Furthermore, we multiplied the gradient by a factor, the **learning rate**, which we chose to balance the speed of learning against the risk of overshooting, the global minimum of the cost function.

### Multi-layer NN architecture

How to connect multiple single neurons to a multi-layer feedforward NN, also called **Multi Layer Perceptron** consisting of an input layer, a hidden layer and an output layer. We can think of the number of layers and units in an NN as additional hyperparameters that can be tuned using cv.. However, the error gradients that can be calculated via **backpropagation** would become increasingly small as more layers are added to the network. This **vanishing gradient problem** makes the model learning more complex: that`s why special algorithms can pretrain such deep NN structures, called **deep learning**.

**Activating a NN via forward propagation**
- starting at the input layer, we forward propagate the patterns of the training data throught he network to generate an output.
- Based on the network's output, we calculate the error that we want to minimize using a cost function
- We **backpropagate the error**, find its derivative with respect to each weight in the network and update the model.

finally after repating the steps for multiple epochs and learning the weights, we use forward propagation to calculate the network output and apply a **threshold function** to obtain the predicted class labels in the one-hot representation.

To be able to solve complex problems such as image classification, we need nonlinear activation functions in our MLP model, fi, the **sigmoid (logistic) activation function**. Using this, the artificial neurons in this MLP are then sigmoid units/logistic regression units that return values in the continuous range between 0 and 1. 

For readability purposes it makes sense to write the activation in a more compact form, allowing for vectorization in Numpy.

Obtaining the MNIST dataset

The MNIST dataset is publicly available at http://yann.lecun.com/exdb/mnist/ and consists of the following four parts:
- Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB unzipped, 60,000 samples)
- Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB unzipped, 60,000 labels)
- Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB, 10,000 samples)
- Test set labels: t10k-labels-idx1-ubyte.gz (5 KB, 10 KB unzipped, 10,000 labels)

In [3]:
os.chdir('C:\\Users\\Schiphol\\Documents\\data\\MINST')

In [4]:
os.getcwd()

'C:\\Users\\Schiphol\\Documents\\data\\MINST'

In [18]:
path = 'C:\\Users\\Schiphol\\Documents\\data\\MINST\\mnist'

In [19]:
import struct
import os
import numpy as np

In [20]:
kind ='train'
os.path.join(path, '%s-labels-idx1-ubyte' % kind)
os.path.join(path, '%s-images-idx3-ubyte' % kind)

'C:\\Users\\Schiphol\\Documents\\data\\MINST\\mnist\\train-images-idx3-ubyte'

In [21]:
import os
import struct
import numpy as np
import matplotlib.pyplot as plt

def load_mnist(path, kind='train'):
    """ Load MNIST data from `path` (`train` is default, otherwise `t10k`) """

    labels_path = os.path.join(path, '%s-labels-idx1-ubyte' % kind)
    images_path = os.path.join(path, '%s-images-idx3-ubyte' % kind)

    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II', lbpath.read(8))
        labels = np.fromfile(lbpath, dtype=np.uint8)

    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack('>IIII', imgpath.read(16))
        images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(labels), 784)

    return images, labels

def plot_image(*imgs):
    if len(imgs) == 1:
        image = plt.imshow(imgs[0].reshape(28,28), cmap='Greys', interpolation='nearest')
        plt.tight_layout()
        plt.show()
    else:
        fig, ax = plt.subplots(nrows=1, ncols=len(imgs), sharex=True, sharey=True)
        ax = ax.flatten()
        for i in range(len(imgs)):
            ax[i].imshow(imgs[i].reshape(28,28), cmap='Greys', interpolation='nearest')
        ax[0].set_xticks([])
        ax[0].set_yticks([])
        plt.tight_layout()
        plt.show()

In [23]:
X_train, y_train = load_mnist(path, kind='train')
print('Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1]))

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Schiphol\\Documents\\data\\MINST\\mnist\\train-labels-idx1-ubyte'

In [15]:
import os, struct
from array import array as pyarray
from numpy import append, array, int8, uint8, zeros

def load_mnist(dataset="training", digits=np.arange(10), path="."):
    """
    Loads MNIST files into 3D numpy arrays

    Adapted from: http://abel.ee.ucla.edu/cvxopt/_downloads/mnist.py
    """

    if dataset == "training":
        fname_img = os.path.join(path, 'train-images-idx3-ubyte')
        fname_lbl = os.path.join(path, 'train-labels-idx1-ubyte')
    elif dataset == "testing":
        fname_img = os.path.join(path, 't10k-images-idx3-ubyte')
        fname_lbl = os.path.join(path, 't10k-labels-idx1-ubyte')
    else:
        raise ValueError("dataset must be 'testing' or 'training'")

    flbl = open(fname_lbl, 'rb')
    magic_nr, size = struct.unpack(">II", flbl.read(8))
    lbl = pyarray("b", flbl.read())
    flbl.close()

    fimg = open(fname_img, 'rb')
    magic_nr, size, rows, cols = struct.unpack(">IIII", fimg.read(16))
    img = pyarray("B", fimg.read())
    fimg.close()

    ind = [ k for k in range(size) if lbl[k] in digits ]
    N = len(ind)

    images = zeros((N, rows, cols), dtype=uint8)
    labels = zeros((N, 1), dtype=int8)
    for i in range(len(ind)):
        images[i] = array(img[ ind[i]*rows*cols : (ind[i]+1)*rows*cols ]).reshape((rows, cols))
        labels[i] = lbl[ind[i]]

    return images, labels

In [12]:
from pylab import *
from numpy import *
images, labels = load_mnist('training', digits=[2])
imshow(images.mean(axis=0), cmap=cm.gray)
show()

PermissionError: [Errno 13] Permission denied: '.\\train-labels-idx1-ubyte'