<a href="https://colab.research.google.com/github/sebastianoscarlopez/learning-deep-learning/blob/master/digits_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Description

Digits recognition it is called the Hello World of Neural Network.

Made using Python, numpy applied to MNIST dataset.

It example is based on http://neuralnetworksanddeeplearning.com/chap1.html

# Preparation

## Base libraries

In [0]:
import numpy as np
import matplotlib.pyplot as plt

## Load data

**mnist_loader**

A library to load the MNIST image data

data stored on training_data (50%),  validation_data (10%) and test_data (40%) 

In [28]:
from keras.datasets import mnis
(x_train, y_train), (x_test, y_test) = mnist.load_data()
training_data = (x_train, y_train)
test_data = (x_test, y_test)

Using TensorFlow backend.


ImportError: ignored

# Network design

Class wich made the network. The **sizes** parameter in its constructors is an array with the number of neurons on each layer

In [0]:
class Network(object):
  def __init__(self, sizes):
      self.num_layers = len(sizes)
      self.sizes = sizes
      self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
      self.weights = [np.random.randn(y, x) 
                      for x, y in zip(sizes[:-1], sizes[1:])]

Trainning using Stochastic Gradient Descent.
    
training_data" is a list of tuples "(x, y)" representing the training inputs and the desired outputs.

epochs total of trainning.

mini_batch_size indicated the group of data to be trainning together

eta is the learning rate

test_data is provided then the network will be evaluated against the test data after each epoch, and partial progress printed out.  This is useful for tracking progress, but slows things down substantially.

In [0]:
  def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
    if test_data: n_test = len(test_data)
    n = len(training_data)
    for j in xrange(epochs):
        random.shuffle(training_data)
        mini_batches = [
            training_data[k:k+mini_batch_size]
            for k in xrange(0, n, mini_batch_size)]
        for mini_batch in mini_batches:
            self.update_mini_batch(mini_batch, eta)
        if test_data:
            print("Epoch {0}: {1} / {2}".format(j, self.evaluate(test_data), n_test))
        else:
            print("Epoch {0} complete".format(j))

Given the input $x$ returns the output vector

In [0]:
  def feedforward(self, x):
    for b, w in zip(self.biases, self.weights):
        x = sigmoid(np.dot(w, x)+b)
    return x

Update the network's weights and biases by applying gradient descent using backpropagation to a single mini batch

In [0]:
  def update_mini_batch(self, mini_batch, eta):
    nabla_b = [np.zeros(b.shape) for b in self.biases]
    nabla_w = [np.zeros(w.shape) for w in self.weights]
    for x, y in mini_batch:
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
    self.weights = [w-(eta/len(mini_batch))*nw 
                    for w, nw in zip(self.weights, nabla_w)]
    self.biases = [b-(eta/len(mini_batch))*nb 
                    for b, nb in zip(self.biases, nabla_b)]

Return a tuple $\nabla a, \nabla b$ representing the gradient for the cost function $C_x$

In [0]:
    def backprop(self, x, y):
      nabla_b = [np.zeros(b.shape) for b in self.biases]
      nabla_w = [np.zeros(w.shape) for w in self.weights]
      # feedforward
      activation = x
      activations = [x] # list to store all the activations, layer by layer
      zs = [] # list to store all the z vectors, layer by layer
      for b, w in zip(self.biases, self.weights):
          z = np.dot(w, activation)+b
          zs.append(z)
          activation = sigmoid(z)
          activations.append(activation)
      # backward pass
      delta = self.cost_derivative(activations[-1], y) * \
          sigmoid_prime(zs[-1])
      nabla_b[-1] = delta
      nabla_w[-1] = np.dot(delta, activations[-2].transpose())
      # Note that the variable l in the loop below is used a little
      # differently to the notation in Chapter 2 of the book.  Here,
      # l = 1 means the last layer of neurons, l = 2 is the
      # second-last layer, and so on.  It's a renumbering of the
      # scheme in the book, used here to take advantage of the fact
      # that Python can use negative indices in lists.
      for l in xrange(2, self.num_layers):
          z = zs[-l]
          sp = sigmoid_prime(z)
          delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
          nabla_b[-l] = delta
          nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
      return (nabla_b, nabla_w)

Return the number of test inputs for which the neural network outputs the correct result.

The output will to be the neuron in the final layer that has the highest activation.

In [0]:
  def evaluate(self, test_data):
    test_results = [(np.argmax(self.feedforward(x)), y)
                    for (x, y) in test_data]
    return sum(int(x == y) for (x, y) in test_results)

Return the vector of partial derivatives $\partial C_x \partial a$ for the output activations

In [0]:
  def cost_derivative(self, output_activations, y):
    return (output_activations-y)

Sigmoid neuron $\sigma(\zeta) = \frac{1}{1 + e^{-\zeta}}$ where $-\zeta = w.x + b$

In [0]:
  def sigmoid(z):
    return 1.0/(1.0+np.exp(-z))

$f'(\sigma)$

In [0]:
def sigmoid_prime(z):
    return sigmoid(z)*(1-sigmoid(z))

# Network creation

Network with 3 layers
input layer: $784 = 28 * 28$

Hidden layer: only one with 30 neurons

Output layer: 10 neurons representing 0-9

In [0]:
net = Network([784, 30, 10])

# Training

In [27]:
SGD(net, training_data, 30, 10, 3.0, test_data=test_data)

NameError: ignored