# Deep Learning

## Convolutional Neural Networks

Usan tres ideas básicas: "Local receptive fields", "shared weights", "pooling"

### Local receptive fields
Se piensa en los píxels en una grilla, y se conectan sub-matrices de tamaño menor a la imagen (por ejemplo: en el caso de MNIST, con imágenes de 28x28, se podrían usar regiones de 5x5), de los inputs, a cada neurona oculta.
Se mueve el "receptive field" y se conecta cada uno a una neurona de la primer capa oculta. El movimiento puede se "de a un píxel" o con diferentes "strides".

### Shared weights and bias
Se usan los mismos pesos y bias para todas las neuronas de la capa (24x24 en este caso). La salida de una neurona de la primera capa oculta quedaría:
$$\sigma \left( b + \sum_{l=0}^4 \sum_{m=0}^4 w_{l,m}a_{j+l,k+m} \right)$$
Los $w_{l,m}$ varían dentro del tamaño de un receptive field, pero son iguales entre todas las neuronas (por eso l,m van de 0 a 4; se asumió un receptive field de tamaño 5x5)
A los shared weights y shared bias, a veces se les llama "kernel" o "filtro".
Un grupo de 24x24 neuronas ocultas detecta un "feature", cada una en una zona de la imagen. Luego, se agregan varios de esos grupos, para detectar distintos "features".
La operación a veces se escribe de esta forma:
$$a^1 = \sigma(b + w * a^0)$$
(donde * representa la convolución)

### Pooling layers
Se "condensa" la información de una capa previa en un nuevo "feature map". Por ejemplo, se pueden tomar las salidas de la capa anterior en grupos de 2x2, calcular el máximo ("max-pooling") y tomar eso como un nuevo feature.
También se usa, a veces $L_2$ pooling (calculando la norma 2, en vez del máximo), u otras.


Finalmente se conecta una capa de salida, totalmente conectada.

In [1]:
import network3
from network3 import Network
from network3 import ConvPoolLayer, FullyConnectedLayer, SoftmaxLayer

training_data, validation_data, test_data = network3.load_data_shared()
mini_batch_size = 10
net = Network([FullyConnectedLayer(n_in=784, n_out=100),
               SoftmaxLayer(n_in=100, n_out=10)],
              mini_batch_size)
net.SGD(training_data,60,mini_batch_size,0.1,validation_data,test_data)

  "downsample module has been moved to the theano.tensor.signal.pool module.")


Running with a CPU.  If this is not desired, then the modify network3.py to set
the GPU flag to True.
Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 92.61%
This is the best validation accuracy to date.
The corresponding test accuracy is 92.09%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 94.70%
This is the best validation accuracy to date.
The corresponding test accuracy is 94.27%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 95.77%
This is the best validation accuracy to date.
The corresponding test accuracy is 95.25%
Training mini-batch number 15000
Training mi

Ahora voy volviendo la red más... "deeper"

In [2]:
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size,1,28,28),
                     filter_shape=(20,1,5,5),
                     poolsize=(2,2)),
        FullyConnectedLayer(n_in=20*12*12, n_out=100),
        SoftmaxLayer(n_in=100, n_out=10)],
             mini_batch_size)
net.SGD(training_data,60,mini_batch_size,0.1,validation_data,test_data)

Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 93.74%
This is the best validation accuracy to date.
The corresponding test accuracy is 93.16%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 95.57%
This is the best validation accuracy to date.
The corresponding test accuracy is 95.34%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 96.67%
This is the best validation accuracy to date.
The corresponding test accuracy is 96.33%
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-

In [3]:
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size,1,28,28),
                     filter_shape=(20,1,5,5),
                     poolsize=(2,2)),
        ConvPoolLayer(image_shape=(mini_batch_size,20,12,12),
                     filter_shape=(40,20,5,5),
                     poolsize=(2,2)),
        FullyConnectedLayer(n_in=40*4*4, n_out=100),
        SoftmaxLayer(n_in=100, n_out=10)],
             mini_batch_size)
net.SGD(training_data,60,mini_batch_size,0.1,validation_data,test_data)

Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 92.39%
This is the best validation accuracy to date.
The corresponding test accuracy is 92.33%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 96.63%
This is the best validation accuracy to date.
The corresponding test accuracy is 96.37%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 97.62%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.34%
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-

Ahora voy a usar "Rectified Linear Units", pero le bajo la cantidad de épocas a 30, para que no demore tanto...

In [4]:
from network3 import ReLU
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size,1,28,28),
                     filter_shape=(20,1,5,5),
                     poolsize=(2,2),
                     activation_fn=ReLU),
        ConvPoolLayer(image_shape=(mini_batch_size,20,12,12),
                     filter_shape=(40,20,5,5),
                     poolsize=(2,2),
                     activation_fn=ReLU),
        FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=ReLU),
        SoftmaxLayer(n_in=100, n_out=10)],
             mini_batch_size)
net.SGD(training_data,30,mini_batch_size,0.03,validation_data,test_data,lmbda=0.1)

Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 97.30%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.03%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 97.76%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.73%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 98.07%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.05%
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-

In [5]:
%run expand_mnist.py

Expanding the MNIST training set
Expanding image number 1000
Expanding image number 2000
Expanding image number 3000
Expanding image number 4000
Expanding image number 5000
Expanding image number 6000
Expanding image number 7000
Expanding image number 8000
Expanding image number 9000
Expanding image number 10000
Expanding image number 11000
Expanding image number 12000
Expanding image number 13000
Expanding image number 14000
Expanding image number 15000
Expanding image number 16000
Expanding image number 17000
Expanding image number 18000
Expanding image number 19000
Expanding image number 20000
Expanding image number 21000
Expanding image number 22000
Expanding image number 23000
Expanding image number 24000
Expanding image number 25000
Expanding image number 26000
Expanding image number 27000
Expanding image number 28000
Expanding image number 29000
Expanding image number 30000
Expanding image number 31000
Expanding image number 32000
Expanding image number 33000
Expanding image num

En esta parte hago lo mismo de antes pero con un conjunto de training "expandido" (con traslaciones de las imágenes originales)

In [8]:
expanded_training_data, _, _ = network3.load_data_shared("../data/mnist_expanded.pkl.gz")
training_data, validation_data, test_data = network3.load_data_shared()

net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size,1,28,28),
                     filter_shape=(20,1,5,5),
                     poolsize=(2,2),
                     activation_fn=ReLU),
        ConvPoolLayer(image_shape=(mini_batch_size,20,12,12),
                     filter_shape=(40,20,5,5),
                     poolsize=(2,2),
                     activation_fn=ReLU),
        FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=ReLU),
        FullyConnectedLayer(n_in=100, n_out=100, activation_fn=ReLU),
        SoftmaxLayer(n_in=100, n_out=10)],
             mini_batch_size)
net.SGD(expanded_training_data,60,mini_batch_size,0.03,validation_data,test_data,lmbda=0.1)

Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-batch number 19000
Training mini-batch number 20000
Training mini-batch number 21000
Training mini-batch number 22000
Training mini-batch number 23000
Training mini-batch number 24000
Epoch 0: validation accuracy 98.81%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.93%
Training mini-batch number 25000
Training mini-batch number 2600

KeyboardInterrupt: 

Por último, intento usando la técnica de "dropout":

In [None]:
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size,1,28,28),
                     filter_shape=(20,1,5,5),
                     poolsize=(2,2),
                     activation_fn=ReLU),
        ConvPoolLayer(image_shape=(mini_batch_size,20,12,12),
                     filter_shape=(40,20,5,5),
                     poolsize=(2,2),
                     activation_fn=ReLU),
        FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=ReLU, p_dropout=0.5),
        FullyConnectedLayer(n_in=100, n_out=100, activation_fn=ReLU, p_dropout=0.5),
        SoftmaxLayer(n_in=100, n_out=10,p_dropout=0.5)],
             mini_batch_size)
net.SGD(expanded_training_data,60,mini_batch_size,0.03,validation_data,test_data,lmbda=0.1)

### Otras ideas:

Recurrent neural networks, Boltzmann machines, generative models, transfer learning, reinforcement learning, etc...
- Long short-term memory units
- Deep belief nets, generative models, and Boltzman machines
www.scholarpedia.org/article/Deep_belief_networks
www.cs.toronto.edu/~hinton/absps/guideTR.pdf

Neural networks, learning to play videogames: www.cs.toronto.edu/~vminh/docs/dqn.pdf
www.nature.com/nature/journal/v518/n7540/abs/nature14236.html


### El futuro de las redes neuronales

- Intention driven interfaces
- Machine Learning, data science, and the virtuous circle of innovation