Tutorial 1: Classifying tiny images with a Convolutional Neural Network
======================================

Outline
------------------------
This interactive notebook shows how to do image classification with a Convnet. You can edit code in the code cells, and run it with `Shift+Return`. The notebook is read-only, so feel free to hack the code, and reload the page if something breaks. The tutorial covers how to:
* Build a small convNet in neon.
* Train it on the [Cifar10](https://www.kaggle.com/c/cifar-10) dataset. 
* Upload a new image, and classify it into one of the 10 categories.


<img src="https://kaggle2.blob.core.windows.net/competitions/kaggle/3649/media/cifar-10.png">


Setting up a model
==================
The pieces we need to set up a model are described in the [neon user guide](http://neon.nervanasys.com/docs/latest/index.html):
* The CIFAR10 dataset.
* layer configuration and a  [model](http://neon.nervanasys.com/docs/latest/models.html).
* a compute [backend](http://neon.nervanasys.com/docs/latest/backends.html).
* an [optimizer](http://neon.nervanasys.com/docs/latest/optimizers.html) to train the model.
* [callbacks](http://neon.nervanasys.com/docs/latest/callbacks.html) to keep us updated about the progress of training.

In [None]:
# We start by generating the backend:
from neon.backends import gen_backend
be = gen_backend(backend='cpu',             
                 batch_size=128)

# there is not much we can do with the backend right now, but if we
# print it, it should tell us that we have a CPU backend object
print be

Loading a dataset
-----------------
More details about loading and generating datasets in our [documentation](http://neon.nervanasys.com/docs/latest/datasets.html).

In [None]:
# The dataset is supplied in canned form, and will be downloaded 
# from the web the first time you run this. It just returns numpy
# arrays with the pixel values, and class labels. 
from neon.data import CIFAR10
cifar10 = CIFAR10()

# to put the dataset into a format neon can understand, we create
# a DataIterator instance. This moves the data onto the compute
# device (e.g. GPU) and provides an iterator that returns training
# batches. 
train_set = cifar10.train_iter
test_set = cifar10.valid_iter

Network Layers
--------------
Layer types are [documented here](http://neon.nervanasys.com/docs/latest/layers.html).
It helps to make use of iPython tab completion to see available layers (e.g. `from neon.layers import TAB`) and to read the docstrings (e.g. using `Dataiterator? shift+return` syntax).

Layer types included in neon:
* Convolution
* Bias
* Activation
* Pooling
* Batch Normalization

And for commonly used combinations neon provides shortcuts:
* Conv = Convolution + Bias + Activation
* Affine = Linear + Bias + Activation

for this network, we are going to use one **Conv**, one **Pooling** and one **Affine** layer. 

In [None]:
# Now we create a model by assembling some layers
from neon.layers import Conv, Affine, Pooling
from neon.initializers import Uniform
from neon.transforms.activation import Rectlin, Softmax
init_uni = Uniform(low=-0.1, high=0.1)
layers = [Conv(fshape=(5,5,16), init=init_uni, activation=Rectlin()),
          Pooling(fshape=2, strides=2),
          Conv(fshape=(5,5,32), init=init_uni, activation=Rectlin()),
          Pooling(fshape=2, strides=2),
          Affine(nout=500, init=init_uni, activation=Rectlin()),
          Affine(nout=10, init=init_uni, activation=Softmax())]

# set up model
from neon.models import Model
model = Model(layers)



Cost function
--------------
Next we need a cost function to evaluate the output of the network. The cost function compares network outputs with ground truth labels, and produces and error that we can backpropagate through the layers of the network.

For our binary classification task, we use a cross entropy cost function.

In [None]:
# setting up the cost function
from neon.layers import GeneralizedCost
from neon.transforms import CrossEntropyMulti
cost = GeneralizedCost(costfunc=CrossEntropyMulti())

Optimizer
---------
We now have a cost function that we want to minimize, typically by following 
the negative gradient of the cost. This is called gradient descent. We do this
iteratively over small batches of the data set, making it stochastic gradient 
decesent (SGD). There are other [optimizers](http://neon.nervanasys.com/docs/latest/optimizers.html) such as
* RMSProp
* AdaDelta

that are supported in neon, but often simple gradient descent works well.

In [None]:
# set up optimizer
from neon.optimizers import GradientDescentMomentum, RMSProp
optimizer = GradientDescentMomentum(learning_rate=0.005, 
                                    momentum_coef=0.9)
#optimizer = RMSProp()

Callbacks
---------
To provide feedback while the model is training, neon lets the user specify a set of callbacks that get evaluated at the end of every iteration (minibatch) or pass through the dataset (epoch). Callbacks include evaluating the model on a validation set or computing missclassification percentage. There are also callbacks for saving to disk and for generating visualizations. Here we will set up a progress bar to monitor training.

In [None]:
# set up callbacks. By default sets up a progress bar
from neon.callbacks.callbacks import Callbacks
callbacks = Callbacks(model, train_set)

Training the model
------------------
Now all the pieces are in place to run the network. We use the fit function and pass it a dataset, cost, optmizer, and the callbacks we set up.

In [None]:
# And  run the model
model.fit(dataset=train_set,
          cost=cost,
          optimizer=optimizer,
          num_epochs=5,
          callbacks=callbacks)

Congrats! If you made it this far you have trained a convolutional network in neon.

Evaluating the model
--------------------
We can now compute the misclassification on the test set to see how well we did.

In [None]:
# Check the performance on the supplied test set
from neon.transforms import Misclassification
error_pct = 100 * model.eval(test_set, metric=Misclassification())
print 'Misclassification error = %.1f%%' % error_pct

By tweaking some of the hyperparameters (number of layers, adding dropout...) we can improve the performance.

This was quite a lot of code! Generally, to set up a new model from scratch it is best to follow one of the examples from the neon/examples directory. It's easy to mix and match parts!

Inference
=========
Now we want to grab a new image from the internet and classify it through our network!

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# an image of a from I found on Wikipedia
img_source = "https://upload.wikimedia.org/wikipedia/commons/thumb/5/55/Atelopus_zeteki1.jpg/440px-Atelopus_zeteki1.jpg"

# download the image
import urllib
urllib.urlretrieve(img_source, filename="image.jpg")

# crop and resize to 32x32
from PIL import Image
import numpy as np

img = Image.open('image.jpg')
crop = img.crop((0,0,min(img.size),min(img.size)))
crop.thumbnail((32, 32))
plt.imshow(crop, interpolation="nearest")
crop = np.asarray(crop, dtype=np.float32)
plt.axis('off')

Create a dataset with this image for inference

In [None]:
# create a minibatch with the new image 
import numpy as np
from neon.data import ArrayIterator
x_new = np.zeros((128,3072), dtype=np.float32)
x_new[0] = crop.reshape(1,3072)/ 255

inference_set = ArrayIterator(x_new, None, nclass=nclass, 
                             lshape=(3, 32, 32))

Get model outputs on the inference data

In [None]:
classes =["airplane", "automobile", "bird", "cat", "deer", 
          "dog", "frog", "horse", "ship", "truck"]
out = model.get_outputs(inference_set)
classes[out[0].argmax()]
