# Single Neuron

In this tutorial we will implement a single perceptron in [Keras](https://keras.io/) and we will employ it to perform two-class classification on the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.

## From theory...

The output of a *single neuron*, given an input $x$ is the following: $$y = \sigma(W^Tx + b)$$
Where $W$ and $b$ are the learned matrix of weights and the learned bias respectively. The greek letter $\sigma$ stands for the activation function: in this case we'll use the sigmoid function, that is $\sigma(z) = 1 / (1 + e^{-z})$.

Being the task binary-classification, we'll use as loss *binary cross-entropy*, defined as: 
$$L(t, p) = -tlog(p) -(1-t)log(1-p)$$
where $t \in \left\{0, 1\right\}$ is the target and $p \in (0, 1)$ is our prediction.

## ...to practice

For this task we'll use the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset, which consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. The whole dataset is quite small (~160MB) and the [Keras](https://keras.io/) library provides a facility for downloading and unpacking the dataset (if not already in cache), as well as for splitting training and testing examples. This is as easy as:

In [1]:
from keras.datasets import cifar10

(X_train, Y_train), (X_val, Y_val) = cifar10.load_data()

print('X_train shape:', X_train.shape)
print('Train samples: {}\nTest samples: {}.'.format(X_train.shape[0], X_val.shape[0]))

Using Theano backend.
Using gpu device 0: Quadro K2200 (CNMeM is disabled, cuDNN 5005)


('X_train shape:', (50000L, 3L, 32L, 32L))
Train samples: 50000
Test samples: 10000.


### Loading the data

A single neuron is able to distinguish only between two classes. Thus, we already provide the code to filter the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset in order to get only a couple of classes. Here is the list of all the ten classes:
> `airplane`, `automobile`, `bird`, `cat`, `deer`, `dog`, `frog`, `horse`, `ship`, `truck`

From these, we can select a pair of classes (say  *horse* and *truck*) as follows:

In [2]:
import keras.backend as K
import numpy as np
import matplotlib.pyplot as plt
from keras.optimizers import Adam
from keras.datasets import cifar10


# size of cifar10 images
height, width, channels = 32, 32, 3  


def select_classes_from_cifar10(dataset, classes):

    allowed = 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'
    if any([not cl in classes for cl in classes]) or not len(classes) > 1:
        raise ValueError('Please check your class arguments')

    (X_train, Y_train), (X_val, Y_val) = dataset

    c_idx = [allowed.index(c) for c in classes]  # class indexes

    # find the ids of the right examples
    idx_train = np.array([True if y[0] in c_idx else False for y in Y_train])
    idx_val = np.array([True if y[0] in c_idx else False for y in Y_val])

    # filter and reshape X
    X_train = np.float32(X_train[idx_train])
    X_train = np.reshape(X_train, (X_train.shape[0], height * width * channels))
    X_val = np.float32(X_val[idx_val])
    X_val = np.reshape(X_val, (X_val.shape[0], height * width * channels))

    # adjust Y
    s = sorted(c_idx)
    Y_train = np.float32([s.index(y[0]) for y in Y_train[idx_train]])
    Y_val = np.float32([s.index(y[0]) for y in Y_val[idx_val]])

    return (X_train/X_train.max(), Y_train), (X_val/X_train.max(), Y_val)


# load data from cifar
cifar10_dataset = cifar10.load_data()
(X_train, Y_train), (X_val, Y_val) = select_classes_from_cifar10(cifar10_dataset, classes=['horse', 'truck'])
print('X_train shape:', X_train.shape)
print('Train samples: {}\nTest samples: {}.'.format(X_train.shape[0], X_val.shape[0]))

('X_train shape:', (10000L, 3072L))
Train samples: 10000
Test samples: 2000.


Now we have our two-class dataset, containing only *horses* and *trucks*! Let's go on defining our model.

### Placeholders and Variables 

First we'll need some placeholders in order to be able to feed our computational graph with external data, such as images and targets. These can be defined through `K.placeholder(shape=())`. In our case *x* is a vector with the same shape of the input image, while the *target* is a scalar $\in \left\{0,1\right\}$

In [3]:
# placeholders variables
x = K.placeholder(shape=(None, height * width * channels))
target = K.placeholder(shape=(1,))

Now we define the variables: on these, `keras.backend` will compute the automatic derivative during the SGD.

We need a vector *w* with the same shape of *x*, and a scalar bias *b*. Both variables are initialized with random normal weights.

In [4]:
# variables
w = K.variable(np.random.randn(height * width * channels))
b = K.variable(np.random.randn())

### Defining the model

We can define the neuron output $y = \sigma(W^Tx + b)$ as follows:

In [5]:
y = K.sigmoid(K.dot(x, w) + b)

Now we have to define our **loss**, which as we saw is the binary cross-entropy.

In [6]:
# loss function
loss = K.mean(K.binary_crossentropy(output=y, target=target))

We also define a couple of utility functions that will be useful to get a feeling of what's happening during training:

In [7]:
# utility functions
compute_loss = K.function(inputs=[x, target], outputs=[loss])
predict = K.function(inputs=[x], outputs=[y])

We still have to tell to Theano to compute the **gradients** of our variables *w* and *b* with respect to our loss. Let's do it:

In [8]:
grads = K.gradients(loss=loss, variables=[w, b])

Once evaluated, `grads[0]` will contain the gradient of the loss w.r.t. the weights *w*, `grads[1]` will contain the gradient of the loss w.r.t. the bias *b*.

We can now set the **learning rate** and define the **update rule**: $w_{t+1} = w_t - lr~\frac{\partial L}{\partial w}$ and $b_{t+1} = b_t - lr~\frac{\partial L}{\partial b}$

In [9]:
# update rule
lr = K.variable(0.1)
updates = [[w, w - lr * grads[0]], [b, b - lr * grads[1]]]

Finally, the actual training function is defined. We have to provide the inputs (that will feed the placeholders), the outputs and the parameters' update. 

In [10]:
# train function
train = K.function(inputs=[x, target], outputs=[loss], updates=updates)

We are now ready to start the **training**. For `nb_epochs`, the `train` function is called and parameters are updated with the update rule previously defined.

In [None]:
# training
nb_epochs = 1000
train_loss_history = []
val_loss_history = []
plt.figure()
for epoch in range(nb_epochs):
    # calc loss and update parameters
    loss_train = train([X_train, Y_train])[0]
    train_loss_history.append(loss_train)

    # just calc loss for validation
    loss_val = compute_loss([X_val, Y_val])[0]
    val_loss_history.append(loss_val)

    # plot and print
    if epoch % 10 == 0:
        cur_w = np.reshape(np.array(K.eval(w)), (channels, height, width)).transpose(1, 2, 0)
        plt.subplot(1, 2, 1)
        plt.imshow(cur_w/cur_w.max(), interpolation='none')
        plt.title('[Epoch: {}] Current weights'.format(epoch))

        plt.subplot(1, 2, 2)
        tr, = plt.plot(train_loss_history, c='r')
        vl, = plt.plot(val_loss_history, c='b')
        plt.legend([tr, vl], ['Train', 'Val'])
        plt.xlim(0, nb_epochs)
        plt.ylim(0, max(max(train_loss_history), max(val_loss_history)))
        plt.title('[Epoch: {}] Loss history'.format(epoch))

        plt.draw()
        plt.pause(0.001)

        print 'Training loss: {}\tValidation loss:{}'.format(loss_train, loss_val)

        

Did the neuron learned anything? Let's find out by calling the **`predict`** function on some samples of the validation set `X_val`!

In [None]:
plt.figure()
while True:
    r = np.random.randint(0, X_val.shape[0])
    sample = X_val[r]
    prediction = round(predict([np.expand_dims(sample, axis=0)])[0])
    sample = np.reshape(sample, (channels, height, width)).transpose(1, 2, 0)

    plt.imshow(sample, interpolation='none')
    plt.title('Prediction: {}, Label:{}'.format(prediction, Y_val[r]))
    plt.draw()
    plt.waitforbuttonpress()

### Extra

- Add a Keras optimizer of your choice to replace vanilla SGD. Training curve should be smoother and loss should get to lower values
- Try different initialization for the variables
- (spare time?) Try implementing a simple MultiLayer Perceptron in Theano