<h2>MultiLayer Perceptron<br />
<small>Using Theano</small></h2>

An artificial neural network with one or more hidden layers is historically refered to as multilayer perceptron.  With a single hidden layer, let $X$ be the data , $h(X)$ be a nonlinear function, and $W_i$ and $b_i$ be the weights and biases of the $i^{th}$ layer.  Then we can write the activation function as

$$f(h(X)) = f(W_2 \cdot h(W_1 \cdot X + b_1)  + b_2) $$


First, we will create a generic Layer object to cimplify constructing our model.  Each will need to have a weight tensor and bias tensor, input and output parameters, which in turn will require input and output size and an activation function.

In [1]:
import numpy as np
import theano
import theano.tensor as T 
rng = np.random.RandomState(1234)

class Layer(object):
    def __init__(self, input, n_in, n_out, activation):
        
        # Stores input tensor
        self.input = input
        # Randomly initialize weights to small values
        W = np.asarray(rng.uniform(low=-np.sqrt(6. / (n_in + n_out)),
                                   high=np.sqrt(6. / (n_in + n_out)),
                                   size=(n_in, n_out)),
                       dtype=theano.config.floatX)
        # Bias should be 0 vector
        b = np.zeros((n_out,), dtype=theano.config.floatX)
        
        self.W = theano.shared(value=W, name='W', borrow=True)
        self.b = theano.shared(value=b, name='b', borrow=True)
        
        # Compute activation function
        linear_output = T.dot(input, self.W) + self.b
        if activation is None:
            self.output = linear_output
        else:
            self.output = activation(linear_output)

        # Collect parameters of the model
        self.params = [self.W, self.b]

Now that we have a layer object, linking them together is as simple as creating them with the correct dimensions and activation functions

In [2]:
index = T.lscalar() # Index for batches
x = T.dmatrix('x')  # holds our datapoints
y = T.ivector('y')  # holds the class labels

hidden = Layer(input=x, n_in=4, n_out=16, activation=T.tanh)
softmax = Layer(input=hidden.output, n_in=16, n_out=3, activation=T.nnet.softmax)

Next, we'll need a complete list of our model parameters, a cost function, then gradients and updates used in backpropagation.  For a softmax classifier, the negative log likelihood can be used as a cost function

In [4]:
cost = -T.mean(T.log(softmax.output)[T.arange(y.shape[0]), y])
# collect our parameters
params = [hidden.params + softmax.params][0]
# The gradient for each parameter
gparams = [T.grad(cost, p) for p in params]
# the general update rule
learning_rate = 0.01
updates = [(p, p - learning_rate * g) for p, g in zip(params, gparams)]

Load in a test dataset - the iris data is a trivial example

In [5]:
from sklearn.datasets import load_iris
raw = load_iris()
train_set_x = theano.shared(np.asarray(raw['data'], dtype=theano.config.floatX), borrow=True)
train_set_y = theano.shared(np.asarray(raw['target'], dtype=np.int32), borrow=True)

# hyperparameters
batch_size = 10

Compile the model, hope it works!

In [6]:
train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

In [7]:
n_train_batches = len(raw['data']) / batch_size
epochs = 200

for e in xrange(epochs):
    for batch in xrange(n_train_batches):
        train_model(batch)

In [8]:
predict_model = theano.function(
        inputs=[index],
        outputs = softmax.output,
        givens={
            x: train_set_x[0::,],
            y: train_set_y[0::]
        },
        on_unused_input='warn'
    )    



In [9]:
np.argmax(predict_model(0), axis=1)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2,
       1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [15]:
sum((raw['target'] == np.argmax(predict_model(0), axis=1)).astype(int))/150.0

0.93999999999999995

Of course, this is accuracy on the training set and not a validation or test set - consider this a proof of concept!