# Conventional Chainer Import Statements
The following imports appear widely in Chainer code and examples:

In [4]:
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions

# The Variable Class
A class Variable represents the unit of computation by wrapping numpy.ndarray in it (.data).

In [7]:
x = Variable(np.asarray([[0,2],[1,-3]]).astype(np.float32))
x.data  # print x

array([[ 0.,  2.],
       [ 1., -3.]], dtype=float32)

Users can define operations and functions (instances of Function) directly on Variables.

In [8]:
y = x**2 - x + 1
y.data  # print y

array([[  1.,   3.],
       [  1.,  13.]], dtype=float32)

Since Variables remember what they are generated from, Variable y has the additive operation as its parent (.creator). This mechanism makes backwards computation possible by tracking back the entire path from the final loss function to the input without defining the computational graph in advance

In [9]:
y.creator

<chainer.functions.math.basic_math.AddConstant at 0x1128c8f98>

# Functions and Links
Many numerical operations and activation functions are given in chainer.functions. Standard neural network operations such as fully connected linear and convolutional layers are implemented in Chainer as an instance of Link. A Link can be thought of as a function together with its corresponding learnable parameters (such as weight and bias parameters, for example). It is also possible to create a Link that itself contains several other links. Such a container of links is called a Chain. This allows Chainer to support modeling a neural network as a hierarchy of links and chains.


# Implementing A Two-Layer Perceptron
Chainer uses a unique approach called define-by-run, in which the training graph is obtained in the course of training. Forward computation directly corresponds to the computational graph and backpropagation through it, so any modification to the graph can be done in the forward computation at each iteration (and even for each sample).

We'll run this example with the MNIST dataset for digit classification:

In [49]:
# chainer provides convenient access to the dataset
train, test = datasets.get_mnist()

train_iter = iterators.SerialIterator(train, batch_size=100, shuffle=True)
test_iter = iterators.SerialIterator(test, batch_size=100, repeat=False, shuffle=False)


In [50]:
class MLP(Chain):
    def __init__(self, n_units, n_out):
        super(MLP, self).__init__()
        with self.init_scope():
            # the size of the inputs to each layer will be inferred
            self.l1 = L.Linear(None, n_units)  # n_in -> n_units
            self.l2 = L.Linear(None, n_units)  # n_units -> n_units
            self.l3 = L.Linear(None, n_out)    # n_units -> n_out

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

In the constructer (__init__), we define two linear transformations from the input to hidden units, and hidden to output units, respectively. Note that no connection between these transformations is defined at this point, which means that the computation graph is not even generated, let alone fixed.

Instead, their relationship will be later given in the forward computation (__call__), by defining the activation function (F.tanh) between the layers. The network definition is simply represented in Python rather than a domain-specific language, so users can make changes to the network in each iteration (forward computation).



In [None]:
class Classifier(Chain):
    def __init__(self, predictor):
        super(Classifier, self).__init__()
        with self.init_scope():
            self.predictor = predictor

    def __call__(self, x, t):
        y = self.predictor
        loss = F.softmax_cross_entropy(y, t)
        accuracy = F.accuracy(y, t)
        report({"loss": loss, "accuracy": accuracy}, self)
        return loss

In [None]:
model = L.Classifier(MLP(100, 10)) # the input size, 784, is inferred
optimizer = optimizers.SGD()
optimizer.setup(model)
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (20, 'epoch'), out='result')

In [55]:
trainer.extend(extensions.Evaluator(test_iter, model))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(['epoch', 'main/accuracy', 'validation/main/accuracy']))
trainer.extend(extensions.ProgressBar())
trainer.run()

RuntimeError: cannot run training loop multiple times