# Dive into Chainer
<img src="chainer_cover.png">

## Core concept: Chainer
Chainer is a flexible framework for neural networks. One major goal is flexibility, so it must enable us to write complex architectures simply and intuitively.

Most existing deep learning frameworks are based on the “Define-and-Run” scheme. That is, first a network is defined and fixed, and then the user periodically feeds it with mini-batches. Since the network is statically defined before any forward/backward computation, all the logic must be embedded into the network architecture as data. Consequently, defining a network architecture in such systems (e.g. Caffe) follows a declarative approach. Note that one can still produce such a static network definition using imperative languages (e.g. torch.nn, Theano-based frameworks, and TensorFlow).
In contrast, Chainer adopts a “Define-by-Run” scheme, i.e., the network is defined on-the-fly via the actual forward computation. More precisely, Chainer stores the history of computation instead of programming logic. This strategy enables to fully leverage the power of programming logic in Python. For example, Chainer does not need any magic to introduce conditionals and loops into the network definitions. The Define-by-Run scheme is the core concept of Chainer. We will show in this tutorial how to define networks dynamically.
http://chainer.org

## 1. Preparation for the data set

We utilize the very useful method to retrieve MNIST dataset given by Chainer. By this method, you can download the data and retrieve each data one by one with ease. 

In [None]:
from chainer.datasets import mnist

# Download the MNIST data if you haven't downloaded it yet
train, test = mnist.get_mnist(withlabel=True, ndim=1)

# set matplotlib so that we can see our drawing inside this notebook
%matplotlib inline
import matplotlib.pyplot as plt

# Display an example from the MNIST dataset.
# `x` contains the input image array and `t` contains that target class
# label as an integer.
x, t = train[100]
plt.imshow(x.reshape(28, 28), cmap='gray')
plt.show()
print('label:', t)

## 2. Define the model

Here, let's define simple 3 layer perceptron. This is a neural network composed of only fully connected layers. In this example, we set each hidden layer to have 100 units and set the output layer to have 10 units, corresponding to the class labels for the MNIST digits. 

We first briefly explain 'Link', 'Function', and 'Chain' which are used for defining the model architechture in Chainer. 

### Link and Function

- In Chainer, each layer of a neural network is decomposed into one of two broad types of functions: 'Link' and 'Function'.
- ** Function is a function without learnable paremeters**
- ** Link is a function with (learnable) parameters** We can think of Link as wrapping a Function to give it parameters.
- We then describe a model by combining various link and functions.
- For examples of links, see the 'chainer.links' module.
- For examples of functions, see the 'chainer.functions' module.

Before we can start using them, we first need to import the modules as shown below.

    ```
    import chainer.links as L
    import chainer.functions as F
    ```
The Chainer convention is to use `L` for links and `F` for functions, like 'L.Convolution2D(...)' or 'F.relu(...)'.

### Chain

- Chain is a class that can hold multiple links and/or functions. It is a subclass of `Link` and so is also a `Link` itself. 
- This means that a chain can contain parameters and these are the parameters of any links that the chain contains.
- In this way, `Chain` allows us to construct models with a potentially deep hierarchy of functions and links.
- These parameters will need to be updated/optimized during the training procedure (there is an exception, of course).
- It turns out to be convinient to use a chain to hold all of the optimizable parameters in a single container since this then makes it possible to use a single `Optimizer` to update all of the the parameters easily during training.


### Define the Model class as a subclass of 'Chain' 

- Models are often defined as a subclass of `Chain`.
- Inside the constructor of a model, provide the names and corresponding layer objects as keyword arguments to parent(super) class. Since the model class will hold all of the layer objects, the 'Optimizer' can easily find and manage all of the model's parameters. 
- Alternatively, we can also use the 'add_link' method to add the layers to a `Chain`.
- Define a "\__call\__" method so that we can call the chain like a function. This method is used to implement the forward computation.


### How to run it on GPU

- The `Link` and 'Chain' classes have a 'to_gpu' method that takes a GPU_ID argument. This method sends all paremeters in a model to the GPU memory on the device with the id that you assigned. 
- During training, it is necessary to perform alternating forward and backword computations on the model. 
- If not using GPU, all of the computations are run on the CPU.



In [None]:
import chainer
import chainer.links as L
import chainer.functions as F

class MLP(chainer.Chain):

    def __init__(self, n_mid_units=100, n_out=10):
        super(MLP, self).__init__(
            l1=L.Linear(None, n_mid_units),
            l2=L.Linear(None, n_mid_units),
            l3=L.Linear(None, n_out),
        )

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        return self.l3(h2)

model = MLP()

### NOTE

The 'L.Linear' class is a fully connected layer. 
When 'None' is passed as the first argument, this allows the number of necessary input units (`n_input`) and also the size of the weight and bias parameters to be automatically determined and computed at runtime during the first forward data pass. We call this feature `parameter shape placeholder`. This can be a very helpful feature when defining deep CNN models, since it would often be tedious to manually determine the input sizes. 

As mentioned previously, a 'Link' can contain multiple parameter arrays. For a given link or chain, such as the `MLP` chain above, the links it contains can be accessed as attributes (or properties). The paramaters of a link can also be accessed as attributes. For example, following code shows how to access the bias parameter of layer l1:

In [None]:
print('The shape of the bias of the first layer, l1, in the model,', model.l1.b.shape)
print('The values of the bias of the first layer in the model after initialization,', model.l1.b.data)

In [None]:
# Wrapp your model by Classifier and include the process of loss calculation within your model. 
model = L.Classifier(model)

gpu_id = 0
# when you use GPU, need to assign gpu ID here. CPU is negative value
model.to_gpu(gpu_id)

### NOTE

Here, the model defined above is passed to 'L.Classifier' and change to new 'Chain' class model. 'L.Classifier', which is in fact an inheritant class from 'Chain' class, keeps the 'Chain' model within the property called 'predictor'. Once you give data and their labels to the model by '()' accessor, first, __call__ in the model are invoked. The data is, then, given to 'predictor' to obtain the output 'y'. Next, together with the label data, the output 'y' will be passed to the loss function which is assigned by 'lossfun' argument in the constructor and the ouput will be returned as 'Variable'. In 'L.Classifiler', the 'lossfun' is set as 'softmax_cross_entropy' as default. 

##  Select an optimization algorithm

Chainer provides a wide variety of optimization algorithms that can be used to optimize the model parameters during training. They are located in the 'chainer.optimizers' module. 

Here, we are going to use the basic stochastic gradient descent (SGD) method, which is implemented by 'optimizers.SGD'. The model (recall that it is a'Chain' object) we created is passed to the optimizer object by providing the model as an argument to the optimizer's 'setup' method. In this way, Optimizer can automatically find the model paremeters to be optimized. 

You can easily try out other optimizers as well. Please test and observe the results of various optimizers. For example, you could try to change 'SGD" of 'chainer.optimizers.SGD' to 'MomentumSGD', 'RMSprop', 'Adam', etc and run your training loop. 

In [None]:
from chainer import optimizers
# Choose an optimizer algorithm
optimizer = optimizers.SGD(lr=0.01)
# Give the optimizer a reference to the model so that it
# can locate the model's parameters.
optimizer.setup(model)

### NOTE

Observe that above, we set'lr' to 0.01 in the SGD constructor. This value is known as a the "learning rate", one of the most important ** hyper paremeters** that need be adjusted in order to obtain the best performance. The various optimizers may each have differnt hyper-parameters and so be sure to check the documentation for the details.

# 3 Let's try to use Trainer

By using 'Trainer', you don't need to write the tedious training loop explicitly any more. 'Extension' in 'Trainer' has many useful tools to visualize your results and manage and store log files more easily. 

Trainer is a class to gather/organize all of necessary stuffs for training. The main stuffs are shown below. 
<img src="trainer.png" width="500">

## 3.1 Create the dataset iterators

Let's make the 'Iterator's. An Iterator will take out the certain number of examples from the dataset, and then return them together in 'minibutch'. We will use this minibutch later in trainer. An Iterator also has properties to manage the training such as 'epoch': how many times we have gone through the entire dataset.

In [None]:
from chainer import iterators

# Choose the minibatch size.
batchsize = 128

train_iter = iterators.SerialIterator(train, batchsize)
test_iter = iterators.SerialIterator(test, batchsize,
                                     repeat=False, shuffle=False)

### About Iterator

- 'SerialIterator', which is one of the simplest iteretaors in Chainer, can retrieve data from your dataset in order.
- Iterator takes 2 arguments: the dataset object and a batch size. 
- When data need to be used repeatedly for training, set the 'repeat' argument to 'True' (the default). When data is needed only once and no longer necessary for retriving the data anymore, set 'repeat' to 'False'.
- When you want to shuffle the training dataset for every epoch, set the 'shuffle' argument 'True'.

In the example above, we set 'batchsize = 128', 'train_iter'is the Iterator for the training dataset, and 'test_iter' is the Iterator for test dataset. These iterators will therefore return 128 image examples as a bundle. 

## 3.2 Preparation for Updator

As you see in the Trainer architechture above, Updater is composed of: 

- Updater
    - Iterator
        - Dataset
    - Optimizer
        - Model

Basically all you need to pass to 'Trainer' is 'Updator'. However, 'Updator' has 'Iterator' and 'Optimizer' within. 
Since 'Iterator' can access to the dataset and 'Optimizer' has references to the  models, 'Updator' can update the parameters of models. 

So, 'Updator' can perform the series of training procedures as shown below:

1. Retrieve the data from dataset (Iterator)
2. Pass the data to the model and calculate the loss (Model = Optimizer.target)
3. Optimizer updates the parameters of the model (Optimizer) 

Let's create the Updator object !

In [None]:
from chainer import training
from chainer.cuda import to_cpu

# Send Iterator and Optimizer to Updator
updater = training.StandardUpdater(train_iter, optimizer, device=gpu_id)
#updater = training.StandardUpdater(train_iter, optimizer)

## 5. Setup Trainer

Lastly, we can setup 'Trainer'. The only requirement for creating 'Trainer' is to pass 'Updater' prviously generated above. However, you can also pass 'stop_trigger' to the second argument as a tuple, '(length, unit)' to tell the trainer stop automatically as your indicated timing. The 'length' is given as an arbitrary integer, The 'unit' is given as a string, by selecting 'epoch' or 'iteration'. Without setting 'stop_trigger', the training will not stop automatically. 

In [None]:
# Send Updater to Trainer
max_epoch = 10

trainer = training.Trainer(updater, (max_epoch, 'epoch'),
                           out='mnist_result')

The 'out' argument in the 'Trainer' will set up an output directory to save the logfiles, the image files of graphs to show the time progress of loss, accuracy, etc. 
Next, we will explain how to display/save those outputs by using 'Extension'. 

## 6. Add Extension to Trainer

The advantages to use 'Trainer' is: 

- Save log files automatically ('LogReport')
- Display the training information to the terminal periodically ('PrintReport')
- Visualize the loss progress by plottig a graph periodically and save its image ('PlotReport')
- Automatically serialize the model or the state of Optimizer periodically ('snapshot'/'snapshot_object')
- Display Progress Bar to show the progress of training ('ProgressBar')
- Save the model architechture as a dot format readable by Graphviz ('dump_graph')

Now you can utilize the wide variety of tools shown above right away ! To make use of these useful tools, what you need to do is only to pass 'Extension' object to 'Trainer' object by 'extend' method. 

In [None]:
from chainer.training import extensions

trainer.extend(extensions.LogReport())
trainer.extend(extensions.snapshot(filename='snapshot_epoch-{.updater.epoch}'))
trainer.extend(extensions.snapshot_object(model.predictor, filename='model_epoch-{.updater.epoch}'))
trainer.extend(extensions.Evaluator(test_iter, model,device= 0)) # GPU
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'validation/main/loss', 'validation/main/accuracy', 'elapsed_time']))
trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], x_key='epoch', file_name='loss.png'))
trainer.extend(extensions.PlotReport(['main/accuracy', 'validation/main/accuracy'], x_key='epoch', file_name='accuracy.png'))
trainer.extend(extensions.dump_graph('main/loss'))

### `LogReport`

Collect 'loss' and 'accuracy' automarically every 'epoch' or 'iteratin' and store the information under the 'log' file in the directory assigned by the 'out' argument of 'Trainer'

### `snapshot`

This 'snapshot' method save the 'Trainer' object at the designated timing (defaut: every epoch) in the directory assigned by 'out' argument in 'Trainer'. 'Trainer' object, as mentioned before, has 'Updator' which retains  'Optimizer' and a model inside. Therefore, as long as you keep the snapshot by 'Extension', you can come back to the training or make inferences using the previously trained model later. 

### `snapshot_object`

Howeve, when you keep the whole 'Trainer' object, in some cases, it is very tedious to retrieve only the inside of the model. By using 'snapshot_object', you can save the particular object (in this case, the model wrapped by 'Classifier') in addition to saving the 'Trainer' object. 'Classifier' is a 'Chain' object which keeps the 'Chain' object given by the first argument as a property called 'predictor' and calculate the loss. 'Classifier' doesn't have parameters other than the inside of models, we only keep 'model.predictor' in the case that we need to make inferences using the model later. 

### `dump_graph`

This method save the computaional garph by which we can track down through 'Variable' object. The graph is saved in the Graphviz's dot format. The output location (directory) to save the graph is assigned by the 'out' argument of 'Trainer'. 

### `Evaluator`

'Iterator' using the dataset for evaluation and the model object are passed to 'Evaluator'. The 'Evaluator' evaluates the model using the given dataset at the assigned timing. 

### `PrintReport`

'Reporter' aggregates the results to output to the standard output. The timing for displaying the output can be given by the list. 

### `PlotReport`

'PlotReport'plots the values assigned by arguments, draw the graph and save the image in the directory assigned by 'file name'.

---

The 'Extension' has a lot of options other than those mentioned here. For instance, by using 'trigger' option, you can set individual timings to activate the 'Extension' more flexibly. Please take a look at the official document in more detail：[Trainer extensions](http://docs.chainer.org/en/latest/reference/extensions.html)

## 7. Start Training

To start training, just call 'run' method from 'Trainer' object.

In [None]:
trainer.run()

Let's see the graph of loss saved in the 'mnist_result' directory. 

In [None]:
from IPython.display import Image
Image(filename='mnist_result/loss.png')

How about the accuracy? 

In [None]:
Image(filename='mnist_result/accuracy.png')

Furthermore, let's visualize the computaional graph output by 'dump_graph' of 'Extension' using 'Graphviz'. If you don't have graphviz..., then
> brew install graphviz 

In [None]:
%%bash
dot -Tpng mnist_result/cg.dot -o mnist_result/cg.png 

In [None]:
Image(filename='mnist_result/cg.png')

From the top to the bottom, you can track down the calculation flow, how data and paremeters are passed to what type of 'Function' and the calculated loss is output. 

## 8. Make inferences by using the pre-trained model

In [None]:
import numpy as np
from chainer import serializers
from chainer.cuda import to_gpu
from chainer.cuda import to_cpu

model = MLP()
serializers.load_npz('mnist_result/model_epoch-10', model)
model.to_gpu(gpu_id)

%matplotlib inline
import matplotlib.pyplot as plt

x, t = test[0]
plt.imshow(x.reshape(28, 28), cmap='gray')
plt.show()
print('label:', t)

x = to_gpu(x[None, ...])
y = model(x[None, ...])
y = to_cpu(y.data)

print('predicted_label:', y.argmax(axis=1)[0])

successfully executed !!

# Advanced:  Inside Chainer

In Section above, we showed how to build and train neural networks in Chainer through image recognition and language modeling. Users can also apply Chainer to their own problems other than such pattern recognition tasks.

Though we only combined preset layers and functions to build neural networks in the experiments, users may need to create new kinds of networks, by writing code for lower level of implementations, from scratch.

Chainer is designed to encourage users to rapidly make such prototype of new models, test it, and improve through trial-and-error. In the following, we explain the core components inside the Chainer.

## 9.1 CuPy and NumPy
NumPy is the widely-used library in Python for numerical computations based on CPU. On the other hand, neural networks can benefit from GPU for faster computatins of multi-dimensional arrays. However, NumPy does not support GPU so that Python users have to write GPU-specific code as in the initial version of Chainer.
Therefore, CuPy has been created and added to Chainer as a NumPy-compatible library based on CUDA. It currently supports many of the APIs in NumPy so that users can write CPU/GPU-agnostic code in most cases.


### Variable natively supports backpropagation

Backpropagation is the standard way to optimize neural networks. After forward computation, the loss is given at the output (as gradient), then the corresponding gradients are assigned to each intermediate layer by backtracking the computational graph. Then the parameters will be updated using the gradient information.

<img src="backward.png" width="250">

In Chainer, since all of the variables in forward computation are stored and automatic differentiation is supported, backward() traces the computational graph backward from the terminal (output) to the root (input of which .creator is None). Then the optimizer updates the model.

### Definition: quadratic equation as forward computation

As shown in previous sections, forward computation can be regarded as a chain of functions to generate the final Variable instance. During the computation, Chainer remembers all of the intermediate Variable instances.

In [None]:
# A mock of forward computation
def forward(x):
    z = 2 * x
    y = x ** 2 - z + 1
    return y, z

### Execution: backward computation to assign gradients

By setting y.grad and call y.backward(), the gradient information will be transferred to x and z.

In [None]:
from chainer import Variable
x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
y, z = forward(x)
y.grad = np.ones((2, 3), dtype=np.float32)
y.backward(retain_grad=True)

In [None]:
# Gradient for x: 2*x - 2
print(x.grad)

In [None]:
# Gradient for z: -1
print(z.grad)

## Define-by-Run scheme

In most of the existing deep learning frameworks, the model construction and training are two separate processes. In advance of training, a fixed computational graph for a model is built by parsing the model definition. Most of them use a text or symbolic style program to define a neural network. These definitions can be regarded as a kind of domain-specific language (DSL) for deep learning. Then, given a training dataset, actual training runs for updating the model. The following figure shows the two processes. We call it Define-and-Run scheme.

<img src="define-and-run.png" width="400">

The Define-and-Run is very straightforward, and good for optimizing the computational graph before training. On the other hand, it has some drawbacks. For example, it requires special syntax to implement recurrent neural networks. The memory efficiency might not be optimal since all of the computational graph should be stored on the main memory from the beginning to the end of training.

Therefore, Chainer uses another approach named Define-by-Run. The model definition is combined with training as actual forward computation builds the computational graph on the fly. It enables users to easily implement complex networks with loops and branching by using host language.  Modifications to the computational graph during the training such as truncated BPTT can also be done efficiently.

<img src="define-by-run.png" width="400">


# Section Let's write more complicated CNNs

Here, we try to write more complicated CNNs using CIFAR10, which has 32X32 small color images with thier label data, instead of MNIST dataset. 

| airplane | automobile | bird | cat | deer | dog | frog | horse | ship | truck |
|:--------:|:----------:|:----:|:---:|:----:|:---:|:----:|:-----:|:----:|:-----:|
| ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship4.png) | ![](https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck4.png) |

## 1. Definition of Model

Define the model by inheriting 'Chain' class. Here, let's build a more complicated convolutional neural nets. This model has 3 layers of convolutions, followed by 2 fully connected layers. 

We define the model mainly by two methods as follows.

1. define the layers to compose of your model by constructor. 
    - in this case, we can pass 'link' objects as keyword argument to the constructor of the parent class ('Chain') , so that 'Optimizer' can add the optimizable parameters of the layers to the model. 
2. define '\__call\__' method, where '( )' accessor can recieve the data and carry out the Forward calculation.

In [None]:
import chainer
import chainer.functions as F
import chainer.links as L

class MyModel(chainer.Chain):
    
    def __init__(self, n_out):
        super(MyModel, self).__init__(
            conv1=L.Convolution2D(None, 32, 3, 3, 1),
            conv2=L.Convolution2D(32, 64, 3, 3, 1),
            conv3=L.Convolution2D(64, 128, 3, 3, 1),
            fc4=L.Linear(None, 1000),
            fc5=L.Linear(1000, n_out)
        )
        
    def __call__(self, x):
        h = F.relu(self.conv1(x))
        h = F.relu(self.conv2(h))
        h = F.relu(self.conv3(h))
        h = F.relu(self.fc4(h))
        h = self.fc5(h)
        return h

## 2. Training

Let's define 'train' function, so that we can train other models easily later on. This function will recieve a model object, train the model using CIFAR10 data to classify 10 classes and return the trained model. 

By using this 'train' function, we can train the 'MyModel' defined above.

In [None]:
from chainer.datasets import cifar
from chainer import iterators
from chainer import optimizers
from chainer import training
from chainer.training import extensions

def train(model_object, batchsize=64, gpu_id=0, max_epoch=20):

    # 1. Dataset
    train, test = cifar.get_cifar10()

    # 2. Iterator
    train_iter = iterators.SerialIterator(train, batchsize)
    test_iter = iterators.SerialIterator(test, batchsize, False, False)

    # 3. Model
    model = L.Classifier(model_object)
    model.to_gpu(gpu_id)  # uncomment "#" when you use GPU

    # 4. Optimizer
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    # 5. Updater
    updater = training.StandardUpdater(train_iter, optimizer, device=gpu_id)

    # 6. Trainer
    trainer = training.Trainer(updater, (max_epoch, 'epoch'), out='{}_cifar10_result'.format(model_object.__class__.__name__))

    # 7. Evaluator
    
    class TestModeEvaluator(extensions.Evaluator):

        def evaluate(self):
            model = self.get_target('main')
            model.train = False
            ret = super(TestModeEvaluator, self).evaluate()
            model.train = True
            return ret

    trainer.extend(extensions.LogReport())
    trainer.extend(TestModeEvaluator(test_iter, model, device=gpu_id))
    trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'validation/main/loss', 'validation/main/accuracy', 'elapsed_time']))
    trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], x_key='epoch', file_name='loss.png'))
    trainer.extend(extensions.PlotReport(['main/accuracy', 'validation/main/accuracy'], x_key='epoch', file_name='accuracy.png'))
    trainer.run()
    del trainer
    
    return model
    
model = train(MyModel(10))

The training is completed. Now take a look at the results.

In [None]:
from IPython.display import Image
Image(filename='MyModel_cifar10_result/loss.png')

In [None]:
Image(filename='MyModel_cifar10_result/accuracy.png')

Although the accuracy of training reached to 98%, the loss of test data is getting worse every iteratiaon after 5 epoch and its accuracy also seems to reach a platue around 60%. It seems that **the model is overfitting to the training data**. 

## 3. Prediction by using pre-trained model 

Although the test accuracy is around 60%, let's classify some test images by using the pre-trained model.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

cls_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
             'dog', 'frog', 'horse', 'ship', 'truck']

def predict(model, image_id):
    _, test = cifar.get_cifar10()
    x, t = test[image_id]
    model.to_cpu()
    y = model.predictor(x[None, ...]).data.argmax(axis=1)[0]
    print('predicted_label:', cls_names[y])
    print('answer:', cls_names[t])

    plt.imshow(x.transpose(1, 2, 0))
    plt.show()

for i in range(5):
    predict(model, i)

Some are correctly classified, others are not. Even though the model can predict the classification using the training datase with 100% accuracy, it is meaningless if we cannot predict the classification against the unknown test data. The accuracy of the test data is believed to indicates generalization ability more directly.
How can we design a model with high generalization ability and train it?

## 4. Let's define Deeper model
Now, let's define a model with more layers than before. Here, we define a single convolutional neural net, 'ConvBlock, and a single fully connected neural net, 'LinearBlock'. Then, we can build deeper networks by stucking those network blocks.


### Define the block of layers

Let's define the network blocks,  'ConvBlock' and 'LinearBlock', which compose of Deeper neural net. 

In [None]:
class ConvBlock(chainer.Chain):
    
    def __init__(self, n_ch, pool_drop=False):
        w = chainer.initializers.HeNormal()
        super(ConvBlock, self).__init__(
            conv=L.Convolution2D(None, n_ch, 3, 1, 1,
                                 nobias=True, initialW=w),
            bn=L.BatchNormalization(n_ch)
        )
        
        self.train = True
        self.pool_drop = pool_drop
        
    def __call__(self, x):
        h = F.relu(self.bn(self.conv(x)))
        if self.pool_drop:
            h = F.max_pooling_2d(h, 2, 2)
            h = F.dropout(h, ratio=0.25, train=self.train)
        return h
    
class LinearBlock(chainer.Chain):
    
    def __init__(self):
        w = chainer.initializers.HeNormal()
        super(LinearBlock, self).__init__(
            fc=L.Linear(None, 1024, initialW=w))
        self.train = True
        
    def __call__(self, x):
        return F.dropout(F.relu(self.fc(x)), ratio=0.5, train=self.train)

'ConvBlock' is defined by inheriting 'Chain'. This contains a single layer convolution and a Batch Normalization layer registered by the constructor. '\__call\__' method recieves the data and apply activation funtion to it. If 'pool_drop' is set as 'True', in addition, Max_Pooling and Dropout functions are applied to this small network. 

In Chainer, the forward calculation code by Python itself represents the model. In other words, how the data goes through what kind of layers represents the network. In this way, by using Chainer, you can describe even complicated networks such as branched networks, and becomes possible to define any networks intuitively and with high flexibility. This is the unique feature of Chainer, called **Define-by-Run**. 

### Define Deeper and larger Networks

Next, let's define bigger networks by stacking the component block of these small networks.

In [None]:
class DeepCNN(chainer.ChainList):

    def __init__(self, n_output):
        super(DeepCNN, self).__init__(
            ConvBlock(64),
            ConvBlock(64, True),
            ConvBlock(128),
            ConvBlock(128, True),
            ConvBlock(256),
            ConvBlock(256),
            ConvBlock(256),
            ConvBlock(256, True),
            LinearBlock(),
            LinearBlock(),
            L.Linear(None, n_output)
        )
        self._train = True
            
    @property
    def train(self):
        return self._train
            
    @train.setter
    def train(self, val):
        self._train = val
        for c in self.children():
            c.train = val
    
    def __call__(self, x):
        for f in self.children():
            x = f(x)
        return x

The class using here is a 'ChainList' class. This class inherits 'Chain' and is very useful when you define networks to retrieve many 'Link' or 'Chain'. 

A model defined by inheriting 'ChainList', when the constructor of the parent class is called, 'Link' or 'Chain' objects are passed **as normal arguments, not as keyword arguments**. Also, they are retrieve each from the list in the registered order by **self.children()** method. 

This feature enables us to describe the forward propagation very easily. With the component list returned by **self.children()**, we can write the entire forward network by using for loop to pick up each component one after another. Then we can first set the input 'x' to the first net and its output is passed to the next series of 'Link' or 'Chain'. This sequential application is extremely useful to simply describe the deeper networks. 

In [None]:
model = train(DeepCNN(10), max_epoch=10)

The training is completed. Let's take a look at the loss and accuracy.

In [None]:
Image(filename='DeepCNN_cifar10_result/loss.png')

In [None]:
Image(filename='DeepCNN_cifar10_result/accuracy.png')

Now the accuracy has improved a lot against the test data. Previously the accuracy was around 60% and now it reach around 87%. According to current research reports, the most advanced model can reach arounc 97%. To improve the accuracy more, it is necessary not only to modify/improve the models but also to increase the training data (Data augmentation) or to combine multiple models to carry out the best perfomance (Ensemble method). There are more room for improvement by your new ideas!