# Pill 17 WIKI Side Quest: Advanced architectures in neural networks

#### by Toni Miranda

## Combining CNN and RNN

CNN and RNN have been described by some of the classmates in this subquest. I would like to add on what has been said by briefly exploring when is a good idea to combine this 2 types of deep-learning models, CNN and RNN. This is on the basis of a book chapter: "Deep Learning for Python" by François Chollet, 2017 (Chapter 6. Deep Learning for text and sequences).

According to Chollet (2017) two fundamental deep-learning algorithms for sequence processing are RNN and 1D CNN, the one-dimensional version of the 2D CNN explained already by classmates and typically used for computer vision. Applications of RNN and 1D CNN  algorithm combination include:
- Document classification and timeseries classification (i.e. identifying the topic of an article or the author of a book).
- Timeseries comparisons, such as estimating how closely related two documents are.
- Sequence-to-sequence learning, such as decoding an English sentence into French
- Sentiment analysis, such as classifying the sentiment of tweets or movie reviews as positive or negative (remember the Kaggle competition in this ML course?)
- Timeseries forecasting, such as predicting the future weather at a certain location, given recent weather data

Note on 1D CNN for sequence data:

The convolution layers introduced previously were 2D convolutions, extracting 2D patches from image tensors and applying an identical transformation to every patch. In the same way, 1D convolutions can be used, extracting local 1D patches (subsequences) from sequences as shown in the figure below (*Fig 1*).

![title](./How_1D_Conv_works.png)  
*Fig 1.* How 1D convolution (1D CNN) works: each output timestep is obtained from a temporal patch in the input sequence.

### Combining CNNs and RNNs

#### Sequence processing with 1-Dimension CNN (1D CNN)

Because 1D CNN process input patches independently, they aren’t sensitive to the order of the timesteps (beyond a local scale, the size of the convolution windows), unlike RNNs. To recognize longer-term patterns, it is possible to stack many convolution layers and pooling layers, but according to Chollet, that’s still a fairly weak way to induce order sensitivity because the CNN looks for patterns anywhere in the input timeseries and has no knowledge of the temporal position of a pattern it sees.

One strategy to combine the speed and lightness of CNNs with the order-sensitivity of RNNs is to use a 1D convnet as a preprocessing step before an RNN. See figure below (*Fig 2*).

![title](./Combining_1D_CNN_and_RNN.png)  
*Fig 2.* Combining a 1D CNN and an RNN for processing long sequences.

#### Combining CNNs and RNNs to process long sequences

This is especially beneficial when dealing with sequences that are so long they can’t realistically be processed with RNNs, such as sequences with thousands of steps. The CNN will turn the long input sequence into
much shorter (downsampled) sequences of higher-level features. This sequence of extracted features then becomes the input to the RNN part of the network.

Key take aways on sequence processing with CNN and combining RNN and CNN:
- In the same way that 2D CNN perform well for processing visual patterns in 2D space, 1D CNN perform well for processing temporal patterns. They offer a faster alternative to RNNs on some problems, in particular natural- language processing tasks.
- Typically, 1D CNN are structured much like their 2D equivalents from the world of computer vision.
- Because RNNs are extremely expensive for processing very long sequences, but 1D CNN are cheap, it can be a good idea to use a 1D CNN as a preprocessing step before an RNN, shortening the sequence and extracting useful representations for the RNN to process.

Exemple of implementation
*Note:* It requires Keras and Tensolflow installed. https://keras.io/#installation

Data set can be download from here: https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip

In [16]:
#Inspecting the data of the Jena weather dataset
import os
data_dir = './'
fname = os.path.join(data_dir, 'jena_climate_2009_2016.csv')
f = open(fname)
data = f.read()
f.close()
lines = data.split('\n')
header = lines[0].split(',')
lines = lines[1:]
print(header)
print(len(lines))

#Parsing data
import numpy as np
float_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
    values = [float(x) for x in line.split(',')[1:]]
    float_data[i, :] = values

#Normalizing data
mean = float_data[:200000].mean(axis=0)
float_data -= mean
std = float_data[:200000].std(axis=0)
float_data /= std

['"Date Time"', '"p (mbar)"', '"T (degC)"', '"Tpot (K)"', '"Tdew (degC)"', '"rh (%)"', '"VPmax (mbar)"', '"VPact (mbar)"', '"VPdef (mbar)"', '"sh (g/kg)"', '"H2OC (mmol/mol)"', '"rho (g/m**3)"', '"wv (m/s)"', '"max. wv (m/s)"', '"wd (deg)"']
420551


In [22]:
#Generator yielding timeseries samples and their targets
def generator(data, lookback, delay, min_index, max_index,
              shuffle=False, batch_size=128, step=6):
    if max_index is None:
        max_index = len(data) - delay - 1
    i = min_index + lookback
    while 1:
        if shuffle:
            rows = np.random.randint(
                min_index + lookback, max_index, size=batch_size)
        else:
            if i + batch_size >= max_index:
                i = min_index + lookback
            rows = np.arange(i, min(i + batch_size, max_index))
            i += len(rows)
        samples = np.zeros((len(rows),
                           lookback // step,
                           data.shape[-1]))
        targets = np.zeros((len(rows),))
        for j, row in enumerate(rows):
            indices = range(rows[j] - lookback, rows[j], step)
            samples[j] = data[indices]
            targets[j] = data[rows[j] + delay][1]
        yield samples, targets

In [23]:
#Preparing the training, validation, and test generators
lookback = 1440
step = 6
delay = 144
batch_size = 128

train_gen = generator(float_data,
                      lookback=lookback,
                      delay=delay,
                      min_index=0,
                      max_index=200000,
                      shuffle=True,
                      step=step,
                      batch_size=batch_size)
val_gen = generator(float_data,
                    lookback=lookback,
                    delay=delay,
                    min_index=200001,
                    max_index=300000,
                    step=step,
                    batch_size=batch_size)
test_gen = generator(float_data,
                     lookback=lookback,
                     delay=delay,
                     min_index=300001,
                     max_index=None,
                     step=step,
                     batch_size=batch_size)

val_steps = (300000 - 200001 - lookback)
test_steps = (len(float_data) - 300001 - lookback)



In [None]:
#Training and evaluating a simple 1D CNN on the Jena data
import keras

from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.Conv1D(32, 5, activation='relu',input_shape=(None, float_data.shape[-1])))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32, 5, activation='relu'))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,steps_per_epoch=500,epochs=20,validation_data=val_gen,validation_steps=val_steps)

Epoch 1/20

## Combining CNN and RNN through Mixed CNN and RNN

#### Giuseppe Onesto

Another very interesting and increasingly used consists in the usage of the idea of recursion, typical of RNNs, to improve standard CNNs architecture and performances.
In literature, these approaches correspond to the so called RCNNs[1] (Recurrent CNN or Recursive CNN), and their main idea is: *"    A prominent difference is that CNN is typically a feed-forward architecture while in the visual system recurrent connections are abundant. Inspired by this fact, we propose a recurrent CNN (RCNN) for object recognition by incorporating recurrent connections into each convolutional layer. "*. [2]

The key module of this RCNN are the recurrent convolution layers (RCL), which introduce recurrent connection into a convolution layer. With these connections the network can evolve over time though the input is static and each unit is influenced by its neighboring units. This property integrates the context information of an image, which is important for object detection, and intuitively overcome CNN problems of it. The following image shows the basic architecture of a RCNN:
![alt text](RCNN architecture.png)
<p style="text-align: center;"></p>

Of course, it shows that the inference network consists of a set of neural levels that are recursives, and this set can be of very large sizes, even every level of the CNN can be recursive, as in Liang and Hu [2] object detection RCNN. 

#### RCNN For Object Recognition

As said, an huge advantage of RCNN is the possibility of handling the context information of an object (eg: an image) by evolving the network nodes while the input being static. 
That explains the suitability of RCNN with problems such as image recognition; at this point, it's interesting to see more in depth the architecture of RCNN used in [2] for object recognition: the network is basically trained by backpropagation through time algorithm for recurrent networks [3];  its architecture can be understood showing and describing the below image.
![alt text](CNNandRNNmixture.JPG)
<p style="text-align: center;"><b>Figure 2:</b> Architecture of CNN and RNN Mixture for RCNN</p>

In this example, it consists of first a convolutional layer to save computations, which is followed by a max pooling layer. On top of that two RCL, one max pooling and then two additional RCL layer are used. Finally we can see one global max pooling and a softmax layer. The global max pooling layer outputs the maximum over every feature map, yielding to a feature vector that represents the image.

###### The results

In their paper, the authors compared their RCNN with state-of-the-art models, as shown in figure 3. RCNN-K denotes a network with K feature maps in layer one to five (eg: RCNN-96 has 96 feature maps). Table 1 compares some results on the [CIFAR-dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
. This dataset consists of 60000 color images of 32x32 pixels in ten classes. For the RCNN 50000 images were used for training and 10000 images were used for testing. The last 10000 images of the training set were used for validation of the net. The RCNN has still remarkable results compared to many other nets. It achieves, with a very low number of parameters, very good results.

![alt text](table comparison.png)
<p style="text-align: center;"><b>Table 1:</b> Comparison of RCNN with respect to state-of-the-art models on the CIFAR dataset</p>




#### Deeply-Recursive Convolutional Network for Image Super-Resolution

The so called DRCN[4], uses the same idea as [2], by increasing the number of recursions at each layer very deeply. It's very interesting to talk about, because, as said by the authors, it tries avoiding the overfitting that would emerge by adding additional convolutional layers to a network, by increasing recursion depth at each level. Moreover, it somehow makes the model simpler to store and load, because of the less number of layers.

##### The model
*It consists of three sub-networks:  embedding, inference and reconstruction networks.
The embedding net takes the input image (grayscale or RGB) and represents it as a set of feature maps. 
Inference net is the main component that solves the task of super-resolution. Analyzing a large image region is done by a single recursive layer. Each recursion applies the same convolution followed by a rectified linear unit.
While feature maps from the final application of the recursive layer represent the high-resolution image, transforming them (multi-channel) back into the original image space (1 or 3-channel) is necessary. This is done by the reconstruction net.* [4]

##### The results

Below it's interesting to show some results on images extreme zooming with respect to other state-of-the-art methods in image super-resolution. As you can see, DRCN offers brilliant results, that are very impressive even for human eyes in image super-resolution.

![alt text](drcn perf1.png)
<p style="text-align: center;"><b>Figure 3:</b> Comparison of DRCN with respect to state-of-the-art models on the super-resolution problem</p>

![alt text](drcn perf2.png)
<p style="text-align: center;"><b>Figure 4:</b> Comparison of DRCN with respect to state-of-the-art models on the super-resolution problem [2]</p>

** References **

[1] Wiest, L. (2017, February). Neural Networks - Combination of RNN and CNN. Retrieved from https://wiki.tum.de/display/lfdv/Recurrent Neural Networks - Combination of RNN and CNN <br>
[2] Liang, Ming, and Xiaolin Hu. "Recurrent convolutional neural network for object recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. <br>
[3] Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78.10 (1990): 1550-1560. <br>
[4] Kim, Jiwon, Jung Kwon Lee, and Kyoung Mu Lee. "Deeply-recursive convolutional network for image super-resolution." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.