<a href="https://colab.research.google.com/github/kocurvik/edu/blob/master/PNNPPV/notebooky/cv09-en.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 9th lab - RNNs

Todays lab deals with recurrent neural networks. We will create an artificial toy problem in which we will construct a large integer from a MNIST images and our goal will be to determine its modulo w.r.t to some number (ideally a prime number above 10).

In [None]:
import keras
from keras.datasets import mnist
from keras.layers import TimeDistributed, Conv2D, Dense, Flatten, MaxPool2D, ConvLSTM2D, GlobalAveragePooling2D, LSTM, GRU
from keras.models import Sequential
from keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt

In [None]:
(x, y), (x_test, y_test) = mnist.load_data()
x = x / 255
x_test = x_test / 255

## Fixex sequence - continual loss

Let us design a model which has a few convolutional layers which are followed by Global Pooling and a GRU (gated recurrent unit) layer. The GRU layer is connected to the GRU layer for the previous element of the sequence. After the GRU layer we will apply a Dense layer with softmax. We use GRU instead of a more notorious LSTM because it is easier to train. Theoretically LSTM should have better long-term memory, but this is not as important for our toy problem.

![GRU](https://raw.githubusercontent.com/kocurvik/edu/master/PNNPPV/supplementary/ntb_images/GRU_model.png)

We can choose one of multiple ways in which the network can be trained. As input we will use a sequence of inputs in shape  $batch \times seq \times 28 \times 28 \times 1$, where $seq$ is the length of the sequence. On the output we have two options. We either use just the output from the last element of the sequence for loss or we use all of the intermediate outputs for loss. We will first try the first approach.

![GRU Multiloss](https://raw.githubusercontent.com/kocurvik/edu/master/PNNPPV/supplementary/ntb_images/GRU_multi_loss.png)

The model is defined below. The wrapper `TimeDistributed` allows us to call a layer type for each element of a seuqnce. When using a recurrent layer we do not need to use the wrapper.

In [None]:
def build_model(modulo, seq_size):
  model = Sequential()
  model.add(TimeDistributed(Conv2D(16, (7,7), activation='relu'), input_shape=(seq_size, 28, 28, 1)))
  model.add(TimeDistributed(Conv2D(32, (7,7), activation='relu')))
  model.add(TimeDistributed(Conv2D(64, (7,7), activation='relu')))
  model.add(TimeDistributed(Conv2D(64, (7,7), activation='relu')))
  model.add(TimeDistributed(GlobalAveragePooling2D()))
  model.add(GRU(64, activation='relu', recurrent_activation='hard_sigmoid', return_sequences=True))
  model.add(TimeDistributed(Dense(modulo, activation='softmax')))

  return model

We will need to programm a generator which gives us the input data as well as the outputs to calculate the loss. Implement the method getitem. For the outputs we want to use classification into `self.modulo` classes. The output is determined by taking the sequence of numbers as the decimals in written one by one (from the left) and then taking a modulo by our selected number. Let us use 13 as a defualt value for this.

In [None]:
class SeqDataGenerator(keras.utils.Sequence):
  def __init__(self, x, y, modulo=13, batch_size=32, seq_size=20, steps=1000):
    self.x = x
    self.y = y
    self.modulo = modulo
    self.batch_size = batch_size
    self.seq_size = seq_size
    self.steps = steps

  def __len__(self):
    return self.steps

  def __getitem__(self, index):
    X = np.empty([self.batch_size, self.seq_size, 28, 28, 1])
    y = np.zeros([self.batch_size, self.seq_size, self.modulo])

    ## doimplemtujte

    return X, y

This code should then train our network on the data. You can try different sequence lengths of sequnce.

In [None]:
seq_size = 10
modulo = 13

train_gen = SeqDataGenerator(x[10000:], y[10000:], modulo=modulo, batch_size=32, seq_size=seq_size)
val_gen = SeqDataGenerator(x[:10000], y[:10000], modulo=modulo, batch_size=32, seq_size=seq_size, steps=100)
model = build_model(modulo, seq_size)

opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

print(model.summary())

model.fit_generator(train_gen, validation_data=val_gen, epochs=5)

## Fixed sequence - Loss only at the end

We will now try to train the network with the loss being calculated only at the end of the sequence.

![alt text](https://raw.githubusercontent.com/kocurvik/edu/master/PNNPPV/supplementary/ntb_images/GRU_single_loss.png)

Implement the relevant methods. Figure out what the `return_sequential` argument does with GRU layer and modify the network so that there is only one vector on output. Verify whether the training works and test it out for various sequence lengths.

In [None]:
def build_single_model(modulo, seq_size):
  # doimplementujte
  ...

In [None]:
class SingleSeqDataGenerator(keras.utils.Sequence):
  def __init__(self, x, y, modulo=11, batch_size=32, seq_size=20, steps=1000):
    self.x = x
    self.y = y
    self.modulo = modulo
    self.batch_size = batch_size
    self.seq_size = seq_size
    self.steps = steps

  def __len__(self):
    return self.steps

  def __getitem__(self, index):
    X = np.empty([self.batch_size, self.seq_size, 28, 28, 1])
    y = np.zeros([self.batch_size, self.modulo])

    for b in range(self.batch_size):
      # doimplementujte

    return X, y

In [None]:
seq_size = 6
modulo = 13

train_gen = SingleSeqDataGenerator(x[10000:], y[10000:], modulo=modulo, batch_size=32, seq_size=seq_size)
val_gen = SingleSeqDataGenerator(x[:10000], y[:10000], modulo=modulo, batch_size=32, seq_size=seq_size, steps=100)
single_model = build_single_model(modulo, seq_size)

opt = Adam(lr=0.001)
single_model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

print(model.summary())

single_model.fit_generator(train_gen, validation_data=val_gen, epochs=5)

## Variable sequence lengths

Now we will try to train a model with a different sequence length for each batch. The batch size will remain the same, but if we had a memory issue we could reduce it. The only difference in the model definition is the input argument which has None as the first dimension now.

In [None]:
def build_var_model(modulo):
  model = Sequential()
  model.add(TimeDistributed(Conv2D(16, (7,7), activation='relu'), input_shape=(None, 28, 28, 1)))
  model.add(TimeDistributed(Conv2D(32, (7,7), activation='relu')))
  model.add(TimeDistributed(Conv2D(64, (7,7), activation='relu')))
  model.add(TimeDistributed(Conv2D(64, (7,7), activation='relu')))
  model.add(TimeDistributed(GlobalAveragePooling2D()))
  model.add(GRU(64, activation='relu', recurrent_activation='hard_sigmoid', return_sequences=True))
  model.add(TimeDistributed(Dense(modulo, activation='softmax')))

  return model

We have to modify the generator so that we always choose a random sequence length.

*Note:* The sequence has to be at least length 2 for tensorflow to be able to manage it.

In [None]:
class VarSeqDataGenerator(keras.utils.Sequence):
  def __init__(self, x, y, modulo=11, batch_size=32, max_seq_size=20, steps=1000):
    self.x = x
    self.y = y
    self.modulo = modulo
    self.batch_size = batch_size
    self.max_seq_size = max_seq_size
    self.steps = steps

  def __len__(self):
    return self.steps

  def __getitem__(self, index):
    seq_size = np.random.randint(2, self.max_seq_size)   
    X = np.empty([self.batch_size, seq_size, 28, 28, 1]) 
    y = np.zeros([self.batch_size, seq_size, self.modulo])

    # doimplementujte

    return X, y

With this approach we can use the validation data in a fixed size (for example we will use length 10).

In [None]:
max_seq_size = 20
modulo = 13

train_gen = VarSeqDataGenerator(x[10000:], y[10000:], modulo=modulo, batch_size=32, max_seq_size=max_seq_size)
val_gen = SeqDataGenerator(x[:10000], y[:10000], modulo=modulo, batch_size=32, seq_size=10, steps=100)
var_model = build_var_model(modulo)

opt = Adam(lr=0.001)
var_model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

print(var_model.summary())

var_model.fit_generator(train_gen, validation_data=val_gen, epochs=5)

## Inference with the whole sequence

As input for the last var model we can use a sequence of any length. We can therefore verify the model.

In [None]:
seq_size = 10
X = np.empty([1, seq_size, 28, 28, 1])

sum = 0

for s in range(seq_size):
  idx = np.random.randint(10000)
  X[0, s, :, :, 0] = x_test[idx]
  sum = sum*10 + y_test[idx]
  plt.imshow(x_test[idx], cmap='gray')
  plt.show()
  print("Current: {}, sum: {}, mod: {}".format(y_test[idx], sum, sum % 13))

pred = var_model.predict(X)
pred_last = np.argmax(pred[0], axis=-1)
print(pred_last)


## One-by-one inference

If we want to feed the network the input one-by-one we have to convert the model into the so-called stateful mode. This means that the GRU layer remmembers the input from the previous example. If we have a stateful network we have to reset the model everytime we want to start a new sequence.

To convert the model we can use this (somewhat hacky) function:

In [None]:
import json
from keras.models import model_from_json

def convert_to_inference_model(original_model):
    original_model_json = original_model.to_json()
    inference_model_dict = json.loads(original_model_json)
    print(inference_model_dict)

    layers = inference_model_dict['config']['layers']
    for layer in layers:
        if 'stateful' in layer['config']:
            layer['config']['stateful'] = True

        if 'batch_input_shape' in layer['config']:
            layer['config']['batch_input_shape'][0] = 1
            layer['config']['batch_input_shape'][1] = 1

    inference_model = model_from_json(json.dumps(inference_model_dict))
    inference_model.set_weights(original_model.get_weights())

    return inference_model

In [None]:
inference_model = convert_to_inference_model(var_model)

sum = 0

for s in range(20):
  idx = np.random.randint(10000)
  img = x_test[idx]
  # doimplementujte

As mentioned above, if we want to start a new sequence we have to reset the network.

In [None]:
sum = 0

inference_model.reset_states()

for s in range(20):
  idx = np.random.randint(10000)
  img = x_test[idx]
  # doimplementujte