# Copying Task

Inspired on the task described in the following paper: [https://arxiv.org/pdf/1511.06464.pdf](https://arxiv.org/pdf/1511.06464.pdf)

## Introduction

The copying task is one of the simplest benchmark tasks for recurrent neural networks.
The general idea of the task is to reproduce a random sequence of symbols with length
`len_sequence` chosen from an alphabet of size `num_symbols` after a certain waiting
period `len_wait`.

Assuming the waiting symbol is `0`, the symbols chosen for the sequence are chosen from
the alphabet `{1,2,3}` and the stop waiting symbol is `4`; an example input and target for a
waiting time of 20 symbols and a sequence length of 5 can be given by:
```
    213310000000000000000000400000
    000000000000000000000000021331
```

As discussed in the [paper](https://arxiv.org/pdf/1511.06464.pdf), it is always useful
to compare the loss of a certain implementation to the baseline loss of guessing.
Assuming one uses the categorical cross-entropy loss, one can describe a baseline by
predicting the waiting symbol for the first `len_wait + len_sequence` timesteps, followed by a random sampling for the remaining `len_sequence` positions out of
the alphabet of symbols `{a1,...,an}` with `num_symbols` elements. This baseline cross entropy loss boils down to
```
    len_sequence*log(n_symbols)/(len_wait + 2*len_sequence)
```

## Imports

In [1]:
%matplotlib inline
import torch
import numpy as np
import matplotlib.pyplot as plt
import sys; sys.path.append('..')
from torch_eunn import EURNN

torch.manual_seed(24)
np.random.seed(42)

## Constants

In [2]:
# Training parameters
num_steps = 500
batch_size = 128
test_size = 100
valid_size = 100

# Data Parameters
len_wait = 100#0 # very slow if len_wait=1000
num_symbols = 8
len_sequence = 10

# RNN Parameters
capacity = 2
num_layers_rnn = 1
num_hidden_rnn = 128

# Cuda
cuda = True
device = torch.device('cuda' if cuda else 'cpu')

# Baseline Error
baseline = len_sequence*np.log(num_symbols)/(len_wait+2*len_sequence)
print(f'baseline = {baseline}')

baseline = 0.17328679513998632


## Data

In [3]:
def data(len_wait, n_data, len_sequence, num_symbols):
    seq = np.random.randint(1, high=(num_symbols+1), size=(n_data, len_sequence))
    zeros1 = np.zeros((n_data, len_wait-1))
    zeros2 = np.zeros((n_data, len_wait))
    marker = (num_symbols+1) * np.ones((n_data, 1))
    zeros3 = np.zeros((n_data, len_sequence))
    x = torch.tensor(np.concatenate((seq, zeros1, marker, zeros3), axis=1), dtype=torch.int64, device=device)
    y = torch.tensor(np.concatenate((zeros3, zeros2, seq), axis=1), dtype=torch.int64, device=device)
    return x, y

In [4]:
x,y = data(len_wait, 1, len_sequence, num_symbols)
print(x)
print(y)

tensor([[7, 4, 5, 7, 3, 8, 5, 5, 7, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
       device='cuda:0')
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 5, 7, 3, 8, 5, 5, 7, 2]],
       device='cuda:0')


## Model

In [5]:
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.embedding = torch.nn.Embedding(len_wait+2*len_sequence, num_symbols+2)
        self.rnn = EURNN(num_symbols+2, num_hidden_rnn, capacity, batch_first=True)
        self.fc = torch.nn.Linear(num_hidden_rnn, num_symbols+1)
        
        # optimizers and criterion
        self.lossfunc = torch.nn.CrossEntropyLoss()
        self.optimizer = torch.optim.Adam(self.parameters(), lr=0.03)
        
        # move to device
        self.to(device)
        
    def forward(self, data):
        data = self.embedding(data)
        rnn_out, _ = self.rnn(data)
        out = self.fc(rnn_out)
        return out
    
    def loss(self, data, labels):
        return self.lossfunc(self(data).view(-1, num_symbols+1), labels.view(-1))
    
    def accuracy(self, data, labels):
        return torch.mean((torch.argmax(self(data), -1).view(-1) == labels.view(-1)).float())
    
    def prediction(self, data):
        return torch.argmax(self(data), -1)

## Train

Create the model

In [6]:
model = Model()

Start Training

In [7]:
%%time

for step in range(num_steps):
    # reset gradients
    model.optimizer.zero_grad()
    
    # calculate validation accuracy and loss
    if step %100 == 0 or step == num_steps -1:
        valid_data, valid_labels = data(len_wait, valid_size, len_sequence, num_symbols)
        loss = model.loss(valid_data, valid_labels).item()
        print(f'Step {step:5.0f}\t Valid. Loss. = {loss:5.4f}')
        
    # train
    batch_data, batch_labels = data(len_wait, batch_size, len_sequence, num_symbols)
    loss = model.loss(batch_data, batch_labels)
    loss.backward()
    model.optimizer.step()

Step     0	 Valid. Loss. = 23.4495
Step   100	 Valid. Loss. = 0.0160
Step   200	 Valid. Loss. = 0.0032
Step   300	 Valid. Loss. = 0.0014
Step   400	 Valid. Loss. = 0.0007
Step   499	 Valid. Loss. = 0.0005
CPU times: user 4min 56s, sys: 155 ms, total: 4min 56s
Wall time: 4min 56s


## Test

In [8]:
test_data, test_labels = data(len_wait, test_size, len_sequence, num_symbols)
test_loss = model.loss(test_data, test_labels).item()
test_acc = model.accuracy(test_data, test_labels).item()
print("Test result: Loss= " + "{:.6f}".format(test_loss) + ", Accuracy= " + "{:.5f}".format(test_acc))
print('baseline = %f'%baseline)

Test result: Loss= 0.000749, Accuracy= 0.99983
baseline = 0.173287


The baseline is clearly broken. Note that this would not be the case if we switch out the EUNN for an LSTM.