## **Introduction** 
In this project i'm going to implement a novel recurrent network model called **Shuffling Recurrent Neural Network (SRNN)**.
In this model the hidden state id defined as:
$$
h_t = \sigma(W_p h_{t-1}+b(x_t))
$$
where $\sigma$ is the activetion function and $W_p$ is a fixed permutation matrix, for example:
$$
W_p = \begin{pmatrix}
0 & 1 & \dots & 0 & 0 \\
0 & \ddots & \ddots & \ddots & 0\\
\vdots & \ddots & \ddots & \ddots & \vdots\\
0 & \ddots & \ddots & \ddots & 1\\
1 & 0 & \dots & 0 & 0\\
\end{pmatrix}
$$
$$
b(x_t) = f_r(x_t)\odot sigmoid(W_s x_t + b_s)
$$

## **Import Library**

In [52]:
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import numpy as np
import tqdm

## **Model**

 $$h_t = \sigma(W_p h_{t-1}+b(x_t))$$

where:

$h_{t-1} ϵ ℜ^{d_h} $

$x_t ϵ ℜ^{d_i} $

$d_h$ is the dimension of the hidden state

$d_i$ is the dimension of the input

$$b(x_t) = f_r(x_t)\odot sigmoid(W_s x_t + b_s)$$

where:

$f_r$ is a MLP and $f_r:ℜ^{d_i} \rightarrow ℜ^{d_h}$

$W_s \epsilon ℜ^{d_h \times d_i}$

$b_s \epsilon ℜ^{d_h}$

In [53]:
class SRNNHidden(nn.Module):
  def __init__(self, inputSize, hiddenSize, numLayer, outputSize):
    super().__init__()
    self.gatBranch = nn.Linear(inputSize,hiddenSize)
    listaLayer = [nn.Linear(inputSize,hiddenSize),nn.ReLU()]
    for i in range(numLayer-1):
      listaLayer.append(nn.Linear(hiddenSize,hiddenSize))
      listaLayer.append(nn.ReLU())
    self.fr = nn.Sequential(*listaLayer)
    self.wp = torch.cat((torch.eye(hiddenSize)[1:],torch.eye(hiddenSize)[0].reshape(1,hiddenSize)))
    self.lastLayer = nn.Linear(hiddenSize,outputSize)
    self.inputSize = inputSize

  def forward(self, x, h = None):
    batchSize = x.shape[0]
    if self.inputSize == 1:
      x = torch.tensor(torch.reshape(x,(batchSize,1)),dtype=torch.float)
    if h == None:
      h = self.fr(x)*torch.sigmoid(self.gatBranch(x))
    else:
      h = torch.matmul(h,self.wp)+self.fr(x)*torch.sigmoid(self.gatBranch(x))
      
    return self.lastLayer(h),h


class SRNN(nn.Module):
  def __init__(self, inputSize, hiddenSize, numLayer, outputSize):
    super().__init__()
    self.srnnHidden = SRNNHidden(inputSize, hiddenSize, numLayer, outputSize)
    

  def forward(self, x):
    lenRNN = x.shape[1]
    h = None
    for i in range(lenRNN):
      lastLayer , h = self.srnnHidden(x[:,i],h) 
    return lastLayer, h

In order to inizialize the neural network we need to define 4 parameters: the size of the input (inputSize), the size of the hidden state (hiddenSize), the number of the layer of the MLP $f_r$ (numLayer) and the size of the output (outputSize)


In [54]:
net = SRNN(2,128,8,1)

The values of the parameters are the same that was used in the paper

## **Dataset**

We are going to test our net on one of the datasets that were used in the paper.
The dataset is the Adding Problem Dataset.
The code that we are going to use to build the dataset it's the code of the Paper.

For more information about the Adding Problem see the file in the repository

In [55]:
class AddingProblemDataset(Dataset):
    def __init__(self, ds_size=1000, sample_len=50):
        super().__init__()
        self.sample_len = sample_len
        self.ds_size = ds_size

    def generate_sample(self, num_samples):
        X_value = np.random.uniform(low=0, high=1, size=(self.sample_len, 1))
        X_mask = np.zeros((self.sample_len, 1))
        half = int(self.sample_len / 2)
        first_i = np.random.randint(half)
        second_i = np.random.randint(half) + half
        X_mask[(first_i, second_i), 0] = 1
        Y = np.sum(X_value[(first_i, second_i), 0])
        X = np.concatenate((X_value, X_mask), 1)
        return X, Y

    def __getitem__(self, item):
        return [torch.tensor(x, dtype=torch.float) for x in self.generate_sample(1)]

    def __len__(self):
        return self.ds_size

In [56]:
sampleLen = 200
batchSize = 50
dataset = DataLoader(AddingProblemDataset(ds_size=100*batchSize, sample_len=sampleLen),batch_size=batchSize)

We are going to use the Mean Square Error Loss function, the same loss function that was used in the paper

In [57]:
loss = nn.MSELoss()
opt = torch.optim.Adam(net.parameters())

In [58]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [59]:
xb, yb = next(iter(dataset))
ypred, h = net(xb)
print(loss(ypred.squeeze(), yb))

tensor(2.6808, grad_fn=<MseLossBackward0>)


In [60]:
for epoch in range(10):
  net.train()
  for xb, yb in tqdm.tqdm(dataset):
    
    xb, yb = xb.to(device), yb.to(device)
    opt.zero_grad()
    ypred, h = net(xb)
    l = loss(ypred.squeeze(), yb)
    l.backward()
    opt.step()

  net.eval()
  print(f'Accuracy at epoch {epoch+1}: {l}')

100%|██████████| 100/100 [00:50<00:00,  1.97it/s]


Accuracy at epoch 1: 0.17953723669052124


100%|██████████| 100/100 [00:45<00:00,  2.20it/s]


Accuracy at epoch 2: 0.14739765226840973


100%|██████████| 100/100 [00:44<00:00,  2.25it/s]


Accuracy at epoch 3: 0.1668688952922821


100%|██████████| 100/100 [00:46<00:00,  2.17it/s]


Accuracy at epoch 4: 0.12376651167869568


100%|██████████| 100/100 [00:44<00:00,  2.24it/s]


Accuracy at epoch 5: 0.022722119465470314


100%|██████████| 100/100 [00:45<00:00,  2.19it/s]


Accuracy at epoch 6: 0.019684581086039543


100%|██████████| 100/100 [00:44<00:00,  2.23it/s]


Accuracy at epoch 7: 0.011949127539992332


100%|██████████| 100/100 [00:46<00:00,  2.16it/s]


Accuracy at epoch 8: 0.006745799910277128


100%|██████████| 100/100 [00:46<00:00,  2.15it/s]


Accuracy at epoch 9: 0.014725782908499241


100%|██████████| 100/100 [00:44<00:00,  2.24it/s]

Accuracy at epoch 10: 0.003851079847663641





In [61]:
print(ypred.squeeze())
print(yb)

tensor([0.8368, 0.6073, 1.4439, 1.0808, 0.8531, 0.9881, 1.2640, 1.1317, 1.2514,
        0.8691, 0.8017, 0.5457, 0.9808, 0.7162, 1.1850, 0.5861, 0.7227, 0.7418,
        1.2890, 0.8356, 1.0278, 1.5093, 0.5997, 1.0406, 1.1698, 1.6333, 1.7002,
        0.4166, 1.3711, 1.3717, 0.6364, 1.1634, 1.1341, 1.1479, 1.5310, 0.7045,
        0.5154, 1.1588, 0.9827, 0.3649, 1.2749, 0.6845, 0.7858, 1.2648, 0.9973,
        1.4959, 0.7878, 1.1233, 1.2502, 1.6640], grad_fn=<SqueezeBackward0>)
tensor([0.9025, 0.6352, 1.5622, 1.1236, 0.9297, 1.0274, 1.3528, 1.1077, 1.3372,
        0.9473, 0.8047, 0.5625, 1.0419, 0.7286, 1.2842, 0.5888, 0.6989, 0.7647,
        1.2832, 0.8068, 1.0789, 1.4191, 0.5235, 1.0582, 1.1645, 1.6303, 1.7693,
        0.4732, 1.4376, 1.2985, 0.6547, 1.2652, 1.1723, 1.0339, 1.5294, 0.7715,
        0.5086, 1.1800, 0.9978, 0.3671, 1.3066, 0.7328, 0.8876, 1.1349, 0.9948,
        1.5535, 0.8669, 1.1721, 1.1682, 1.5544])


Now we are going to test our SRNN on the same problem but this time with a sequence lenght of 750

In [62]:
batchSize = 50
dataset750 = DataLoader(AddingProblemDataset(ds_size=100*batchSize, sample_len=750),batch_size=batchSize)

In [63]:
net750 = SRNN(2,128,8,1)

In [64]:
loss750 = nn.MSELoss()
opt750 = torch.optim.Adam(net750.parameters())

In [65]:
xb750, yb750 = next(iter(dataset750))
ypred750, h750 = net750(xb750)
print(loss750(ypred750.squeeze(), yb750))

tensor(39.7789, grad_fn=<MseLossBackward0>)


In [67]:
for epoch in range(10):
  net750.train()
  for xb750, yb750 in tqdm.tqdm(dataset750):
    
    xb750, yb750 = xb750.to(device), yb750.to(device)
    opt750.zero_grad()
    ypred750, h750 = net750(xb750)
    l = loss(ypred750.squeeze(), yb750)
    l.backward()
    opt750.step()

  net750.eval()
  print(f'Accuracy at epoch {epoch+1}: {l}')

100%|██████████| 100/100 [02:43<00:00,  1.64s/it]


Accuracy at epoch 1: 0.19520387053489685


100%|██████████| 100/100 [02:43<00:00,  1.63s/it]


Accuracy at epoch 2: 0.19216546416282654


100%|██████████| 100/100 [02:42<00:00,  1.63s/it]


Accuracy at epoch 3: 0.19294202327728271


100%|██████████| 100/100 [02:42<00:00,  1.62s/it]


Accuracy at epoch 4: 0.19828654825687408


100%|██████████| 100/100 [02:43<00:00,  1.63s/it]


Accuracy at epoch 5: 0.09964174032211304


100%|██████████| 100/100 [02:42<00:00,  1.63s/it]


Accuracy at epoch 6: 0.20470498502254486


100%|██████████| 100/100 [02:42<00:00,  1.63s/it]


Accuracy at epoch 7: 0.14254242181777954


100%|██████████| 100/100 [02:42<00:00,  1.62s/it]


Accuracy at epoch 8: 0.21697691082954407


100%|██████████| 100/100 [02:42<00:00,  1.63s/it]


Accuracy at epoch 9: 0.12660609185695648


100%|██████████| 100/100 [02:42<00:00,  1.63s/it]

Accuracy at epoch 10: 0.12603643536567688



