### Recurrent Neural Networks (RNN)
Traditional feed-forward neural networks take in a fixed amount of input data all at the same time and produce a fixed amount of output each time. On the other hand, RNNs do not consume all the input data at once. Instead, they take them in one at a time and in a sequence. At each step, the RNN does a series of calculations before producing an output. The output, known as the hidden state, is then combined with the next input in the sequence to produce another output. This process continues until the model is programmed to finish or the input sequence ends.

The calculations at each time step consider the context of the previous time steps in the form of the hidden state. Being able to use this contextual information from previous inputs is the key essence to RNNs’ success in sequential problems.

While it may seem that a different RNN cell is being used at each time step in the graphics, the underlying principle of Recurrent Neural Networks is that the RNN cell is actually the exact same one and reused throughout.
#### Long Short Term Memory Networks (LTSMs)
Recurrent Neural Networks can also be divided into units called Long Short Term Memory (LTSM) if there are feedback loops present, or delays of time. In this case, you would have mitigated flow of the input data through the neural network. This allows the neural network to prioritize between what is **important** vs. **non-important** information.
One LTSM is composed of a:
- **Memory cell**: where the input data resides. It’s a container where all the action happens. The gates on its perimeter are able to control what information flows through it and how the input is handled.
- **Input Gate**: This is where the input enters the cell (obviously). There is a tanh activation function because the gate decides whether to let the input data in or erase the present state, and how the input will affect the output. You can see this in the diagram below as it is represented by the middle two activation functions.
- **Output Gate**: This is shown in the diagram by the activation function on the right side. It regulates and filters the output of the function.
- **Forget Gate**: You don’t actually always need the previous input information for the following one. This allows you to rid of information that was previously stored. For example: say you input “Angelica is my friend. Logan is Sam’s cousin.”, then it will ‘forget’ all the data before “friend” by the time it reaches “Logan”. It is seen in the activation function the furthest on the left side.

In general LTSMs are used to classify, identify, or predict output data accordingly based on a series of discrete-time input data. They use gradient descent and back propagation algorithms to minimize error.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim

class RNN_Classification_Net(nn.Module):
    def __init__(self, device, batch_size, n_steps, n_inputs, n_neurons, n_outputs):
        super().__init__()
        
        self.device = device
        self.n_neurons = n_neurons
        self.batch_size = batch_size
        self.n_steps = n_steps
        self.n_inputs = n_inputs
        self.n_outputs = n_outputs
        
        self.basic_rnn = nn.RNN(n_inputs, n_neurons)
        
        self.FC = nn.Linear(self.n_neurons, self.n_outputs)
        
    def init_hidden(self,):
        # (num_layers, batch_size, n_neurons)
        return (torch.zeros(1, self.batch_size, self.n_neurons).to(self.device))
        
    def forward(self, X):
        # transforms X to dimensions: n_steps X batch_size X n_inputs
        X = X.view(-1, 28,28) 
        X = X.permute(1, 0, 2) 
        
        self.batch_size = X.size(1)
        self.hidden = self.init_hidden()

        lstm_out, self.hidden = self.basic_rnn(X, self.hidden)      
        out = self.FC(self.hidden)
        
        return out.view(-1, self.n_outputs) # batch_size X n_output


DEVICE = "cuda" if torch.has_cuda else "cpu"
BATCH_SIZE = 64
N_STEPS = 28
N_INPUTS = 28
N_NEURONS = 150
N_OUTPUTS = 10

RNN_Model = RNN_Classification_Net(DEVICE,BATCH_SIZE, N_STEPS, N_INPUTS, N_NEURONS, N_OUTPUTS).to(DEVICE)
print(RNN_Model)

RNN_Classification_Net(
  (basic_rnn): RNN(28, 150)
  (FC): Linear(in_features=150, out_features=10, bias=True)
)


In [14]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.datasets import FashionMNIST
from torchvision.transforms import ToTensor,RandomPerspective,RandomRotation

#Loading the data
training_data = FashionMNIST(
    root="data", #Path to the data
    train=True, #Are the data for training
    download=True, #Download the data if they don't exist
    transform=ToTensor() #Transform the feature and label into tensors
)

test_data = FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)


#Loading the data in a dataloader
train_dataloader = DataLoader(training_data, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True)

#Implementing both training and testing sets
def train_loop(dataloader, model, loss_fn, optimizer, scheduler):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # reset hidden states
        model.hidden = model.init_hidden()
        
        # Setting up the inputs
        X = X.to(DEVICE)
        y = y.to(DEVICE)

        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()#reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
        loss.backward()#Backpropagate the prediction loss
        optimizer.step()#adjust the parameters by the gradients collected in the backward pass.

        if batch % 500 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
    scheduler.step()


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad(): #Emptying the gradients
        for X, y in dataloader:
            # Setting up the inputs
            X = X.to(DEVICE)
            y = y.to(DEVICE)

            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

#Defining the Training Parameters
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.AdamW(RNN_Model.parameters(),lr = 0.001,weight_decay = 0.001)
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
epochs = 10

#Starting the training and testing
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, RNN_Model, loss_fn, optimizer, scheduler)
    test_loop(test_dataloader, RNN_Model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.300172  [    0/60000]
loss: 0.592000  [32000/60000]
Test Error: 
 Accuracy: 76.4%, Avg loss: 0.654331 

Epoch 2
-------------------------------
loss: 0.431172  [    0/60000]
loss: 0.638750  [32000/60000]
Test Error: 
 Accuracy: 78.8%, Avg loss: 0.580906 

Epoch 3
-------------------------------
loss: 0.608246  [    0/60000]
loss: 0.437673  [32000/60000]
Test Error: 
 Accuracy: 81.2%, Avg loss: 0.518366 

Epoch 4
-------------------------------
loss: 0.360196  [    0/60000]
loss: 0.355520  [32000/60000]
Test Error: 
 Accuracy: 81.1%, Avg loss: 0.537505 

Epoch 5
-------------------------------
loss: 0.526343  [    0/60000]
loss: 0.468028  [32000/60000]
Test Error: 
 Accuracy: 81.6%, Avg loss: 0.550994 

Epoch 6
-------------------------------
loss: 0.343049  [    0/60000]
loss: 0.423382  [32000/60000]
Test Error: 
 Accuracy: 84.0%, Avg loss: 0.454628 

Epoch 7
-------------------------------
loss: 0.445706  [    0/60000]
loss: 0.483376  [3

In [18]:

from sklearn.metrics import confusion_matrix
import seaborn as sn
import pandas as pd
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

y_pred = []
y_true = []
wrong_preds = []

RNN_Model.hidden.to("cpu")
RNN_Model.to("cpu")

# iterate over test data
for img, label in tqdm(test_data):
        output =RNN_Model(img[None, ...]).argmax(1)
        y_pred.append(output.item()) # Save Prediction
        y_true.append(label) # Save Truth
        if label != output:
            wrong_preds.append((img,label))

# images that the model predicted wrong
print(f"Out of {len(test_data)} images the model has predicted {len(wrong_preds)} wrong images")

# Build confusion matrix
classes = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']

# Build confusion matrix
cf_matrix = confusion_matrix(y_true, y_pred)
df_cm = pd.DataFrame(cf_matrix/np.sum(cf_matrix) *10,
                    index = [i for i in classes],
                    columns = [i for i in classes]) #instead of numbers to get a precentage
plt.figure(figsize = (12,7))
sn.heatmap(df_cm, annot=True)

# Visualizing random images that the model predicted wrong
figure = plt.figure(figsize=(12, 12))
cols, rows = 3, 4

for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(wrong_preds), size=(1,)).item() 
    img, label = wrong_preds[sample_idx]

    pred = RNN_Model(img[None, ...]).argmax(1)

    figure.add_subplot(rows, cols, i)
    plt.title(f"Pred: {classes[pred.item()]}, Label: {classes[label]}", fontdict={"fontsize": 14, "color": ("green" if pred.item() == label else "red")})
    plt.axis("off")
    plt.imshow(img.cpu().squeeze()) 

plt.show()

  0%|          | 0/10000 [00:00<?, ?it/s]


RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cpu and hidden tensor at cuda:0