# Wprowadzenie do sieci neuronowych i uczenia maszynowego - Sieci Rekurencyjne


---

**Prowadzący:** Piotr Baryczkowski, Jakub Bednarek<br>
**Kontakt:** piotr.baryczkowski@put.poznan.pl<br>

---

## Cel ćwiczeń:
- zapoznanie się z rekurencyjnymi sieciami neuronowymi,
- stworzenie modelu sieci z warstwami rekurencyjnymi dla zbioru danych MNIST,
- stworzenie własnych implementacji warstwami neuronowych

In [1]:
import numpy as np
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.9.0+cpu
CUDA available: False


In [2]:
import torch.nn as nn
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [3]:
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=32, shuffle=True)

100%|██████████| 9.91M/9.91M [00:00<00:00, 40.8MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.10MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.0MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 3.37MB/s]


## Sieci rekurencyjne
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://pytorch.org/docs/stable/generated/torch.nn.RNN.html

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Przykładowy model z warstwą rekurencyjną dla danych MNIST:

In [4]:
class RecurrentModel(nn.Module):
    def __init__(self, num_classes=10):
        super(RecurrentModel, self).__init__()
        self.num_classes = num_classes
        # Define your layers here.
        self.lstm_1 = nn.LSTM(input_size=28, hidden_size=128, batch_first=True)
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(128, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        lstm_out, _ = self.lstm_1(inputs)
        # Take the last output from the sequence (assume inputs are padded appropriately or have consistent lengths)
        x = lstm_out[:, -1, :]  # Get the output of the last time step
        x = self.relu_1(x)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel(num_classes=10)
model

RecurrentModel(
  (lstm_1): LSTM(28, 128, batch_first=True)
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [5]:
learning_rate = 1e-3
batch_size = 32
epochs = 5

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [6]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302981  [   32/60000]
loss: 1.883636  [ 3232/60000]
loss: 1.828843  [ 6432/60000]
loss: 1.951353  [ 9632/60000]
loss: 1.706200  [12832/60000]
loss: 1.715351  [16032/60000]
loss: 1.831772  [19232/60000]
loss: 1.771518  [22432/60000]
loss: 1.741484  [25632/60000]
loss: 1.672478  [28832/60000]
loss: 1.650613  [32032/60000]
loss: 1.790274  [35232/60000]
loss: 1.618494  [38432/60000]
loss: 1.680868  [41632/60000]
loss: 1.610301  [44832/60000]
loss: 1.614900  [48032/60000]
loss: 1.546688  [51232/60000]
loss: 1.582004  [54432/60000]
loss: 1.501046  [57632/60000]
Test Error: 
 Accuracy: 92.7%, Avg loss: 1.538454 

Epoch 2
-------------------------------
loss: 1.593899  [   32/60000]
loss: 1.523293  [ 3232/60000]
loss: 1.530599  [ 6432/60000]
loss: 1.515375  [ 9632/60000]
loss: 1.543829  [12832/60000]
loss: 1.695800  [16032/60000]
loss: 1.555089  [19232/60000]
loss: 1.495103  [22432/60000]
loss: 1.618255  [25632/60000]
loss: 1.496590  [28832/60000

### Zadanie 1
Rozszerz model z powyższego przykładu o kolejną warstwę rekurencyjną przed gęstą warstwą wyjściową.

Standardowe sieci neuronowe generują jeden wynik na podstawie jednego inputu.
Natomiast sieci rekurencyjne przetwarzają dane sekwencyjnie, w każdym kroku łącząc wynik poprzedniego przetwarzania i aktualnego wejścia. Dlatego domyślnym wejściem sieci neuronowej jest tensor 3-wymiarowy ([batch_size,sequence_size,sample_size]).
Domyślnie warstwy rekurencyjne w PyTorchu zwracają sekwencje wyników wszystkich kroków przetwarzania dla warstwy rekurencyjnej. Jeśli chcesz zwrócić tylko wyniki ostatniego przetwarzania dla warstwy rekurencyjnej, musisz samemu to zaimplementować np. `x = lstm_out[:, -1, :]`.


In [7]:
class RecurrentModel2(nn.Module):
    def __init__(self, num_classes=10):
        super(RecurrentModel2, self).__init__()
        self.num_classes = num_classes
        self.lstm_1 = nn.LSTM(input_size=28, hidden_size=128, batch_first=True)
        self.lstm_2 = nn.LSTM(input_size=128, hidden_size=128, batch_first=True)
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(128, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            inputs = inputs.squeeze(1)
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        lstm_out_1, _ = self.lstm_1(inputs)
        lstm_out_2, _ = self.lstm_2(lstm_out_1)
        x = lstm_out_2[:, -1, :]
        x = self.relu_1(x)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel2(num_classes=10)
model

RecurrentModel2(
  (lstm_1): LSTM(28, 128, batch_first=True)
  (lstm_2): LSTM(128, 128, batch_first=True)
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [8]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302454  [   32/60000]
loss: 2.065291  [ 3232/60000]
loss: 1.837247  [ 6432/60000]
loss: 1.847266  [ 9632/60000]
loss: 1.687922  [12832/60000]
loss: 1.734693  [16032/60000]
loss: 1.741557  [19232/60000]
loss: 1.662582  [22432/60000]
loss: 1.565365  [25632/60000]
loss: 1.630697  [28832/60000]
loss: 1.559054  [32032/60000]
loss: 1.500766  [35232/60000]
loss: 1.567310  [38432/60000]
loss: 1.527837  [41632/60000]
loss: 1.536370  [44832/60000]
loss: 1.552933  [48032/60000]
loss: 1.546464  [51232/60000]
loss: 1.586277  [54432/60000]
loss: 1.494086  [57632/60000]
Test Error: 
 Accuracy: 93.1%, Avg loss: 1.531878 

Epoch 2
-------------------------------
loss: 1.573966  [   32/60000]
loss: 1.525413  [ 3232/60000]
loss: 1.500720  [ 6432/60000]
loss: 1.495611  [ 9632/60000]
loss: 1.492251  [12832/60000]
loss: 1.574720  [16032/60000]
loss: 1.496001  [19232/60000]
loss: 1.599877  [22432/60000]
loss: 1.534971  [25632/60000]
loss: 1.561566  [28832/60000

### Zadanie 2
Wykorzystując model z przykładu, napisz sieć rekurencyjną przy użyciu RNNCell.

RNNCell implementuje tylko operacje wykonywane przez warstwę
rekurencyjną dla jednego kroku. Warstwy rekurencyjne w każdym kroku
łączą wynik operacji poprzedniego kroku i aktualny input.
Wykorzystaj pętle for do wielokrotnego wywołania komórki RNNCell (liczba kroków to liczba elementów w sekwencji).

Wywołanie zainicjalizowanej komórki rekurencyjnej wymaga podania aktualnego inputu i listy stanów ukrytych poprzedniego kroku (RNNCell ma jeden stan).

Trzeba zainicjalizować ukryty stan warstwy z wartościami początkowymi (można wykorzystać zmienne losowe - torch.rand).

In [9]:
import torch
import torch.nn as nn

class RecurrentModel3(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel3, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define the RNN cell
        self.rnn_cell = nn.RNNCell(input_size, hidden_size)
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        # Initialize hidden state
        batch_size = inputs.size(0)
        h = torch.zeros(batch_size, self.hidden_size, device=inputs.device)

        # Process sequence step by step
        for t in range(inputs.size(1)): # inputs.size(1) is the sequence_length (28 for MNIST images)
            h = self.rnn_cell(inputs[:, t, :], h)

        # Take the final hidden state and pass it through the dense layer
        x = self.relu_1(h)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel3(input_size=28, hidden_size=128, num_classes=10)

In [10]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302114  [   32/60000]
loss: 2.295591  [ 3232/60000]
loss: 2.246437  [ 6432/60000]
loss: 2.224259  [ 9632/60000]
loss: 2.081541  [12832/60000]
loss: 2.100355  [16032/60000]
loss: 2.110106  [19232/60000]
loss: 1.948715  [22432/60000]
loss: 1.833246  [25632/60000]
loss: 2.134346  [28832/60000]
loss: 1.883807  [32032/60000]
loss: 2.270484  [35232/60000]
loss: 2.054603  [38432/60000]
loss: 1.976045  [41632/60000]
loss: 2.001590  [44832/60000]
loss: 1.839967  [48032/60000]
loss: 1.868906  [51232/60000]
loss: 1.782172  [54432/60000]
loss: 1.806194  [57632/60000]
Test Error: 
 Accuracy: 68.3%, Avg loss: 1.792229 

Epoch 2
-------------------------------
loss: 1.859068  [   32/60000]
loss: 1.782731  [ 3232/60000]
loss: 1.802533  [ 6432/60000]
loss: 1.722198  [ 9632/60000]
loss: 1.681407  [12832/60000]
loss: 1.770567  [16032/60000]
loss: 1.688006  [19232/60000]
loss: 1.826825  [22432/60000]
loss: 1.753467  [25632/60000]
loss: 1.623357  [28832/60000

### Zadanie 3
Zamień komórkę rekurencyjną z poprzedniego zadania na LSTMCell.

In [11]:
class RecurrentModel4(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel4, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define the LSTM cell
        self.lstm_cell = nn.LSTMCell(input_size, hidden_size)
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        # Initialize hidden state and cell state
        batch_size = inputs.size(0)
        h = torch.zeros(batch_size, self.hidden_size, device=inputs.device)
        c = torch.zeros(batch_size, self.hidden_size, device=inputs.device)

        # Process sequence step by step
        for t in range(inputs.size(1)): # inputs.size(1) is the sequence_length (28 for MNIST images)
            h, c = self.lstm_cell(inputs[:, t, :], (h, c))

        # Take the final hidden state and pass it through the dense layer
        x = self.relu_1(h)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel4(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel4(
  (lstm_cell): LSTMCell(28, 128)
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [12]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.304571  [   32/60000]
loss: 2.056720  [ 3232/60000]
loss: 1.933532  [ 6432/60000]
loss: 1.788264  [ 9632/60000]
loss: 1.676427  [12832/60000]
loss: 1.695085  [16032/60000]
loss: 1.613389  [19232/60000]
loss: 1.759412  [22432/60000]
loss: 1.732003  [25632/60000]
loss: 1.615311  [28832/60000]
loss: 1.718195  [32032/60000]
loss: 1.611797  [35232/60000]
loss: 1.585010  [38432/60000]
loss: 1.603897  [41632/60000]
loss: 1.497465  [44832/60000]
loss: 1.526759  [48032/60000]
loss: 1.527915  [51232/60000]
loss: 1.568891  [54432/60000]
loss: 1.504753  [57632/60000]
Test Error: 
 Accuracy: 90.9%, Avg loss: 1.555202 

Epoch 2
-------------------------------
loss: 1.580806  [   32/60000]
loss: 1.519574  [ 3232/60000]
loss: 1.529403  [ 6432/60000]
loss: 1.517452  [ 9632/60000]
loss: 1.650924  [12832/60000]
loss: 1.537828  [16032/60000]
loss: 1.577263  [19232/60000]
loss: 1.610891  [22432/60000]
loss: 1.558320  [25632/60000]
loss: 1.495877  [28832/60000

### Zadanie 4
Wykorzystując model z poprzedniego zadania, stwórz model sieci
neuronowej z własną implementacją prostej warstwy rekurencyjnej.
- w call zamień self.lstm_cell_layer(x) na wyołanie własnej metody np. self.cell(x)
- w konstruktorze modelu usuń inicjalizację komórki LSTM i zastąp ją inicjalizacją warstw potrzebnych do stworzenia własnej komórki rekurencyjnej,
- stwórz metodę cell() wykonującą operacje warstwy rekurencyjnej,
- prosta warstwa rekurencyjna konkatenuje poprzedni wyniki i aktualny input, a następnie przepuszcza ten połączony tensor przez warstwę gęstą (Dense).

In [13]:
class RecurrentModel5(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel5, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define layers for custom recurrent cell
        self.linear_cell = nn.Linear(input_size + hidden_size, hidden_size)
        self.tanh = nn.Tanh()

        # Output layers
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def cell(self, x, h):
        combined_input = torch.cat((x, h), dim=1)
        h_new = self.tanh(self.linear_cell(combined_input))
        return h_new

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        # Initialize hidden state
        batch_size = inputs.size(0)
        h = torch.zeros(batch_size, self.hidden_size, device=inputs.device)

        # Process sequence step by step using the custom cell
        for t in range(inputs.size(1)): # inputs.size(1) is the sequence_length (28 for MNIST images)
            h = self.cell(inputs[:, t, :], h)

        # Take the final hidden state and pass it through the dense layer
        x = self.relu_1(h)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel5(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel5(
  (linear_cell): Linear(in_features=156, out_features=128, bias=True)
  (tanh): Tanh()
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [14]:
epochs = 5

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.300291  [   32/60000]
loss: 2.293908  [ 3232/60000]
loss: 2.165182  [ 6432/60000]
loss: 2.146461  [ 9632/60000]
loss: 2.102144  [12832/60000]
loss: 2.053736  [16032/60000]
loss: 2.040306  [19232/60000]
loss: 2.043575  [22432/60000]
loss: 2.037921  [25632/60000]
loss: 1.978972  [28832/60000]
loss: 1.907864  [32032/60000]
loss: 1.825292  [35232/60000]
loss: 1.703875  [38432/60000]
loss: 2.061726  [41632/60000]
loss: 1.848050  [44832/60000]
loss: 1.850106  [48032/60000]
loss: 1.948858  [51232/60000]
loss: 1.984806  [54432/60000]
loss: 1.978574  [57632/60000]
Test Error: 
 Accuracy: 62.5%, Avg loss: 1.841141 

Epoch 2
-------------------------------
loss: 1.850295  [   32/60000]
loss: 1.972953  [ 3232/60000]
loss: 2.249990  [ 6432/60000]
loss: 1.839846  [ 9632/60000]
loss: 2.079344  [12832/60000]
loss: 1.841746  [16032/60000]
loss: 1.636889  [19232/60000]
loss: 1.846280  [22432/60000]
loss: 1.869632  [25632/60000]
loss: 2.095574  [28832/60000

### Zadanie 5

Na podstawie modelu z poprzedniego zadania stwórz model z własną implementacją warstwy LSTM. Dokładny i zrozumiały opis działania wartswy LSTM znajduje się na [stronie](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).

In [15]:
from torch.nn.modules.activation import Sigmoid, Tanh

class RecurrentModel6(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel6, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define LSTM layers (gates and candidate cell state)
        self.linear_f = nn.Linear(input_size + hidden_size, hidden_size) # Forget gate
        self.linear_i = nn.Linear(input_size + hidden_size, hidden_size) # Input gate
        self.linear_c = nn.Linear(input_size + hidden_size, hidden_size) # Cell state candidate
        self.linear_o = nn.Linear(input_size + hidden_size, hidden_size) # Output gate

        self.sigmoid = Sigmoid()
        self.tanh = Tanh()

        # Output layers
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def cell(self, x, h_c_prev):
        h_prev, c_prev = h_c_prev # Unpack previous hidden and cell states

        combined = torch.cat((x, h_prev), dim=1)

        # Forget Gate
        f_t = self.sigmoid(self.linear_f(combined))
        # Input Gate
        i_t = self.sigmoid(self.linear_i(combined))
        # Cell State Candidate
        c_tilde_t = self.tanh(self.linear_c(combined))
        # Output Gate
        o_t = self.sigmoid(self.linear_o(combined))

        # New Cell State
        c_t = f_t * c_prev + i_t * c_tilde_t
        # New Hidden State
        h_t = o_t * self.tanh(c_t)

        return h_t, c_t

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        # Initialize hidden state and cell state
        batch_size = inputs.size(0)
        h = torch.zeros(batch_size, self.hidden_size, device=inputs.device)
        c = torch.zeros(batch_size, self.hidden_size, device=inputs.device)

        # Process sequence step by step using the custom cell
        for t in range(inputs.size(1)): # inputs.size(1) is the sequence_length (28 for MNIST images)
            h, c = self.cell(inputs[:, t, :], (h, c))

        # Take the final hidden state and pass it through the dense layer
        x = self.relu_1(h)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel6(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel6(
  (linear_f): Linear(in_features=156, out_features=128, bias=True)
  (linear_i): Linear(in_features=156, out_features=128, bias=True)
  (linear_c): Linear(in_features=156, out_features=128, bias=True)
  (linear_o): Linear(in_features=156, out_features=128, bias=True)
  (sigmoid): Sigmoid()
  (tanh): Tanh()
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [16]:
epochs = 2
learning_rate = 0.001
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303797  [   32/60000]
loss: 1.926665  [ 3232/60000]
loss: 1.849037  [ 6432/60000]
loss: 1.737481  [ 9632/60000]
loss: 1.742795  [12832/60000]
loss: 1.666287  [16032/60000]
loss: 1.788786  [19232/60000]
loss: 1.739735  [22432/60000]
loss: 1.662443  [25632/60000]
loss: 1.492035  [28832/60000]
loss: 1.679692  [32032/60000]
loss: 1.599578  [35232/60000]
loss: 1.622489  [38432/60000]
loss: 1.587174  [41632/60000]
loss: 1.583508  [44832/60000]
loss: 1.574126  [48032/60000]
loss: 1.599679  [51232/60000]
loss: 1.508448  [54432/60000]
loss: 1.617748  [57632/60000]
Test Error: 
 Accuracy: 91.3%, Avg loss: 1.550356 

Epoch 2
-------------------------------
loss: 1.578380  [   32/60000]
loss: 1.557972  [ 3232/60000]
loss: 1.526417  [ 6432/60000]
loss: 1.473073  [ 9632/60000]
loss: 1.587164  [12832/60000]
loss: 1.550855  [16032/60000]
loss: 1.634084  [19232/60000]
loss: 1.617076  [22432/60000]
loss: 1.557010  [25632/60000]
loss: 1.504239  [28832/60000

# Task
Zaimplementuj klasę `RecurrentModel2` w komórce `FzMsg5A7fEky`, dodając drugą warstwę rekurencyjną `nn.LSTM` po pierwszej warstwie LSTM, a przed warstwą ReLU i gęstą warstwą wyjściową, a następnie zaimplementuj metodę `forward` uwzględniającą tę nową warstwę. Następnie uruchom trening i testowanie zaktualizowanego modelu przez 5 epok, używając `CrossEntropyLoss` i `RMSprop`, uruchamiając komórkę `3ptuv6IHfEky`, a na koniec podsumuj uzyskane wyniki.

## Implement RecurrentModel2

### Subtask:
Uzupełnij klasę `RecurrentModel2` w komórce `FzMsg5A7fEky`. Dodaj drugą warstwę rekurencyjną `nn.LSTM` po pierwszej warstwie LSTM, a przed warstwą ReLU i gęstą warstwą wyjściową. Następnie zaimplementuj metodę `forward`, aby uwzględnić nową warstwę.


**Reasoning**:
The subtask requires implementing the `RecurrentModel2` class by adding a second LSTM layer and adjusting the `forward` method. The current code block for `RecurrentModel2` is incomplete. I will provide the complete code for `RecurrentModel2` within the specified cell.

