## Recurrent Neural Network
- The only difference between an MLP and a RNN is that an RNN has a feedback loop.
- This feedback loops allows the model to keep relationships for sequential values.
- We can unroll the feedback loop by making a copy of the neural network for each input value.
- The two biggest problems in RNNs are the vanishing gradient problem and the exploding gradient problem. LSTMs, GRUs, and Transformers are used to address this problem.
- The order that we input the data matters a lot. This is an autoregressive model.

## Input and Hidden State
- As stated above, RNNs are autoregressive models. The state of all previous data is called the hiden state.
- The input state is the new input that is fed into the RNN at each time step. 
- The hidden state is the RNN's internal representation that summaries all the information it has seen up until that point.
- The hidden size is the number of hidden units. (Basically the look back period) If you have a look back period of 100, then that is what is used for the forward calculation.

In [None]:
import torch.optim as optim
import torch.nn as nn
import torch
import torch.nn.functional as F

In [None]:
# Get Data

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.output = nn.Linear(input_size + hidden_size, output_size)
    
    def forward(self, x, hidden_state):
        combined = torch.cat((x, hidden_state), 1)
        hidden = torch.ReLU(self.hidden(combined))
        output = self.output(combined)

        return output, hidden

In [None]:
hidden_size = 256
learning_rate = 0.001

# Must get these values from the data
input_size = 1
output_size = 1

model = RNN(input_size, hidden_size, learning_rate)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

In [None]:
# Training

# Test dataset, must actually get the data
train_dataset = []

for epoch in range(5): 
    for i, (name, label) in enumerate(train_dataset):
        for char in name: 
            output, hidden_state = model(char, hidden_state)
        loss = criterion(output, label)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


In [None]:
# Testing

# Must make a test dataset
test_dataset = []

num_correct = 0
num_samples = len(test_dataset)

model.eval()

with torch.no_grad():
    for name, label in test_dataset:
        for char in name: 
            output, hidden_state = model(char, hidden_state)
        _, pred = torch.max(output, dim=1)
        num_correct += bool(pred==label)

print(f"Accuracy: {num_correct / num_samples * 100:.4f}%")