## Simple LSTM

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

```nn.LSTM``` is a class within the PyTorch framework, specially part fo the ```torch.nn``` module. It is used to create an instance of the ```Long short-term memory(LSTM)``` layer. 

### Key Parameters of ```nn.LSTM```

* ```input_size:``` The number of expected features in the input ```x```. 
* ```hidden_size:``` The number of features in the hidden state ```h```
* ```num_layers(optional):``` Number of recurrent layer. E.g., setting ```num_layers=2``` would mean stacking two LSTMs together to form a stacked LSTM, wiht the second LSTM taking in outputs of the first LSTM and computing the final result. 
* ```bias(optional):``` If ```False```, then the layer does not use bias weights ```b_ih``` and ```b_hh```. Default is ```True```. 
* ```batch_first(optional):``` If ```True```, then the input and output tensors are provided as (batch, seq, feature). Default is ```False```, which expects (seq, batch,feature).
* ```dropout(optional):``` If ```non-zero```, indroduce a Dropout layer on the output of each LSTM layer expect the last layer, with dropout probability equal to dropout. Default is ```0```.
* ```bidirectional(optional):``` If ```True```, becomes a bidirectional LSTM. Default is ```False```.

In [8]:
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embadding_dim, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.embadding = nn.Embedding(vocab_size, embadding_dim) # embadding layer
        self.lstm = nn.LSTM(embadding_dim, hidden_size, batch_first=True) # LSTM Layer
        self.fc = nn.Linear(hidden_size, output_size) # fully connected layer to produce the output
    
    def forward(self, x):
        # enbadded input words
        x = self.embadding(x)

        # initiate hidden state and cell state which zero
        h0 = torch.zeros(1, x.size(0), self.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.hidden_size)
        # forward propagate the LSTM
        out,(hn,cn) = self.lstm(x,(h0, c0))

        # pss the output of the last time step to the fully connected layer
        out = self.fc(out[:,-1,:])
        return out

In [9]:
# Parameters
vocab_size = 10 # Size of the vocabulary
embadding_dim = 4 # dimension of the embadding vactors
hidden_size=10 # number of features in the hidden state
output_size = 1 # number of output classes per timesteps

In [10]:
# Create model 
model = LSTMModel(vocab_size, embadding_dim, hidden_size, output_size)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Sample data (batch size, sequence length)
input = torch.tensor([[1,2,3],[2,3,4]])
target = torch.tensor([[4.0],[5.0]])

# Training loop 
for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    outputs = model(input)
    loss = criterion(outputs, target)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 17.2338
Epoch [20/100], Loss: 11.9545
Epoch [30/100], Loss: 4.8258
Epoch [40/100], Loss: 0.3735
Epoch [50/100], Loss: 0.2054
Epoch [60/100], Loss: 0.1379
Epoch [70/100], Loss: 0.0023
Epoch [80/100], Loss: 0.0187
Epoch [90/100], Loss: 0.0007
Epoch [100/100], Loss: 0.0016


In [5]:
# Where we want to get the output at every time step 
class LSTM2(nn.Module):
    def __init__(self, vocab_size, embadding_dim, hidden_size,output_size):
        super(LSTM2, self).__init__()
        self.hidden_size = hidden_size
        self.embdadding = nn.Embedding(vocab_size, embadding_dim)
        self.rnn = nn.LSTM(embadding_dim, hidden_size, batch_first = True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.embdadding(x)
        h0 = torch.zeros(1, x.size(0),self.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.hidden_size)
        out,(hn,cn) = self.rnn(x,(h0,c0))
        out =self.fc(out)
        return out

# Paramenters 
vocab_size = 10 # size of the vocabulary 
embadding_dim = 4 # dimension of the embdading vactors 
hidden_size = 10 # number of features in the hidden state 
output_size = 1 # number of output classses per timesteps 

# Create the model 
model = LSTM2(vocab_size, embadding_dim, hidden_size, output_size)

# loss and optimizer 
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr= 0.01)


# Sample data(batch size, sequence length)
input = torch.tensor([[1,2,3],[2,3,4]])
target = torch.tensor([[[4.0],[5.0],[6.0]],[[5.0],[6.0],[7.0]]])

# Training loop 
for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    outputs = model(input)
    loss = criterion(outputs, target)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 26.4189
Epoch [20/100], Loss: 18.6157
Epoch [30/100], Loss: 8.9903
Epoch [40/100], Loss: 2.8900
Epoch [50/100], Loss: 0.6973
Epoch [60/100], Loss: 0.3916
Epoch [70/100], Loss: 0.4069
Epoch [80/100], Loss: 0.3324
Epoch [90/100], Loss: 0.2670
Epoch [100/100], Loss: 0.2398


In [6]:
# Test the model 
model.eval()
test_input = torch.tensor([[3,4,5]])
predicted = model(test_input)
print(f"Predicted values: {predicted.detach().numpy()}")

Predicted values: [[[1.6227318]
  [4.0620785]
  [5.894954 ]]]


```nn.Embadding``` layer maps each interger int he input sequence to a high-dimentional vector. This layer is particularly useful when dealing with words where each word is reporesented as unique integer.

```nn.Embadding``` layer transforms each integer in the input tensor into a embadding vactor. The output shape form the embading layer becomes ```(batch_size,sequnce_length, embadding_size)```

If ```embadding_dim`` is 4, as in the example, the shape after the embadding layer will be (2,3,4).

```LSTM Layer``` When this tensor is passed through the ```nn.LSTM``` layer, the LSTM processes each dsequence of embadded vactors. The ```nn.LSTM``` layer outputs three tensors: ```the output tensor```, ```the hidden state``` and ```the cell state```. The ```output tensor``` from the LSTM hase the general shape(```batch_size, sequence_length, num_directions * hidden_size```), both of these ```state tensors the cell and the hidden``` hase the shape(```num_layers * num_direction, batch_size, hidden_size```).

```Fully Connected Layer``` the model can uses the output at the last step or at every timestep of the sequence to make a prediction. 

I we want to prediction at the last time steps, output at is sliced form the LSTM output tensor wiht ```out[:,-1,:]```, which reduce its shape to (```batch_size, hidden_size```), or (2,10). The sliced ouptut is then passed through a fully connnected layers (```nn.Liner```), which is designed to map the LSTM's hidden state to the desired output sizer. 

If we want the prediciton at every timesteps, output no need to slice the RNN output tensor simply pass it to (```nn.Liner```), which is designed to map the ```RNN```'s hidden state to the desired output size.