## Simple LSTM

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

```nn.LSTM``` is a class within the PyTorch framework, specially part fo the ```torch.nn``` module. It is used to create an instance of the ```Long short-term memory(LSTM)``` layer. 

### Key Parameters of ```nn.LSTM```

* ```input_size:``` The number of expected features in the input ```x```. 
* ```hidden_size:``` The number of features in the hidden state ```h```
* ```num_layers(optional):``` Number of recurrent layer. E.g., setting ```num_layers=2``` would mean stacking two LSTMs together to form a stacked LSTM, wiht the second LSTM taking in outputs of the first LSTM and computing the final result. 
* ```bias(optional):``` If ```False```, then the layer does not use bias weights ```b_ih``` and ```b_hh```. Default is ```True```. 
* ```batch_first(optional):``` If ```True```, then the input and output tensors are provided as (batch, seq, feature). Default is ```False```, which expects (seq, batch,feature).
* ```dropout(optional):``` If ```non-zero```, indroduce a Dropout layer on the output of each LSTM layer expect the last layer, with dropout probability equal to dropout. Default is ```0```.
* ```bidirectional(optional):``` If ```True```, becomes a bidirectional LSTM. Default is ```False```.

In [None]:
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embadding_dim, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.embadding = nn.Embedding(vocab_size, embadding_dim) # embadding layer
        self.lstm = nn.LSTM(embadding_dim, hidden_size, batch_first=True) # LSTM Layer
        self.fc = nn.Linear(hidden_size, output_size) # fully connected layer to produce the output
    
    def forward(self, x):
        # enbadded input words
        x = self.embadding(x)

        # initiate hidden state and cell state which zero
        h0 = torch.zeros(1, x.size(0), self.hidden_size)
        c0 = torch.zeros(1, x.size(0), self.hidden_size)
        # forward propagate the LSTM
        out,(hn,cn) = self.lstm(x,(h0, c0))

        # pss the output of the last time step to the fully connected layer
        out = self.fc(out[:,-1,:])
        return out

In [2]:
# Parameters
vocab_size = 10 # Size of the vocabulary
embadding_dim = 4 # dimension of the embadding vactors
hidden_size=10 # number of features in the hidden state
output_size = 1 # number of output classes per timestep

In [None]:
# Create model 
model = LSTMModel(vocab_size, embadding_dim, hidden_size, output_size)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Sample data (batch size, sequence length)
input = torch.tensor([[1,2,3],[2,3,4]])
target = torch.tensor([[4.0],[5.0]])

# training loop 
for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    outputs = model(input)
    loss = 