# **Lecture 9 - Step-by-Step implementation and usage of LSTM**

In this short tutorial, I will show you how to use PyTorch's implementation of LSTM. I will use the same data used in Lab 8.

This tutorial is adapted from [Chapter 9](https://d2l.ai/chapter_recurrent-neural-networks/index.html),  [Chapter 10](https://classic.d2l.ai/chapter_convolutional-neural-networks/index.html) of the textbook, and from [this PyTorch tutorial on classifying names](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html).

Section 1 is the same as Section 1.1. of Topic 8's tutorial. Section 2 is new, while the remaining sections are the same Topuic's 8 tutorial. In this notebook, I included all the sections for completeness.

## **1. Using LSTM**



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

<torch._C.Generator at 0x7ff7dc9954d0>

In [None]:
## Hello LSTM

# Create an LSTM whose input is of dim 3 and has an output (i.e., no. of features/dim in the hidden state) of dim 3
lstm = nn.LSTM(3, 3)

# Let's make a random sequence of
inputs = [torch.randn(1, 3) for _ in range(5)]  # make a sequence of length 5
for i in range(5):
  print(inputs[i])


tensor([[-0.1473,  0.3482,  1.1371]])
tensor([[-0.3339, -1.4724,  0.7296]])
tensor([[-0.1312, -0.6368,  1.0429]])
tensor([[ 0.4903,  1.0318, -0.5989]])
tensor([[ 1.6015, -1.0735, -1.2173]])


Just for test, let's feed this input into the LSTM and output the predictions (note that the network has not been trained yet). There are two ways of doing this;



In [None]:
# initialize the hidden state and cell state (memory of the LSTM unit).
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))

# Stepping through the sequence one element at a time
for i in inputs:
    print(i)
    out, hidden = lstm(i.view(1, 1, -1), hidden)

# Displaying the prediction
print(out)      # the final output
print(hidden)   # the hidden state of  the network


**Approach 2: Process the entire sequence at once**

In this case, LSTM returns
- out: All of the hidden states throughout the sequence,
- hidden: The most recent hidden state.

In this case, the input to LSTM is one tensor whose dimensions are as follows:
- Dim 1 (first dimensions): corresponds to the length of the sequence
- Dim 2: mini batch size - let's consider it as 1 for now.
- Dim 3: the length of each element in the sequence.

In [None]:
# Concatenate all the input to form one tensor of size sequence_length x minibatch_size x element_length
inputs = torch.cat(inputs).view(len(inputs), 1, -1)

# Initialize the hidden state
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))

# Run LSTm through  the entire sequence (inputs) at once
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

tensor([[[ 0.1149,  0.0220,  0.2672]],

        [[ 0.0150,  0.0030,  0.0964]],

        [[-0.1190,  0.0397,  0.0507]],

        [[-0.0874,  0.0521,  0.0937]],

        [[ 0.1593, -0.0071, -0.0020]]], grad_fn=<StackBackward0>)
(tensor([[[ 0.1593, -0.0071, -0.0020]]], grad_fn=<StackBackward0>), tensor([[[ 0.2186, -0.0347, -0.0096]]], grad_fn=<StackBackward0>))


You can cascade multiple LSTM blocks together. The number of blocks (called layers in LSTM) can be specified inthe third argument of nn.LSTM.

In the example below, we cascade two LSTM blocks:

In [None]:
# Create an LSTM whose input is of dim 3 and has an output (i.e., no. of features/dim in the hidden state) of dim 3
n_input = 3
n_hidden = 3
n_blocks = 2
lstm = nn.LSTM(n_input, n_hidden, n_blocks)

# Let's make a random sequence of
inputs = [torch.randn(1, 3) for _ in range(5)]  # make a sequence of length 5
#for i in range(5):
#  print(inputs[i])

# Concatenate all the input to form one tensor of size sequence_length x minibatch_size x element_length
inputs = torch.cat(inputs).view(len(inputs), 1, -1)

# Run LSTm through  the entire sequence (inputs) at once
out, hidden = lstm(inputs)  #, hidden)
print(out)
print(hidden)

tensor([[[-0.0196,  0.0525,  0.2091]],

        [[-0.0228,  0.0760,  0.3162]],

        [[-0.0192,  0.1160,  0.3595]],

        [[ 0.0344,  0.1436,  0.3846]],

        [[-0.0107,  0.1668,  0.3867]]], grad_fn=<StackBackward0>)
(tensor([[[-0.2989, -0.0129,  0.0624]],

        [[-0.0107,  0.1668,  0.3867]]], grad_fn=<StackBackward0>), tensor([[[-0.5663, -0.0504,  0.1610]],

        [[-0.0172,  0.4029,  1.4973]]], grad_fn=<StackBackward0>))


## **2. Your Task**

Integrate this code into the classes you created last week and use it to recognize the language of surnames (please refer to last week's lab).