In [53]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)

import numpy as np
import matplotlib.pyplot as plt; plt.style.use("fivethirtyeight")

In this notebook I will explore usage of RNNs in pytorch.
https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

In [54]:
input_size = 3
hidden_size = 4
lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size)  # LSTM(input_size, output_size)
# make a sequence of length 5
input_sequence = [torch.randn(1, 1, input_size) for _ in range(5)]

# initalize the hidden state
hidden = (torch.randn(1, 1, hidden_size),
          torch.randn(1, 1, hidden_size))
for i in input_sequence:
    out, hidden = lstm(i.view(1, 1, -1), hidden)

In [55]:
input_sequence

[tensor([[[ 0.0939,  1.2381, -1.3459]]]),
 tensor([[[ 0.5119, -0.6933, -0.1668]]]),
 tensor([[[-0.9999, -1.6476,  0.8098]]]),
 tensor([[[ 0.0554,  1.1340, -0.5326]]]),
 tensor([[[ 0.6592, -1.5964, -0.3769]]])]

## LSTM
(All the information about LSTM can be found here: https://pytorch.org/docs/stable/nn.html#lstm)

A little bit of explanation of what was just done.

First of all, for explanation of a difference between `reshape` and `view` see here: https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view. Basically view works only in the special case when reshape works

Input data dimensionality:
* first dimension: sequence itself
* second dimension: samples from a mini-batch
* third dimension: elements of a vector input

Why is `hidden` a tuple of two vectors? Don't we need just a single one? It is becasue while in simple RNN the model contained just a hidden vectors, in LSTM we also have cell memory (often denoted as $c$). Because of that, we have to initialize that as well.

<img src="img/lstm_equations.png">

In the for loop we basically iterated over all elements of a sequence and performed the forward pass. We can also do it without a for loop, treating input as one tensor (which actually makes more sense).

In [56]:
inputs = torch.cat(input_sequence).view(len(inputs), 1, -1)
print("Inputs shape: {}".format(inputs.shape))
hidden = (torch.randn(1, 1, hidden_size), torch.randn(1, 1, hidden_size))
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

Inputs shape: torch.Size([5, 1, 3])
tensor([[[-0.2566,  0.1294,  0.0441, -0.5235]],

        [[-0.4444,  0.0762,  0.0304, -0.3889]],

        [[-0.1741,  0.1061, -0.0179, -0.1505]],

        [[-0.1409,  0.1110,  0.0773, -0.2373]],

        [[-0.2308,  0.1164, -0.0115, -0.2423]]], grad_fn=<StackBackward>)
(tensor([[[-0.2308,  0.1164, -0.0115, -0.2423]]], grad_fn=<StackBackward>), tensor([[[-0.4093,  0.6065, -0.0288, -0.4107]]], grad_fn=<StackBackward>))


The output of LSTM is a tuple of
* all the hidden states (for each input sequence) 
* last hidden state

It is kind of redundant, becasuse we could have taken last hidden state from the first element of the tuple as well (check that the last element of first element of a tuple is the same as first element of second element of the tuple), but it's not the case for $c$, which we have to input to LSTM later on in a forward pass.

## Summary
And that's it. As we see most of the details of LSTM were abstracted away for us, we just need a basic understanding how LSTM works. 

Note however, that we ignored the our approach was to perform a forward pass on a single observation (sequence). That's perfectly fine and we can optimize (do backpropagation) one sample at a time, but what if we want to process more elements at once? Well, we can create mini batches. But we have to keep in mind that the sequences can be of different lengths (that's the whole point of sequence models like RNNs) so it's not clear so far how to deal with that. 