---
### RNN Block Basics
##### (ref. https://github.com/buomsoo-kim/PyTorch-learners-tutorial)
##### (ref. https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912)
---

In [1]:
import numpy as np
import pandas as pd
import torch, torchvision
import torch.nn as nn
torch.__version__

'1.1.0'

## Basics of RNN


Like human start their thinnking not from scratch but based on previous experience or memory  
Neural Network could imitate to use previous data with Recurrent Neural Network(RNN)  
Passing a message to successor node  
It is successful to deal with sequence data  
<br>

![alt text](https://cdn-images-1.medium.com/max/1600/1*xLcQd_xeBWHeC6CeYSJ9bA.png)
<br>
![alt text](https://cdn-images-1.medium.com/max/1600/1*XosBFfduA1cZB340SSL1hg.png)

In [None]:
nn.RNN()

### 1) Vanilla RNN

![alt_text](https://cdn-images-1.medium.com/max/1600/1*ccHxugJhQo7VH4GAAZt3Sg.png)
<br>
![alt_text](https://cdn-images-1.medium.com/max/1200/1*jLWB_Dute-qB43DXqe8G3Q.png)

- ```torch.nn.RNN()```: multi-layer Elman RNN
  - Parameters
      - ```input_size``` : The number of expected features in the input `x` (the number of activation block)
      - ```hidden_size``` : The number of features in the hidden state `h` (the number of nodes in block)
      - ```num_layers``` : Number of recurrent layers. if 2 then stacked RNN with 2 layers (stacked RMN layers aboved RNN layer)
      - ```nonlinearity``` : activation function with tanh or relu
      - ```batch_first``` :  If True, then the input and output tensors are provided as `(batch, seq, feature)
      - ```bidirectional``` :  If True, bidirectional RNN
   
stacked RNN
![alt_text](https://lh6.googleusercontent.com/rC1DSgjlmobtRxMPFi14hkMdDqSkEkuOX7EW_QrLFSymjasIM95Za2Wf-VwSC1Tq1sjJlOPLJ92q7PTKJh2hjBoXQawM6MQC27east67GFDklTalljlt0cFLZnPMdhp8erzO)

In [55]:
rnn = nn.RNN(input_size = 10, 
             hidden_size = 10, 
             num_layers = 1)

In [56]:
## inputs to RNN
# input data (seq_len, batch_size, input_size)
x0 = torch.from_numpy(np.random.randn(12, 64, 10)).float()     
# hidden state (num_layers * num_directions, batch_size, hidden_size)
h0 = torch.from_numpy(np.zeros((1, 64, 10))).float()            

print(x0.shape, h0.shape)

torch.Size([12, 64, 10]) torch.Size([1, 64, 10])


In [57]:
out, h1 = rnn(x0, h0) # the input size of x0 and h0 should be same as the input_size paramter of nn.RNN
print(out.shape, h1.shape)

torch.Size([12, 64, 10]) torch.Size([1, 64, 10])


In [77]:
# when batch_first = True
rnn = nn.RNN(input_size = 10, 
             hidden_size = 5, 
             num_layers = 2,     # stacked RNN (2 layers)
             batch_first = True)

In [78]:
## inputs to RNN
x0 = torch.from_numpy(np.random.randn(64, 12, 10)).float()     
# note that even batch_first == True, hidden state shape order does not change
h0 = torch.from_numpy(np.zeros((2, 64, 5))).float() # As the number of RNN Layer is 2, h0 should have 2 values

print(x0.shape, h0.shape)

torch.Size([64, 12, 10]) torch.Size([2, 64, 5])


In [79]:
out, h1 = rnn(x0, h0) # the input size of x0 and h0 should be same as the input_size paramter of nn.RNN
print(out.shape, h1.shape)

torch.Size([64, 12, 5]) torch.Size([2, 64, 5])


In [85]:
# bidirectional, stacked RNN
rnn = nn.RNN(input_size = 10, 
             hidden_size = 5, 
             num_layers = 4,     
             bidirectional = True)

x0 = torch.from_numpy(np.random.randn(5, 64, 10)).float()
h0 = torch.from_numpy(np.zeros((4 * 2, 64, 5))).float()  # notice the dimensionality of hidden state
out, h1 = rnn(x0, h0)

print(out.shape, h1.shape)

torch.Size([5, 64, 10]) torch.Size([8, 64, 5])



- Notions for convolution layer
    - **Kernel Size** – the size of the filter.
    - **Kernel Type** – the values of the actual filter. Some examples include identity, edge detection, and sharpen
    - **Stride** – the rate at which the kernel passes over the input image. A stride of 2 moves the kernel in 2-pixel increments
    - **Padding** – we can add layers of 0s to the outside of the image in order to make sure that the kernel properly passes over the edges of the image
    - **Output Layers** – how many different kernels are applied to the image


- How to calculate output size of convolution operation
  <br> 
*(W - F + 2P)/S + 1* <br>
  - *W*: input size
  - *F*: kernel size
  - *P*: padding 
  - *S*: stride
  
![alt text](http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif)
<br>
<br>
![alt_text](https://qph.fs.quoracdn.net/main-qimg-af9899617c2beedbc89c036e3b8a9e78)