# Thinking in tensors in PyTorch

Hands-on training  by [Piotr Migdał](https://p.migdal.pl) (2019). 

Version for [AI & NLP Workshop Day](https://nlpday.pl/), 31 May 2019, Warsaw, Poland: **Understanding LSTM and GRU networks in PyTorch**.

> Long short-term memory (LSTM) and gated recurrent unit (GRU) network are popular network architectures for text processing. During this workshop (held in PyTorch) we will work with them in a low-level way, getting access to memory cells and intermediate states. Targeted at people using LSTMs/GRUs as black boxes OR have a background in other network architectures and would like to understand natural language processing with deep learning.


## NLP & AI: 1. RNN architecture overview

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/stared/thinking-in-tensors-writing-in-pytorch/blob/master/extra/1%20RNN%20architecture%20overview.ipynb)

We use recurrent networks. For wonderful introductions:

* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) by Chris Olah
* [Exploring LSTMs](http://blog.echen.me/2017/05/30/exploring-lstms/) by Edwin Chen	

See also:

* [Simple diagrams of convoluted neural networks](https://medium.com/inbrowserai/simple-diagrams-of-convoluted-neural-networks-39c097d2925b) by Piotr Migdał
* [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy
* [Repository to track the progress in Natural Language Processing](https://github.com/sebastianruder/NLP-progress) by Sebastian Ruder
* [Memorization in RNNs](https://distill.pub/2019/memorization-in-rnns/)

And a few technical remarks:

* [Inconsistent dimension ordering for 1D networks - NCL vs NLC vs LNC](https://discuss.pytorch.org/t/inconsistent-dimension-ordering-for-1d-networks-ncl-vs-nlc-vs-lnc/14807)
* [Contiguous() and permute()](https://discuss.pytorch.org/t/contiguous-and-permute/20673)

How to think about tensors:

* [Named tensors](http://nlp.seas.harvard.edu/NamedTensor) and [Named tensors (part 2)](http://nlp.seas.harvard.edu/NamedTensor2) by Alexander Rush 
* [Matrices as Tensor Network Diagrams](https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams) by Tai-Danae Bradley
* There are Named Tensors with [PyTorch 1.3.0 release](https://github.com/pytorch/pytorch/releases/tag/v1.3.0)

### Let's do a few examples in... Keras

* [Keras or PyTorch as your first deep learning framework](https://deepsense.ai/keras-or-pytorch/) by Rafał Jakubanis and Piotr Migdał

At least for the first approach, Keras may over an easier start:

* [Recurrent Layers - Keras](https://keras.io/layers/recurrent/)
* [François Chollet, "Deep Learning with Python"](https://www.manning.com/books/deep-learning-with-python), Chapter 6. Deep learning for text and sequences

Here we use [keras-sequential-ascii](https://github.com/stared/keras-sequential-ascii) package.



In [1]:
# if you need to install that
!pip install -q keras_sequential_ascii

[33mYou are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
from keras.models import Sequential
from keras.layers import Flatten, SimpleRNN, Dense, Dropout, LSTM, GRU, Bidirectional

from keras_sequential_ascii import sequential_model_to_ascii_printout

Using TensorFlow backend.


## Simple Recurrent Neural Networks

![](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png)

from [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

In [3]:
model = Sequential([
    SimpleRNN(32, return_sequences=False, input_shape=(10, 26)),
    Dense(5, activation='softmax')
])

sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     10   26
           SimpleRNN   ????? -------------------      1888    92.0%
                tanh   #####          32
               Dense   XXXXX -------------------       165     8.0%
             softmax   #####           5


### Long short-term memory (LSTM)

![](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png)

from [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

In [4]:
model = Sequential([
    LSTM(32, input_shape=(10, 26)),
    Dense(5, activation='softmax')
])

sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     10   26
                LSTM   LLLLL -------------------      7552    97.9%
                tanh   #####          32
               Dense   XXXXX -------------------       165     2.1%
             softmax   #####           5


In [5]:
model = Sequential([
    Bidirectional(LSTM(32, input_shape=(10, 26), return_sequences=True)),
    Bidirectional(LSTM(32)),
    Dense(5, activation='softmax')
])

# sequential_model_to_ascii_printout(model)

In [6]:
model = Sequential([
    LSTM(32, input_shape=(10, 26), return_sequences=True),
    LSTM(32),
    Dense(5, activation='softmax')
])

sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     10   26
                LSTM   LLLLL -------------------      7552    47.1%
                tanh   #####     10   32
                LSTM   LLLLL -------------------      8320    51.9%
                tanh   #####          32
               Dense   XXXXX -------------------       165     1.0%
             softmax   #####           5


### Gated Recurrent Unit (GRU)

To some extent- 

![](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-var-GRU.png)

from [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

In [7]:
# GRU is a drop-on replacement
model = Sequential([
    GRU(32, input_shape=(10, 26), return_sequences=True),
    GRU(32),
    Dense(5, activation='softmax')
])

sequential_model_to_ascii_printout(model)

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     10   26
                 GRU   LLLLL -------------------      5664    46.9%
                tanh   #####     10   32
                 GRU   LLLLL -------------------      6240    51.7%
                tanh   #####          32
               Dense   XXXXX -------------------       165     1.4%
             softmax   #####           5


If in Keras is is simpler, why bother?