# Seq2Seq Github Tutorial (README)

Compilation of relevant links:
* [Seq2Seq Github](https://github.com/farizrahman4u/seq2seq)
    * [Simple Seq2Seq](https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/models.py#L16)
* [RecurrentShop Github](https://github.com/datalogai/recurrentshop)


![](https://camo.githubusercontent.com/242210d7d0151cae91107ee63bff364a860db5dd/687474703a2f2f6936342e74696e797069632e636f6d2f333031333674652e706e67)

## Simple Seq2Seq

The model (I've replaced some code with pseudocode in [brackets] to make it readable) [[link]](https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/models.py#L64):

```python
def SimpleSeq2Seq(output_dim, output_length, hidden_dim=None, depth=1, dropout=0., **kwargs):
    from recurrentshop import LSTMCell, RecurrentContainer
    encoder = RecurrentContainer([...])
	encoder.add(LSTMCell(hidden_dim, [...]))
	for _ in range([num LSTMS on encoding side]):
		encoder.add(Dropout(dropout))
		encoder.add(LSTMCell(hidden_dim, **kwargs))
        
	decoder = RecurrentContainer(decode=True, [...])
	decoder.add(Dropout([...]))

	if [want more than 1 decoding LSTM]: [add them to model with hidden_dim]
    decoder.add(LSTMCell(output_dim, **kwargs))

	return Sequential([encoder, decoder])
```

Notes:
* One allowed kwarg is 'input_length', which would specify the sequence length of the input. If not provided, appears to support arbitrary sequence lengths (i.e. it will figure it out). 

In [3]:
import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8)
model.compile(loss='mse', optimizer='rmsprop')

To make a 'deep' seq2seq model, just specify the depth argument as seen in the snippet above (in the markdown cell). Accepts either integers or tuples. Examples:
* depth=3 would just get changed to depth=(3, 3), meaning 3 encoder LSTMs and 3 decoder LSTMs (total depth 6). 
* depth=(4, 5) would be 4 encoder LSTMs, 5 decoder LSTMs (total depth 9).

## "Advanced" Seq2Seq Models

According to the Github author, the code below differs from the simple case because:
1. The hidden state of the encoder is "transferred" to decoder. 
    * No it isn't. 
2. Output of decoder at each timestep becomes input to decoder at next time step (isn't this like the definition of LSTMs and RNNs in general?)
3. The hidden state is "propagated throughout the LSTM stack". 
    * Again, isn't this like the definition of an LSTM??
   

In [5]:
import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')