# Convolutional Sequence to Sequence Learning

This notebook implements the model in:  
Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 1243-1252. [arXiv:1705.03122](https://arxiv.org/abs/1705.03122).  

This model is based on an *Encoder-Decoder* framework, in which the encoder and the decoder are both CNNs.  
This model encodes all the information of the source sequence into the encoder's hidden states $\{h_1, h_2, \dots, h_{T_x} \}$, and allows the decoder (via an *attention mechanism*) to look at the entire hidden states in *every step* when generating the target sequence, instead of compressing all the information into a fixed-length context vector. This design aims to further relieve the *information compression*.  

![Conv Seq2Seq Learning](fig/conv-seq2seq-learning.png)

In [1]:
import random
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

SEED = 515
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

## Preparing Data