## Model Architecture
Most competitive neural sequence transduction models have an __encoder-decoder__ structure ([cite](https://arxiv.org/abs/1409.0473)). Here, the encoder maps an input sequence of symbol representations ($$x_1,\dots,x_n$$) to a sequence of continuous representations $$z=(z_1,\dots,z_n)$$. Given $$z$$, the decoder then generates an output sequence $$(y_1, \dots, y_m)$$ of symbols one element at a time. At each step the model is auto-regressive ([cite](https://arxiv.org/abs/1308.0850)), consuming the previously generated symbols as additional input when generating the next.

In [1]:
class EncoderDecoder(nn.Module):
    """
    A standard Encoder-Decoder architecture. Base for this and many
    other models.
    """
    def __init__(self, encoder, decoder, src_embed, tgt_embed, generator):
        super(EncoderDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.src_embed = src_embed
        self.tgt_embed = tgt_embed
        self.generator = generator

    def forward(self, src, tgt, src_mask, tgt_mask):
        """Take in and process masked src and target sequences."""
        return self.decode(self.encode(src, src_mask), src_mask,
                           tgt, tgt_mask)

    def encode(self, src, src_mask):
        return self.encoder(self.src_embed(src), src_mask)

    def decode(self, memory, src_mask, tgt, tgt_mask):
        return self.decoder(self.tgt_embed(tgt), memory, src_mask, tgt_mask)


NameError: name 'nn' is not defined