The following additional libraries are needed to run this
notebook. Note that running on Colab is experimental, please report a Github
issue if you have any problem.

In [1]:
!pip install d2l==1.0.0-alpha1.post0


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting d2l==1.0.0-alpha1.post0
  Downloading d2l-1.0.0a1.post0-py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 1.1 MB/s 
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting matplotlib-inline
  Downloading matplotlib_inline-0.1.6-py3-none-any.whl (9.4 kB)
Collecting qtconsole
  Downloading qtconsole-5.3.2-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 46.8 MB/s 
Collecting jedi>=0.10
  Downloading jedi-0.18.1-py2.py3-none-any.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 41.4 MB/s 
Collecting qtpy>=2.0.1
  Downloading QtPy-2.2.1-py3-none-any.whl (82 kB)
[K     |████████████████████████████████| 82 kB 824 kB/s 
Installing collected packages: jedi, qtpy, qtconsole, matplotlib-inline, jupyter, d2l
Successfully installed d2l-1.0.0a1.post0 jedi-0.18.1 jupyter-1.0.0 matplotlib-inline-

# Encoder-Decoder Architecture
:label:`sec_encoder-decoder`

In general seq2seq problems 
like machine translation 
(:numref:`sec_machine_translation`),
inputs and outputs are of varying lengths
that are unaligned. 
The standard approach to handling this sort of data
is to design an *encoder-decoder* architecture (:numref:`fig_encoder_decoder`)
consisting of two major components:
an *encoder* that takes a variable-length sequence as input,
and a *decoder* that acts as a conditional language model,
taking in the encoded input 
and the leftwards context of the target sequence 
and predicting the subsequent token in the target sequence. 


![The encoder-decoder architecture.](https://github.com/d2l-ai/d2l-pytorch-colab/blob/master/img/encoder-decoder.svg?raw=1)
:label:`fig_encoder_decoder`

Let's take machine translation from English to French as an example.
Given an input sequence in English:
"They", "are", "watching", ".",
this encoder-decoder architecture
first encodes the variable-length input into a state,
then decodes the state 
to generate the translated sequence,
token by token, as output:
"Ils", "regardent", ".".
Since the encoder-decoder architecture
forms the basis of different seq2seq models
in subsequent sections,
this section will convert this architecture
into an interface that will be implemented later.

## (**Encoder**)

In the encoder interface,
we just specify that
the encoder takes variable-length sequences as input `X`.
The implementation will be provided 
by any model that inherits this base `Encoder` class.


In [2]:
from torch import nn
from d2l import torch as d2l


#@save
class Encoder(nn.Module):
    """The base encoder interface for the encoder-decoder architecture."""
    def __init__(self):
        super().__init__()

    # Later there can be additional arguments (e.g., length excluding padding)
    def forward(self, X, *args):
        raise NotImplementedError

## [**Decoder**]

In the following decoder interface,
we add an additional `init_state` function
to convert the encoder output (`enc_outputs`)
into the encoded state.
Note that this step
may require extra inputs,
such as the valid length of the input,
which was explained
in :numref:`sec_machine_translation`.
To generate a variable-length sequence token by token,
every time the decoder may map an input 
(e.g., the generated token at the previous time step)
and the encoded state 
into an output token at the current time step.


In [3]:
#@save
class Decoder(nn.Module):
    """The base decoder interface for the encoder-decoder architecture."""
    def __init__(self):
        super().__init__()

    # Later there can be additional arguments (e.g., length excluding padding)
    def init_state(self, enc_outputs, *args):
        raise NotImplementedError

    def forward(self, X, state):
        raise NotImplementedError

## [**Putting the Encoder and Decoder Together**]

In the forward propagation,
the output of the encoder
is used to produce the encoded state,
and this state will be further used
by the decoder as one of its input.


In [4]:
#@save
class EncoderDecoder(d2l.Classifier):
    """The base class for the encoder-decoder architecture."""
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, enc_X, dec_X, *args):
        enc_outputs = self.encoder(enc_X, *args)
        dec_state = self.decoder.init_state(enc_outputs, *args)
        # Return decoder output only
        return self.decoder(dec_X, dec_state)[0]

In the next section, 
we will see how to apply RNNs to design 
seq2seq models based on 
this encoder-decoder architecture.


## Summary

Encoder-decoder architectures
can handle inputs and outputs 
that both consist of variable-length sequences
and thus are suitable for seq2seq problems 
such as machine translation.
The encoder takes a variable-length sequence as input 
and transforms it into a state with a fixed shape.
The decoder maps the encoded state of a fixed shape
to a variable-length sequence.


## Exercises

1. Suppose that we use neural networks to implement the encoder-decoder architecture. Do the encoder and the decoder have to be the same type of neural network?  
1. Besides machine translation, can you think of another application where the encoder-decoder architecture can be applied?


[Discussions](https://discuss.d2l.ai/t/1061)
