Skip to content

Sequence to sequence (Seq2seq)

Joe Kawai edited this page Jul 31, 2018 · 6 revisions

How it Works

Seq2Seq

Steps:

  1. Encoder is responsible for reading the source text.
  2. Attention Distribution helps inform the network where to look at when generating the next word.
  3. Vocabulary Distribution is then generated. It contains all the words in the vocabulary with an assigned probability of generation.
  4. Decoder generates the word with the highest probability in the Vocabulary Distribution.

Issues with Seq2Seq

  1. Reproduce factually incorrect details
    • In example 1, it is caused by poor word embedding of rare words. OOV (Out of Vocabulary) words will not be in the vocabulary distribution causing the NN to choose the best replacement in vocabulary. In this case, the word 3-2 appears to be similar and in vocabulary.
    • In example 2, it is caused by words being clustered together due to their similarity causing the NN to think the words are interchangeable.
Example 1
Incorrect Generated Summary Germany beat Argentina 3-2
Original Summary Germany beat Argentina 0-2
Example 2
Incorrect Generated Summary John wrote this essay
Original Summary Timothy wrote this essay
  1. Tendency to repeat itself
    • Due to the decoder’s over-reliance on the decoder input (i.e. previous summary word), rather than referring to longer-term information in the decoder state.
    • Problem is common in RNNs.
    • Difficult to explain the true cause due to the black box structure of a NN.
Example 1
Incorrect Generated Summary Germany beat Germany beat Germany beat ...
Original Summary Germany beat ...

Fortunately, the Pointer-Generator model attempts to solve both of the issues.

Running the Model

Please refer to the Pointer-Generator's section on 'Running the Model'.

Other Resources & Dependencies