Sequence to sequence (Seq2seq)

How it Works

Seq2Seq

Steps:

Encoder is responsible for reading the source text.
Attention Distribution helps inform the network where to look at when generating the next word.
Vocabulary Distribution is then generated. It contains all the words in the vocabulary with an assigned probability of generation.
Decoder generates the word with the highest probability in the Vocabulary Distribution.

Reproduce factually incorrect details
- In example 1, it is caused by poor word embedding of rare words. OOV (Out of Vocabulary) words will not be in the vocabulary distribution causing the NN to choose the best replacement in vocabulary. In this case, the word 3-2 appears to be similar and in vocabulary.
- In example 2, it is caused by words being clustered together due to their similarity causing the NN to think the words are interchangeable.

Example 1
Incorrect Generated Summary	Germany beat Argentina `3-2`
Original Summary	Germany beat Argentina `0-2`
Example 2
Incorrect Generated Summary	`John` wrote this essay
Original Summary	`Timothy` wrote this essay

Tendency to repeat itself
- Due to the decoder’s over-reliance on the decoder input (i.e. previous summary word), rather than referring to longer-term information in the decoder state.
- Problem is common in RNNs.
- Difficult to explain the true cause due to the black box structure of a NN.

Example 1
Incorrect Generated Summary	Germany beat `Germany beat Germany beat` ...
Original Summary	Germany beat ...

Fortunately, the Pointer-Generator model attempts to solve both of the issues.

Completed by Melvin and Joe