Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 1.45 KB

learning-phrase-representations.md

File metadata and controls

18 lines (12 loc) · 1.45 KB

TLDR; The authors propose a novel encoder-decoder neural network architecture. The encoder RNN encodes a sequence into a fixed length vector representation and the decoder generates a new variable-length sequence based on this representation. The authors also introduce a new cell type (now called GRU) to be used with this network architecture. The model is evaluated on a statistical machine translation task where it is fed as an additional feature to a log-linear model. It leads to improved BLEU scores. The authors also find that the model learns syntactically and semantically meaningful representations of both words and phrases.

Key Points:

  • New encoder-decoder architecture, seq2seq. Decoder conditioned on thought vector.
  • Architecture can be used for both scoring or generation
  • New hidden unit type, now called GRU. Simplified LSTM.
  • Could replace whole pipeline with this architecture, but this paper doesn't
  • 15k vocabulary (93% of dataset cover). 100d embeddings, 500 maxout units in final affine layer, batch size of 64, adagrad, 384M words, 3 days training time.
  • Architecture is trained without frequency information so we expect it to capture linguistic information rather than statistical information.
  • Visualizations of both words embeddings and thought vectors.

Questions/Notes

  • Why not just use LSTM units?