Non-Monotonic Sequential Text Generation #121

kweonwooj · 2019-02-10T11:49:52Z

Abstract

propose a framework for training models of text generation in non-monotonic orders
- generate tokens in a binary tree structure
learning is framed as imitation learning
achieves competitive performance with conventional left-to-right generation
- tasks : language modeling, sentence completion, word reordering and machine translation

an example generation of proposed approach.
- generation can start from any tokens.
- number in green box is generation order
- number in blue box is reconstruction order
conventional left-to-right can be framed as a special case of binary tree

Imitation Learning framework where oracle policy provides valid distribution over choices of tokens and model parameter learns it via KL Divergence loss

where we have a choice for P_a.
- uniform oracle produces uniform distribution over valid tokens. (does not lead to optimal quality)
- coaching oracle : multiply uniform and current policy

annealed coaching oracle : linear weighted sum of coaching and uniform oracle to provide variety in learning

in imitation learning, roll-in policy is an stochastic mixture of learned model and oracle policy, but in this task, simply using oracle policy throughout performs better

Dataset : Persona-Chat dataset with 133k / 16k / 15k
Model : 2-layered uni-directional LSTM
non-monotonic (annealed) LM produced more diverse(unique and novel) sentences, with average span 1.3~1.4 (span = avg number of child nodes)

POS tag analysis leads to interesting insights
- non-monotonic (annealed) produces in order of PUNCT > PNOUN > VERB > NOUN
- left-to-right produces in order of PNOUN > VERB > NOUN > PUNCT

non-monotonic generation opens up a new spectrum in sentence completion where generation can take place anywhere
- left-to-right can only complete sentences to its right

Dataset : IWSLT16 DeEn 196k / TED tst2013 / TED tst2014
Model : 1-layer bi-LSTM
End-tuning : since end tag is frequent in training, model over-produces end tag during inference. P_a value for end is tuned down with validation set.
7~8 points lower BLEU than Left-to-Right due to drop in 4-gram precision. (1,2-gram is higher, 3-gram is equivalent)
relatively less discrepancy on other metrics but still lower than left-to-right

Left-to-Right seems to be a good inductive bias for generation, that's why there is a big gap in quantitative results on machine translation
Generating tokens in non-monotonic order is far from human's intuitions, but VERY interesting idea
what is the potential gain of generating machine translation outputs in non-monotonic order?
- this idea is interesting, but seems to make the problem more difficult for the model to learn. model now has to learn all combinatorial cases of sentence generation

Link : https://arxiv.org/pdf/1902.02192.pdf
Authors : Welleck et al. 2019

The text was updated successfully, but these errors were encountered:

kweonwooj added NMT Novel Idea labels Feb 10, 2019