Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Monotonic Sequential Text Generation #121

Open
kweonwooj opened this issue Feb 10, 2019 · 0 comments
Open

Non-Monotonic Sequential Text Generation #121

kweonwooj opened this issue Feb 10, 2019 · 0 comments

Comments

@kweonwooj
Copy link
Owner

kweonwooj commented Feb 10, 2019

Abstract

  • propose a framework for training models of text generation in non-monotonic orders
    • generate tokens in a binary tree structure
  • learning is framed as imitation learning
  • achieves competitive performance with conventional left-to-right generation
    • tasks : language modeling, sentence completion, word reordering and machine translation

Details

Non-Monotonic Generation as Binary-Tree

  • an example generation of proposed approach.
    • generation can start from any tokens.
    • number in green box is generation order
    • number in blue box is reconstruction order
  • conventional left-to-right can be framed as a special case of binary tree

screen shot 2019-02-10 at 8 11 41 pm

Learning for Non-Monotonic Generation

  • Imitation Learning framework where oracle policy provides valid distribution over choices of tokens and model parameter learns it via KL Divergence loss

screen shot 2019-02-10 at 8 26 27 pm

  • Oracle policy is defined by

screen shot 2019-02-10 at 8 27 55 pm

  • where we have a choice for P_a.
    • uniform oracle produces uniform distribution over valid tokens. (does not lead to optimal quality)
    • coaching oracle : multiply uniform and current policy

screen shot 2019-02-10 at 8 28 02 pm

  • annealed coaching oracle : linear weighted sum of coaching and uniform oracle to provide variety in learning

screen shot 2019-02-10 at 8 28 09 pm

  • in imitation learning, roll-in policy is an stochastic mixture of learned model and oracle policy, but in this task, simply using oracle policy throughout performs better

Experiments

Language Model

  • Dataset : Persona-Chat dataset with 133k / 16k / 15k
  • Model : 2-layered uni-directional LSTM
  • non-monotonic (annealed) LM produced more diverse(unique and novel) sentences, with average span 1.3~1.4 (span = avg number of child nodes)

screen shot 2019-02-10 at 8 36 45 pm

  • POS tag analysis leads to interesting insights
    • non-monotonic (annealed) produces in order of PUNCT > PNOUN > VERB > NOUN
    • left-to-right produces in order of PNOUN > VERB > NOUN > PUNCT

screen shot 2019-02-10 at 8 38 51 pm

Sentence Completion

  • non-monotonic generation opens up a new spectrum in sentence completion where generation can take place anywhere
    • left-to-right can only complete sentences to its right

screen shot 2019-02-10 at 8 40 48 pm

Machine Translation

  • Dataset : IWSLT16 DeEn 196k / TED tst2013 / TED tst2014
  • Model : 1-layer bi-LSTM
  • End-tuning : since end tag is frequent in training, model over-produces end tag during inference. P_a value for end is tuned down with validation set.
  • 7~8 points lower BLEU than Left-to-Right due to drop in 4-gram precision. (1,2-gram is higher, 3-gram is equivalent)
  • relatively less discrepancy on other metrics but still lower than left-to-right

screen shot 2019-02-10 at 8 43 57 pm

Personal Thoughts

  • Left-to-Right seems to be a good inductive bias for generation, that's why there is a big gap in quantitative results on machine translation
  • Generating tokens in non-monotonic order is far from human's intuitions, but VERY interesting idea
  • what is the potential gain of generating machine translation outputs in non-monotonic order?
    • this idea is interesting, but seems to make the problem more difficult for the model to learn. model now has to learn all combinatorial cases of sentence generation

Link : https://arxiv.org/pdf/1902.02192.pdf
Authors : Welleck et al. 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant