Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Hybrid Convolutional Variational Autoencoder for Text Generation #15

Open
flrngel opened this issue Apr 1, 2018 · 0 comments
Open

Comments

@flrngel
Copy link
Owner

flrngel commented Apr 1, 2018

https://arxiv.org/abs/1702.02390

Abstract

  • this model uses rnn + feed-forward convolutional architecture
  • author claims this model has
    • fast train convergence
    • handles long sequences
    • avoid major difficulties from VAE on textual data

1. Introduction

  • author claims this model is
    • first work that applies deconvolution in latent variable generative model of natural text
  • this paper discuss
    • optimization difficulties of VAEs for text
      • propose effective ways to address them

2. Related Work

  • Techniques to improve VAE training
    • KL-term annealing and input dropout
    • imposing structured sparsity on latent variables

3. Model

  • this model respect latent variables that can be able to output sample realistic sentences with latent space

3.1. Variational Autoencoder

  • VAE forces the model to map an input to a region of the space

3.2. Deconvolutional Networks

  • Deconvolutional layer's goal
    • to perform an inverse convolution operation and increase spatial size of the input while decreasing the number of feature maps.
  • Deconvolutional layer's benefit
    • efficient GPU implementation that makes fully parallel
    • feed-forward is typically easier to optimize than using recurrent counterparts

3.3. Hybrid Convolutional-Recurrent VAE (paper model)

  • VAE + RNN architecture
  • RNN is for consuming deconvolutional decoder to have dependency with previous outputs which is, image
    instead of image
  • encode every detail of a text fragment instead of high level feature (like semantic)

3.4. Optimization Difficulties

  • input dropout helped
  • add auxiliary reconstruction term computed from last deconvolutional layer, image
  • finally, image
  • autoregressive part reuses these features

4. Experiments

4.1. Comparision with LSTM VAE

Historyless decoding

  • paper model's historyless decoding was better
  • computationally faster (factor 2)

Decoding with history

  • Paper checks
    • is historyless decoding generallizes well?
    • how model copes latent variable well
  • paper claims their model does not fail on long texts

4.2. Controlling the KL term

Aux cost weight

  • using input dropout increases final loss but this is trade off
  • note that model finds non-trivial latent vectors when a is large enough

Receptive field

  • goal is to study the relationship between KL term values and expressiveness of decoder
    -RNN decoder in LSTM VAE completely ignore information on latent vector
  • Aux helps as Figure 6
    image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant