Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about LSTM word language model example #103

Closed
cloudygoose opened this issue Mar 9, 2017 · 3 comments
Closed

questions about LSTM word language model example #103

cloudygoose opened this issue Mar 9, 2017 · 3 comments

Comments

@cloudygoose
Copy link

Hello,
question1: in my understanding of the example, when we are predicting the first word in a sentence, the hidden state contains information with the previous sentence, this would lead to unfair comparison(in fact, in a lot of lm examples in famous toolkits, they don't care about this issue either, maybe the developer is not a lm people), for example, for standard N-gram evaluation, they don't utilize words with the previous sentence.

question2: in my understanding of the example, let's first assume sentence1 is [x1 x2 x3 x4 x5], sentence2 is [y1 y2 y3 y4 y5], then the first mb could be [x1 x2 x3], the second mb could be [x4 x5 y1], etc.
However, according the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch", the better way is to set the mb like [(x1 y1) (x2 y2) (x3 y3)], so that more parallelism would be allowed.

Please correct me if my understanding is wrong. Thanks!

@apaszke
Copy link
Contributor

apaszke commented Mar 9, 2017

  1. Yes, that's true, but if you think it's unfair to compare that with other methods, you can modify it. It's meant to be an example and I think it makes perfect sense to keep the hidden state from the previous sentence, as it will likely increase the quality of generated text.

  2. I don't understand the suggestion, it seems that you've just increased the batch twice, so it's not exactly equivalent. Of course larger batches lead to increased parallelism.

Please use issues for problem reports and feature requests. If you want to discuss anything or have a question, post on our forums.

@apaszke apaszke closed this as completed Mar 9, 2017
@cloudygoose
Copy link
Author

  1. I don't know how to modify it, is there any way to block the gradient flow at a sentence beginning?
  2. I recommend you to read the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch".

@apaszke
Copy link
Contributor

apaszke commented Mar 10, 2017

If you have any questions, please ask on the forums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants