questions about LSTM word language model example #103

cloudygoose · 2017-03-09T11:52:03Z

Hello,
question1: in my understanding of the example, when we are predicting the first word in a sentence, the hidden state contains information with the previous sentence, this would lead to unfair comparison(in fact, in a lot of lm examples in famous toolkits, they don't care about this issue either, maybe the developer is not a lm people), for example, for standard N-gram evaluation, they don't utilize words with the previous sentence.

question2: in my understanding of the example, let's first assume sentence1 is [x1 x2 x3 x4 x5], sentence2 is [y1 y2 y3 y4 y5], then the first mb could be [x1 x2 x3], the second mb could be [x4 x5 y1], etc.
However, according the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch", the better way is to set the mb like [(x1 y1) (x2 y2) (x3 y3)], so that more parallelism would be allowed.

Please correct me if my understanding is wrong. Thanks!

apaszke · 2017-03-09T19:27:35Z

Yes, that's true, but if you think it's unfair to compare that with other methods, you can modify it. It's meant to be an example and I think it makes perfect sense to keep the hidden state from the previous sentence, as it will likely increase the quality of generated text.
I don't understand the suggestion, it seems that you've just increased the batch twice, so it's not exactly equivalent. Of course larger batches lead to increased parallelism.

Please use issues for problem reports and feature requests. If you want to discuss anything or have a question, post on our forums.

cloudygoose · 2017-03-10T06:28:46Z

I don't know how to modify it, is there any way to block the gradient flow at a sentence beginning?
I recommend you to read the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch".

apaszke · 2017-03-10T16:23:08Z

If you have any questions, please ask on the forums.

apaszke closed this as completed Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about LSTM word language model example #103

questions about LSTM word language model example #103

cloudygoose commented Mar 9, 2017

apaszke commented Mar 9, 2017

cloudygoose commented Mar 10, 2017

apaszke commented Mar 10, 2017

questions about LSTM word language model example #103

questions about LSTM word language model example #103

Comments

cloudygoose commented Mar 9, 2017

apaszke commented Mar 9, 2017

cloudygoose commented Mar 10, 2017

apaszke commented Mar 10, 2017