You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
question1: in my understanding of the example, when we are predicting the first word in a sentence, the hidden state contains information with the previous sentence, this would lead to unfair comparison(in fact, in a lot of lm examples in famous toolkits, they don't care about this issue either, maybe the developer is not a lm people), for example, for standard N-gram evaluation, they don't utilize words with the previous sentence.
question2: in my understanding of the example, let's first assume sentence1 is [x1 x2 x3 x4 x5], sentence2 is [y1 y2 y3 y4 y5], then the first mb could be [x1 x2 x3], the second mb could be [x4 x5 y1], etc.
However, according the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch", the better way is to set the mb like [(x1 y1) (x2 y2) (x3 y3)], so that more parallelism would be allowed.
Please correct me if my understanding is wrong. Thanks!
The text was updated successfully, but these errors were encountered:
Yes, that's true, but if you think it's unfair to compare that with other methods, you can modify it. It's meant to be an example and I think it makes perfect sense to keep the hidden state from the previous sentence, as it will likely increase the quality of generated text.
I don't understand the suggestion, it seems that you've just increased the batch twice, so it's not exactly equivalent. Of course larger batches lead to increased parallelism.
Please use issues for problem reports and feature requests. If you want to discuss anything or have a question, post on our forums.
Hello,
question1: in my understanding of the example, when we are predicting the first word in a sentence, the hidden state contains information with the previous sentence, this would lead to unfair comparison(in fact, in a lot of lm examples in famous toolkits, they don't care about this issue either, maybe the developer is not a lm people), for example, for standard N-gram evaluation, they don't utilize words with the previous sentence.
question2: in my understanding of the example, let's first assume sentence1 is [x1 x2 x3 x4 x5], sentence2 is [y1 y2 y3 y4 y5], then the first mb could be [x1 x2 x3], the second mb could be [x4 x5 y1], etc.
However, according the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch", the better way is to set the mb like [(x1 y1) (x2 y2) (x3 y3)], so that more parallelism would be allowed.
Please correct me if my understanding is wrong. Thanks!
The text was updated successfully, but these errors were encountered: