New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load wt103 checkpoint, size mismatch #66
Comments
Can you be more specific about the commands you ran? I followed the instructions in the README with
and this worked fine. |
Thanks for your reply! I want to evaluate the checkpoint's ppl but not use it for generation, so I use the train script but skip training and only do evaluation, and changed the ckpt_path to be the path of the checkpoint:
|
I'm not running into the same error. My evaluation script loads the model with
If this doesn't work, maybe there's something wrong with the dataset? The final test ppl was 20.95 (updated in the arxiv paper). Looking at the logs, the final val loss is
which is 19.68 ppl |
Oh, that might be the reason. |
That's the right version. But the dataset loader uses a cache after processing the vocab, and I thought it's possible that the logic changed and you're using an outdated cache (your error message could be because of an off-by-1 error in the vocab size between 67737 and 67738). It could be worth trying to remove the cache folders inside But if all you wanted is the ppl numbers, those have been reported. |
Thanks Albert! I'd like to eval and test the speed. I'll close this issue now. |
Hi Albert,
I tried to load your recently uploaded wikitext-103 checkpoint, but encountered the following error:
Do you know why is it? I used the wt103 data downloaded from the transformer-xl repo: https://github.com/kimiyoung/transformer-xl/blob/master/getdata.sh.
Thanks!
The text was updated successfully, but these errors were encountered: