-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WikiText-103 #1
Comments
Hi, the WikiText-103 LM config has been added to the README. It can be run with
Note that these experiments are quite expensive. We used 8 A100 GPUs and trained for around 5 days (according to the original paper, the baseline Transformer used 3 days). This is because the S4 model overfits harder on this small dataset, so we turned up the regularization very high and trained for longer. |
Could you provide me your pretrained S4 LM on the wikitext-103 corpus to experiment the power of this architecture on other downstream tasks? Thanks! |
@albertfgu Thanks for the config update. Can you please upload the logs from the Wikitext-103 experiment? It will help a lot in reproducing the results and provide an early signal if something is wrong. Thank you very much! |
Hi @yuvalkirstain, I am working on exporting the logs. It is a bit complicated because this experiment was split into multiple runs with checkpointing/resuming because of resource management on our cluster. That said, your perplexity after 23000 steps actually tracks ours very closely. As noted in the paper, S4 had tendencies to overfit on this dataset, so we used very high regularization that slowed down training speed. |
Hello, @albertfgu are you planning to release your pre-trained models for text? I am very interested on them. Also, are you planing to integrate your models in Huggingface? huggingface/transformers#14837 |
I think we are leaning toward not releasing the one trained for the paper because of a few reasons, such as the model implementation still undergoing changes and improvements. We are working with HuggingFace to release a version of the model though. |
A WikiText-103 model has been re-trained and released. Instructions for using it are located throughout the READMEs, for example here. |
Hi, I'm interested in recreating your WikiText-103 LM experiment. Is it possible you could make that easier for me? Thanks! CJ
The text was updated successfully, but these errors were encountered: