Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiText-103 #1

Closed
Cortexelus opened this issue Nov 5, 2021 · 7 comments
Closed

WikiText-103 #1

Cortexelus opened this issue Nov 5, 2021 · 7 comments

Comments

@Cortexelus
Copy link

Hi, I'm interested in recreating your WikiText-103 LM experiment. Is it possible you could make that easier for me? Thanks! CJ

@albertfgu
Copy link
Contributor

Hi, the WikiText-103 LM config has been added to the README. It can be run with

python -m train experiment=s4-wt103 wandb=null

Note that these experiments are quite expensive. We used 8 A100 GPUs and trained for around 5 days (according to the original paper, the baseline Transformer used 3 days). This is because the S4 model overfits harder on this small dataset, so we turned up the regularization very high and trained for longer.

@thanhlt998
Copy link

Hi, the WikiText-103 LM config has been added to the README. It can be run with

python -m train experiment=s4-wt103 wandb=null

Note that these experiments are quite expensive. We used 8 A100 GPUs and trained for around 5 days (according to the original paper, the baseline Transformer used 3 days). This is because the S4 model overfits harder on this small dataset, so we turned up the regularization very high and trained for longer.

Could you provide me your pretrained S4 LM on the wikitext-103 corpus to experiment the power of this architecture on other downstream tasks? Thanks!

@yuvalkirstain
Copy link

@albertfgu Thanks for the config update.

Can you please upload the logs from the Wikitext-103 experiment? It will help a lot in reproducing the results and provide an early signal if something is wrong.
I am trying to reproduce the results and after 23,000 steps the validation perplexity is ~29 (I expected a lower perplexity at this stage).

Thank you very much!

@albertfgu
Copy link
Contributor

Hi @yuvalkirstain, I am working on exporting the logs. It is a bit complicated because this experiment was split into multiple runs with checkpointing/resuming because of resource management on our cluster.

That said, your perplexity after 23000 steps actually tracks ours very closely. As noted in the paper, S4 had tendencies to overfit on this dataset, so we used very high regularization that slowed down training speed.

@gaceladri
Copy link

Hello, @albertfgu are you planning to release your pre-trained models for text? I am very interested on them. Also, are you planing to integrate your models in Huggingface? huggingface/transformers#14837

@albertfgu
Copy link
Contributor

I think we are leaning toward not releasing the one trained for the paper because of a few reasons, such as the model implementation still undergoing changes and improvements. We are working with HuggingFace to release a version of the model though.

@albertfgu
Copy link
Contributor

A WikiText-103 model has been re-trained and released. Instructions for using it are located throughout the READMEs, for example here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants