-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Albert Hyperparameters for Fine-tuning SQuAD 2.0 #1974
Comments
Wondering this as well but for GLUE tasks. There don't seem to be a good consensus on hyperparameters such as weight decay and such |
Results using hyperparameters from my first post above, varying only batch size:
|
@ahotrod There is a table in the appendix section of the ALBERT paper, which shows hyperparameters for ALBERT in downstream tasks: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
❓ Questions & Help
I want to fine-tune
albert-xxlarge-v1
on SQuAD 2.0 and am in need of optimal hyperparameters. I did not find any discussion in the Albert original paper regarding suggested fine-tuning hyperparameters, as is provided in the XLNet original paper. I did find the following hard-coded parameters in the Google-research Albertrun_squad_sp.py
code:With fine-tuning on my 2x GPUs taking ~69 hours, I'd like to shrink the number of fine-tuning iterations necessary to attain optimal model performance. Anyone have a bead on the optimal hyperparameters?
Also, Google-research comments in
run_squad_sp.py
state thatwarmup_proportion
is "Proportion of training to perform linear learning rate warmup for." "E.g., 0.1 = 10% of training". Since 3 epochs, batch size = 32 while fine-tuning SQuAD 2.0 results in approximately 12.5K total optimization steps, would I set--warmup_steps = 1250
when calling Transformers' run_squad.py?Thanks in advance for any input.
The text was updated successfully, but these errors were encountered: