Albert Hyperparameters for Fine-tuning SQuAD 2.0 #1974

ahotrod · 2019-11-27T22:03:22Z

❓ Questions & Help

I want to fine-tune albert-xxlarge-v1 on SQuAD 2.0 and am in need of optimal hyperparameters. I did not find any discussion in the Albert original paper regarding suggested fine-tuning hyperparameters, as is provided in the XLNet original paper. I did find the following hard-coded parameters in the Google-research Albert run_squad_sp.py code:

'do_lower_case' = True
'train_batch_size' = 32
'predict_batch_size' = 8
'learning_rate' = 5e-5
'num_train_epochs' = 3.0
'warmup_proportion' = 0.1

With fine-tuning on my 2x GPUs taking ~69 hours, I'd like to shrink the number of fine-tuning iterations necessary to attain optimal model performance. Anyone have a bead on the optimal hyperparameters?

Also, Google-research comments in run_squad_sp.py state that warmup_proportion is "Proportion of training to perform linear learning rate warmup for." "E.g., 0.1 = 10% of training". Since 3 epochs, batch size = 32 while fine-tuning SQuAD 2.0 results in approximately 12.5K total optimization steps, would I set --warmup_steps = 1250 when calling Transformers' run_squad.py?

Thanks in advance for any input.

The text was updated successfully, but these errors were encountered:

frankfka · 2019-11-27T23:16:50Z

Wondering this as well but for GLUE tasks. There don't seem to be a good consensus on hyperparameters such as weight decay and such

ahotrod · 2019-12-07T05:05:30Z

Results using hyperparameters from my first post above, varying only batch size:

albert_xxlargev1_squad2_512_bs32:
{
  "exact": 83.67725090541565,
  "f1": 87.51235434089064,
  "total": 11873,
  "HasAns_exact": 81.86572199730094,
  "HasAns_f1": 89.54692697189559,
  "HasAns_total": 5928,
  "NoAns_exact": 85.48359966358284,
  "NoAns_f1": 85.48359966358284,
  "NoAns_total": 5945
}

albert_xxlargev1_squad2_512_bs48:
{
  "exact": 83.65198349195654,
  "f1": 87.4736247587816,
  "total": 11873,
  "HasAns_exact": 81.73076923076923,
  "HasAns_f1": 89.38501126197984,
  "HasAns_total": 5928,
  "NoAns_exact": 85.5677039529016,
  "NoAns_f1": 85.5677039529016,
  "NoAns_total": 5945
}

fgksgf · 2019-12-20T13:03:32Z

@ahotrod There is a table in the appendix section of the ALBERT paper, which shows hyperparameters for ALBERT in downstream tasks:

stale · 2020-02-18T13:53:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ahotrod mentioned this issue Dec 9, 2019

Having trouble reproducing SQuAD 2.0 results using ALBERT v2 models #2104

Closed

WilliamNurmi mentioned this issue Feb 6, 2020

XLNET SQuAD2.0 Fine-Tuning - What May Have Changed? #2651

Closed

stale bot added the wontfix label Feb 18, 2020

stale bot closed this as completed Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Albert Hyperparameters for Fine-tuning SQuAD 2.0 #1974

Albert Hyperparameters for Fine-tuning SQuAD 2.0 #1974

ahotrod commented Nov 27, 2019 •

edited

frankfka commented Nov 27, 2019

ahotrod commented Dec 7, 2019 •

edited

fgksgf commented Dec 20, 2019

stale bot commented Feb 18, 2020

Albert Hyperparameters for Fine-tuning SQuAD 2.0 #1974

Albert Hyperparameters for Fine-tuning SQuAD 2.0 #1974

Comments

ahotrod commented Nov 27, 2019 • edited

❓ Questions & Help

frankfka commented Nov 27, 2019

ahotrod commented Dec 7, 2019 • edited

fgksgf commented Dec 20, 2019

stale bot commented Feb 18, 2020

ahotrod commented Nov 27, 2019 •

edited

ahotrod commented Dec 7, 2019 •

edited