Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 1.53 KB

hyper.md

File metadata and controls

38 lines (27 loc) · 1.53 KB

Hyperparameters

back to main README

The default hyperparameters are the result of tuning on a wide variety of tasks, and should thus provide a good starting point in most situations. However, you can tune for your target task to try and improve performance. The most likely starting point would be changing the embeddings, and/or the learning rate (lr). For other options of tuning, see configs/params.json.

To change any hyperparameter, you can either change its value in configs/params.json, or create your own hyperparameters file. If you create your own, we suggest you copy configs/params.json.

cp configs/params.json configs/params-champ.json

After you changed the parameters, you can now train using --parameters_config:

python3 train.py --dataset_configs configs/ewt.json --device 0 --name ewt --parameters_config config/params-champ.json

Parameters that will likely have an effect on performance include:

  • embeddings which are used
  • batching/batch_size: You should probably also change `batching/max_tokens' accordingly. We have this parameter, because it saves a lot of GPU ram.
  • trainer/optimizer/lr in combination with:
  • trainer/num_epochs
  • encoder/dropout

Decoder dropout

Decoder dropout is defined on the task-type level. It can be set in decoders/[task\_type]. Currently, decoder dropout is not supported for language modeling, because it uses the native language modeling head from Huggingface. Thanks to Lguyogiro for the implementation.