Hyperparameters

back to main README

The default hyperparameters are the result of tuning on a wide variety of tasks, and should thus provide a good starting point in most situations. However, you can tune for your target task to try and improve performance. The most likely starting point would be changing the embeddings, and/or the learning rate (lr). For other options of tuning, see configs/params.json.

To change any hyperparameter, you can either change its value in configs/params.json, or create your own hyperparameters file. If you create your own, we suggest you copy configs/params.json.

cp configs/params.json configs/params-champ.json

After you changed the parameters, you can now train using --parameters_config:

python3 train.py --dataset_configs configs/ewt.json --device 0 --name ewt --parameters_config config/params-champ.json

Parameters that will likely have an effect on performance include:

embeddings which are used
batching/batch_size: You should probably also change `batching/max_tokens' accordingly. We have this parameter, because it saves a lot of GPU ram.
trainer/optimizer/lr in combination with:
trainer/num_epochs
encoder/dropout

Decoder dropout

Decoder dropout is defined on the task-type level. It can be set in decoders/[task\_type]. Currently, decoder dropout is not supported for language modeling, because it uses the native language modeling head from Huggingface. Thanks to Lguyogiro for the implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hyper.md

hyper.md

Hyperparameters

Decoder dropout

Files

hyper.md

Latest commit

History

hyper.md

File metadata and controls

Hyperparameters

Decoder dropout