New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

About the hyperparameters of finetuning t5-base #11

Closed

shunyuzh opened this issue Aug 15, 2021 · 1 comment

shunyuzh commented Aug 15, 2021

Thanks for your awesome project. And I just want to know the hyperparameters of finetuning T5-basa.

You have only shared the T5-large's hyper in the tutorial as followings, could you share T5-base's as the former's ?

python train_reader.py \
        --use_checkpoint \
        --lr 0.00005 \
        --optim adamw \
        --scheduler linear \
        --weight_decay 0.01 \
        --text_maxlength 250 \
        --per_gpu_batch_size 1 \
        --n_context 100 \
        --total_step 15000 \
        --warmup_step 1000 \

Thanks, looking forward to your reply.

The text was updated successfully, but these errors were encountered:

Contributor

gizacard commented Aug 24, 2021

Hi, we used a learning rate equal to 1e-4 for the base model, the rest should be similar.

gizacard closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment