This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

What is the learning rate that has been used during the fine-tuning of XLM(MLM+TLM) for XNLI task("Zero shot setting")? #56

Closed

gourango01 opened this issue Apr 15, 2019 · 1 comment

gourango01 commented Apr 15, 2019

In the paper, Section 5.1, "We sample the learning rate of the Adam optimizer with values from 5.10−4 to 2.10−4" but on the github it is "5.10-6" .

aconneau commented Jun 22, 2019

We used the following learning rates: "adam,lr=0.00001", "adam,lr=0.000005", "adam,lr=0.000002"
It's a typo in the paper.

glample closed this as completed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.