Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

What is the learning rate that has been used during the fine-tuning of XLM(MLM+TLM) for XNLI task("Zero shot setting")? #56

Closed
gourango01 opened this issue Apr 15, 2019 · 1 comment

Comments

@gourango01
Copy link

In the paper, Section 5.1, "We sample the learning rate of the Adam optimizer with values from 5.10−4 to 2.10−4" but on the github it is "5.10-6" .

@aconneau
Copy link

We used the following learning rates: "adam,lr=0.00001", "adam,lr=0.000005", "adam,lr=0.000002"
It's a typo in the paper.

@glample glample closed this as completed Jun 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants