[Dashboard] Wrong sequence lengths in HF suite

Megatron generally follows training procedure for Albert and therefore has to have sequence length of 512 (currently has 128), Albert paper https://arxiv.org/pdf/1909.11942.pdf, see section 4.1
Roberta paper states sequence length of 512 (currently has 128), see section 3.1 of the paper https://arxiv.org/pdf/1907.11692.pdf
Bert original paper
Bert paper states sequence length of 512 tokens https://arxiv.org/pdf/1810.04805.pdf, see section A.2, currently has 128. 
I didn't check all the models, so this list is not exhaustive.
cc @anijain2305  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dashboard] Wrong sequence lengths in HF suite #1842

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Dashboard] Wrong sequence lengths in HF suite #1842

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions