Skip to content
This repository was archived by the owner on Aug 1, 2025. It is now read-only.
This repository was archived by the owner on Aug 1, 2025. It is now read-only.

[Dashboard] Wrong sequence lengths in HF suite #1842

@ngimel

Description

@ngimel

Megatron generally follows training procedure for Albert and therefore has to have sequence length of 512 (currently has 128), Albert paper https://arxiv.org/pdf/1909.11942.pdf, see section 4.1
Roberta paper states sequence length of 512 (currently has 128), see section 3.1 of the paper https://arxiv.org/pdf/1907.11692.pdf
Bert original paper
Bert paper states sequence length of 512 tokens https://arxiv.org/pdf/1810.04805.pdf, see section A.2, currently has 128.
I didn't check all the models, so this list is not exhaustive.
cc @anijain2305

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions