You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 1, 2025. It is now read-only.
Megatron generally follows training procedure for Albert and therefore has to have sequence length of 512 (currently has 128), Albert paper https://arxiv.org/pdf/1909.11942.pdf, see section 4.1
Roberta paper states sequence length of 512 (currently has 128), see section 3.1 of the paper https://arxiv.org/pdf/1907.11692.pdf
Bert original paper
Bert paper states sequence length of 512 tokens https://arxiv.org/pdf/1810.04805.pdf, see section A.2, currently has 128.
I didn't check all the models, so this list is not exhaustive.
cc @anijain2305