Skip to content

should I manually change the max_position_embeddings for training long context? #5146

@mungg

Description

@mungg

Hi, I'm working on training long-context models using GRPO or SFT.
I set the model_len to my desired context length, but I have a question regarding max_position_embeddings in the model's config.json.

In 360-LLaMA-Factory, it seems like even without manually changing max_position_embeddings, the value is automatically updated in the newly trained model as long as I configure the max length and RoPE settings.
However, in ms-swift, unless I explicitly modify max_position_embeddings myself, even the trained models (with RoPE scaling applied) retain the original value.

so do I need to manually change max_position_embeddings to enable proper training for long-context tasks?
Is this the correct approach?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions