should I manually change the max_position_embeddings for training long context?

Hi, I'm working on training long-context models using GRPO or SFT.
I set the model_len to my desired context length, but I have a question regarding `max_position_embeddings` in the model's config.json.

In 360-LLaMA-Factory, it seems like even without manually changing `max_position_embeddings`, the value is automatically updated in the newly trained model as long as I configure the max length and RoPE settings.
However, in ms-swift, unless I explicitly modify `max_position_embeddings` myself, even the trained models (with RoPE scaling applied) retain the original value.

so do I need to manually change max_position_embeddings to enable proper training for long-context tasks?
Is this the correct approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

should I manually change the max_position_embeddings for training long context? #5146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

should I manually change the max_position_embeddings for training long context? #5146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions