Skip to content

fix(gpt-neox): preserve rotary_pct across save/load cycle#44917

Closed
s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx:fix/44913-gpt-neox-rotary-pct
Closed

fix(gpt-neox): preserve rotary_pct across save/load cycle#44917
s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx:fix/44913-gpt-neox-rotary-pct

Conversation

@s-zx
Copy link
Copy Markdown

@s-zx s-zx commented Mar 21, 2026

Summary

GPTNeoXConfig.convert_rope_params_to_dict unconditionally overwrote
rope_parameters["partial_rotary_factor"] with the default 0.25 when
rotary_pct was absent from kwargs.

On every from_pretrained call, rotary_pct is not present as a top-level
key in config.json (it is stored inside rope_parameters), so the value was
silently reset to 0.25.

Fix

Replace the unconditional assignment with a conditional + setdefault:

-self.rope_parameters["partial_rotary_factor"] = kwargs.pop("rotary_pct", 0.25)
+rotary_pct = kwargs.pop("rotary_pct", None)
+if rotary_pct is not None:
+    self.rope_parameters["partial_rotary_factor"] = rotary_pct
+else:
+    self.rope_parameters.setdefault("partial_rotary_factor", 0.25)

When rotary_pct is absent (the reload path), setdefault only fills in 0.25
if partial_rotary_factor is not already present, so an existing value is preserved.

Fixes #44913

`convert_rope_params_to_dict` unconditionally wrote the default value
0.25 to `partial_rotary_factor` whenever `rotary_pct` was absent from
kwargs, which happens on every `from_pretrained` call (the key is
serialised inside `rope_parameters`, not as a top-level key).

Use `kwargs.pop("rotary_pct", None)` and guard with `setdefault` so
the existing value inside `rope_parameters` is preserved when
`rotary_pct` is not present in the incoming kwargs.

Fixes huggingface#44913
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_neox

@Rocketknight1
Copy link
Copy Markdown
Member

Hi @s-zx, no more code agent PRs like this please! As we state in CONTRIBUTING, the reviewers are getting flooded with these, and usually they have issues that require extra maintainer attention. For example, your other PR at #44916 broke the copying mechanism, which was being discussed in the issue at #44855 when you opened the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In GPTNeoXConfig, rotary_pct silently reverts to default on reload

3 participants