fix(models): Preserve custom token IDs through DiaConfig save and load by harshaljanjani · Pull Request #43928 · huggingface/transformers

harshaljanjani · 2026-02-11T19:59:42Z

What does this PR do?

The following failing Dia use case was identified and fixed in this PR:

→ Tests that created DiaConfig with custom token IDs (eos_token_id=97 for a vocab_size=100) failed because saving then reloading the config would reset these values to defaults (eos_token_id=1024). The reason being that DiaConfig only set the attrs on the sub decoder_config, not on the main parent config, also leading to an IndexError during generation.
→ For more details on reproducing the bug and the output screenshots, please visit the linked issue!

Fixes #43927.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you fix any necessary existing tests?

zucchini-nlp · 2026-02-12T08:36:09Z

src/transformers/models/dia/configuration_dia.py

+        self.pad_token_id = pad_token_id
+        self.eos_token_id = eos_token_id
+        self.bos_token_id = bos_token_id


imo we need to remote pad_token_id: int = 1025, eos_token_id: int = 1024, bos_token_id: int = 1026 in longer term. They have to be in the config where it's actually set as attr

For BC we can keep it and set default to None, that should work since save_pretrained wasn't saving it anyway in the main config. So smth like

python

def __init__(self, pad_token_id: int = None, eos_token_id: int = None, bos_token_id: int = None, # We could raise a deprecation warning here, but first we need to update the official ckpt in `nari-labs` org if pad_token_id is not None: logger.maybe_warn("Please pass you pad token to the config where it belongs!") self.decoder_config.pad_token_id = pad_token_id

Resolved; I've updated the tests and docs, and added a TODO to avoid missing the nari-labs org config update (happy to remove if unnecessary). Tests pass without the change to the testing file as well, but maybe this should make the pattern clearer for readers (? :))

zucchini-nlp

Thanks for catching this, I've also been thinking to get rid of this pattern across the repo haha. I'm not 100% sure but there were a few more similar models, t5gemma and maybe some more. Would be great to check if you have bandwidth. or I'll just let you know later in the comments :)

HuggingFaceDocBuilderDev · 2026-02-12T08:47:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

harshaljanjani · 2026-02-12T11:20:50Z

@zucchini-nlp That makes sense! I've left a comment here with the findings within the time I had for your perusal.
For now, I'll resolve this according to your review and ensured it’s BC, maybe on an as-needed basis, I could open a separate issue to address the points below if they are indeed unwanted anti-patterns, lmk. Happy to help fix them down the line as well.
Just a nit: Dia is the only one that "broke" because of the anti-pattern. I've listed what I found in T5Gemma and others as well, but the others seem to be working just fine in terms of token ID preservation, again repro-ed this for the listed models as well :)

→ I've listed the hyperlinks with the correct line numbers for each reference; feel free to cross-check my findings :)

What was I searching for → Accept pad_token_id/eos_token_id/bos_token_id in __init__
→ Forward them to a sub-config (self.decoder_config.eos_token_id = ..)
→ Fail to persist them on the main config obj or pass them to super().__init__()

What I found → Wasn't able to find other models other than Dia with the actual save/load bug; other composite configs set token IDs on a sub-config, but make sure it persists in the main config, speaking of which, the patterns are:

Pattern	Models	Buggy??
A. Force-set on sub-config only: Takes token IDs and sets them on `self.decoder_config.*`, but never on `self` → lost on save / load and defaults overwrite sub-config values.	Could only find DiaConfig (before fix), happy to know if I missed any others	Yes
B. Extract from sub-config → inject into kwargs: Does not accept token IDs, instead pulls them from the decoder sub-config and injects into `**kwargs` before calling `super().__init__()`.	T5GemmaConfig, T5Gemma2Config	No (works, but this is quite like the anti-pattern you rightly mentioned)
C. Set on self directly → Configs that accept token IDs then set `self` (`pad_token_id = pad_token_id` before / after `super().__init__()`); `PreTrainedConfig.__init__` does not explicitly handle these attributes and they persist.	SeamlessM4TConfig, SeamlessM4Tv2Config, MoshiConfig, CsmConfig, KyutaiSpeechToTextConfig	No

→ Additional: Pix2StructConfig reads token IDs from text_config (self.eos_token_id = self.text_config.eos_token_id) which also works correctly, but I'm not too sure if that's warranted.

Would love to hear your suggestions for which ones you're looking to fix asap and probably we could move to writing issues for them, happy to help :)

github-actions · 2026-02-12T11:40:18Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: dia

zucchini-nlp · 2026-02-12T11:54:05Z

@harshaljanjani thanks a lot for your research. The T5gemma seems to work because it actually assigns the attr in subconfig and also in the main config, thus it will save it in both places. Let's fix Dia for now and I will think of unifying these later, I am not 100% sure if it'll disrupt downstream usage for other models if we just delete the attr

zucchini-nlp

Nice, thanks a lot!

zucchini-nlp · 2026-02-12T11:58:52Z

src/transformers/models/dia/configuration_dia.py

+        # TODO: Remove token ID forwarding once the `nari-labs/Dia-1.6B`
+        # checkpoint is updated


Can you open a PR in nari-labs repo and link to this issue?

harshaljanjani · 2026-02-12T20:03:01Z

Opened a PR at https://github.com/nari-labs/dia with the change; links the issue and the PR.

zucchini-nlp · 2026-02-13T09:28:01Z

I meant more like the hub one 😅 Opened https://huggingface.co/nari-labs/Dia-1.6B-0626/discussions/7 as well :)

harshaljanjani · 2026-02-13T09:33:43Z

I meant more like the hub one 😅 Opened https://huggingface.co/nari-labs/Dia-1.6B-0626/discussions/7 as well :)

Oops, my bad! Should I close the other PR then? 😅

zucchini-nlp · 2026-02-13T10:55:27Z

No, you can leave it. They would need to update their GH repo if hub config changes are merged!

harshaljanjani · 2026-02-13T10:56:31Z

No, you can leave it. They would need to update their GH repo if hub config changes are merged!

Perfect, sounds good!

fix: Preserve custom token IDs in DiaConfig save and load

d646f8e

harshaljanjani marked this pull request as ready for review February 11, 2026 20:05

github-actions bot requested review from ArthurZucker and Rocketknight1 February 11, 2026 20:05

zucchini-nlp reviewed Feb 12, 2026

View reviewed changes

refactor: Address review comments

765a485

harshaljanjani requested a review from zucchini-nlp February 12, 2026 11:46

zucchini-nlp approved these changes Feb 12, 2026

View reviewed changes

harshaljanjani mentioned this pull request Feb 12, 2026

refactor: Move bos/eos/pad token IDs from DiaConfig to DecoderConfig nari-labs/dia#293

Open

3 tasks

zucchini-nlp merged commit 403990c into huggingface:main Feb 13, 2026
19 checks passed

harshaljanjani deleted the fix/dia-config-token-ids branch February 13, 2026 09:33

		# TODO: Remove token ID forwarding once the `nari-labs/Dia-1.6B`
		# checkpoint is updated

Conversation

harshaljanjani commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

zucchini-nlp Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

harshaljanjani Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2026

Uh oh!

harshaljanjani commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

zucchini-nlp commented Feb 12, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

harshaljanjani commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Feb 13, 2026

Uh oh!

Uh oh!

harshaljanjani commented Feb 13, 2026

Uh oh!

zucchini-nlp commented Feb 13, 2026

Uh oh!

harshaljanjani commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harshaljanjani commented Feb 11, 2026 •

edited

Loading

harshaljanjani commented Feb 12, 2026 •

edited

Loading