Skip to content

fix(models): Preserve custom token IDs through DiaConfig save and load#43928

Merged
zucchini-nlp merged 2 commits intohuggingface:mainfrom
harshaljanjani:fix/dia-config-token-ids
Feb 13, 2026
Merged

fix(models): Preserve custom token IDs through DiaConfig save and load#43928
zucchini-nlp merged 2 commits intohuggingface:mainfrom
harshaljanjani:fix/dia-config-token-ids

Conversation

@harshaljanjani
Copy link
Contributor

@harshaljanjani harshaljanjani commented Feb 11, 2026

What does this PR do?

The following failing Dia use case was identified and fixed in this PR:

→ Tests that created DiaConfig with custom token IDs (eos_token_id=97 for a vocab_size=100) failed because saving then reloading the config would reset these values to defaults (eos_token_id=1024). The reason being that DiaConfig only set the attrs on the sub decoder_config, not on the main parent config, also leading to an IndexError during generation.
→ For more details on reproducing the bug and the output screenshots, please visit the linked issue!

Fixes #43927.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you fix any necessary existing tests?

@harshaljanjani harshaljanjani marked this pull request as ready for review February 11, 2026 20:05
Comment on lines 273 to 275
self.pad_token_id = pad_token_id
self.eos_token_id = eos_token_id
self.bos_token_id = bos_token_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we need to remote pad_token_id: int = 1025, eos_token_id: int = 1024, bos_token_id: int = 1026 in longer term. They have to be in the config where it's actually set as attr

For BC we can keep it and set default to None, that should work since save_pretrained wasn't saving it anyway in the main config. So smth like

python

def __init__(self,
pad_token_id: int = None,
        eos_token_id: int = None,
        bos_token_id: int = None,

# We could raise a deprecation warning here, but first we need to update the official ckpt in `nari-labs` org
if pad_token_id is not None:
    logger.maybe_warn("Please pass you pad token to the config where it belongs!")
    self.decoder_config.pad_token_id = pad_token_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved; I've updated the tests and docs, and added a TODO to avoid missing the nari-labs org config update (happy to remove if unnecessary). Tests pass without the change to the testing file as well, but maybe this should make the pattern clearer for readers (? :))

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I've also been thinking to get rid of this pattern across the repo haha. I'm not 100% sure but there were a few more similar models, t5gemma and maybe some more. Would be great to check if you have bandwidth. or I'll just let you know later in the comments :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@harshaljanjani
Copy link
Contributor Author

@zucchini-nlp That makes sense! I've left a comment here with the findings within the time I had for your perusal.
For now, I'll resolve this according to your review and ensured it’s BC, maybe on an as-needed basis, I could open a separate issue to address the points below if they are indeed unwanted anti-patterns, lmk. Happy to help fix them down the line as well.
Just a nit: Dia is the only one that "broke" because of the anti-pattern. I've listed what I found in T5Gemma and others as well, but the others seem to be working just fine in terms of token ID preservation, again repro-ed this for the listed models as well :)

→ I've listed the hyperlinks with the correct line numbers for each reference; feel free to cross-check my findings :)

What was I searching for → Accept pad_token_id/eos_token_id/bos_token_id in __init__
→ Forward them to a sub-config (self.decoder_config.eos_token_id = ..)
→ Fail to persist them on the main config obj or pass them to super().__init__()

What I found → Wasn't able to find other models other than Dia with the actual save/load bug; other composite configs set token IDs on a sub-config, but make sure it persists in the main config, speaking of which, the patterns are:

Pattern Models Buggy??
A. Force-set on sub-config only: Takes token IDs and sets them on self.decoder_config.*, but never on self → lost on save / load and defaults overwrite sub-config values. Could only find DiaConfig (before fix), happy to know if I missed any others Yes
B. Extract from sub-config → inject into kwargs: Does not accept token IDs, instead pulls them from the decoder sub-config and injects into **kwargs before calling super().__init__(). T5GemmaConfig, T5Gemma2Config No (works, but this is quite like the anti-pattern you rightly mentioned)
C. Set on self directly → Configs that accept token IDs then set self (pad_token_id = pad_token_id before / after super().__init__()); PreTrainedConfig.__init__ does not explicitly handle these attributes and they persist. SeamlessM4TConfig, SeamlessM4Tv2Config, MoshiConfig, CsmConfig, KyutaiSpeechToTextConfig No

Additional: Pix2StructConfig reads token IDs from text_config (self.eos_token_id = self.text_config.eos_token_id) which also works correctly, but I'm not too sure if that's warranted.

Would love to hear your suggestions for which ones you're looking to fix asap and probably we could move to writing issues for them, happy to help :)

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: dia

@zucchini-nlp
Copy link
Member

@harshaljanjani thanks a lot for your research. The T5gemma seems to work because it actually assigns the attr in subconfig and also in the main config, thus it will save it in both places. Let's fix Dia for now and I will think of unifying these later, I am not 100% sure if it'll disrupt downstream usage for other models if we just delete the attr

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot!

Comment on lines +289 to +290
# TODO: Remove token ID forwarding once the `nari-labs/Dia-1.6B`
# checkpoint is updated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a PR in nari-labs repo and link to this issue?

@harshaljanjani
Copy link
Contributor Author

harshaljanjani commented Feb 12, 2026

Opened a PR at https://github.com/nari-labs/dia with the change; links the issue and the PR.

@zucchini-nlp
Copy link
Member

I meant more like the hub one 😅 Opened https://huggingface.co/nari-labs/Dia-1.6B-0626/discussions/7 as well :)

@zucchini-nlp zucchini-nlp merged commit 403990c into huggingface:main Feb 13, 2026
19 checks passed
@harshaljanjani
Copy link
Contributor Author

I meant more like the hub one 😅 Opened https://huggingface.co/nari-labs/Dia-1.6B-0626/discussions/7 as well :)

Oops, my bad! Should I close the other PR then? 😅

@harshaljanjani harshaljanjani deleted the fix/dia-config-token-ids branch February 13, 2026 09:33
@zucchini-nlp
Copy link
Member

No, you can leave it. They would need to update their GH repo if hub config changes are merged!

@harshaljanjani
Copy link
Contributor Author

No, you can leave it. They would need to update their GH repo if hub config changes are merged!

Perfect, sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] DiaConfig loses custom token IDs after save / load and causes IndexError during generation

3 participants