Skip to content

Fix init weights in remote code#43768

Open
zucchini-nlp wants to merge 10 commits intohuggingface:mainfrom
zucchini-nlp:vllm-v5-bump
Open

Fix init weights in remote code#43768
zucchini-nlp wants to merge 10 commits intohuggingface:mainfrom
zucchini-nlp:vllm-v5-bump

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Feb 5, 2026

What does this PR do?

Helps vLLM to bump to v5

Comment on lines 2312 to 2317
if getattr(module, "_is_hf_initialized", False):
return

if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False):
return

Copy link
Member Author

@zucchini-nlp zucchini-nlp Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module's never have an _is_hf_initialized attr, ig this is a typo? Otherwise it causes the whole model to be random init when remote code has an old-format _init_weights defined and it takes ages for big models

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines 434 to 436
# 5. Special tokens mask configuration
# Patterns: "none", "cls_sep", "eos", "bos", "bos_eos", "cls_double_sep", "prefix_suffix"
self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "cls_sep")
self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "bos_eos")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @itazap @ArthurZucker , i want to clarify this part. Should we default to None because cls-sep ids arent always available for all tokenizers. This we are getting [None, 1, 18001, 468, None] as token ids for those models

Ignore the current change to bos_eos

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_5_omni

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants