Fix init weights in remote code#43768
Fix init weights in remote code#43768zucchini-nlp wants to merge 10 commits intohuggingface:mainfrom
Conversation
| if getattr(module, "_is_hf_initialized", False): | ||
| return | ||
|
|
||
| if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False): | ||
| return | ||
|
|
There was a problem hiding this comment.
module's never have an _is_hf_initialized attr, ig this is a typo? Otherwise it causes the whole model to be random init when remote code has an old-format _init_weights defined and it takes ages for big models
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # 5. Special tokens mask configuration | ||
| # Patterns: "none", "cls_sep", "eos", "bos", "bos_eos", "cls_double_sep", "prefix_suffix" | ||
| self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "cls_sep") | ||
| self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "bos_eos") |
There was a problem hiding this comment.
cc @itazap @ArthurZucker , i want to clarify this part. Should we default to None because cls-sep ids arent always available for all tokenizers. This we are getting [None, 1, 18001, 468, None] as token ids for those models
Ignore the current change to bos_eos
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen2_5_omni |
What does this PR do?
Helps vLLM to bump to v5