Fix shard finalization issue to prevent skipping non-LLM tail layers#1548
Merged
lvliang-intel merged 5 commits intomainfrom Mar 17, 2026
Merged
Fix shard finalization issue to prevent skipping non-LLM tail layers#1548lvliang-intel merged 5 commits intomainfrom
lvliang-intel merged 5 commits intomainfrom
Conversation
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
…ix_shard_finalization_skip Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a sharded-weight finalization bug where ShardWriter.finalize() could mistakenly treat the last leaf module (e.g., diffusion proj_out) as a tied LM head and silently skip writing its tensors when model.config.tie_word_embeddings is missing.
Changes:
- Change the default
tie_word_embeddingsassumption inShardWriter.finalize()fromTruetoFalse. - Read
model.config.tie_word_embeddingsonly when the attribute is explicitly present.
You can also share your feedback on Copilot code review. Take the survey.
…ix_shard_finalization_skip
…ix_shard_finalization_skip
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Root Cause
ShardWriter.finalize() uses get_lm_head_name() to identify the tied embedding head and skip it. For diffusion model like FLUX, this returns "proj_out" (the last leaf module). Combined with tie_word_embeddings defaulting to True when the attribute is absent from the model config, proj_out.weight and proj_out.bias were silently skipped and never written to disk.
This bug was latent in the old commit but never triggered because FLUX RTN quantization disabled low_cpu_mem_usage on that code path, so finalize() was never called. PR #1386 changed that branch to keep low_cpu_mem_usage=True, which activated the
is_immediate_savingpath and exposed the bug.Fix
Change the default value of tie_word_embeddings from True to False, consistent with the same logic in utils.py:364.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting