Fix Nebula checkpoint engine commit() API mismatch#7740
Merged
sfc-gh-truwase merged 2 commits intodeepspeedai:masterfrom Dec 22, 2025
Merged
Fix Nebula checkpoint engine commit() API mismatch#7740sfc-gh-truwase merged 2 commits intodeepspeedai:masterfrom
sfc-gh-truwase merged 2 commits intodeepspeedai:masterfrom
Conversation
Update checkpoint_engine.commit() calls to pass CheckpointCommitInfo object instead of just the tag string, allowing more information to be passed to the checkpoint engine. Signed-off-by: Rakshit-gen <sisodiarakshit456@gmail.com>
Contributor
Author
|
@sfc-gh-truwase can we review this change? |
sfc-gh-truwase
approved these changes
Dec 22, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AttributeError: 'str' object has no attribute 'tag'when using Nebula checkpoint engineCheckpointCommitInfoobject instead of rawtagstring tocheckpoint_engine.commit()Description
The
CheckpointEngine.commit()interface expects aCheckpointCommitInfoobject, but two call sites inengine.pywere passing a rawtagstring instead:save_checkpoint()at line 3695save_16bit_model()at line 4230This worked with
TorchCheckpointEnginebecause it ignores the parameter, butNebulaCheckpointEngineaccessesinfo.tag, causing the crash.Changes
CheckpointCommitInfoobject before callingcommit()commit_infovariable instead oftagTest plan
AttributeErrorFixes #7678