fix: engine initializes optimizer attributes at the beginning#7410
Merged
Conversation
As in `destroy`, `self.optimizer` is called, but the error
out calling to `destroy` can happen in `__init__`, even before
optimizer and scheduler is configured. So we need to move
`self.optimizer` to the top to avoid triggering another
exception.
e.g.:
```logs
File "deepspeed/runtime/engine.py", line 453, in _configure_tensor_parallel_states
assert self.zero_optimization_stage(
AssertionError: Currently, the compatibility between 'autotp' and 'zero_stage = 3' has not been validated
Exception ignored in: <function DeepSpeedEngine.__del__ at 0x1516c0610820>
Traceback (most recent call last):
File "deepspeed/runtime/engine.py", line 509, in __del__
self.destroy()
File "deepspeed/runtime/engine.py", line 512, in destroy
if self.optimizer is not None and hasattr(self.optimizer, 'destroy'):
File "deepspeed/runtime/engine.py", line 621, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DeepSpeedEngine' object has no attribute 'optimizer'
```
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Collaborator
|
@HollowMan6, thanks for the PR. It seems this is to help with graceful exit in the event of failure during initialization. Is that correct? |
Contributor
Author
Yes, especially when the optimizer is not initialized. |
Collaborator
|
@HollowMan6, got it. Is it possible to add a unit test? |
Contributor
Author
I'm not quite sure if it's feasible to test, as we just need to make sure |
sfc-gh-truwase
approved these changes
Jul 7, 2025
lpnpcs
pushed a commit
to lpnpcs/DeepSpeed
that referenced
this pull request
Jul 30, 2025
…eedai#7410) As in `destroy`, `self.optimizer` is called, but the error out calling to `destroy` can happen in `__init__`, even before optimizer and scheduler is configured. So we need to move `self.optimizer` to the top to avoid triggering another exception. e.g.: ```logs File "deepspeed/runtime/engine.py", line 453, in _configure_tensor_parallel_states assert self.zero_optimization_stage( AssertionError: Currently, the compatibility between 'autotp' and 'zero_stage = 3' has not been validated Exception ignored in: <function DeepSpeedEngine.__del__ at 0x1516c0610820> Traceback (most recent call last): File "deepspeed/runtime/engine.py", line 509, in __del__ self.destroy() File "deepspeed/runtime/engine.py", line 512, in destroy if self.optimizer is not None and hasattr(self.optimizer, 'destroy'): File "deepspeed/runtime/engine.py", line 621, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'DeepSpeedEngine' object has no attribute 'optimizer' ``` Signed-off-by: Hollow Man <hollowman@opensuse.org>
mauryaavinash95
pushed a commit
to DataStates/DeepSpeed
that referenced
this pull request
Oct 4, 2025
…eedai#7410) As in `destroy`, `self.optimizer` is called, but the error out calling to `destroy` can happen in `__init__`, even before optimizer and scheduler is configured. So we need to move `self.optimizer` to the top to avoid triggering another exception. e.g.: ```logs File "deepspeed/runtime/engine.py", line 453, in _configure_tensor_parallel_states assert self.zero_optimization_stage( AssertionError: Currently, the compatibility between 'autotp' and 'zero_stage = 3' has not been validated Exception ignored in: <function DeepSpeedEngine.__del__ at 0x1516c0610820> Traceback (most recent call last): File "deepspeed/runtime/engine.py", line 509, in __del__ self.destroy() File "deepspeed/runtime/engine.py", line 512, in destroy if self.optimizer is not None and hasattr(self.optimizer, 'destroy'): File "deepspeed/runtime/engine.py", line 621, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'DeepSpeedEngine' object has no attribute 'optimizer' ``` Signed-off-by: Hollow Man <hollowman@opensuse.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As in
destroy,self.optimizeris called, but the error out calling todestroycan happen in__init__, even before optimizer and scheduler is configured. So we need to moveself.optimizerto the top to avoid triggering another exception.e.g.: