You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When a deespeed model is initialised with an optimiser, the torch.nn.module.to() functionality for moving the model between devices breaks as the optimiser holds references to the model parameters and thus GPU memory is not cleared when trying to move it to CPU for example.
Describe the solution you'd like
Functionality that is similar to torch.nn.module.to() for moving both model and optimiser between devices which de-allocates the previously occupied memory.
Describe alternatives you've considered
The alternative is to destroy the model instance and recreate it from a checkpoint but this has a much higher time cost.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When a deespeed model is initialised with an optimiser, the
torch.nn.module.to()
functionality for moving the model between devices breaks as the optimiser holds references to the model parameters and thus GPU memory is not cleared when trying to move it to CPU for example.Describe the solution you'd like
Functionality that is similar to
torch.nn.module.to()
for moving both model and optimiser between devices which de-allocates the previously occupied memory.Describe alternatives you've considered
The alternative is to destroy the model instance and recreate it from a checkpoint but this has a much higher time cost.
The text was updated successfully, but these errors were encountered: