Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Moving a trainable model with an optimiser between GPU and CPU #5620

Closed
kfertakis opened this issue Jun 5, 2024 · 3 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@kfertakis
Copy link

Is your feature request related to a problem? Please describe.
When a deespeed model is initialised with an optimiser, the torch.nn.module.to() functionality for moving the model between devices breaks as the optimiser holds references to the model parameters and thus GPU memory is not cleared when trying to move it to CPU for example.

Describe the solution you'd like
Functionality that is similar to torch.nn.module.to() for moving both model and optimiser between devices which de-allocates the previously occupied memory.

Describe alternatives you've considered
The alternative is to destroy the model instance and recreate it from a checkpoint but this has a much higher time cost.

@kfertakis kfertakis added the enhancement New feature or request label Jun 5, 2024
@tohtana tohtana self-assigned this Sep 9, 2024
@tohtana
Copy link
Contributor

tohtana commented Sep 9, 2024

Hi @kfertakis,
Is #6011 useful for your purpose? It is not merged yet but we validated that it reduces the memory footprint.

@kfertakis
Copy link
Author

Hey @tohtana,

Thank you very much for bringing it to my attention. I will be testing it and report back but it does seem promising for my use case. Thanks a lot.

@kfertakis
Copy link
Author

#6011 addresses this request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants