Empty ZeRO3 partition cache #3060

tjruwase · 2023-03-20T20:19:18Z

API to free GPU memory consumed by zero3 partition cache.
Fix #3025

stas00 · 2023-03-22T02:03:44Z

oh, sorry, could this new method be added to the API docs please? Thank you, Tunji!

Perhaps something like:

By default at the end of the training some parameters will remain unpartitioned and use up some gpu memory. This is done on purpose as an optimization should you resume training again. If you'd like to clear out the cached parameters that use up gpu memory, you can call:

deepspeed_engine.empty_partition_cache()

as soon as the training has finished.

stas00 · 2023-03-22T23:26:53Z

Thank you for adding the doc - looks great, Tunji!

Empty ZeRO3 partition cache

6630b75

tjruwase requested review from jeffra and jomayeri March 20, 2023 20:19

tjruwase requested review from samyam and mrwyattii as code owners March 20, 2023 20:19

tjruwase mentioned this pull request Mar 20, 2023

[BUG] zero3 memory leak on return from training loop #3025

Closed

tjruwase added 2 commits March 21, 2023 16:56

Merge branch 'master' into olruwase/zero_partition_cache

010cfe7

Merge branch 'master' into olruwase/zero_partition_cache

2977f5d

tjruwase added 2 commits March 22, 2023 06:40

Merge branch 'master' into olruwase/zero_partition_cache

17bf797

Docs

605fe72

tjruwase requested review from RezaYazdaniAminabadi, awan-10, cmikeh2 and arashb as code owners March 22, 2023 14:52

tjruwase added 4 commits March 22, 2023 08:08

Docs

c1b5144

Formatting

95c9f82

Merge branch 'master' into olruwase/zero_partition_cache

3594637

Update docs

53f746f

jeffra approved these changes Mar 22, 2023

View reviewed changes

Pytorch/accelerator mem cleanup

78ec384

Merge branch 'master' into olruwase/zero_partition_cache

e5f60b2

jeffra merged commit e80ae08 into master Mar 24, 2023
1 check failed

jeffra deleted the olruwase/zero_partition_cache branch March 24, 2023 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty ZeRO3 partition cache #3060

Empty ZeRO3 partition cache #3060

tjruwase commented Mar 20, 2023

stas00 commented Mar 22, 2023 •

edited

stas00 commented Mar 22, 2023

Empty ZeRO3 partition cache #3060

Empty ZeRO3 partition cache #3060

Conversation

tjruwase commented Mar 20, 2023

stas00 commented Mar 22, 2023 • edited

stas00 commented Mar 22, 2023

stas00 commented Mar 22, 2023 •

edited