Add and Remove ZeRO 3 Hooks #5658

jomayeri · 2024-06-13T16:14:42Z

Gives the ability to add and remove the forward hooks in ZeRO 3 by using a context manager. These code changes were taken from a Huggingface PR and integrated for direct support in DeepSpeed.

This is useful in the inference case and the speedup can be observed here.

tjruwase · 2024-06-13T18:05:46Z

@jomayeri, please add some unit tests

tjruwase · 2024-06-18T13:12:13Z

deepspeed/runtime/zero/utils.py

+
+
+@contextmanager
+def unwrap_model_for_generation(model):


It is better to use general naming that describes what the utility does rather than specific usage like generation.

Suggested change

def unwrap_model_for_generation(model):

def unshard_and_remove_hooks(model):

tjruwase · 2024-06-18T13:15:51Z

deepspeed/runtime/zero/utils.py

+        if model.optimizer is not None and hasattr(model.optimizer, "parameter_offload"):
+            optimizer_offload = model.optimizer.parameter_offload
+        elif model.optimizer is not None:
+            optimizer_offload = model.optimizer


Can you explain this zero-3 case where we have hooks attached to the optimizer?

tjruwase · 2024-06-18T13:17:42Z

deepspeed/runtime/zero/utils.py

+        elif model.optimizer is not None:
+            optimizer_offload = model.optimizer
+
+        for hook in optimizer_offload.forward_hooks:


The hooks are associated with parameters not optimizer, so this naming is a bit confusing. Let's clarify this.

Suggested change

for hook in optimizer_offload.forward_hooks:

for hook in parameter_offload.forward_hooks:

tjruwase · 2024-07-16T21:00:33Z

csrc/aio/py_lib/deepspeed_py_copy.cpp

@@ -10,7 +10,7 @@ Functionality for swapping optimizer tensors to/from (NVMe) storage devices.
 #include "deepspeed_py_copy.h"
 #include <omp.h>

-#define ROUND_DOWN(size, step) ((size) & ~((step)-1))
+#define ROUND_DOWN(size, step) ((size) & ~((step) - 1))


Is this related?

tjruwase · 2024-07-16T21:00:41Z

csrc/deepspeed4science/evoformer_attn/iterators/predicated_tile_access_iterator_residual_last.h

@@ -488,7 +488,7 @@ class PredicatedTileAccessIteratorResidualLast<Shape_,

        /// Construct the Params object given a pitch-linear tensor's layout
        CUTLASS_HOST_DEVICE
-        Params(Layout const& layout) : params_(layout::PitchLinear(layout.stride(0))){};
+        Params(Layout const& layout) : params_(layout::PitchLinear(layout.stride(0))) {};


Is this related?

tjruwase · 2024-07-16T21:03:01Z

tests/unit/runtime/zero/test_zero_context.py

@@ -299,3 +300,6 @@ def test(self):
        with deepspeed.zero.GatheredParameters(l.weight):
            # all ranks compare
            assert torch.equal(l.weight, torch.zeros_like(l.weight))
+


Is this needed?

adding hooks context manager

d36ecdf

jomayeri requested review from tjruwase and mrwyattii as code owners June 13, 2024 16:14

Merge branch 'master' into jomayeri/zero3-hooks

93bd49e

format changes

318929c

tjruwase requested review from tohtana and removed request for mrwyattii June 18, 2024 00:25

Merge branch 'master' into jomayeri/zero3-hooks

6b75fca

tjruwase reviewed Jun 18, 2024

View reviewed changes

jomayeri added 2 commits June 27, 2024 11:45

Merge branch 'master' into jomayeri/zero3-hooks

daca236

running precommit checks

63fa13c

jomayeri requested review from awan-10 and arashb as code owners June 27, 2024 18:59

jomayeri and others added 3 commits July 15, 2024 21:25

Merge branch 'master' into jomayeri/zero3-hooks

be067ae

Merge branch 'master' into jomayeri/zero3-hooks

65d6699

remove circular dependency

00870e3

jomayeri requested a review from loadams as a code owner July 16, 2024 19:47

tjruwase reviewed Jul 16, 2024

View reviewed changes

jomayeri added 2 commits July 17, 2024 18:19

adding unwrap unittest

3dd6ebe

Merge branch 'master' into jomayeri/zero3-hooks

a71bb00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add and Remove ZeRO 3 Hooks #5658

Add and Remove ZeRO 3 Hooks #5658

jomayeri commented Jun 13, 2024

tjruwase commented Jun 13, 2024

tjruwase Jun 18, 2024

tjruwase Jun 18, 2024

tjruwase Jun 18, 2024

tjruwase Jul 16, 2024

tjruwase Jul 16, 2024

tjruwase Jul 16, 2024

	def unwrap_model_for_generation(model):
	def unshard_and_remove_hooks(model):

	for hook in optimizer_offload.forward_hooks:
	for hook in parameter_offload.forward_hooks:

Add and Remove ZeRO 3 Hooks #5658

Are you sure you want to change the base?

Add and Remove ZeRO 3 Hooks #5658

Conversation

jomayeri commented Jun 13, 2024

tjruwase commented Jun 13, 2024

tjruwase Jun 18, 2024

Choose a reason for hiding this comment

tjruwase Jun 18, 2024

Choose a reason for hiding this comment

tjruwase Jun 18, 2024

Choose a reason for hiding this comment

tjruwase Jul 16, 2024

Choose a reason for hiding this comment

tjruwase Jul 16, 2024

Choose a reason for hiding this comment

tjruwase Jul 16, 2024

Choose a reason for hiding this comment