[FSDP][Easy] Rename streams; add back stream sharing test #104966

awgu · 2023-07-11T13:30:03Z

Stack from ghstack (oldest at bottom):

Purely out of preference, this PR renames the streams to _unshard_stream instead of _streams_unshard etc. since the former reads more naturally. The PR also removes some duplicated comments and adds back a unit test that streams are shared.

[ghstack-poisoned]

pytorch-bot · 2023-07-11T13:30:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104966

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b57126f:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu · 2023-07-11T13:32:03Z

torch/distributed/fsdp/_runtime_utils.py

@@ -283,17 +283,10 @@ def _share_state_and_init_handle_attrs(
            "set yet or should have been set to `False`",
        )
        fsdp_state._is_root = False
-        # Stream for unshard logic, including allocating the all-gather destination


We do not need to duplicate these comments with the initial construction in _init_streams(), as the comments may become stale and here we are only sharing the state.

pytorch/torch/distributed/fsdp/_runtime_utils.py

Lines 305 to 324 in b57126f

def _init_streams(

state: _FSDPState,

) -> _FSDPState:

"""

Initializes CUDA streams for overlapping communication, computation, and

data transfers. The streams should be shared across FSDP instances.

"""

assert state._is_root

assert state._device_handle.is_available()

# Stream for unshard logic, including allocating the all-gather destination

# tensors and the all-gathers themselves.

state._unshard_stream = state._device_handle.Stream()

# Stream for overlapping gradient reduction with the backward pass gradient

# computation.

state._post_backward_stream = state._device_handle.Stream()

# Stream for pre-unshard logic, namely allocations and writes for CPU

# offloading (H2D copy) and mixed precision (low precision cast).

state._pre_unshard_stream = state._device_handle.Stream()

# Default stream for computation

state._default_stream = state._device_handle.current_stream()

awgu · 2023-07-11T13:32:29Z

test/distributed/_composable/fully_shard/test_fully_shard_init.py

@@ -276,7 +276,14 @@ def _test_nested_fully_shard_shared_state(self, use_policy: bool):
        # NOTE: This check only requires that the data structure state is
        # shared. Namely, sharing the FSDP state object itself is sufficient
        # but not necessary.


We used to test that state._streams was shared. This adds back testing that each stream is shared.

ghstack-source-id: 623474b088ab2d5ade75a1f952ae666d7e8e1f04 Pull Request resolved: pytorch#104966

rohan-varma

LGTM

awgu · 2023-07-13T00:22:52Z

@pytorchbot merge

pytorchmergebot · 2023-07-13T00:24:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[FSDP][Easy] Rename streams; add back stream sharing test

b57126f

[ghstack-poisoned]

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jul 11, 2023

This was referenced Jul 11, 2023

[FSDP][6/N] Check valid param freezing for ModuleWrapPolicy #104427

Closed

[FSDP][7/N] Add warning about frozen params #104967

Closed

awgu commented Jul 11, 2023

View reviewed changes

awgu added the topic: not user facing topic category label Jul 11, 2023

awgu marked this pull request as ready for review July 11, 2023 13:39

awgu requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, kwen2501, wanchaol, fegin, fduwjj, kiukchung and d4l3k as code owners July 11, 2023 13:39

This was referenced Jul 11, 2023

[FSDP][8/N] Replace _FSDPPolicy.policy with _Policy._run_policy #104969

Closed

[FSDP][9/N] Introduce CustomPolicy #104986

Closed

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 11, 2023

awgu added a commit to awgu/pytorch that referenced this pull request Jul 11, 2023

[FSDP][Easy] Rename streams; add back stream sharing test

29ae896

ghstack-source-id: 623474b088ab2d5ade75a1f952ae666d7e8e1f04 Pull Request resolved: pytorch#104966

awgu mentioned this pull request Jul 11, 2023

[FSDP][Easy] Allow ModuleWrapPolicy to take Iterable #104999

Closed

rohan-varma approved these changes Jul 13, 2023

View reviewed changes

pytorchmergebot added the merging label Jul 13, 2023

pytorchmergebot added Merged and removed merging labels Jul 13, 2023

pytorchmergebot closed this in 954bae8 Jul 13, 2023

awgu mentioned this pull request Jul 14, 2023

[Do not review] Test pre-backward current stream #105216

Closed

facebook-github-bot deleted the gh/awgu/417/head branch July 16, 2023 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP][Easy] Rename streams; add back stream sharing test #104966

[FSDP][Easy] Rename streams; add back stream sharing test #104966

awgu commented Jul 11, 2023 •

edited

pytorch-bot bot commented Jul 11, 2023 •

edited

awgu Jul 11, 2023

awgu Jul 11, 2023

rohan-varma left a comment

awgu commented Jul 13, 2023

pytorchmergebot commented Jul 13, 2023

	def _init_streams(
	state: _FSDPState,
	) -> _FSDPState:
	"""
	Initializes CUDA streams for overlapping communication, computation, and
	data transfers. The streams should be shared across FSDP instances.
	"""
	assert state._is_root
	assert state._device_handle.is_available()
	# Stream for unshard logic, including allocating the all-gather destination
	# tensors and the all-gathers themselves.
	state._unshard_stream = state._device_handle.Stream()
	# Stream for overlapping gradient reduction with the backward pass gradient
	# computation.
	state._post_backward_stream = state._device_handle.Stream()
	# Stream for pre-unshard logic, namely allocations and writes for CPU
	# offloading (H2D copy) and mixed precision (low precision cast).
	state._pre_unshard_stream = state._device_handle.Stream()
	# Default stream for computation
	state._default_stream = state._device_handle.current_stream()

[FSDP][Easy] Rename streams; add back stream sharing test #104966

[FSDP][Easy] Rename streams; add back stream sharing test #104966

Conversation

awgu commented Jul 11, 2023 • edited

pytorch-bot bot commented Jul 11, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104966

✅ No Failures

awgu Jul 11, 2023

Choose a reason for hiding this comment

awgu Jul 11, 2023

Choose a reason for hiding this comment

rohan-varma left a comment

Choose a reason for hiding this comment

awgu commented Jul 13, 2023

pytorchmergebot commented Jul 13, 2023

Merge started

awgu commented Jul 11, 2023 •

edited

pytorch-bot bot commented Jul 11, 2023 •

edited