Skip to content

Conversation

voznesenskym
Copy link
Collaborator

@voznesenskym voznesenskym commented Aug 1, 2023

Stack from ghstack (oldest at bottom):

This reverts commit a9a3c45.

This PR changes the following:

  • _ExecOrderData.handle_to_handle_index -> FlatParamHandle._handle_index
  • _ExecOrderData.handles_to_pre_forward_order_index -> FlatParamHandle._pre_forward_order_index
  • _ExecOrderData.handles_to_post_forward_order_index -> FlatParamHandle._post_forward_index
  • _FSDPState._needs_pre_forward_unshard -> FlatParamHandle._needs_pre_forward_unshard
  • _FSDPState._needs_pre_backward_unshard -> FlatParamHandle._needs_pre_backward_unshard
  • _FSDPState._handles_prefetched -> FlatParamHandle._prefetched

This reverts commit a9a3c45.

fixes

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 1, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106357

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit be4c9a8:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added release notes: distributed (fsdp) release notes category labels Aug 1, 2023
voznesenskym added a commit that referenced this pull request Aug 1, 2023
This reverts commit a9a3c45.

fixes

ghstack-source-id: a025b0c
Pull Request resolved: #106357
@voznesenskym voznesenskym changed the title Revert "Revert "Simplify handle indexing (#105006)" (#105984)" WIP - reland Simplify handle indexing (#105006) Aug 1, 2023
@albanD albanD removed their request for review August 1, 2023 16:02
This reverts commit a9a3c45.

fixes

[ghstack-poisoned]
@voznesenskym
Copy link
Collaborator Author

Fixed pre backward fetch allgather

image

if not handle:
return
# Only record the first usage of a handles key
if handle in self.handles_to_post_forward_order_index:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

How it looked before, in the reverted PR. That would cause a None to show up.

@voznesenskym voznesenskym added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Aug 3, 2023
@awgu
Copy link
Collaborator

awgu commented Aug 3, 2023

It looks like the test failures are real :/

@voznesenskym
Copy link
Collaborator Author

It looks like the test failures are real :/

yeah they are

This reverts commit a9a3c45.

fixes

[ghstack-poisoned]
voznesenskym added a commit that referenced this pull request Aug 3, 2023
This reverts commit a9a3c45.

fixes

ghstack-source-id: 0342a7c
Pull Request resolved: #106357

fix

fix
@awgu awgu self-assigned this Aug 3, 2023
self.handles_post_forward_order: List[FlatParamHandle] = []
# Maps each handles key to its index in `handles_post_forward_order`
self.handles_to_post_forward_order_index: Dict[FlatParamHandle, int] = {}
self.handles_post_forward_order: List[Optional[FlatParamHandle]] = []
Copy link
Collaborator

@awgu awgu Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IIUC, this is actually a List[FlatParamHandle] now because we never append None into the list?

Copy link
Collaborator

@awgu awgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Returns the handle indices (i.e. indices into ``self.all_handles``)
corresponding to the handles in ``handle``. An entry in the
returned tuple is ``None`` if the handle is invalid.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR (making note to myself):

if self.is_first_iter:
msg_prefix = "Forward order differs across ranks:"
optional_local_indices: Tuple[
Optional[int], ...
] = self._get_handle_indices(handle)
device = handle.device # guaranteed to be non-CPU
num_valid_indices = sum(
(index is not None) for index in optional_local_indices
)

local_indices = torch.tensor(optional_local_indices, **tensor_kwargs) # type: ignore[arg-type]

The above looks buggy. If optional_local_indices has a None in it, we should not be able to pass it to torch.tensor().

@voznesenskym voznesenskym changed the title WIP - reland Simplify handle indexing (#105006) Reland Simplify handle indexing (#105006) Aug 3, 2023
@voznesenskym
Copy link
Collaborator Author

@pytorchbot merge - Once more unto the breach

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 3, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: - Once more unto the breach

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

Try @pytorchbot --help for more info.

@voznesenskym
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants