[BE] Enabled mypy in `common_fsdp.py` #118755

awgu · 2024-01-31T17:05:28Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2024-01-31T17:05:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118755

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fe502d8 with merge base 278a0e1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

awgu · 2024-01-31T18:40:04Z

torch/testing/_internal/common_fsdp.py


        def get_future():
-            future = torch.futures.Future()
+            future: torch.futures.Future = torch.futures.Future()


For some reason mypy wanted this type annotation.

awgu · 2024-01-31T18:41:04Z

torch/testing/_internal/common_fsdp.py

    ]
    world_size = dist.get_world_size(process_group)
    olist = [None for _ in range(world_size)]
    dist.all_gather_object(olist, named_module_states, group=process_group)


all_gather_object fills in the olist destructively. Another approach could be to initialize olist with some dummy object of the expected type.

would it be a good BE item (maybe for myself), to allow [] ?

olist = [] dist.all_gather_object(olist

This seems reasonable to me. all_gather_object can destructively modify the olist as it already does today and append world_size many elements.

This sounds like a good BE task!

awgu · 2024-01-31T18:42:06Z

torch/testing/_internal/common_fsdp.py

        dist.reduce_scatter_tensor = orig_reduce_scatter


+@no_type_check


These patching methods complain about assigning to a method like FSDPParamGroup.unshard = new_unshard. Since we have two of these assignments (to set to new and to restore to old) per patch context, I preferred to just ignore type checking, as it is not too valuable here.

[ghstack-poisoned]

ghstack-source-id: 5430730 Pull Request resolved: pytorch#118755

[ghstack-poisoned]

awgu · 2024-02-13T19:02:43Z

@pytorchbot merge

pytorchmergebot · 2024-02-13T19:04:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

The `groupby` logic to check if all all-gather inputs have the same dtype is not so readable. Let us use `all` instead. Pull Request resolved: #119825 Approved by: https://github.com/Skylion007 ghstack dependencies: #119550, #118136, #118223, #118755

This PR adds a way to do gradient accumulation without collectives (i.e. reduce-scatter for FSDP and reduce-scatter/all-reduce for HSDP, though HSDP is not yet implemented). Since the `no_sync()` context manager has received some feedback, we simply define a method on the module to set whether the module requires gradient synchronization or not, where this method can recurse or not. ``` # Before with `no_sync()`: with fsdp_model.no_sync() if not is_last_microbatch else contextlib.nullcontext(): # Forward/backward # After with a setter: fsdp_model.set_requires_gradient_sync(not is_last_microbatch) # Forward/backward ``` Having the method be able to recurse or not also gives some flexibility. For example, some large modules can still reduce-scatter, while some smaller modules can avoid it to save communication bandwidth: ``` fsdp_modules_to_reduce_scatter: Set[nn.Module] = ... for module in fsdp_model.modules(): if isinstance(module, FSDP) and module not in fsdp_modules_to_reduce_scatter: module.set_requires_gradient_sync(not is_last_microbatch) # Forward/backward ``` (Separately, we may expose a helper for `return [module for model.modules() if isinstance(module, FSDP)]`.) --- To show the spirit of this API choice, I also included `set_requires_all_reduce` that would give us the ability to only reduce-scatter but not all-reduce for HSDP (originally from the MiCS paper). If we want to flexibly support heterogeneous sharding where FSDP is applied to some modules and HSDP to others in the same model, then having a module-level method that has the option to not recurse makes sense to me. Pull Request resolved: #118298 Approved by: https://github.com/wconstab, https://github.com/wanchaol ghstack dependencies: #119550, #118136, #118223, #118755, #119825

[BE] Enabled mypy in common_fsdp.py

738dbf4

[ghstack-poisoned]

awgu mentioned this pull request Jan 31, 2024

[FSDP2] Added all-gather and unsharded parameter #117950

Closed

awgu added the topic: not user facing topic category label Jan 31, 2024

Update on "[BE] Enabled mypy in common_fsdp.py"

86df81c

[ghstack-poisoned]

Skylion007 approved these changes Jan 31, 2024

View reviewed changes

awgu commented Jan 31, 2024

View reviewed changes

Andrew Gu added 13 commits January 31, 2024 11:36

Update on "[BE] Enabled mypy in common_fsdp.py"

2809517

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

9c11fa0

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

b559540

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

9aa732d

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

d82f962

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

43543ad

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

14260d8

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

993a0f2

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

f962e4c

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

f14905c

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

40b4742

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

c2397c9

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

fd73368

[ghstack-poisoned]

awgu mentioned this pull request Feb 7, 2024

[FSDP2] Added pre/post-all-gather extensions #119378

Closed

weifengpy approved these changes Feb 7, 2024

View reviewed changes

weifengpy mentioned this pull request Feb 7, 2024

[c10d] accept [] in torch.distributed.all_gather_object #119417

Open

Update on "[BE] Enabled mypy in common_fsdp.py"

319a768

[ghstack-poisoned]

awgu pushed a commit to awgu/pytorch that referenced this pull request Feb 8, 2024

[BE] Enabled mypy in common_fsdp.py

7ac7851

ghstack-source-id: 5430730 Pull Request resolved: pytorch#118755

awgu mentioned this pull request Feb 8, 2024

[FSDP2] Used split_with_sizes_copy for all-gather copy-out #119451

Closed

Andrew Gu added 3 commits February 8, 2024 07:42

Update on "[BE] Enabled mypy in common_fsdp.py"

ccebf28

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

a7b93f3

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

d9edb9f

[ghstack-poisoned]

This was referenced Feb 9, 2024

[FSDP2] Replaced version-ctx with no_grad; removed no_grad #119550

Closed

[no ci][FSD2] Added annotations for compute compile #119551

Closed

wanchaol approved these changes Feb 13, 2024

View reviewed changes

Andrew Gu added 3 commits February 13, 2024 07:23

Update on "[BE] Enabled mypy in common_fsdp.py"

c40cff6

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

d511503

[ghstack-poisoned]

Update on "[BE] Enabled mypy in common_fsdp.py"

fe502d8

[ghstack-poisoned]

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 13, 2024

pytorchmergebot added the merging label Feb 13, 2024

pytorchmergebot added the Merged label Feb 13, 2024

pytorchmergebot closed this in 0a2e000 Feb 13, 2024

pytorchmergebot removed the merging label Feb 13, 2024

awgu mentioned this pull request Feb 13, 2024

[FSDP2][ez] Replaced groupby with all for same-dtype check #119825

Closed

github-actions bot deleted the gh/awgu/504/head branch March 15, 2024 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BE] Enabled mypy in `common_fsdp.py` #118755

[BE] Enabled mypy in `common_fsdp.py` #118755

Uh oh!

awgu commented Jan 31, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 31, 2024 •

edited

Loading

Uh oh!

awgu Jan 31, 2024

Uh oh!

awgu Jan 31, 2024

Uh oh!

weifengpy Feb 7, 2024

Uh oh!

awgu Feb 7, 2024

Uh oh!

awgu Jan 31, 2024

Uh oh!

awgu commented Feb 13, 2024

Uh oh!

pytorchmergebot commented Feb 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		dist.reduce_scatter_tensor = orig_reduce_scatter


		@no_type_check

[BE] Enabled mypy in common_fsdp.py #118755

[BE] Enabled mypy in common_fsdp.py #118755

Uh oh!

Conversation

awgu commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118755

✅ No Failures

Uh oh!

awgu Jan 31, 2024

Choose a reason for hiding this comment

Uh oh!

awgu Jan 31, 2024

Choose a reason for hiding this comment

Uh oh!

weifengpy Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

awgu Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

awgu Jan 31, 2024

Choose a reason for hiding this comment

Uh oh!

awgu commented Feb 13, 2024

Uh oh!

pytorchmergebot commented Feb 13, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[BE] Enabled mypy in `common_fsdp.py` #118755

[BE] Enabled mypy in `common_fsdp.py` #118755

awgu commented Jan 31, 2024 •

edited

Loading

pytorch-bot bot commented Jan 31, 2024 •

edited

Loading