[C10D] Add tests for gather and gather_object with subgroup #118359

wconstab · 2024-01-26T04:15:15Z

Stack from ghstack (oldest at bottom):

-> [C10D] Add tests for gather and gather_object with subgroup #118359

Addresses #118337 somewhat- we probably need to update docs. Let's first
confirm what behavior we want.

Identifies a couple of confusing things

'dst' arg for many collectives is always in 'global' rank regardless
of whether a subgroup is passed in. This needs a doc update
gather_object has a strong dependency on setting the cuda device;
could we make that smoother?
gather_object also should be happy with an empty list on the dst
side, imo

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @yf225

Addresses #118337 somewhat- we probably need to update docs. Let's first confirm what behavior we want. Identifies a couple of confusing things 1) 'dst' arg for many collectives is always in 'global' rank regardless of whether a subgroup is passed in. This needs a doc update 2) gather_object has a strong dependency on setting the cuda device; could we make that smoother? 3) gather_object also should be happy with an empty list on the dst side, imo [ghstack-poisoned]

pytorch-bot · 2024-01-26T04:15:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118359

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9a9057c with merge base d6b556b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Addresses #118337 somewhat- we probably need to update docs. Let's first confirm what behavior we want. Identifies a couple of confusing things 1) 'dst' arg for many collectives is always in 'global' rank regardless of whether a subgroup is passed in. This needs a doc update 2) gather_object has a strong dependency on setting the cuda device; could we make that smoother? 3) gather_object also should be happy with an empty list on the dst side, imo cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l yf225 [ghstack-poisoned]

Fixes #118337 by updating docs. Let's discuss further whether we want any behavior changes. Identifies a couple of confusing things 1) 'dst' arg for many collectives is always in 'global' rank regardless of whether a subgroup is passed in. 2) gather_object has a strong dependency on setting the cuda device; could we make that smoother? 3) gather_object also should be happy with an empty list on the dst side, imo ghstack-source-id: fb74977 Pull Request resolved: #118359

wconstab · 2024-01-26T04:22:31Z

@kwen2501 wonder if you know off the top of your head whether the same doc change i made for the gather variants applies to the src rank for the scatter variants, and any other ops (send/recv, perhaps)? We need to update them all but I ran out of steam tonight.

fegin · 2024-01-26T20:49:22Z

torch/distributed/distributed_c10d.py

            collective and will contain the output. Must be ``None`` on non-dst
            ranks. (default is ``None``)
-        dst (int, optional): Destination rank. (default is 0)
+        dst (int, optional): Destination rank on global process group (regardless of 'group' argument). (default is 0)


Is this expected to be the case for all our APIS, where the global ranks should always use? Or should we change the behavior?

I prefer the changed behavior. But i do not know if it is worth the disruption of making a breaking change. worth a discussion

if we want to change it, what we might have to do is first add a 'group_rank' arg and add logic to ensure only one of group_rank or rank is passed. Then deprecate rank. then finally remove rank. Or how else?

weifengpy

approve since this is more about updating docs and verify current behavior.
will update docs / unit tests similarly for send/reduce with dst param

weifengpy · 2024-01-26T22:03:41Z

test/distributed/test_c10d_nccl.py

+            return
+
+        store = c10d.FileStore(self.file_name, self.world_size)
+        torch.distributed.init_process_group(backend="nccl", store=store, rank=self.rank, world_size=self.world_size)


on devgpu with 8 gpus (world_size=8), seems only rank 0...3 calls torch.distributed.init_process_group ? PG does not complain about no init from rank 4...7 ?

good point. since I chose to make my test run with exactly 4 gpus for simplicity, i should not rely on self.world_size to report 4 in case someone changes it. I will update this to pass 4 into init_process_group

weifengpy · 2024-01-26T23:23:05Z

test/distributed/test_c10d_nccl.py

+    @requires_nccl()
+    @skip_if_lt_x_gpu(4)
+    def test_gather_subgroup(self):
+        if self.rank > 3:


do we want to set world_size explicitly? right now it's hard coded to 4 from from test_c10d_common.AbstractLargeCommTest any way

@property def world_size(self): return 4

i wrote the test in as simple a way as possible, hardcoding lists of ranks e.g. [0, 1] instead of doing things like ranks = list(range(self.world_size))[:self.world_size/2]. It just felt easier to read and even less LOC this way. It is not that important to have the test run on 8 gpus if they are available.

thanks. it's clear now with world_size=4 in init_process_group

Addresses #118337 somewhat- we probably need to update docs. Let's first confirm what behavior we want. Identifies a couple of confusing things 1) 'dst' arg for many collectives is always in 'global' rank regardless of whether a subgroup is passed in. This needs a doc update 2) gather_object has a strong dependency on setting the cuda device; could we make that smoother? 3) gather_object also should be happy with an empty list on the dst side, imo cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l yf225 [ghstack-poisoned]

Fixes #118337 by updating docs. Let's discuss further whether we want any behavior changes. Identifies a couple of confusing things 1) 'dst' arg for many collectives is always in 'global' rank regardless of whether a subgroup is passed in. 2) gather_object has a strong dependency on setting the cuda device; could we make that smoother? 3) gather_object also should be happy with an empty list on the dst side, imo ghstack-source-id: 3869bec Pull Request resolved: #118359

wconstab · 2024-01-27T01:14:05Z

@pytorchbot merge

pytorchmergebot · 2024-01-27T01:17:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Follow up #118359: whether``src`` and ``dst`` are base on global pg or sub pg * update c10d docstring: ``src`` / ``dst`` are base on global pg regardless of ``group`` arguments * communication ops with ``dst`` argument: ``reduce``, ``gather_object``, ``gather``, ``send``, ``isend`` * communication ops with ``src`` argument: ``irecv``, ``recv``, ``broadcast``, ``broadcast_object_list``, ``scatter``, ``scatter_object_list`` * ``pytest test/distributed/test_c10d_nccl.py -k subgroup`` 3 collectives are for pickable objects (``gather_object``, ``broadcast_object_list``, ``scatter_object_list``). There are 2 ways to set device * use device argument: it's implemented in ``broadcast_object_list``. maybe worth implementing in the other 2 * ``torch.cuda.set_device(global_rank)`` Pull Request resolved: #118593 Approved by: https://github.com/wconstab

…118359) Addresses pytorch#118337 somewhat- we probably need to update docs. Let's first confirm what behavior we want. Identifies a couple of confusing things 1) 'dst' arg for many collectives is always in 'global' rank regardless of whether a subgroup is passed in. This needs a doc update 2) gather_object has a strong dependency on setting the cuda device; could we make that smoother? 3) gather_object also should be happy with an empty list on the dst side, imo Pull Request resolved: pytorch#118359 Approved by: https://github.com/weifengpy

Follow up #118359: whether``src`` and ``dst`` are base on global pg or sub pg * update c10d docstring: ``src`` / ``dst`` are base on global pg regardless of ``group`` arguments * communication ops with ``dst`` argument: ``reduce``, ``gather_object``, ``gather``, ``send``, ``isend`` * communication ops with ``src`` argument: ``irecv``, ``recv``, ``broadcast``, ``broadcast_object_list``, ``scatter``, ``scatter_object_list`` * ``pytest test/distributed/test_c10d_nccl.py -k subgroup`` 3 collectives are for pickable objects (``gather_object``, ``broadcast_object_list``, ``scatter_object_list``). There are 2 ways to set device * use device argument: it's implemented in ``broadcast_object_list``. maybe worth implementing in the other 2 * ``torch.cuda.set_device(global_rank)`` Pull Request resolved: #118593 Approved by: https://github.com/wconstab

pytorch-bot bot added the topic: not user facing topic category label Jan 26, 2024

github-actions bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jan 26, 2024

wconstab requested review from kwen2501 and wanchaol January 26, 2024 04:21

fegin reviewed Jan 26, 2024

View reviewed changes

weifengpy approved these changes Jan 26, 2024

View reviewed changes

weifengpy reviewed Jan 26, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 27, 2024

pytorchmergebot added the merging label Jan 27, 2024

pytorchmergebot added the Merged label Jan 27, 2024

pytorchmergebot closed this in 70699a6 Jan 27, 2024

pytorchmergebot removed the merging label Jan 27, 2024

This was referenced Jan 30, 2024

[c10d] added docstrings and tests for src / dst weifengpy/pytorch#28

Merged

[c10d] added docstrings and tests for src / dst #118593

Closed

facebook-github-bot deleted the gh/wconstab/266/head branch January 30, 2024 15:23

[C10D] Add tests for gather and gather_object with subgroup #118359

[C10D] Add tests for gather and gather_object with subgroup #118359

Uh oh!

Conversation

wconstab commented Jan 26, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118359

✅ No Failures

Uh oh!

wconstab commented Jan 26, 2024

Uh oh!

fegin Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

weifengpy left a comment

Choose a reason for hiding this comment

Uh oh!

weifengpy Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 27, 2024

Choose a reason for hiding this comment

Uh oh!

weifengpy Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 27, 2024

Choose a reason for hiding this comment

Uh oh!

weifengpy Jan 28, 2024

Choose a reason for hiding this comment

Uh oh!

wconstab commented Jan 27, 2024

Uh oh!

pytorchmergebot commented Jan 27, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wconstab commented Jan 26, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jan 26, 2024 •

edited

Loading

fegin Jan 26, 2024 •

edited

Loading

weifengpy Jan 26, 2024 •

edited

Loading