Skip to content

Conversation

IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented Jul 8, 2025

Copy link

pytorch-bot bot commented Jul 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157780

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 574b069 with merge base 02a9d90 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jul 8, 2025
@IvanKobzarev IvanKobzarev added the topic: not user facing topic category label Jul 8, 2025
orig_wait_nodes,
orig_wait_node_recursive_users,
) = bucket_id_to_bucketed_op_info[bucket_id]
# unsharded_grads = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this? (or, why is it commented..)

device = unsharded_grads[0].meta["val"].device
rank = device.index
# TODO: need more work if we want to support non-dim-0 sharding (e.g. search for `shard_dim` in FSDP2 codebase)
shard_dim = 0
Copy link
Contributor

@wconstab wconstab Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be asserted actually? (just for safety sake) hmm, i guess the user does not specify the dim in any way in simple fsdp, so we are literally just choosing it here. thats fine i guess.

Porting fx passes for reduce_scatters bucketing (similar to all_gather bucketing) for simple_fsdp and autoparallel testing.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Jul 8, 2025
ghstack-source-id: 847e789
Pull Request resolved: #157780
(FileCheck().check("all_gather_into_tensor_out").run(code))
(
FileCheck()
.check_count(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity, any reason you used 3 separate FileCheck instances instead of just doing it all in one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks, it can be done as a one check :)

Porting fx passes for reduce_scatters bucketing (similar to all_gather bucketing) for simple_fsdp and autoparallel testing.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Jul 10, 2025
ghstack-source-id: 1c6c9b7
Pull Request resolved: #157780
@IvanKobzarev
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 10, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants