Skip to content

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Jul 14, 2025

Stack from ghstack (oldest at bottom):

Their values are actually the same. Just staying in line with other INSTALL commands.

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jul 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158235

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 1c61385 with merge base 826f12b (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jul 14, 2025
kwen2501 added a commit that referenced this pull request Jul 14, 2025
ghstack-source-id: e4d0faa
Pull-Request-resolved: #158235
@kwen2501 kwen2501 requested review from atalman and fduwjj July 14, 2025 15:06
@kwen2501 kwen2501 added the release notes: distributed (symm_mem) release note label for symmetric memory label Jul 14, 2025
@kwen2501
Copy link
Contributor Author

@pytorchbot merge -f "failures are unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 23, 2025
…6743)

So that it would be easier if user would like to feed `out_splits_offsets` as input to a combining a2av (coming next).
An example is in #157029.

Pull Request resolved: #156743
Approved by: https://github.com/ngimel
ghstack dependencies: #158234, #158235
pytorchmergebot pushed a commit that referenced this pull request Jul 24, 2025
Added `all_to_all_vdev_2d_offset`, which:

Perform a 2D AllToAllv operation, with input split and offset
information provided on device. The input offsets need not to be
exact prefix sum of the input splits, i.e. paddings are allowed between the
splitted chunks. The paddings, however, will not be transferred to peer
ranks.

In Mixure of Experts models, this operation can be used to combine tokens
processed by experts on remote ranks. This operation can be viewed as an
"reverse" operation to the `all_to_all_vdev_2d` operation (which shuffles
tokens to experts).

The change may seem a bit dense, sorry.  But it is mainly two changes:
1. templating existing device functions (to use provided input offset or calculate it)
2. generalizing variable names, e.g. npes, ne --> minor_size, major_size,
so that I can use the same alltoall function for matrix of (nranks, ne) as well as matrix of (ne, nranks).

Pull Request resolved: #156881
Approved by: https://github.com/ngimel
ghstack dependencies: #158234, #158235, #156743
pytorchmergebot pushed a commit that referenced this pull request Jul 24, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Putting both the dispatch API and combine API in battlefield, one following the other, i.e.
```
all_to_all_vdev_2d(inp, out, inp_splits, out_splits_offsets, ...)

all_to_all_vdev_2d_offset(
    input=out,
    out=combine_out,
    in_splits_offsets=out_splits_offsets,
    out_splits_offsets=combine_out_splits_offsets
)
```
Here the `out_splits_offsets` from dispatch perfectly serves as the `in_splits_offsets` argument for combine.

Then we assert that the output of combine is exactly the same as the original input to shuffle, and combine's output splits are exactly the same as the original input splits.

It works!

Pull Request resolved: #157026
Approved by: https://github.com/Skylion007, https://github.com/ngimel
ghstack dependencies: #158234, #158235, #156743, #156881
pytorchmergebot pushed a commit that referenced this pull request Jul 24, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Use torch.randn to fill input buffer.

Pull Request resolved: #157029
Approved by: https://github.com/fegin, https://github.com/ngimel
ghstack dependencies: #158234, #158235, #156743, #156881, #157026
yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
…6743)

So that it would be easier if user would like to feed `out_splits_offsets` as input to a combining a2av (coming next).
An example is in #157029.

Pull Request resolved: #156743
Approved by: https://github.com/ngimel
ghstack dependencies: #158234, #158235
yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
Added `all_to_all_vdev_2d_offset`, which:

Perform a 2D AllToAllv operation, with input split and offset
information provided on device. The input offsets need not to be
exact prefix sum of the input splits, i.e. paddings are allowed between the
splitted chunks. The paddings, however, will not be transferred to peer
ranks.

In Mixure of Experts models, this operation can be used to combine tokens
processed by experts on remote ranks. This operation can be viewed as an
"reverse" operation to the `all_to_all_vdev_2d` operation (which shuffles
tokens to experts).

The change may seem a bit dense, sorry.  But it is mainly two changes:
1. templating existing device functions (to use provided input offset or calculate it)
2. generalizing variable names, e.g. npes, ne --> minor_size, major_size,
so that I can use the same alltoall function for matrix of (nranks, ne) as well as matrix of (ne, nranks).

Pull Request resolved: #156881
Approved by: https://github.com/ngimel
ghstack dependencies: #158234, #158235, #156743
yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Putting both the dispatch API and combine API in battlefield, one following the other, i.e.
```
all_to_all_vdev_2d(inp, out, inp_splits, out_splits_offsets, ...)

all_to_all_vdev_2d_offset(
    input=out,
    out=combine_out,
    in_splits_offsets=out_splits_offsets,
    out_splits_offsets=combine_out_splits_offsets
)
```
Here the `out_splits_offsets` from dispatch perfectly serves as the `in_splits_offsets` argument for combine.

Then we assert that the output of combine is exactly the same as the original input to shuffle, and combine's output splits are exactly the same as the original input splits.

It works!

Pull Request resolved: #157026
Approved by: https://github.com/Skylion007, https://github.com/ngimel
ghstack dependencies: #158234, #158235, #156743, #156881
yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Use torch.randn to fill input buffer.

Pull Request resolved: #157029
Approved by: https://github.com/fegin, https://github.com/ngimel
ghstack dependencies: #158234, #158235, #156743, #156881, #157026
@github-actions github-actions bot deleted the gh/kwen2501/196/head branch August 16, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged release notes: distributed (symm_mem) release note label for symmetric memory topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants