[BE] Replace lib with TORCH_INSTALL_LIB_DIR #158235

kwen2501 · 2025-07-14T15:05:17Z

Stack from ghstack (oldest at bottom):

Their values are actually the same. Just staying in line with other INSTALL commands.

[ghstack-poisoned]

pytorch-bot · 2025-07-14T15:05:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158235

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 1c61385 with merge base 826f12b ():

NEW FAILURES - The following jobs have failed:

pull / linux-jammy-py3.9-gcc11 / test (default, 1, 5, lf.linux.2xlarge) (gh)
inductor/test_flex_decoding.py::TestFlexDecodingCPU::test_builtin_score_mods_bfloat16_score_mod0_head_dims2_cpu_bfloat16
pull / linux-jammy-py3.9-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh)
inductor/test_flex_decoding.py::TestFlexDecodingCPU::test_builtin_score_mods_bfloat16_score_mod8_head_dims2_cpu_bfloat16
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
inductor/test_flex_decoding.py::TestFlexDecodingCPU::test_builtin_score_mods_bfloat16_score_mod3_head_dims0_cpu_bfloat16
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 5, lf.linux.2xlarge) (gh)
inductor/test_flex_decoding.py::TestFlexDecodingCPU::test_builtin_score_mods_bfloat16_score_mod0_head_dims0_cpu_bfloat16
pull / linux-jammy-py3.9-gcc11 / test (default, 5, 5, lf.linux.2xlarge) (gh)
inductor/test_flex_decoding.py::TestFlexDecodingCPU::test_builtin_score_mods_bfloat16_score_mod0_head_dims1_cpu_bfloat16

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: e4d0faa Pull-Request-resolved: #158235

kwen2501 · 2025-07-16T14:17:27Z

@pytorchbot merge -f "failures are unrelated"

pytorchmergebot · 2025-07-16T14:19:43Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…6743) So that it would be easier if user would like to feed `out_splits_offsets` as input to a combining a2av (coming next). An example is in #157029. Pull Request resolved: #156743 Approved by: https://github.com/ngimel ghstack dependencies: #158234, #158235

Added `all_to_all_vdev_2d_offset`, which: Perform a 2D AllToAllv operation, with input split and offset information provided on device. The input offsets need not to be exact prefix sum of the input splits, i.e. paddings are allowed between the splitted chunks. The paddings, however, will not be transferred to peer ranks. In Mixure of Experts models, this operation can be used to combine tokens processed by experts on remote ranks. This operation can be viewed as an "reverse" operation to the `all_to_all_vdev_2d` operation (which shuffles tokens to experts). The change may seem a bit dense, sorry. But it is mainly two changes: 1. templating existing device functions (to use provided input offset or calculate it) 2. generalizing variable names, e.g. npes, ne --> minor_size, major_size, so that I can use the same alltoall function for matrix of (nranks, ne) as well as matrix of (ne, nranks). Pull Request resolved: #156881 Approved by: https://github.com/ngimel ghstack dependencies: #158234, #158235, #156743

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): Putting both the dispatch API and combine API in battlefield, one following the other, i.e. ``` all_to_all_vdev_2d(inp, out, inp_splits, out_splits_offsets, ...) all_to_all_vdev_2d_offset( input=out, out=combine_out, in_splits_offsets=out_splits_offsets, out_splits_offsets=combine_out_splits_offsets ) ``` Here the `out_splits_offsets` from dispatch perfectly serves as the `in_splits_offsets` argument for combine. Then we assert that the output of combine is exactly the same as the original input to shuffle, and combine's output splits are exactly the same as the original input splits. It works! Pull Request resolved: #157026 Approved by: https://github.com/Skylion007, https://github.com/ngimel ghstack dependencies: #158234, #158235, #156743, #156881

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): Use torch.randn to fill input buffer. Pull Request resolved: #157029 Approved by: https://github.com/fegin, https://github.com/ngimel ghstack dependencies: #158234, #158235, #156743, #156881, #157026

…6743) So that it would be easier if user would like to feed `out_splits_offsets` as input to a combining a2av (coming next). An example is in #157029. Pull Request resolved: #156743 Approved by: https://github.com/ngimel ghstack dependencies: #158234, #158235

Added `all_to_all_vdev_2d_offset`, which: Perform a 2D AllToAllv operation, with input split and offset information provided on device. The input offsets need not to be exact prefix sum of the input splits, i.e. paddings are allowed between the splitted chunks. The paddings, however, will not be transferred to peer ranks. In Mixure of Experts models, this operation can be used to combine tokens processed by experts on remote ranks. This operation can be viewed as an "reverse" operation to the `all_to_all_vdev_2d` operation (which shuffles tokens to experts). The change may seem a bit dense, sorry. But it is mainly two changes: 1. templating existing device functions (to use provided input offset or calculate it) 2. generalizing variable names, e.g. npes, ne --> minor_size, major_size, so that I can use the same alltoall function for matrix of (nranks, ne) as well as matrix of (ne, nranks). Pull Request resolved: #156881 Approved by: https://github.com/ngimel ghstack dependencies: #158234, #158235, #156743

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): Putting both the dispatch API and combine API in battlefield, one following the other, i.e. ``` all_to_all_vdev_2d(inp, out, inp_splits, out_splits_offsets, ...) all_to_all_vdev_2d_offset( input=out, out=combine_out, in_splits_offsets=out_splits_offsets, out_splits_offsets=combine_out_splits_offsets ) ``` Here the `out_splits_offsets` from dispatch perfectly serves as the `in_splits_offsets` argument for combine. Then we assert that the output of combine is exactly the same as the original input to shuffle, and combine's output splits are exactly the same as the original input splits. It works! Pull Request resolved: #157026 Approved by: https://github.com/Skylion007, https://github.com/ngimel ghstack dependencies: #158234, #158235, #156743, #156881

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): Use torch.randn to fill input buffer. Pull Request resolved: #157029 Approved by: https://github.com/fegin, https://github.com/ngimel ghstack dependencies: #158234, #158235, #156743, #156881, #157026

Update

1c61385

[ghstack-poisoned]

pytorch-bot bot added the topic: not user facing topic category label Jul 14, 2025

kwen2501 added a commit that referenced this pull request Jul 14, 2025

[BE] Replace lib with TORCH_INSTALL_LIB_DIR

e80d2af

ghstack-source-id: e4d0faa Pull-Request-resolved: #158235

kwen2501 mentioned this pull request Jul 14, 2025

[BE] Rename libnvshmem_extension to libtorch_nvshmem #158234

Closed

kwen2501 requested review from atalman and fduwjj July 14, 2025 15:06

kwen2501 added the release notes: distributed (symm_mem) release note label for symmetric memory label Jul 14, 2025

Skylion007 approved these changes Jul 14, 2025

View reviewed changes

pytorchmergebot added the merging label Jul 16, 2025

pytorchmergebot added the Merged label Jul 16, 2025

pytorchmergebot closed this in 5763ec5 Jul 16, 2025

pytorchmergebot removed the merging label Jul 16, 2025

github-actions bot deleted the gh/kwen2501/196/head branch August 16, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BE] Replace lib with TORCH_INSTALL_LIB_DIR #158235

[BE] Replace lib with TORCH_INSTALL_LIB_DIR #158235

Uh oh!

kwen2501 commented Jul 14, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

kwen2501 commented Jul 16, 2025

Uh oh!

pytorchmergebot commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BE] Replace lib with TORCH_INSTALL_LIB_DIR #158235

[BE] Replace lib with TORCH_INSTALL_LIB_DIR #158235

Uh oh!

Conversation

kwen2501 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158235

❌ 5 New Failures, 1 Unrelated Failure

Uh oh!

kwen2501 commented Jul 16, 2025

Uh oh!

pytorchmergebot commented Jul 16, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kwen2501 commented Jul 14, 2025 •

edited

Loading

pytorch-bot bot commented Jul 14, 2025 •

edited

Loading