[DebugMode] Fix hash for 0 ele tensor; Add more tests #169027

yushangdi · 2025-11-25T00:46:10Z

When tensor numel is 0, we let the hash be 0 instead of hashing, because torch.hash_tensor doesn't work for 0 numel tensors
Add some tests for distributed

pytorch-bot · 2025-11-25T00:46:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169027

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit bf2047a with merge base 481e5ab ():

NEW FAILURE - The following job has failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh)
test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/utils/_debug_mode.py

ngimel · 2025-11-25T03:04:31Z

torch/utils/_debug_mode.py

+    if t.numel() > 0:
+        out = torch.hash_tensor(t_clean)
+    else:
+        out = torch.tensor(0)


you probably still want to avoid sync here, out = torch.zeros((), device=t_clean.device)

torch/utils/_debug_mode.py

yushangdi · 2025-12-01T17:56:14Z

@pytorchbot rebase

pytorchmergebot · 2025-12-01T17:57:55Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-12-01T17:57:57Z

Tried to rebase and push PR #169027, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

yushangdi · 2025-12-01T18:06:53Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-12-01T18:08:26Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-12-01T18:08:29Z

Successfully rebased sy_debug_mode_test onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout sy_debug_mode_test && git pull --rebase)

…hashing outputs" this is an attempt to re-land #168119 with a few tweaks: (1) for non-functional collectives, only wait on the work item with `async=True`. [See comment](#168119 (comment)) (2) For functional collectives, we can always call `wait_tensor` on the output. The test in this PR will probably conflict with the test in #169027, so ill wait for that PR to land first and rebase. [ghstack-poisoned]

this is an attempt to re-land #168119 with a few tweaks: (1) for non-functional collectives, only wait on the work item with `async=True`. [See comment](#168119 (comment)) (2) For functional collectives, we can always call `wait_tensor` on the output. The test in this PR will probably conflict with the test in #169027, so ill wait for that PR to land first and rebase. [ghstack-poisoned]

yushangdi · 2025-12-02T01:29:10Z

@pytorchbot merge -i

pytorchmergebot · 2025-12-02T01:31:47Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

- When tensor numel is 0, we let the hash be 0 instead of hashing, because torch.hash_tensor doesn't work for 0 numel tensors - Add some tests for distributed Pull Request resolved: #169027 Approved by: https://github.com/xmfan, https://github.com/ngimel

ngimel reviewed Nov 25, 2025

View reviewed changes

torch/utils/_debug_mode.py Outdated Show resolved Hide resolved

yushangdi force-pushed the sy_debug_mode_test branch 2 times, most recently from 8b10b27 to a8651a1 Compare November 25, 2025 01:07

yushangdi requested review from ngimel and pianpwk November 25, 2025 01:07

yushangdi changed the title ~~Add debug mode tests~~ [DebugMode] Fix hash for 0 ele tensor; Add more tests Nov 25, 2025

xmfan approved these changes Nov 25, 2025

View reviewed changes

torch/utils/_debug_mode.py Outdated Show resolved Hide resolved

yushangdi force-pushed the sy_debug_mode_test branch 2 times, most recently from 541de94 to 094a7a4 Compare November 25, 2025 01:16

yushangdi added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 25, 2025

ngimel reviewed Nov 25, 2025

View reviewed changes

yushangdi force-pushed the sy_debug_mode_test branch from 094a7a4 to cc76887 Compare November 25, 2025 04:17

ngimel reviewed Nov 25, 2025

View reviewed changes

torch/utils/_debug_mode.py Outdated Show resolved Hide resolved

yushangdi force-pushed the sy_debug_mode_test branch 2 times, most recently from 0a262b2 to 497d194 Compare November 26, 2025 00:21

ngimel approved these changes Nov 26, 2025

View reviewed changes

yushangdi force-pushed the sy_debug_mode_test branch from 497d194 to 964341a Compare November 26, 2025 18:35

bdhirsh mentioned this pull request Dec 1, 2025

make DebugMode wait on collectives before hashing outputs #169295

Open

Add debug mode tests

bf2047a

pytorchmergebot force-pushed the sy_debug_mode_test branch from 964341a to bf2047a Compare December 1, 2025 18:08

pytorchmergebot added the merging label Dec 2, 2025

pytorchmergebot closed this in 2d1f78f Dec 2, 2025

pytorchmergebot added Merged and removed merging labels Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DebugMode] Fix hash for 0 ele tensor; Add more tests #169027

[DebugMode] Fix hash for 0 ele tensor; Add more tests #169027

Uh oh!

yushangdi commented Nov 25, 2025

Uh oh!

pytorch-bot bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ngimel Nov 25, 2025

Uh oh!

Uh oh!

yushangdi commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

yushangdi commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

yushangdi commented Dec 2, 2025

Uh oh!

pytorchmergebot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[DebugMode] Fix hash for 0 ele tensor; Add more tests #169027

[DebugMode] Fix hash for 0 ele tensor; Add more tests #169027

Uh oh!

Conversation

yushangdi commented Nov 25, 2025

Uh oh!

pytorch-bot bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169027

❌ 1 New Failure

Uh oh!

Uh oh!

Uh oh!

ngimel Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yushangdi commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

yushangdi commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

yushangdi commented Dec 2, 2025

Uh oh!

pytorchmergebot commented Dec 2, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Nov 25, 2025 •

edited

Loading