Fix unbacked replacement where LHS is purely backed expr and RHS is unbacked expr #164013

ColinPeppler · 2025-09-26T22:23:26Z

Scenario

If there's a torch._check(backed_expr == unbacked_symbol)
then we should replace unbacked_symbol for backed_expr
currently, we don't do that when generating inputs for autotune_at_compile_time

Error traceback

$ python test/inductor/test_aot_inductor.py -k test_size_with_unbacked_add_expr_transitive
  ...
  File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1696, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1187, in codegen_and_compile
    dynamo_utils.preserve_rng_state(),
  File "/home/colinpeppler/.conda/envs/pytorch/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/data/users/colinpeppler/pytorch/torch/_dynamo/utils.py", line 2236, in preserve_rng_state
    torch.cuda.set_rng_state(cuda_rng_state)  # type: ignore[possibly-undefined]
  File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 79, in set_rng_state
    _lazy_call(cb)
  File "/data/users/colinpeppler/pytorch/torch/cuda/__init__.py", line 341, in _lazy_call
    callable()
  File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 77, in cb
    default_generator.set_state(new_state)
torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Bad autotuning input generation

# assume unbacked_symint_fallback = 16
# we generate too small of an input (16)
buf11 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256))
triton_poi_fused_ones_1.run(buf11, 4096, stream=stream0)

stream0 = get_raw_stream(0)
buf12 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256))
buf13 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256))
add_kernel_1.run(buf11, buf12, buf13, 4096, 16, 1, 1, stream=stream0)
del buf11, buf12

stream0 = get_raw_stream(0)
buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256))
triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0)

Good autotuning input generation

# notice we generate with the proper size now (10500)
buf11 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256))
triton_poi_fused_ones_1.run(buf11, 2688000, stream=stream0)

stream0 = get_raw_stream(0)
buf12 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256))
buf13 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256))
add_kernel_1.run(buf11, buf12, buf13, 2688000, 10500, 1, 1, stream=stream0)
del buf11, buf12

stream0 = get_raw_stream(0)
buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256))
triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0)

Stack from ghstack (oldest at bottom):

-> Fix unbacked replacement where LHS is purely backed expr and RHS is unbacked expr #164013

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

…nbacked symbol [ghstack-poisoned]

pytorch-bot · 2025-09-26T22:23:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164013

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1d1f148 with merge base e21b037 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…nbacked symbol ghstack-source-id: 91b1f99 Pull Request resolved: #164013

laithsakka · 2025-09-26T23:22:16Z

torch/_inductor/sizevars.py

+            return all(self.shape_env.is_unbacked_symint(s) for s in expr.free_symbols)
+
+        def all_symbols_are_backed(expr: Expr) -> bool:
+            return all(symbol_is_type(s, SymT.SIZE) for s in expr.free_symbols)


is expr.free_symbols iterstable or does it materialize all symbols?
make sure we dont do this mistake
https://github.com/pytorch/pytorch/pull/140027/files

I'm using the free_symbols attribute stored on the sympy.Expr which stores the free symbols in a Set.

https://github.com/sympy/sympy/blob/6a47c4992f36cb298635b7240b662b4f5fad0a95/sympy/core/basic.py#L676-L723

shall we use an efficient API. like the one above?

laithsakka · 2025-09-26T23:22:25Z

torch/_inductor/sizevars.py

            return self.unbacked_replacements

+        def all_symbols_are_unbacked(expr: Expr) -> bool:
+            return all(self.shape_env.is_unbacked_symint(s) for s in expr.free_symbols)


ditto like other place

cp2923 · 2025-09-29T17:49:23Z

torch/_inductor/sizevars.py

            # assuming lhs is the expr to be replaced (src), rhs is the replacement (dst)
            # checking if we should keep them for the replacement rule or swap
+
+            if all_symbols_are_unbacked(lhs) and all_symbols_are_backed(rhs):


n00b q, is all_symbols_are_backed(rhs) good enough? why do we need to check all_symbols_are_unbacked(lhs) here and all_symbols_are_unbacked(rhs) in line 744?

that's a good point, likely all_symbols_are_backed(rhs) is sufficient given deferred runtime assertions should always include an unbacked symbol today.

maybe 'all_symbols_are_unbacked' -> 'some_symbols_are_backed'

cp2923 · 2025-09-29T18:16:53Z

torch/_inductor/sizevars.py

            else:
-                return lhs.compare(rhs) == 1  # see sympy.Basic.compare
+                # fallback to sympy.Basic.compare for a deterministic ordering
+                return lhs.compare(rhs) == 1


I guess exsting issue come from: sympy.Basic.compare does not prefer backed symint over unbacked?

it does prefer backed symbol over unbacked symbol, but not backed expression over unbacked symbol.

A symbol (backed or unbacked) is always preferred over an expression (e.g. sym1 + sym2)

…nd RHS is unbacked symbol" ## Scenario - If there's a `torch._check(backed_expr == unbacked_symbol)` - then we should replace unbacked_symbol for backed_expr - currently, we don't do that when generating inputs for autotune_at_compile_time ## Error traceback ``` $ python test/inductor/test_aot_inductor.py -k test_size_with_unbacked_add_expr_transitive ... File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1696, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1187, in codegen_and_compile dynamo_utils.preserve_rng_state(), File "/home/colinpeppler/.conda/envs/pytorch/lib/python3.12/contextlib.py", line 158, in __exit__ self.gen.throw(value) File "/data/users/colinpeppler/pytorch/torch/_dynamo/utils.py", line 2236, in preserve_rng_state torch.cuda.set_rng_state(cuda_rng_state) # type: ignore[possibly-undefined] File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 79, in set_rng_state _lazy_call(cb) File "/data/users/colinpeppler/pytorch/torch/cuda/__init__.py", line 341, in _lazy_call callable() File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 77, in cb default_generator.set_state(new_state) torch.AcceleratorError: CUDA error: an illegal memory access was encountered ``` ## Bad autotuning input generation ``` # assume unbacked_symint_fallback = 16 # we generate too small of an input (16) buf11 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) triton_poi_fused_ones_1.run(buf11, 4096, stream=stream0) stream0 = get_raw_stream(0) buf12 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) buf13 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) add_kernel_1.run(buf11, buf12, buf13, 4096, 16, 1, 1, stream=stream0) del buf11, buf12 stream0 = get_raw_stream(0) buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0) ``` ## Good autotuning input generation ``` # notice we generate with the proper size now (10500) buf11 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_ones_1.run(buf11, 2688000, stream=stream0) stream0 = get_raw_stream(0) buf12 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) buf13 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) add_kernel_1.run(buf11, buf12, buf13, 2688000, 10500, 1, 1, stream=stream0) del buf11, buf12 stream0 = get_raw_stream(0) buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

…nbacked symbol ghstack-source-id: c6bcbe6 Pull Request resolved: #164013

ColinPeppler · 2025-10-01T00:52:56Z

@pytorchbot merge

pytorchmergebot · 2025-10-01T00:54:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-01T02:09:05Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m2-15)

Details for Dev Infra team

Raised by workflow job

…nd RHS is unbacked symbol" ## Scenario - If there's a `torch._check(backed_expr == unbacked_symbol)` - then we should replace unbacked_symbol for backed_expr - currently, we don't do that when generating inputs for autotune_at_compile_time ## Error traceback ``` $ python test/inductor/test_aot_inductor.py -k test_size_with_unbacked_add_expr_transitive ... File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1696, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1187, in codegen_and_compile dynamo_utils.preserve_rng_state(), File "/home/colinpeppler/.conda/envs/pytorch/lib/python3.12/contextlib.py", line 158, in __exit__ self.gen.throw(value) File "/data/users/colinpeppler/pytorch/torch/_dynamo/utils.py", line 2236, in preserve_rng_state torch.cuda.set_rng_state(cuda_rng_state) # type: ignore[possibly-undefined] File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 79, in set_rng_state _lazy_call(cb) File "/data/users/colinpeppler/pytorch/torch/cuda/__init__.py", line 341, in _lazy_call callable() File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 77, in cb default_generator.set_state(new_state) torch.AcceleratorError: CUDA error: an illegal memory access was encountered ``` ## Bad autotuning input generation ``` # assume unbacked_symint_fallback = 16 # we generate too small of an input (16) buf11 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) triton_poi_fused_ones_1.run(buf11, 4096, stream=stream0) stream0 = get_raw_stream(0) buf12 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) buf13 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) add_kernel_1.run(buf11, buf12, buf13, 4096, 16, 1, 1, stream=stream0) del buf11, buf12 stream0 = get_raw_stream(0) buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0) ``` ## Good autotuning input generation ``` # notice we generate with the proper size now (10500) buf11 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_ones_1.run(buf11, 2688000, stream=stream0) stream0 = get_raw_stream(0) buf12 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) buf13 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) add_kernel_1.run(buf11, buf12, buf13, 2688000, 10500, 1, 1, stream=stream0) del buf11, buf12 stream0 = get_raw_stream(0) buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

…nbacked symbol ghstack-source-id: 5ef1883 Pull Request resolved: #164013

…nd RHS is unbacked symbol" ## Scenario - If there's a `torch._check(backed_expr == unbacked_symbol)` - then we should replace unbacked_symbol for backed_expr - currently, we don't do that when generating inputs for autotune_at_compile_time ## Error traceback ``` $ python test/inductor/test_aot_inductor.py -k test_size_with_unbacked_add_expr_transitive ... File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1696, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/data/users/colinpeppler/pytorch/torch/_inductor/compile_fx.py", line 1187, in codegen_and_compile dynamo_utils.preserve_rng_state(), File "/home/colinpeppler/.conda/envs/pytorch/lib/python3.12/contextlib.py", line 158, in __exit__ self.gen.throw(value) File "/data/users/colinpeppler/pytorch/torch/_dynamo/utils.py", line 2236, in preserve_rng_state torch.cuda.set_rng_state(cuda_rng_state) # type: ignore[possibly-undefined] File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 79, in set_rng_state _lazy_call(cb) File "/data/users/colinpeppler/pytorch/torch/cuda/__init__.py", line 341, in _lazy_call callable() File "/data/users/colinpeppler/pytorch/torch/cuda/random.py", line 77, in cb default_generator.set_state(new_state) torch.AcceleratorError: CUDA error: an illegal memory access was encountered ``` ## Bad autotuning input generation ``` # assume unbacked_symint_fallback = 16 # we generate too small of an input (16) buf11 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) triton_poi_fused_ones_1.run(buf11, 4096, stream=stream0) stream0 = get_raw_stream(0) buf12 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) buf13 = generate_example_value((16, 256), (256, 1), 'cuda:0', torch.float32, 0, (16, 256)) add_kernel_1.run(buf11, buf12, buf13, 4096, 16, 1, 1, stream=stream0) del buf11, buf12 stream0 = get_raw_stream(0) buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0) ``` ## Good autotuning input generation ``` # notice we generate with the proper size now (10500) buf11 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_ones_1.run(buf11, 2688000, stream=stream0) stream0 = get_raw_stream(0) buf12 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) buf13 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) add_kernel_1.run(buf11, buf12, buf13, 2688000, 10500, 1, 1, stream=stream0) del buf11, buf12 stream0 = get_raw_stream(0) buf15 = generate_example_value((10500, 256), (256, 1), 'cuda:0', torch.float32, 0, (10500, 256)) triton_poi_fused_add_mul_2.run(buf2, buf13, buf15, 2688000, stream=stream0) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

…nbacked symbol ghstack-source-id: 5ef1883 Pull Request resolved: #164013

ColinPeppler · 2025-10-02T02:33:24Z

@pytorchbot merge

pytorchmergebot · 2025-10-02T02:35:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix unbacked replacement where LHS is purely backed expr and RHS is u…

24b1e7d

…nbacked symbol [ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 26, 2025

ColinPeppler added a commit that referenced this pull request Sep 26, 2025

Fix unbacked replacement where LHS is purely backed expr and RHS is u…

e41531f

…nbacked symbol ghstack-source-id: 91b1f99 Pull Request resolved: #164013

ColinPeppler added the topic: not user facing topic category label Sep 26, 2025

ColinPeppler requested review from laithsakka and sevenEng September 26, 2025 22:31

laithsakka reviewed Sep 26, 2025

View reviewed changes

cp2923 approved these changes Sep 29, 2025

View reviewed changes

ColinPeppler added a commit that referenced this pull request Sep 30, 2025

Fix unbacked replacement where LHS is purely backed expr and RHS is u…

0cc301b

…nbacked symbol ghstack-source-id: c6bcbe6 Pull Request resolved: #164013

laithsakka approved these changes Sep 30, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 1, 2025

pytorchmergebot added the merging label Oct 1, 2025

ColinPeppler mentioned this pull request Oct 1, 2025

[atomically_apply_size_hint] Make unbacked replacements reconciles to a single expr #164324

Closed

pytorchmergebot removed the merging label Oct 1, 2025

ColinPeppler added a commit that referenced this pull request Oct 1, 2025

Fix unbacked replacement where LHS is purely backed expr and RHS is u…

72a683c

…nbacked symbol ghstack-source-id: 5ef1883 Pull Request resolved: #164013

ColinPeppler added 2 commits October 1, 2025 11:13

ColinPeppler added a commit that referenced this pull request Oct 1, 2025

Fix unbacked replacement where LHS is purely backed expr and RHS is u…

c148725

…nbacked symbol ghstack-source-id: 5ef1883 Pull Request resolved: #164013

ColinPeppler changed the title ~~Fix unbacked replacement where LHS is purely backed expr and RHS is unbacked symbol~~ Fix unbacked replacement where LHS is purely backed expr and RHS is unbacked expr Oct 1, 2025

pytorchmergebot added the merging label Oct 2, 2025

pytorchmergebot added the Merged label Oct 2, 2025

pytorchmergebot closed this in 5dbae1e Oct 2, 2025

pytorchmergebot removed the merging label Oct 2, 2025

Fix unbacked replacement where LHS is purely backed expr and RHS is unbacked expr #164013

Fix unbacked replacement where LHS is purely backed expr and RHS is unbacked expr #164013

Uh oh!

Conversation

ColinPeppler commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scenario

Error traceback

Bad autotuning input generation

Good autotuning input generation

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164013

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColinPeppler commented Oct 1, 2025

Uh oh!

pytorchmergebot commented Oct 1, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 1, 2025

Merge failed

Uh oh!

ColinPeppler commented Oct 2, 2025

Uh oh!

pytorchmergebot commented Oct 2, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ColinPeppler commented Sep 26, 2025 •

edited

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading