[reland] Add graph module runtime asserts to AOTI #153182

yushangdi · 2025-05-08T18:52:51Z

Summary:
Solves #151925

A reland of #152125.

added a try-except around the justknob internally. Also added more documentation

Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all _assert_scalar calls in the input graph.

Also factored out the run time assertion logic to a separate function.

    We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops,

because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already.

One example is below:

        class Model(torch.nn.Module):
            def forward(self, a, b, c):
                nz = torch.nonzero(a)
                ones = a.new_ones([nz.size(0), b.size(0)])
                torch._check(ones.size(0) >= 1)
                equals = torch.add(ones, c)
                return equals
        torch._dynamo.mark_dynamic(c, 0)

When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv.
See test_unbacked_equals_input_size_runtime_assertion in test_aot_inductor.

In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below:

        class Model(torch.nn.Module):
            def forward(self, x):
                y = x.reshape(100, -1).clone()
                y = y + 1
                return y
        
        dynamic_shapes = {
            "x": {0: torch.export.Dim.DYNAMIC},
        }
        x.shape[0] needs to be a multiple of 100.

See test_aoti_runtime_asserts_backed_symint in test_aot_inductor.

Example:

    def forward(self):
        arg0_1: "f32[s35]";

        arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0)

         #
        mod: "Sym(Mod(s35, 100))" = sym_size_int % 100;  sym_size_int = None
        eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0;  mod = None
        _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'");  eq_2 = _assert_scalar = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]);  arg0_1 = None
        clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view);  view = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1
        add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1);  clone = None
        return (add_6,)

Generated cpp code:

    auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1);
    auto arg0_1 = std::move(inputs[0]);
    auto arg0_1_size = arg0_1.sizes();
    int64_t s35 = arg0_1_size[0];
    inputs.clear();
    auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
    if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); }

Test Plan:

buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify
TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda
TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1  buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r  test_unbacked_equals_input_size

Differential Revision: D74361799

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-05-08T18:52:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153182

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit e5f07c3 with merge base 192f714 ():

NEW FAILURE - The following job has failed:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh)
REGRESSION: benchmark ('add_loop_inductor_dynamic_gpu', 'compile_time_instruction_count') failed, actual result 44105572998 is 2.67% higher than expected 42960000000 ±+2.50% if this is an expected regression, please update the expected results.

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (#152916)
[ FAILED ] AotInductorTest.BasicTestCpu
pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, ephemeral.linux.2xlarge) (gh) (#144480)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-05-08T18:53:02Z

This pull request was exported from Phabricator. Differential Revision: D74361799

henrylhtsang · 2025-05-08T18:57:25Z

LGTM cc @hl475

facebook-github-bot · 2025-05-09T16:14:18Z

This pull request was exported from Phabricator. Differential Revision: D74361799

Summary: Pull Request resolved: pytorch#153182 Solves pytorch#151925 A reland of pytorch#152125. added a try-except around the justknob internally. Also added more documentation Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph. Also factored out the run time assertion logic to a separate function. We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops, because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already. One example is below: ``` class Model(torch.nn.Module): def forward(self, a, b, c): nz = torch.nonzero(a) ones = a.new_ones([nz.size(0), b.size(0)]) torch._check(ones.size(0) >= 1) equals = torch.add(ones, c) return equals torch._dynamo.mark_dynamic(c, 0) ``` When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv. See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor. In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below: ``` class Model(torch.nn.Module): def forward(self, x): y = x.reshape(100, -1).clone() y = y + 1 return y dynamic_shapes = { "x": {0: torch.export.Dim.DYNAMIC}, } x.shape[0] needs to be a multiple of 100. ``` See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor. Example: ``` def forward(self): arg0_1: "f32[s35]"; arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec) # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0) # mod: "Sym(Mod(s35, 100))" = sym_size_int % 100; sym_size_int = None eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0; mod = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'"); eq_2 = _assert_scalar = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone() view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]); arg0_1 = None clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view); view = None # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1 add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1); clone = None return (add_6,) ``` Generated cpp code: ``` auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1); auto arg0_1 = std::move(inputs[0]); auto arg0_1_size = arg0_1.sizes(); int64_t s35 = arg0_1_size[0]; inputs.clear(); auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get()); if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); } ``` Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_unbacked_equals_input_size ``` Reviewed By: henrylhtsang Differential Revision: D74361799

facebook-github-bot · 2025-05-09T18:00:37Z

This pull request was exported from Phabricator. Differential Revision: D74361799

facebook-github-bot · 2025-05-09T22:48:36Z

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

pytorchmergebot · 2025-05-09T22:50:32Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, ephemeral.linux.2xlarge), inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ci-no-td Do not run TD on this PR ciflow/inductor module: inductor labels May 8, 2025

facebook-github-bot added the fb-exported label May 8, 2025

yushangdi requested review from ColinPeppler, henrylhtsang, hl475 and laithsakka May 8, 2025 18:55

yushangdi added the release notes: export label May 8, 2025

henrylhtsang approved these changes May 8, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 8, 2025

yushangdi force-pushed the export-D74361799 branch from abbfc76 to d8b97db Compare May 9, 2025 16:10

yushangdi force-pushed the export-D74361799 branch 2 times, most recently from 9c61038 to 1743136 Compare May 9, 2025 17:53

yushangdi force-pushed the export-D74361799 branch from 1743136 to e5f07c3 Compare May 9, 2025 18:00

pytorchmergebot added the merging label May 9, 2025

pytorchmergebot closed this in ee32613 May 9, 2025

pytorchmergebot added the Merged label May 9, 2025

pytorchmergebot removed the merging label May 9, 2025

yushangdi mentioned this pull request May 12, 2025

Add runtime asserts to AOTI #152125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[reland] Add graph module runtime asserts to AOTI #153182

[reland] Add graph module runtime asserts to AOTI #153182

Uh oh!

yushangdi commented May 8, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented May 8, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented May 8, 2025

Uh oh!

henrylhtsang commented May 8, 2025

Uh oh!

facebook-github-bot commented May 9, 2025

Uh oh!

facebook-github-bot commented May 9, 2025

Uh oh!

facebook-github-bot commented May 9, 2025

Uh oh!

pytorchmergebot commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[reland] Add graph module runtime asserts to AOTI #153182

[reland] Add graph module runtime asserts to AOTI #153182

Uh oh!

Conversation

yushangdi commented May 8, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153182

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

facebook-github-bot commented May 8, 2025

Uh oh!

henrylhtsang commented May 8, 2025

Uh oh!

facebook-github-bot commented May 9, 2025

Uh oh!

facebook-github-bot commented May 9, 2025

Uh oh!

facebook-github-bot commented May 9, 2025

Uh oh!

pytorchmergebot commented May 9, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yushangdi commented May 8, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 8, 2025 •

edited

Loading