Skip to content

Conversation

yushangdi
Copy link
Contributor

@yushangdi yushangdi commented May 8, 2025

Summary:
Solves #151925

A reland of #152125.

added a try-except around the justknob internally. Also added more documentation

Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all _assert_scalar calls in the input graph.

Also factored out the run time assertion logic to a separate function.

    We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops,

because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already.

One example is below:

        class Model(torch.nn.Module):
            def forward(self, a, b, c):
                nz = torch.nonzero(a)
                ones = a.new_ones([nz.size(0), b.size(0)])
                torch._check(ones.size(0) >= 1)
                equals = torch.add(ones, c)
                return equals
        torch._dynamo.mark_dynamic(c, 0)

When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv.
See test_unbacked_equals_input_size_runtime_assertion in test_aot_inductor.

In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below:

        class Model(torch.nn.Module):
            def forward(self, x):
                y = x.reshape(100, -1).clone()
                y = y + 1
                return y
        
        dynamic_shapes = {
            "x": {0: torch.export.Dim.DYNAMIC},
        }
        x.shape[0] needs to be a multiple of 100.

See test_aoti_runtime_asserts_backed_symint in test_aot_inductor.

Example:

    def forward(self):
        arg0_1: "f32[s35]";

        arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0)

         #
        mod: "Sym(Mod(s35, 100))" = sym_size_int % 100;  sym_size_int = None
        eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0;  mod = None
        _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'");  eq_2 = _assert_scalar = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]);  arg0_1 = None
        clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view);  view = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1
        add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1);  clone = None
        return (add_6,)

Generated cpp code:

    auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1);
    auto arg0_1 = std::move(inputs[0]);
    auto arg0_1_size = arg0_1.sizes();
    int64_t s35 = arg0_1_size[0];
    inputs.clear();
    auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
    if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); }

Test Plan:

buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify
TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda
TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1  buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r  test_unbacked_equals_input_size

Differential Revision: D74361799

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Copy link

pytorch-bot bot commented May 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153182

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit e5f07c3 with merge base 192f714 (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74361799

@henrylhtsang
Copy link
Contributor

LGTM cc @hl475

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 8, 2025
@yushangdi yushangdi force-pushed the export-D74361799 branch from abbfc76 to d8b97db Compare May 9, 2025 16:10
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74361799

@yushangdi yushangdi force-pushed the export-D74361799 branch 2 times, most recently from 9c61038 to 1743136 Compare May 9, 2025 17:53
Summary:
Pull Request resolved: pytorch#153182

Solves pytorch#151925

A reland of pytorch#152125.

added a try-except around the justknob internally. Also added more documentation

Currently, AOTI only generate runtime asserts for unbacked symints. We should generate asserts for all `_assert_scalar` calls in the input graph.

Also factored out the run time assertion logic to a separate function.

        We need to generate runtime asserts directly in Inductor instead of just re-using the asserts from input graphs becase we reuse the same ShapeEnv as before. In particular, on subsequent graph passes, we would immediately turn all of these assertions into noops,
because when we evaluated their expressions, we would see that because we had a deferred runtime assert in the ShapeEnv, we know "oh, of course this expression is True" already.

One example is below:
```
        class Model(torch.nn.Module):
            def forward(self, a, b, c):
                nz = torch.nonzero(a)
                ones = a.new_ones([nz.size(0), b.size(0)])
                torch._check(ones.size(0) >= 1)
                equals = torch.add(ones, c)
                return equals
        torch._dynamo.mark_dynamic(c, 0)
```
When we re-use the ShapeEnv in Inductor lowering, the check that checks a and nonzero have the same shape would be evaluted to True after we resolve unbacked bindings using the ShapeEnv.
See `test_unbacked_equals_input_size_runtime_assertion` in test_aot_inductor.

In addition to the Inductor generated runtime asserts, we also need the runtime asserts from the input graph, because some derived runtime asserts are not generated in Inductor. One example is below:
```
        class Model(torch.nn.Module):
            def forward(self, x):
                y = x.reshape(100, -1).clone()
                y = y + 1
                return y

        dynamic_shapes = {
            "x": {0: torch.export.Dim.DYNAMIC},
        }
        x.shape[0] needs to be a multiple of 100.
```
See `test_aoti_runtime_asserts_backed_symint` in test_aot_inductor.

Example:

```
    def forward(self):
        arg0_1: "f32[s35]";

        arg0_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        sym_size_int: "Sym(s35)" = torch.ops.aten.sym_size.int(arg0_1, 0)

         #
        mod: "Sym(Mod(s35, 100))" = sym_size_int % 100;  sym_size_int = None
        eq_2: "Sym(Eq(Mod(s35, 100), 0))" = mod == 0;  mod = None
        _assert_scalar = torch.ops.aten._assert_scalar.default(eq_2, "Runtime assertion failed for expression Eq(Mod(s35, 100), 0) on node 'eq'");  eq_2 = _assert_scalar = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:11 in forward, code: y = x.reshape(100, -1).clone()
        view: "f32[100, (s35//100)]" = torch.ops.aten.reshape.default(arg0_1, [100, -1]);  arg0_1 = None
        clone: "f32[100, (s35//100)]" = torch.ops.aten.clone.default(view);  view = None

         # File: /data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/73a672eb896e7996/scripts/shangdiy/__pt__/pt#link-tree/scripts/shangdiy/pt.py:12 in forward, code: y = y + 1
        add_6: "f32[100, 1]" = torch.ops.aten.add.Tensor(clone, 1);  clone = None
        return (add_6,)
```

Generated cpp code:

```
    auto inputs = steal_from_raw_handles_to_raii_handles(input_handles, 1);
    auto arg0_1 = std::move(inputs[0]);
    auto arg0_1_size = arg0_1.sizes();
    int64_t s35 = arg0_1_size[0];
    inputs.clear();
    auto& kernels = static_cast<AOTInductorModelKernels&>(*this->kernels_.get());
    if (!((s35 % 100L) == 0L)) { throw std::runtime_error("Expected Eq(Mod(s35, 100), 0) to be True but received " + std::to_string(s35)); }
```

Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_runtime_asserts_backed_symint
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchinductor_dynamic_shapes -- -r test_unbacked_floordiv_simplify
TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1 buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_i64_input_codegen_cuda
TORCHINDUCTOR_SCALAR_ASSERTS_FULL=1  buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r  test_unbacked_equals_input_size
```

Reviewed By: henrylhtsang

Differential Revision: D74361799
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74361799

@yushangdi yushangdi force-pushed the export-D74361799 branch from 1743136 to e5f07c3 Compare May 9, 2025 18:00
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, ephemeral.linux.2xlarge), inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants