Fix assertion failure in gemm template lowering #146353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

dmpots wants to merge 1 commit into pytorch:main from dmpots:export-D68814625

Contributor

dmpots commented Feb 3, 2025 •

edited

Loading

Summary:
This commit fixes a crash in the gemm template lowering caused by hitting an assert that a buffer was previously removed.

The assert triggers because in the first gemm lowering we use a local accumulation buffer, which causes the original buffer name to be added to the removed_buffers set. Then in the next gemm lowering we use the global buffer for accumulation, but that buffer name is already in the removed_buffers set.

The fix is to add a unique suffix to the buffer name to avoid triggering the assert from different gemm lowerings.

Differential Revision: D68814625

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @desertfire @chauhang @aakhundov

pytorch-bot bot added ciflow/inductor module: inductor labels

pytorch-bot bot commented Feb 3, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146353

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 5 Pending, 1 Unrelated Failure

As of commit b212a54 with merge base bc01918 ():

NEW FAILURE - The following job has failed:

trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh)
'Test'

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh) (similar failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Contributor

facebook-github-bot commented Feb 3, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625

facebook-github-bot added the fb-exported label

Contributor Author

dmpots commented Feb 3, 2025

@pytorchbot label "topic: not user facing"

pytorch-bot bot added the topic: not user facing label

dmpots force-pushed the export-D68814625 branch from aaf91e7 to 2b2c301 Compare

February 3, 2025 23:31

dmpots added a commit to dmpots/pytorch that referenced this pull request


          Fix assertion failure in gemm template lowering (pytorch#146353)

2b2c301

Summary:

This commit fixes a crash in the gemm template lowering caused by hitting an [assert](https://github.com/pytorch/pytorch/blob/fd515e4f59bfa0ac9faa5185b7a02f3222c4cd08/torch/_inductor/codegen/common.py#L1181) that a buffer was previously removed.

The assert triggers because in the first gemm lowering we use a local accumulation buffer, which causes the original buffer name to be added to the `removed_buffers` set. Then in the next gemm lowering we use the global buffer for accumulation, but that buffer name is already in the `removed_buffers` set.

The fix is to add a unique suffix to the buffer name to avoid triggering the assert from different gemm lowerings.

Test Plan:
# Reduced test case
```
TRITON_LOCAL_BUILD=1 buck2 run 'fbcode//mode/opt' fbcode//caffe2/test/inductor:cpu_select_algorithm_cpu  -- caffe2.test.inductor.test_cpu_select_algorithm.TestSelectAlgorithmCPU.test_local_and_global_accumulator_cpu_float32
```

# Original failure
```
$ manifold get aitemplate/tree/aotinductor_cpu/915857944_1.input.predictor.disagg.remote_other /tmp/915857944_1.input.predictor.disagg.remote_other

$ buck2 run @//mode/opt //deeplearning/aot_inductor/cpu:cli -- --local-model-path /tmp/915857944_1.input.predictor.disagg.remote_other --submodule remote_other --preset ads_second_stage_ranking_type_1
```

Differential Revision: D68814625

Contributor

facebook-github-bot commented Feb 3, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625

Contributor

hl475 commented Feb 4, 2025

@leslie-fang-intel and @frost-intel - could you please help take a look? we found another issue when trying autuotune from Meta internal models, and David comes up with this fix

leslie-fang-intel reviewed

View reviewed changes

test/inductor/test_cpu_select_algorithm.py Show resolved Hide resolved

dmpots force-pushed the export-D68814625 branch from 2b2c301 to 7fed581 Compare

February 6, 2025 03:03

Contributor

facebook-github-bot commented Feb 6, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625

dmpots added a commit to dmpots/pytorch that referenced this pull request


          Fix assertion failure in gemm template lowering (pytorch#146353)

b48f0db

Summary:

This commit fixes a crash in the gemm template lowering caused by hitting an [assert](https://github.com/pytorch/pytorch/blob/fd515e4f59bfa0ac9faa5185b7a02f3222c4cd08/torch/_inductor/codegen/common.py#L1181) that a buffer was previously removed.

The assert triggers because in the first gemm lowering we use a local accumulation buffer, which causes the original buffer name to be added to the `removed_buffers` set. Then in the next gemm lowering we use the global buffer for accumulation, but that buffer name is already in the `removed_buffers` set.

The fix is to add a unique suffix to the buffer name to avoid triggering the assert from different gemm lowerings.

Test Plan:
# Reduced test case
```
TRITON_LOCAL_BUILD=1 buck2 run 'fbcode//mode/opt' fbcode//caffe2/test/inductor:cpu_select_algorithm_cpu  -- caffe2.test.inductor.test_cpu_select_algorithm.TestSelectAlgorithmCPU.test_local_and_global_accumulator_cpu_float32
```

# Original failure
```
$ manifold get aitemplate/tree/aotinductor_cpu/915857944_1.input.predictor.disagg.remote_other /tmp/915857944_1.input.predictor.disagg.remote_other

$ buck2 run @//mode/opt //deeplearning/aot_inductor/cpu:cli -- --local-model-path /tmp/915857944_1.input.predictor.disagg.remote_other --submodule remote_other --preset ads_second_stage_ranking_type_1
```

Differential Revision: D68814625

dmpots force-pushed the export-D68814625 branch from 7fed581 to b48f0db Compare

February 6, 2025 04:12

Contributor

facebook-github-bot commented Feb 6, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625

frost-intel reviewed

View reviewed changes

torch/_inductor/codegen/cpp_gemm_template.py Show resolved Hide resolved

dmpots force-pushed the export-D68814625 branch from b48f0db to fb64ee6 Compare

February 7, 2025 01:00

dmpots added a commit to dmpots/pytorch that referenced this pull request


          Fix assertion failure in gemm template lowering (pytorch#146353)

fb64ee6

Summary:

This commit fixes a crash in the gemm template lowering caused by hitting an [assert](https://github.com/pytorch/pytorch/blob/fd515e4f59bfa0ac9faa5185b7a02f3222c4cd08/torch/_inductor/codegen/common.py#L1181) that a buffer was previously removed.

The assert triggers because in the first gemm lowering we use a local accumulation buffer, which causes the original buffer name to be added to the `removed_buffers` set. Then in the next gemm lowering we use the global buffer for accumulation, but that buffer name is already in the `removed_buffers` set.

The fix is to add a unique suffix to the buffer name to avoid triggering the assert from different gemm lowerings.

Test Plan:
# Reduced test case
```
TRITON_LOCAL_BUILD=1 buck2 run 'fbcode//mode/opt' fbcode//caffe2/test/inductor:cpu_select_algorithm_cpu  -- caffe2.test.inductor.test_cpu_select_algorithm.TestSelectAlgorithmCPU.test_local_and_global_accumulator_cpu_float32
```

# Original failure
```
$ manifold get aitemplate/tree/aotinductor_cpu/915857944_1.input.predictor.disagg.remote_other /tmp/915857944_1.input.predictor.disagg.remote_other

$ buck2 run @//mode/opt //deeplearning/aot_inductor/cpu:cli -- --local-model-path /tmp/915857944_1.input.predictor.disagg.remote_other --submodule remote_other --preset ads_second_stage_ranking_type_1
```

Differential Revision: D68814625

Contributor

facebook-github-bot commented Feb 7, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625

leslie-fang-intel approved these changes

View reviewed changes

pytorch-bot bot added the ciflow/trunk label

frost-intel approved these changes

View reviewed changes

dmpots force-pushed the export-D68814625 branch from fb64ee6 to a8a5d97 Compare

February 7, 2025 18:21

Contributor

facebook-github-bot commented Feb 7, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625


          Fix assertion failure in gemm template lowering (pytorch#146353)

b212a54

Summary:

This commit fixes a crash in the gemm template lowering caused by hitting an [assert](https://github.com/pytorch/pytorch/blob/fd515e4f59bfa0ac9faa5185b7a02f3222c4cd08/torch/_inductor/codegen/common.py#L1181) that a buffer was previously removed.

The assert triggers because in the first gemm lowering we use a local accumulation buffer, which causes the original buffer name to be added to the `removed_buffers` set. Then in the next gemm lowering we use the global buffer for accumulation, but that buffer name is already in the `removed_buffers` set.

The fix is to add a unique suffix to the buffer name to avoid triggering the assert from different gemm lowerings.

Test Plan:
# Reduced test case
```
TRITON_LOCAL_BUILD=1 buck2 run 'fbcode//mode/opt' fbcode//caffe2/test/inductor:cpu_select_algorithm_cpu  -- caffe2.test.inductor.test_cpu_select_algorithm.TestSelectAlgorithmCPU.test_local_and_global_accumulator_cpu_float32
```

# Original failure
```
$ manifold get aitemplate/tree/aotinductor_cpu/915857944_1.input.predictor.disagg.remote_other /tmp/915857944_1.input.predictor.disagg.remote_other

$ buck2 run @//mode/opt //deeplearning/aot_inductor/cpu:cli -- --local-model-path /tmp/915857944_1.input.predictor.disagg.remote_other --submodule remote_other --preset ads_second_stage_ranking_type_1
```

Differential Revision: D68814625

dmpots force-pushed the export-D68814625 branch from a8a5d97 to b212a54 Compare

February 7, 2025 18:58

Contributor

facebook-github-bot commented Feb 7, 2025

This pull request was exported from Phabricator. Differential Revision: D68814625

hl475 approved these changes

View reviewed changes

Contributor

hl475 commented Feb 7, 2025

@pytorchbot merge

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Feb 7, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Collaborator

pytorchmergebot commented Feb 7, 2025

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label

Contributor

hl475 commented Feb 8, 2025

@pytorchbot merge -f "the failed test is irrelevant with this PR"

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Feb 8, 2025

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

9c78fb9

pytorchmergebot removed the merging label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk fb-exported Merged module: inductor topic: not user facing