Skip to content

Conversation

coconutruben
Copy link
Contributor

@coconutruben coconutruben commented Aug 6, 2025

Stack from ghstack (oldest at bottom):

Summary:

This reverts the part of #159383 for scaled_mm where now, like before,
we pass through the normal input_nodes (not the triton_input_nodes)
to select_algorithm

This fixes that by passing the original input nodes rather
than the triton input nodes.

Test Plan:

buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)'

This set of tests was failing, and is passing now

Side note: these tests were failing I believe because the unsqueezed
bias made the ATEN choice no longer eligible, and there is some minor
numerical discrepancy between ATEN and Triton for this. I'm not sure
the test should be written like that, as we're implicitly relying on
ATEN being the choice here.

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D79717654

Summary:

This reverts the part of #159383 for scaled_mm where now, like before,
we pass through the normal input_nodes (not the triton_input_nodes)
to select_algorithm

- #159383 refactored how kwargs are retrieved
- it introduced this notion of KernelInputs that wrap input_nodes
- scaled_mm uses unsqueezed input nodes for triton to retrieve params
- the issue: it uses a squeezed (regular) bias for select_algorithm
  instead

This fixes that by passing the original input nodes rather
than the triton input nodes.

Test Plan:

```
buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
```

This set of tests was failing, and is passing now

Side note: these tests were failing I believe because the unsqueezed
bias made the ATEN choice no longer eligible, and there is some minor
numerical discrepancy between ATEN and Triton for this. I'm not sure
the test should be written like that, as we're implicitly relying on
ATEN being the choice here.

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Aug 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159948

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 94484c6 with merge base 74a754a (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

  • pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
    /var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

coconutruben added a commit that referenced this pull request Aug 6, 2025
Summary:

This reverts the part of #159383 for scaled_mm where now, like before,
we pass through the normal input_nodes (not the triton_input_nodes)
to select_algorithm

- #159383 refactored how kwargs are retrieved
- it introduced this notion of KernelInputs that wrap input_nodes
- scaled_mm uses unsqueezed input nodes for triton to retrieve params
- the issue: it uses a squeezed (regular) bias for select_algorithm
  instead

This fixes that by passing the original input nodes rather
than the triton input nodes.

Test Plan:

```
buck test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
buck test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
```

This set of tests was failing, and is passing now

Side note: these tests were failing I believe because the unsqueezed
bias made the ATEN choice no longer eligible, and there is some minor
numerical discrepancy between ATEN and Triton for this. I'm not sure
the test should be written like that, as we're implicitly relying on
ATEN being the choice here.

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: b7b8caa
Pull Request resolved: #159948
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 6, 2025
@coconutruben coconutruben requested a review from eellison August 6, 2025 09:48
@coconutruben coconutruben added the topic: not user facing topic category label Aug 6, 2025
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test ?

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/coconutruben/27/head branch September 6, 2025 02:07
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
Summary:

This reverts the part of pytorch#159383 for scaled_mm where now, like before,
we pass through the normal input_nodes (not the triton_input_nodes)
to select_algorithm

- pytorch#159383 refactored how kwargs are retrieved
- it introduced this notion of KernelInputs that wrap input_nodes
- scaled_mm uses unsqueezed input nodes for triton to retrieve params
- the issue: it uses a squeezed (regular) bias for select_algorithm
  instead

This fixes that by passing the original input nodes rather
than the triton input nodes.

Test Plan:

```
buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
buck test '@fbcode//mode/opt' fbcode//caffe2/test/inductor:fp8 -- --exact 'caffe2/test/inductor:fp8 - test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_True (caffe2.test.inductor.test_fp8.TestFP8Lowering)'
```

This set of tests was failing, and is passing now

Side note: these tests were failing I believe because the unsqueezed
bias made the ATEN choice no longer eligible, and there is some minor
numerical discrepancy between ATEN and Triton for this. I'm not sure
the test should be written like that, as we're implicitly relying on
ATEN being the choice here.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D79717654](https://our.internmc.facebook.com/intern/diff/D79717654)
Pull Request resolved: pytorch#159948
Approved by: https://github.com/izaitsevfb, https://github.com/eellison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants