Merge OpenAI commit `dbc85fc` #5210

anmyachev · 2025-09-29T16:35:14Z

This PR change the Triton base from 1b27b93 to dbc85fc (Sep 23).
Pass rate: 96.32%->96.23%

Please do not squash and merge this PR.

As per title

Failing to verify this can cause all sorts of weird crashes in the frontend due to violating invariants when calling into dialect (C++) code.

A simple transpose construction of the offsets tensor, followed by a trans() operation, results in conservative contiguity analysis in AxisInfo. The expected behavior is supposed to similar to a contiguous offsets tensor construction (without transpose op). ``` @triton.jit def transpose_read_kernel( X_ptr, stride_xa, stride_xb, ): offsets = (tl.arange(0, 64)[:, None] * stride_xb + tl.arange(0, 64)[None, :] * stride_xa) offsets = tl.trans(offsets, (1, 0)) # remark: %11 = tt.trans %10 {order = array<i32: 1, 0>} : tensor<64x64xi32> -> tensor<64x64xi32> => # contiguity = [1, 1], divisibility = [1, 1], constancy = [1, 1], constant_value = <none> # ideal remark: # contiguity = [1, 64], divisibility = [2, 16], constancy = [1, 1], constant_value = <none> tl.async_load(X_ptr + offsets, buffer) if __name__ == "__main__": x = torch.randn( (128, 128), device="cuda", dtype=torch.float16, ) transpose_read_kernel[(1,)]( x, x.stride(0), x.stride(1), ) ``` `TransOp` did not have an `AxisInfo` visitor, which was causing it to fall back to pessimistic defaults that don't properly propagate contiguity information. This PR adds a new visitor that handles transpose operations in the `AxisInfo` lattice.  # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [x] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [x] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) Co-authored-by: pka <pka@devgpu009.pnb3.facebook.com>

Adds a condition on `TRITON_BUILD_UT` before including Proton tests. When `TRITON_BUILD_UT` is `OFF` without adding this condition, a build failure occurs. This is because the Proton tests cmake calls the `add_triton_ut` function, however this function is not declared when `TRITON_BUILD_UT` is `OFF`. # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because it's a build bug fix. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

* split_k was not working with batched matmul, I'll just disable it for now. * Epilogue handling was missing in _reduce_grouped.

Previous pc sampling and cupti uses different format, now that they use the same format as "file_name:line_number@function_name"

This commit switches to use a basic heuristic for improving support of preshuffled scale tensors--we try a few common scale tensor schemes and see which one gives the largest vectorization when global load.

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

lezcano and others added 9 commits September 22, 2025 09:54

[LAYOUTS] Have MemDesc{Trans,Reshape} accept equivalent layouts (#8251)

f22c53a

As per title

[Gluon] Verify tensor rank and layout rank match (#8242)

b4cd36e

Failing to verify this can cause all sorts of weird crashes in the frontend due to violating invariants when calling into dialect (C++) code.

[KERNELS] two small split-k fixes. (#8252)

15cefc9

* split_k was not working with batched matmul, I'll just disable it for now. * Epilogue handling was missing in _reduce_grouped.

[PROTON] Unify python frame representation (#8241)

e90d5a3

Previous pc sampling and cupti uses different format, now that they use the same format as "file_name:line_number@function_name"

[AMD] Remove specific scale preshuffle pattern match (#8247)

83683fc

This commit switches to use a basic heuristic for improving support of preshuffled scale tensors--we try a few common scale tensor schemes and see which one gives the largest vectorization when global load.

[PROTON] Simplify backend lib settings (#8246)

dbc85fc

Merge commit 'dbc85fc3f285394f24273e200d95b4142541e809'

79e91eb

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev changed the title ~~Merge OpenAI commit dbc85fc3f285394f24273e200d95b4142541e809~~ Merge OpenAI commit dbc85fc Sep 29, 2025

anmyachev requested a review from whitneywhtsang September 29, 2025 16:36

whitneywhtsang approved these changes Sep 29, 2025

View reviewed changes

[intel] update triton_kernels skiplist after '15cefc9'

67a7052

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev marked this pull request as ready for review September 29, 2025 19:13

whitneywhtsang merged commit b7a6ffb into main Sep 29, 2025
21 checks passed

whitneywhtsang deleted the amyachev/merge89 branch September 29, 2025 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI commit `dbc85fc` #5210

Merge OpenAI commit `dbc85fc` #5210

Uh oh!

anmyachev commented Sep 29, 2025 •

edited by whitneywhtsang

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Merge OpenAI commit dbc85fc #5210

Merge OpenAI commit dbc85fc #5210

Uh oh!

Conversation

anmyachev commented Sep 29, 2025 • edited by whitneywhtsang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Merge OpenAI commit `dbc85fc` #5210

Merge OpenAI commit `dbc85fc` #5210

anmyachev commented Sep 29, 2025 •

edited by whitneywhtsang

Loading