[RFC] Add experimental Pallas TorchInductor backend #166822

oulgen · 2025-11-02T20:20:18Z

Stack from ghstack (oldest at bottom):

-> [RFC] Add experimental Pallas TorchInductor backend #166822

Very simple Pallas TorchInductor backend
Given

import torch

def f(x, y):
    return x.sin() + y

torch._inductor.config.cuda_backend="pallas"

x = torch.randn(4).cuda()
y = torch.randn(4).cuda()

compiled = torch.compile(f, backend="inductor", fullgraph=True)
torch.testing.assert_close(compiled(x, y), f(x, y))

it outputs

import torch
import jax
import jax.numpy as jnp
from jax.experimental import pallas as pl
from torch.utils import dlpack as torch_dlpack
def pallas_fused_add_sin_56b646d2_kernel(in_ptr0, in_ptr1, out_ptr0):
    tmp0 = in_ptr0[...]
    tmp1 = jnp.sin(tmp0)
    tmp2 = in_ptr1[...]
    tmp3 = tmp1 + tmp2
    out_ptr0[...] = tmp3
def pallas_fused_add_sin_56b646d2_main(in_ptr0, in_ptr1, out_ptr0, stream=None):
    # Convert Torch -> JAX for inputs
    in_ptr0_jax = jax.dlpack.from_dlpack(torch_dlpack.to_dlpack(in_ptr0))
    in_ptr1_jax = jax.dlpack.from_dlpack(torch_dlpack.to_dlpack(in_ptr1))
    # Prepare output spec from PyTorch tensor
    # Map PyTorch dtype to JAX dtype string
    _torch_dtype_to_jax = {
        torch.float32: jnp.float32, torch.float64: jnp.float64, torch.float16: jnp.float16,
        torch.int32: jnp.int32, torch.int64: jnp.int64, torch.int16: jnp.int16, torch.int8: jnp.int8,
        torch.uint8: jnp.uint8, torch.bool: jnp.bool_,
    }
    out_spec = jax.ShapeDtypeStruct(out_ptr0.shape, _torch_dtype_to_jax[out_ptr0.dtype])
    compiled = pl.pallas_call(
        lambda *refs: pallas_fused_add_sin_56b646d2_kernel(*refs),
        out_shape=out_spec,
        grid=(1,),
    )
    res = compiled(in_ptr0_jax, in_ptr1_jax)
    # Copy result back into the provided torch output tensor
    res_t = torch_dlpack.from_dlpack(jax.dlpack.to_dlpack(res))
    out_ptr0.copy_(res_t)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

[ghstack-poisoned]

pytorch-bot · 2025-11-02T20:20:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166822

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fac62ad with merge base d980d8d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4f6a4d3 Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 06d67aa Pull-Request: #166822

EikanWang · 2025-11-03T03:14:00Z

@oulgen , can I expect the Pallas to deliver a more competitive performance advantage compared to Gluon?

oulgen · 2025-11-03T03:16:45Z

@oulgen , can I expect the Pallas to deliver a more competitive performance advantage compared to Gluon?

No clue, probably not though considering gluon is a much lower level language able to express hardware semantics better

[ghstack-poisoned]

ghstack-source-id: a1d01c6 Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 4fe915b Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 2967eb8 Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 6ed1cab Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: de1080d Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 381db0b Pull-Request: #166822

jansel · 2025-11-04T04:23:26Z

torch/_inductor/codegen/pallas.py

+    - Compute expression with Python operators (compatible with jax.numpy broadcasting)
+    - Store as full-array ref assignment: "out_ptrY[...] = <expr>"
+    - Generate Python code that defines a Pallas kernel and a host entrypoint.
+    - Use async_compile.cutedsl path to compile and load Python code (generic wrapper).


jansel · 2025-11-04T04:24:14Z

torch/_inductor/codegen/pallas.py

+        # Pallas refs must be unpacked with [...] to load the array
+        return self.cse.generate(
+            self.compute,
+            f"{buf}[...]",


Add an assert based on index so this errors if the load order is not contiguous.

jansel · 2025-11-04T04:24:49Z

torch/_inductor/codegen/pallas.py

+        out = self.args.output(name)
+        self.store_buffer_names.add(name)
+        # Pallas refs must use [...] assignment to store back to the ref
+        self.stores.writeline(f"{out}[...] = {value}")


Add an assert based on index so this errors if the load order is not contiguous. Use a shared indexing helper to compute the "..."

jansel · 2025-11-04T04:27:27Z

torch/_inductor/codegen/pallas.py

+    @classmethod
+    def get_backend_features(cls, device: torch.device) -> OrderedSet[BackendFeature]:
+        # Start minimal: no special features advertised
+        return OrderedSet()


When you do reductions, consider reduce to single element here if that is something pallas can do fast. Basically, should we break single element output reductions into multiple kernels.

jansel · 2025-11-04T04:28:22Z

torch/utils/_pallas.py

+    if not has_pallas_package():
+        return False
+
+    import torch


You can import torch in global scope

yarongmu-google · 2025-11-04T17:15:01Z

torch/_inductor/codegen/common.py

        cuda_backends = {
            "triton": CUDACombinedScheduling,
            "halide": HalideScheduling,
+            "pallas": PallasScheduling,


Why is Pallas registered as a cuda backend? Asking from a technical perspective; for example, is this a placeholder, or perhaps the concrete backend/HW diff doesn't ammeter at this layer?

We can add pallas to other backends too (See halide in both cpu and gpu), i only added to cuda here because i was testing on cuda only for now. Once we have a tpu backend we can test on, we would register pallas to tpu device as well

[ghstack-poisoned]

ghstack-source-id: cde5c87 Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 0abb10c Pull-Request: #166822

[ghstack-poisoned]

ghstack-source-id: 4ffbe99 Pull-Request: #166822

oulgen · 2025-11-04T22:48:20Z

@pytorchbot merge

pytorchmergebot · 2025-11-04T22:50:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

7649c30

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 2, 2025

[RFC] Add experimental Pallas TorchInductor backend

faaafc8

ghstack-source-id: 4f6a4d3 Pull-Request: #166822

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 2, 2025

oulgen added the release notes: inductor label Nov 2, 2025

Update

d0b374d

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 2, 2025

[RFC] Add experimental Pallas TorchInductor backend

e4e50a9

ghstack-source-id: 06d67aa Pull-Request: #166822

Update

7ed1447

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 3, 2025

[RFC] Add experimental Pallas TorchInductor backend

2b9ec64

ghstack-source-id: a1d01c6 Pull-Request: #166822

Update

9eac62d

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 3, 2025

[RFC] Add experimental Pallas TorchInductor backend

84b9612

ghstack-source-id: 4fe915b Pull-Request: #166822

Update

45a98f9

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 3, 2025

[RFC] Add experimental Pallas TorchInductor backend

3d383e7

ghstack-source-id: 2967eb8 Pull-Request: #166822

Update

c58c388

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 3, 2025

[RFC] Add experimental Pallas TorchInductor backend

87956ca

ghstack-source-id: 6ed1cab Pull-Request: #166822

oulgen requested a review from jansel November 3, 2025 20:22

Update

0976b64

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 3, 2025

[RFC] Add experimental Pallas TorchInductor backend

2974fcc

ghstack-source-id: de1080d Pull-Request: #166822

oulgen marked this pull request as ready for review November 3, 2025 20:33

miladm requested a review from zou3519 November 3, 2025 21:42

Update

1a020ad

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 3, 2025

[RFC] Add experimental Pallas TorchInductor backend

42b2025

ghstack-source-id: 381db0b Pull-Request: #166822

jansel approved these changes Nov 4, 2025

View reviewed changes

yarongmu-google reviewed Nov 4, 2025

View reviewed changes

Update

73a5193

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 4, 2025

[RFC] Add experimental Pallas TorchInductor backend

11a9581

ghstack-source-id: cde5c87 Pull-Request: #166822

oulgen mentioned this pull request Nov 4, 2025

More pyrefly local errors #166976

Closed

Update

35b24a8

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 4, 2025

[RFC] Add experimental Pallas TorchInductor backend

a887a4a

ghstack-source-id: 0abb10c Pull-Request: #166822

oulgen mentioned this pull request Nov 4, 2025

If USE_CUDA=1 is set, do not fallback to no CUDA #166982

Closed

Update

fac62ad

[ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 4, 2025

[RFC] Add experimental Pallas TorchInductor backend

1f165b3

ghstack-source-id: 4ffbe99 Pull-Request: #166822

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 4, 2025

pytorchmergebot added the merging label Nov 4, 2025

pytorchmergebot added the Merged label Nov 5, 2025

pytorchmergebot closed this in f2fbc81 Nov 5, 2025

pytorchmergebot removed the merging label Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Add experimental Pallas TorchInductor backend #166822

[RFC] Add experimental Pallas TorchInductor backend #166822

oulgen commented Nov 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 2, 2025 •

edited

Loading

Uh oh!

EikanWang commented Nov 3, 2025

Uh oh!

oulgen commented Nov 3, 2025

Uh oh!

jansel Nov 4, 2025

Uh oh!

jansel Nov 4, 2025

Uh oh!

jansel Nov 4, 2025

Uh oh!

jansel Nov 4, 2025

Uh oh!

jansel Nov 4, 2025

Uh oh!

yarongmu-google Nov 4, 2025

Uh oh!

oulgen Nov 4, 2025 •

edited

Loading

Uh oh!

oulgen commented Nov 4, 2025

Uh oh!

pytorchmergebot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[RFC] Add experimental Pallas TorchInductor backend #166822

[RFC] Add experimental Pallas TorchInductor backend #166822

Conversation

oulgen commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166822

✅ No Failures

Uh oh!

EikanWang commented Nov 3, 2025

Uh oh!

oulgen commented Nov 3, 2025

Uh oh!

jansel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

yarongmu-google Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

oulgen Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oulgen commented Nov 4, 2025

Uh oh!

pytorchmergebot commented Nov 4, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

oulgen commented Nov 2, 2025 •

edited

Loading

pytorch-bot bot commented Nov 2, 2025 •

edited

Loading

oulgen Nov 4, 2025 •

edited

Loading