[MPSInductor] Fix remainder implementation for int types #155891

malfet · 2025-06-13T06:19:37Z

Stack from ghstack (oldest at bottom):

-> [MPSInductor] Fix remainder implementation for int types #155891

Introduce c10::metal::remainder and call it from both inductor and eager implementation, with integer specialization, which should make it much faster than before, while still compliant with Python way of rounding up negative numbers.

This allows one to remove complex type detection logic from mps codegen and rely on Metal(C++) type system to figure out input and output types.

This fixes compilation of something like

@torch.compile
def f(x, y):
    return x[y % 5]

which beforehand failed to compile with

torch._inductor.exc.InductorError: SyntaxError: failed to compile
    #include <c10/metal/utils.h>
    kernel void generated_kernel(
        device float* out_ptr0,
        constant long* in_ptr0,
        constant float* in_ptr1,
        uint xindex [[thread_position_in_grid]]
    ) {
        int x0 = xindex;
        auto tmp0 = in_ptr0[x0];
        auto tmp1 = 12;
        auto tmp2 = static_cast<float>(tmp0) - static_cast<float>(tmp1) * metal::floor(static_cast<float>(tmp0) / static_cast<float>(tmp1));
        auto tmp3 = 1024;
        auto tmp4 = static_cast<long>(tmp3);
        auto tmp5 = tmp2 + tmp4;
        auto tmp6 = tmp2 < 0;
        auto tmp7 = tmp6 ? tmp5 : tmp2;
        if ((tmp7 < 0) && (tmp7 > 1024)) return;
        auto tmp9 = in_ptr1[tmp7];
        out_ptr0[x0] = static_cast<float>(tmp9);
    }
 with program_source:372:28: error: array subscript is not an integer
        auto tmp9 = in_ptr1[tmp7];
                           ^~~~~

This fixes fail_to_compile for GPT2ForSequenceClassification Huggingface model using transformers==4.44.2

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-06-13T06:19:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155891

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 13 Pending

As of commit 031425d with merge base 6020440 ():

NEW FAILURE - The following job has failed:

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh)
Process completed with exit code 1.

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Introduce `c10::metal::remainder` and call it from both inductor and eager implementation This fixes compilation of something like ```python torch.compile def f(x, y): return x[y % 5] ``` which beforehand failed to compile with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> kernel void generated_kernel( device float* out_ptr0, constant long* in_ptr0, constant float* in_ptr1, uint xindex [[thread_position_in_grid]] ) { int x0 = xindex; auto tmp0 = in_ptr0[x0]; auto tmp1 = 12; auto tmp2 = static_cast<float>(tmp0) - static_cast<float>(tmp1) * metal::floor(static_cast<float>(tmp0) / static_cast<float>(tmp1)); auto tmp3 = 1024; auto tmp4 = static_cast<long>(tmp3); auto tmp5 = tmp2 + tmp4; auto tmp6 = tmp2 < 0; auto tmp7 = tmp6 ? tmp5 : tmp2; if ((tmp7 < 0) && (tmp7 > 1024)) return; auto tmp9 = in_ptr1[tmp7]; out_ptr0[x0] = static_cast<float>(tmp9); } with program_source:372:28: error: array subscript is not an integer auto tmp9 = in_ptr1[tmp7]; ^~~~~ ``` ghstack-source-id: f27f2cb Pull Request resolved: #155891

[ghstack-poisoned]

Introduce `c10::metal::remainder` and call it from both inductor and eager implementation This fixes compilation of something like ```python torch.compile def f(x, y): return x[y % 5] ``` which beforehand failed to compile with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> kernel void generated_kernel( device float* out_ptr0, constant long* in_ptr0, constant float* in_ptr1, uint xindex [[thread_position_in_grid]] ) { int x0 = xindex; auto tmp0 = in_ptr0[x0]; auto tmp1 = 12; auto tmp2 = static_cast<float>(tmp0) - static_cast<float>(tmp1) * metal::floor(static_cast<float>(tmp0) / static_cast<float>(tmp1)); auto tmp3 = 1024; auto tmp4 = static_cast<long>(tmp3); auto tmp5 = tmp2 + tmp4; auto tmp6 = tmp2 < 0; auto tmp7 = tmp6 ? tmp5 : tmp2; if ((tmp7 < 0) && (tmp7 > 1024)) return; auto tmp9 = in_ptr1[tmp7]; out_ptr0[x0] = static_cast<float>(tmp9); } with program_source:372:28: error: array subscript is not an integer auto tmp9 = in_ptr1[tmp7]; ^~~~~ ``` ghstack-source-id: cd4c8a4 Pull Request resolved: #155891

malfet · 2025-06-13T16:40:54Z

@pytorchbot merge -f "Lint + MPS are green"

pytorchmergebot · 2025-06-13T16:42:35Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

c19cea0

[ghstack-poisoned]

malfet requested a review from kulinseth as a code owner June 13, 2025 06:19

pytorch-bot bot added ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) module: inductor release notes: mps Release notes category labels Jun 13, 2025

malfet added the topic: bug fixes topic category label Jun 13, 2025

malfet requested review from jansel, dcci and manuelcandales June 13, 2025 06:20

Update

9a62cbc

[ghstack-poisoned]

Update

031425d

[ghstack-poisoned]

malfet requested a review from Skylion007 June 13, 2025 15:48

manuelcandales approved these changes Jun 13, 2025

View reviewed changes

pytorchmergebot added the merging label Jun 13, 2025

pytorchmergebot added the Merged label Jun 13, 2025

pytorchmergebot closed this in b6add8c Jun 13, 2025

pytorchmergebot removed the merging label Jun 13, 2025

github-actions bot deleted the gh/malfet/400/head branch July 14, 2025 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPSInductor] Fix remainder implementation for int types #155891

[MPSInductor] Fix remainder implementation for int types #155891

Uh oh!

malfet commented Jun 13, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 13, 2025 •

edited

Loading

Uh oh!

malfet commented Jun 13, 2025

Uh oh!

pytorchmergebot commented Jun 13, 2025

Uh oh!

Uh oh!

[MPSInductor] Fix remainder implementation for int types #155891

[MPSInductor] Fix remainder implementation for int types #155891

Uh oh!

Conversation

malfet commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155891

❌ 1 New Failure, 13 Pending

Uh oh!

malfet commented Jun 13, 2025

Uh oh!

pytorchmergebot commented Jun 13, 2025

Merge started

Uh oh!

Uh oh!

malfet commented Jun 13, 2025 •

edited

Loading

pytorch-bot bot commented Jun 13, 2025 •

edited

Loading