Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented May 20, 2025

Stack from ghstack (oldest at bottom):

By decorated emitted kernels with ''' rather than """

To match regex in torch._inductor.utils.run_and_get_kernels
This fixes test_deterministic_codegen_mps, test_deterministic_codegen_on_graph_break_mps and test_deterministic_codegen_with_suffix_mps
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
Copy link

pytorch-bot bot commented May 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153970

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 64 Pending

As of commit 6707d65 with merge base 2e56ce0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) module: inductor labels May 20, 2025
@malfet malfet requested review from dcci and jansel May 20, 2025 19:12
@malfet malfet added the topic: not user facing topic category label May 20, 2025
[ghstack-poisoned]
@malfet malfet changed the title [Testing] Fix test_deterministic_codegen_on_graph_break_mps [Testing] Fix test_deterministic_... on MPS May 20, 2025
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #153971

pytorchmergebot pushed a commit that referenced this pull request May 20, 2025
cummin is not implemented

Pull Request resolved: #153971
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: #153970
pytorchmergebot pushed a commit that referenced this pull request May 21, 2025
By using `c10::metal::floor_divie` primitive

Which fixes `test_flip_cat_mps` test, and makes `doctr_reco_predictor` and `doctr_det_predictor` pass accuracy checks (at least locally, scheduled a workflow dispatch to validate it in CI)

Before this change following script generated different compile and eager results
```python
import torch

def foo(unsqueeze, unsqueeze_1):
    cat_1 = torch.ops.aten.cat.default([unsqueeze, unsqueeze_1], 1)
    view = torch.ops.aten.view.default(cat_1, [4])
    slice_5 = torch.ops.aten.slice.Tensor(view, 0, 0, 3)
    rev_1 = torch.ops.aten.flip.default(slice_5, [0])
    return rev_1

if __name__ == "__main__":
    x = torch.arange(1.0, 3.0, device='mps').reshape(2, 1)
    y = torch.arange(5.0, 7.0, device='mps').reshape(2, 1)

    rc, (kernel,) = torch._inductor.utils.run_and_get_kernels(torch.compile(foo), x, y)
    print(kernel)
    print("Compile: ", rc)
    print("Eager: ", foo(x, y))
```
After this change
```
'''
    #include <c10/metal/utils.h>
    kernel void generated_kernel(
        device float* out_ptr0,
        constant float* in_ptr0,
        constant float* in_ptr1,
        uint xindex [[thread_position_in_grid]]
    ) {
        int x0 = xindex;
        auto tmp6 = in_ptr0[1 + (c10::metal::floor_divide((-1)*x0, 2))];
        auto tmp11 = in_ptr1[1 + (c10::metal::floor_divide((-1)*x0, 2))];
        auto tmp0 = (2 + ((-1)*x0)) % (2);
        auto tmp1 = static_cast<long>(tmp0);
        auto tmp2 = 0;
        auto tmp3 = tmp1 >= tmp2;
        auto tmp4 = 1;
        auto tmp5 = tmp1 < tmp4;
        auto tmp7 = tmp5 ? tmp6 : 0.0;
        auto tmp8 = tmp1 >= tmp4;
        auto tmp9 = 2;
        auto tmp10 = tmp1 < tmp9;
        auto tmp12 = tmp8 ? tmp11 : 0.0;
        auto tmp13 = tmp5 ? tmp7 : tmp12;
        out_ptr0[x0] = static_cast<float>(tmp13);
    }
'''
Compile:  tensor([2., 5., 1.], device='mps:0')
Eager:  tensor([2., 5., 1.], device='mps:0')
```

Pull Request resolved: #153997
Approved by: https://github.com/dcci
ghstack dependencies: #153970, #153971
pytorchmergebot pushed a commit that referenced this pull request May 22, 2025
`isin_Tensor_Scalar_out` is just a redispatch to eq/neq
`isin_Scalar_Tensor_out` redispatches back to generic `isin` op, but needs a small tweak to handle float scalars
Make sure that `out` is resized to an expected value in `isin_Tensor_Tensor_out_mps`

Add unittests to validate that, but skip them on MacOS-13, where MPS op just returns garbage

Before this change both of those failed
```python
>>> import torch
>>> t = torch.tensor([0, 1, 2], device='mps')
>>> torch.isin(t, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: The operator 'aten::isin.Tensor_Scalar_out' is not currently implemented for the MPS device. If you want this op to be considered for addition please comment on #141287 and mention use-case, that resulted in missing op as well as commit hash 3b875c2. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
>>> torch.isin(1, t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: The operator 'aten::isin.Scalar_Tensor_out' is not currently implemented for the MPS device. If you want this op to be considered for addition please comment on #141287 and mention use-case, that resulted in missing op as well as commit hash 3b875c2. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
```

Pull Request resolved: #154010
Approved by: https://github.com/Skylion007, https://github.com/dcci, https://github.com/manuelcandales
ghstack dependencies: #153970, #153971, #153997
@github-actions github-actions bot deleted the gh/malfet/347/head branch June 21, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants