[MPS] Add support for Custom Kernels #100661

razarmehr · 2023-05-04T21:32:46Z

This change introduces these APIs to enable developing custom kernels on the MPS Stream:
torch::mps::get_command_buffer()
torch::mps::get_dispatch_queue()
torch::mps::commit()
Add ObjC test case

pytorch-bot · 2023-05-04T21:32:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100661

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f12a876:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

razarmehr · 2023-05-05T00:43:03Z

The CI trunk failures seem unrelated to the PR:
bash: ./.ci/pytorch/python_doc_push_script.sh: No such file or directory

albanD

btw if the answer to a lot of my questions is "there is a single stream so this is not relevant", it should be made very explicit.

torch/csrc/mps/Module.cpp

caffe2/CMakeLists.txt

torch/csrc/api/include/torch/mps.h

kulinseth

Thanks @albanD for the review .
And @razarmehr for the PR . The change looks good

kulinseth

Thanks @albanD for the review .
And @razarmehr for the PR . The change looks good

malfet

Looks good, but please undef the define in header. (And again, would be nice to submit as separate PR as it has nothing to do with adding support for customer Kernels, is it?
And not sure why rename _mps_synchronize to ``_mps_DeviceSynchronize`

aten/src/ATen/detail/MPSHooksInterface.h

torch/_C/__init__.pyi.in

albanD

The change in itself sounds ok but I think we need to be a lot stricter on the documentation if we expect non-MPS maintainers to be able to use this / be able to review PRs touching MPS code.

aten/src/ATen/detail/MPSHooksInterface.h

torch/csrc/mps/Module.cpp

torch/csrc/api/include/torch/mps.h

razarmehr · 2023-05-08T18:11:14Z

@pytorchbot merge

pytorchmergebot · 2023-05-08T18:19:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-08T19:10:09Z

Merge failed

Reason: 1 jobs have failed, first few of them are: .github/workflows/lint.yml / lintrunner / linux-job

Details for Dev Infra team

Raised by workflow job

razarmehr · 2023-05-08T19:55:15Z

@pytorchbot merge -f "all checks except the pre-existing lint-linux failure are green"

pytorchmergebot · 2023-05-08T20:05:41Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Co-authored-by: albanD <desmaison.alban@gmail.com>

kulinseth · 2023-05-12T16:00:25Z

this a great addition. I have been reading the differences between implementing raw metal kernel vs using the MPSGraph api. In the current MPSGraph api has a concept of re-using Tensor cached graph to reduce allocation overhead. See here:

pytorch/aten/src/ATen/native/mps/operations/UnaryOps.mm

Line 73 in 036a8d6

auto cachedGraph = LookUpOrCreateCachedGraph<MPSUnaryCachedGraph>(key, [&](auto mpsGraph, auto newCachedGraph) {

Can we do something like this if we write raw metal kernel?

Thanks @TaiPhamD. For raw Metal kernel you have the flexibility to cache the PSO as you see fit. We end up caching the Graph as there is CPU overhead in compiling it and we can do this as shapes are known. For Metal we already have caching built in-place for you in OS and you get that already.

facebook-github-bot · 2023-05-12T17:57:24Z

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torch/csrc/api/include/torch/mps.h

facebook-github-bot · 2023-05-12T19:10:15Z

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torch/csrc/api/include/torch/mps.h

razarmehr · 2023-05-15T15:55:57Z

@pytorchbot merge

pytorchmergebot · 2023-05-15T15:59:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-15T15:59:40Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team

Raised by workflow job

malfet · 2023-05-15T17:00:31Z

@pytorchbot merge -f "Internal builds are fine"

pytorchmergebot · 2023-05-15T17:02:28Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

- This change introduces these APIs to enable developing custom kernels on the MPS Stream: `torch::mps::get_command_buffer()` `torch::mps::get_dispatch_queue()` `torch::mps::commit()` - Add ObjC test case Pull Request resolved: #100661 Approved by: https://github.com/kulinseth, https://github.com/malfet

I've added the implementation of erfinv using the algorithm from https://github.com/pytorch/pytorch/blob/4154c8ea159fdaecc71ee9af820ac956193c875b/aten/src/ATen/native/Math.h#L152 in order for the MPS based algorithm to match the CPU automatic test. This PR is using the new metal api calls from #100661 Testing shows MPS has a decent speed up (270x) compared to CPU on tensor size of 100 mil elements. ``` import torch x = torch.arange(-1, 1, 1e-8) # default cpu tensor #measure CPU compute time by calling torch.erfinv time = %timeit -o -q -r 5 torch.erfinv(x) cpu_time = time.average print("CPU torch.erfinv time: ", cpu_time) x = x.to("mps") # measure MPS compute time time = %timeit -o -q -r 5 torch.erfinv(x) mps_time = time.average print("MPS torch.erfinv time: ", mps_time) print(f"MPS torch.erfinv is {cpu_time/mps_time*100} percent faster than CPU torch.erfinv") # compute MSE between MPS and CPU torch.erfinv x = x.to("cpu") y_cpu = torch.erfinv(x) x = x.to("mps") y_mps = torch.erfinv(x) y_mps = y_mps.to("cpu") mask = torch.isfinite(y_cpu) & torch.isfinite(y_mps.to("cpu")) y_mps = y_mps[mask] y_cpu = y_cpu[mask] x = x[mask] print(f"length of y_mps: {len(y_mps)}, length of y_cpu: {len(y_cpu)}, length of x: {len(x)}") mse = torch.square(y_cpu - y_mps).mean() print("MSE between MPS and CPU torch.erfinv: ", mse) diff = torch.abs(y_cpu - y_mps) print("Largest difference") print(f"x: {x[torch.argmax(diff)]}, y_cpu: {y_cpu[torch.argmax(diff)]}, y_mps: {y_mps[torch.argmax(diff)]} , diff = {y_cpu[torch.argmax(diff)] - y_mps[torch.argmax(diff)]}") ``` CPU torch.erfinv time: 2.654937833400254 MPS torch.erfinv time: 0.009831255332002912 MPS torch.erfinv is 27005.07456822776 percent faster than CPU torch.erfinv length of y_mps: 199999992, length of y_cpu: 199999992, length of x: 199999992 MSE between MPS and CPU torch.erfinv: tensor(4.2339e-14) Largest difference x: -0.9999980330467224, y_cpu: -3.363569736480713, y_mps: -3.3635685443878174 , diff = -1.1920928955078125e-06 Fixes ##86808 Pull Request resolved: #101507 Approved by: https://github.com/kulinseth

razarmehr requested review from DenisVieriu97, kulinseth and malfet May 4, 2023 21:32

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels May 4, 2023

razarmehr added the ciflow/trunk Trigger trunk jobs on your pull request label May 4, 2023

kulinseth requested a review from albanD May 4, 2023 21:42

pytorchbot added the open source label May 4, 2023

albanD reviewed May 5, 2023

View reviewed changes

torch/csrc/mps/Module.cpp Outdated Show resolved Hide resolved

caffe2/CMakeLists.txt Outdated Show resolved Hide resolved

torch/csrc/api/include/torch/mps.h Outdated Show resolved Hide resolved

razarmehr force-pushed the MPS_CustomKernel branch from 8f0f34f to 5dcef82 Compare May 5, 2023 20:54

razarmehr requested a review from albanD May 5, 2023 20:58

kulinseth approved these changes May 6, 2023

View reviewed changes

malfet approved these changes May 6, 2023

View reviewed changes

aten/src/ATen/detail/MPSHooksInterface.h Show resolved Hide resolved

torch/_C/__init__.pyi.in Show resolved Hide resolved

albanD reviewed May 8, 2023

View reviewed changes

aten/src/ATen/detail/MPSHooksInterface.h Outdated Show resolved Hide resolved

torch/csrc/mps/Module.cpp Show resolved Hide resolved

torch/csrc/api/include/torch/mps.h Outdated Show resolved Hide resolved

pytorchmergebot added the merging label May 8, 2023

pytorchmergebot removed the merging label May 8, 2023

pytorch deleted a comment from pytorch-bot bot May 8, 2023

pytorchmergebot added the merging label May 8, 2023

pytorchmergebot added Merged and removed merging labels May 8, 2023

pytorchmergebot closed this in f39cda8 May 8, 2023

razarmehr and others added 2 commits May 11, 2023 15:48

Update aten/src/ATen/detail/MPSHooksInterface.h

fda9244

Co-authored-by: albanD <desmaison.alban@gmail.com>

Resolve the naming conflict for dispatch_queue_t

5bfdace

pytorch deleted a comment from pytorchmergebot May 11, 2023

Fix lint errors and remove static casts

becdec3

razarmehr force-pushed the MPS_CustomKernel branch from f9b0d78 to becdec3 Compare May 11, 2023 19:54

razarmehr requested a review from malfet May 11, 2023 20:10

malfet reviewed May 12, 2023

View reviewed changes

torch/csrc/api/include/torch/mps.h Outdated Show resolved Hide resolved

Update torch/csrc/api/include/torch/mps.h

a13753c

Merge branch 'main' into MPS_CustomKernel

69280ee

malfet reviewed May 12, 2023

View reviewed changes

torch/csrc/api/include/torch/mps.h Outdated Show resolved Hide resolved

Update torch/csrc/api/include/torch/mps.h

f12a876

pytorchmergebot added the merging label May 15, 2023

pytorchmergebot removed the merging label May 15, 2023

pytorchmergebot added the merging label May 15, 2023

pytorchmergebot removed the merging label May 15, 2023

pytorchmergebot closed this in 721b144 May 15, 2023

TaiPhamD mentioned this pull request May 17, 2023

[MPS] aten::erfinv metal kernel ops #101507

Closed

kulinseth mentioned this pull request May 30, 2023

[MPS] Prerequisite for MPS C++ extension #102483

Closed

[MPS] Add support for Custom Kernels #100661

[MPS] Add support for Custom Kernels #100661

Uh oh!

Conversation

razarmehr commented May 4, 2023

Uh oh!

pytorch-bot bot commented May 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100661

✅ No Failures

Uh oh!

razarmehr commented May 5, 2023

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kulinseth left a comment

Choose a reason for hiding this comment

Uh oh!

kulinseth left a comment

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

razarmehr commented May 8, 2023

Uh oh!

pytorchmergebot commented May 8, 2023

Merge started

Uh oh!

pytorchmergebot commented May 8, 2023

Merge failed

Uh oh!

razarmehr commented May 8, 2023

Uh oh!

pytorchmergebot commented May 8, 2023

Merge started

Uh oh!

kulinseth commented May 12, 2023

Uh oh!

facebook-github-bot commented May 12, 2023

Uh oh!

Uh oh!

facebook-github-bot commented May 12, 2023

Uh oh!

Uh oh!

razarmehr commented May 15, 2023

Uh oh!

pytorchmergebot commented May 15, 2023

Merge started

Uh oh!

pytorchmergebot commented May 15, 2023

Merge failed

Uh oh!

malfet commented May 15, 2023

Uh oh!

pytorchmergebot commented May 15, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

pytorch-bot bot commented May 4, 2023 •

edited

Loading