[MPS] Add PSO caching for advanced indexing kernels #99855

DenisVieriu97 · 2023-04-24T02:47:50Z

Use bindless Argument Buffers (unbounded arrays) for advanced indexing kernels - this allows caching of the PSOs since we don't have to query anymore the main metal function for the AB size (this is filled directly now on the CPU).

* Add pso caching for advanced indexing kernels * Fix build on macos 12 * Fix macOS Monterey build * Fix build failure * Use MPSAllocator buffer pool to allocate memory for indexing kernels * Revert changes

pytorch-bot · 2023-04-24T02:47:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99855

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm jobs are failing with No space left on device

✅ No Failures

As of commit eeac863:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kulinseth · 2023-04-24T15:39:47Z

@pytorchbot merge -f "All tests are green"

pytorchmergebot · 2023-04-24T15:41:43Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

qqaatw · 2023-05-10T06:59:30Z

aten/src/ATen/mps/MPSDevice.h

@@ -66,7 +66,8 @@ class TORCH_API MPSDevice {
   */
  bool isMacOS13Plus(MacOSVersion version) const;

-  MTLFunction_t metalIndexingFunction(const std::string &kernel, MTLFunctionConstantValues_t constantValues);
+  MTLComputePipelineState_t metalIndexingFunction(const std::string &kernel);


Hi @DenisVieriu97, do you think it would be better to rename it as metalIndexingPSO? If so, I can propose a PR for it.

@qqaatw this sounds good! Please go ahead with the change

@DenisVieriu97 here is the PR :) #101156

As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it. See below backtrace for example: ``` thread #1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4 frame #1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10 frame #2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12 * frame #3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57 frame #4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6 frame #5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame #8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5 frame #9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10 frame #10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10 frame #11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3 ``` This was introduced by #99855 that replaced `newFunctionWithName:constantValues:error:` with `newFunctionWithName:` Pull Request resolved: #116938 Approved by: https://github.com/Skylion007

Add PSO caching for advanced indexing kernels (#383)

ce5e38e

* Add pso caching for advanced indexing kernels * Fix build on macos 12 * Fix macOS Monterey build * Fix build failure * Use MPSAllocator buffer pool to allocate memory for indexing kernels * Revert changes

DenisVieriu97 added the ciflow/mps Run MPS tests (subset of trunk) label Apr 24, 2023

DenisVieriu97 requested a review from razarmehr April 24, 2023 02:47

DenisVieriu97 requested a review from kulinseth as a code owner April 24, 2023 02:47

pytorch-bot bot added the release notes: mps Release notes category label Apr 24, 2023

pytorchbot added the open source label Apr 24, 2023

Fix lintrunner

eeac863

kulinseth approved these changes Apr 24, 2023

View reviewed changes

pytorchmergebot added the merging label Apr 24, 2023

pytorchmergebot added the Merged label Apr 24, 2023

pytorchmergebot closed this in dcd686f Apr 24, 2023

qqaatw reviewed May 10, 2023

View reviewed changes

malfet mentioned this pull request Jan 7, 2024

[MPS][BE] Do not crash if Metal function can not be found #116938

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MPS] Add PSO caching for advanced indexing kernels #99855

[MPS] Add PSO caching for advanced indexing kernels #99855

DenisVieriu97 commented Apr 24, 2023

pytorch-bot bot commented Apr 24, 2023 •

edited

kulinseth commented Apr 24, 2023

pytorchmergebot commented Apr 24, 2023

qqaatw May 10, 2023

DenisVieriu97 May 10, 2023

qqaatw May 11, 2023

[MPS] Add PSO caching for advanced indexing kernels #99855

[MPS] Add PSO caching for advanced indexing kernels #99855

Conversation

DenisVieriu97 commented Apr 24, 2023

pytorch-bot bot commented Apr 24, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99855

❗ 1 Active SEVs

✅ No Failures

kulinseth commented Apr 24, 2023

pytorchmergebot commented Apr 24, 2023

Merge started

qqaatw May 10, 2023

Choose a reason for hiding this comment

DenisVieriu97 May 10, 2023

Choose a reason for hiding this comment

qqaatw May 11, 2023

Choose a reason for hiding this comment

pytorch-bot bot commented Apr 24, 2023 •

edited