[aotinductor] add versions for the sdpa shim api #113487

chenyang78 · 2023-11-10T20:33:16Z

Stack from ghstack (oldest at bottom):

-> [aotinductor] add versions for the sdpa shim api #113487

In our first implemenation of the sdpa shim api, we didn't consider
the case where the optional scale argument could be None. It was
unnoticed because we always got a default argument for the cuda backend.
The issue was detected with the cpu backend.

This PR implements versioning for shim kernels. Currently, we only
have different versions for the sdpa api. We expect we would only
maintain a very small number of abi-compatible shim APIs that
had different versions.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @kadeng @muchulee8 @aakhundov @ColinPeppler

In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. [ghstack-poisoned]

pytorch-bot · 2023-11-10T20:33:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113487

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a3b5d67 with merge base b910d9e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

int3 · 2023-11-10T21:06:46Z

torch/csrc/inductor/aoti_torch/shim_common.cpp

@@ -252,14 +252,14 @@ AOTITorchError aoti_torch_create_tensor_from_blob(
  });
 }

-AOTITorchError aoti_torch__scaled_dot_product_flash_attention(
+static AOTITorchError _aoti_torch__scaled_dot_product_flash_attention_internal(


I don't think we need a new _internal function -- we could just have aoti_torch__scaled_dot_product_flash_attention call aoti_torch__scaled_dot_product_flash_attention_v2 while passing the scale param as &scale. The v2 function would then call at::_scaled_dot_product_flash_attention directly.

int3 · 2023-11-10T23:27:51Z

torch/_inductor/ir.py

+            # For sdpa, we need the v2 version only if any optional
+            # kwarg is missing.


I thought the idea was to make all new models call the v2 version, so that we could delete the old one entirely? Doesn't using the v2 version only if an optional arg is missing make migration harder?

I thought the idea was to make all new models call the v2 version, so that we could delete the old one entirely? Doesn't using the v2 version only if an optional arg is missing make migration harder?

It's not about new models v.s. old models. Given the same model, the optional scale could be either None or have a default value, for different backends (e.g. cpu and cuda in our case). Regarding migrating to the V2 version, my understanding is that once the new V2 version becomes available on the serving side, we could always generate the V2 version. At that point, we may safely remove the old one.

Ohh right. We want to keep generating calls to the old (v1) version where possible for now so the CUDA models keep working with old libtorch deployments. Got it

int3 · 2023-11-11T13:56:38Z

torch/_inductor/ir.py

@@ -4278,6 +4302,7 @@ def __init__(
                self.kernel = (
                    f"{kernel.__module__.replace('._ops.', '.ops.')}.{kernel.__name__}"
                )
+        self.abi_compatible_kernel = None


not sure why the typechecker didn't complain, but shouldn't this line be in the constructor of ExternKernelAlloc, since ExternKernelAlloc.codegen references it?

Good catch! Fixed. Thanks.

In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. ghstack-source-id: 8283d5506c21091ddebd3dc18b56b99691a9c933 Pull Request resolved: #113487

desertfire

Left a comment about the function signature. The rest LGTM.

desertfire · 2023-11-13T01:01:42Z

torch/csrc/inductor/aoti_torch/c/shim.h

+    AtenTensorHandle key,
+    AtenTensorHandle value,
+    double dropout_p,
+    bool is_causal,


I haven't paid close enough attention to the v1 function signature, but let's use int instead of bool in the v2 function signature to avoid any potential problems from mixing c++ compilers.

Good point. Fixed. Thanks.

In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. ghstack-source-id: 46e533fccd3ca5e4e1da5421c5b561294fbdecc1 Pull Request resolved: #113487

chenyang78 · 2023-11-13T17:35:16Z

@pytorchbot merge

pytorchmergebot · 2023-11-13T17:37:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. Pull Request resolved: pytorch#113487 Approved by: https://github.com/int3, https://github.com/desertfire

This is a backout of #113747 which reverted the above two commits. [ghstack-poisoned]

This is a backout of #113747 which reverted the above two commits. ghstack-source-id: f5a794fdf03bbbccc8ad416d1f3960aa02052c6a Pull Request resolved: #114974

This is a backout of #113747 which reverted the above two commits. [ghstack-poisoned]

This is a backout of #113747 which reverted the above two commits. ghstack-source-id: fecfab474deec140005a36f81fdd72b4528f9f26 Pull Request resolved: #114990

…pport)" This is a backout of #113747 which reverted the above two commits. Now that #113997 has landed, this diff can be landed safely without breaking ABI compatibility. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

This is a backout of #113747 which reverted the above two commits. Pull Request resolved: #114974 ghstack-source-id: fecfab474deec140005a36f81fdd72b4528f9f26

…4974) This is a backout of #113747 which reverted the above two commits. Now that #113997 has landed, this diff can be landed safely without breaking ABI compatibility. Pull Request resolved: #114974 Approved by: https://github.com/chenyang78

… support) (pytorch#114974) This is a backout of pytorch#113747 which reverted the above two commits. Now that pytorch#113997 has landed, this diff can be landed safely without breaking ABI compatibility. Pull Request resolved: pytorch#114974 Approved by: https://github.com/chenyang78

github-actions bot added module: inductor ciflow/inductor labels Nov 10, 2023

chenyang78 requested review from desertfire, jansel and int3 November 10, 2023 20:39

chenyang78 mentioned this pull request Nov 10, 2023

Support fp8 in AOTInductor + support optional<> in C ABI #112527

Closed

int3 reviewed Nov 10, 2023

View reviewed changes

int3 reviewed Nov 11, 2023

View reviewed changes

int3 approved these changes Nov 11, 2023

View reviewed changes

desertfire requested changes Nov 13, 2023

View reviewed changes

chenyang78 requested a review from desertfire November 13, 2023 09:28

desertfire approved these changes Nov 13, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 13, 2023

chenyang78 added the topic: not user facing topic category label Nov 13, 2023

pytorchmergebot added the merging label Nov 13, 2023

pytorchmergebot added Merged and removed merging labels Nov 13, 2023

pytorchmergebot closed this in a144eb5 Nov 13, 2023

facebook-github-bot deleted the gh/chenyang78/3/head branch November 17, 2023 15:30

int3 added a commit that referenced this pull request Dec 1, 2023

Reland #113487 and #112527 (sdpa shim & fp8 AOTInductor support)

0f9b7d5

This is a backout of #113747 which reverted the above two commits. [ghstack-poisoned]

int3 added a commit that referenced this pull request Dec 1, 2023

Reland #113487 and #112527 (sdpa shim & fp8 AOTInductor support)

7ebb799

This is a backout of #113747 which reverted the above two commits. [ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aotinductor] add versions for the sdpa shim api #113487

[aotinductor] add versions for the sdpa shim api #113487

chenyang78 commented Nov 10, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 10, 2023 •

edited

Loading

int3 Nov 10, 2023

int3 Nov 10, 2023

chenyang78 Nov 11, 2023

int3 Nov 11, 2023

int3 Nov 11, 2023

chenyang78 Nov 12, 2023

desertfire left a comment

desertfire Nov 13, 2023

chenyang78 Nov 13, 2023

chenyang78 commented Nov 13, 2023

pytorchmergebot commented Nov 13, 2023

		# For sdpa, we need the v2 version only if any optional
		# kwarg is missing.

[aotinductor] add versions for the sdpa shim api #113487

[aotinductor] add versions for the sdpa shim api #113487

Conversation

chenyang78 commented Nov 10, 2023 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Nov 10, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113487

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

desertfire left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenyang78 commented Nov 13, 2023

pytorchmergebot commented Nov 13, 2023

Merge started

chenyang78 commented Nov 10, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 10, 2023 •

edited

Loading