-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[nativert] Downcast triton double arguments to floats #166620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166620
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 578ce43 with merge base 3206677 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
3c5e332 to
91a84b5
Compare
Summary: This diff tries to fix a limitation in Sigmoid + Triton interaction, where float arguments are not correctly passed. NativeRT passes float arguments as double, while triton kernels were reading as a float, resulting in wrong values. --- ## Limitations in (de)seriazliation In triton, float arguments to a kernel are encoded as "fp32" ([code](https://github.com/triton-lang/triton-cpu/blob/main-merged/python/triton/runtime/jit.py#L310-L326)): ``` elif isinstance(arg, float): return ("fp32", None) ``` But it seems like that torch export serde uses double ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/_export/serde/export_schema.thrift#L149)) because Thrift only has the double type: ``` union Argument { 10: bool as_none; 20: TensorArgument as_tensor; 30: list<TensorArgument> as_tensors; 50: i64 as_int; 70: list<i64> as_ints; 80: double as_float; ===> actually double ... ``` `TritonKernel` constructor loads attributes from a node, where `Constant` represents the variant type. And it only has `double` ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/nativert/graph/Graph.h#L86)): ``` using Constant = std::variant< None, int64_t, std::vector<int64_t>, double, ===> triton float is loaded as double ``` So, NativeRT passes float arguments (originally in Triton) as double to triton kernels. But, all of the triton backends (nvidia, amd and cpu) are reading them as float because the signature still says `fp32`. D84423898 was the current workaround: wrapping float arguments with tensors. ## The Fix Fixing the thrift definition isn't viable because Thrift only supports double type. It's also possible to fix on the triton side: it can downcast from double to float. But I needed to fix all backends. Instead, I think this diff would be the most effective way: when building `TritonKernel`, have downcasted float values, right after loading double arguments. Test Plan: ``` buck test fbcode//mode/opt-amd-gpu fbcode//sigmoid/inference/test:test_passes buck test fbcode//mode/opt-amd-gpu fbcode//caffe2/test:test_export -- ``` Reviewed By: XueningXu Differential Revision: D85747160
91a84b5 to
216f9b0
Compare
Summary: This diff tries to fix a limitation in Sigmoid + Triton interaction, where float arguments are not correctly passed. NativeRT passes float arguments as double, while triton kernels were reading as a float, resulting in wrong values. --- ## Limitations in (de)seriazliation In triton, float arguments to a kernel are encoded as "fp32" ([code](https://github.com/triton-lang/triton-cpu/blob/main-merged/python/triton/runtime/jit.py#L310-L326)): ``` elif isinstance(arg, float): return ("fp32", None) ``` But it seems like that torch export serde uses double ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/_export/serde/export_schema.thrift#L149)) because Thrift only has the double type: ``` union Argument { 10: bool as_none; 20: TensorArgument as_tensor; 30: list<TensorArgument> as_tensors; 50: i64 as_int; 70: list<i64> as_ints; 80: double as_float; ===> actually double ... ``` `TritonKernel` constructor loads attributes from a node, where `Constant` represents the variant type. And it only has `double` ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/nativert/graph/Graph.h#L86)): ``` using Constant = std::variant< None, int64_t, std::vector<int64_t>, double, ===> triton float is loaded as double ``` So, NativeRT passes float arguments (originally in Triton) as double to triton kernels. But, all of the triton backends (nvidia, amd and cpu) are reading them as float because the signature still says `fp32`. D84423898 was the current workaround: wrapping float arguments with tensors. ## The Fix Fixing the thrift definition isn't viable because Thrift only supports double type. It's also possible to fix on the triton side: it can downcast from double to float. But I needed to fix all backends. Instead, I think this diff would be the most effective way: when building `TritonKernel`, have downcasted float values, right after loading double arguments. Test Plan: ``` buck test fbcode//mode/opt-amd-gpu fbcode//sigmoid/inference/test:test_passes buck test fbcode//mode/opt-amd-gpu fbcode//caffe2/test:test_export -- ``` Reviewed By: XueningXu Differential Revision: D85747160
216f9b0 to
8befb05
Compare
Summary: This diff tries to fix a limitation in Sigmoid + Triton interaction, where float arguments are not correctly passed. NativeRT passes float arguments as double, while triton kernels were reading as a float, resulting in wrong values. --- ## Limitations in (de)seriazliation In triton, float arguments to a kernel are encoded as "fp32" ([code](https://github.com/triton-lang/triton-cpu/blob/main-merged/python/triton/runtime/jit.py#L310-L326)): ``` elif isinstance(arg, float): return ("fp32", None) ``` But it seems like that torch export serde uses double ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/_export/serde/export_schema.thrift#L149)) because Thrift only has the double type: ``` union Argument { 10: bool as_none; 20: TensorArgument as_tensor; 30: list<TensorArgument> as_tensors; 50: i64 as_int; 70: list<i64> as_ints; 80: double as_float; ===> actually double ... ``` `TritonKernel` constructor loads attributes from a node, where `Constant` represents the variant type. And it only has `double` ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/nativert/graph/Graph.h#L86)): ``` using Constant = std::variant< None, int64_t, std::vector<int64_t>, double, ===> triton float is loaded as double ``` So, NativeRT passes float arguments (originally in Triton) as double to triton kernels. But, all of the triton backends (nvidia, amd and cpu) are reading them as float because the signature still says `fp32`. D84423898 was the current workaround: wrapping float arguments with tensors. ## The Fix Fixing the thrift definition isn't viable because Thrift only supports double type. It's also possible to fix on the triton side: it can downcast from double to float. But I needed to fix all backends. Instead, I think this diff would be the most effective way: when building `TritonKernel`, have downcasted float values, right after loading double arguments. Test Plan: ``` buck test fbcode//mode/opt-amd-gpu fbcode//sigmoid/inference/test:test_passes buck test fbcode//mode/opt-amd-gpu fbcode//caffe2/test:test_export -- ``` Reviewed By: XueningXu Differential Revision: D85747160
Summary: This diff tries to fix a limitation in Sigmoid + Triton interaction, where float arguments are not correctly passed. NativeRT passes float arguments as double, while triton kernels were reading as a float, resulting in wrong values. --- ## Limitations in (de)seriazliation In triton, float arguments to a kernel are encoded as "fp32" ([code](https://github.com/triton-lang/triton-cpu/blob/main-merged/python/triton/runtime/jit.py#L310-L326)): ``` elif isinstance(arg, float): return ("fp32", None) ``` But it seems like that torch export serde uses double ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/_export/serde/export_schema.thrift#L149)) because Thrift only has the double type: ``` union Argument { 10: bool as_none; 20: TensorArgument as_tensor; 30: list<TensorArgument> as_tensors; 50: i64 as_int; 70: list<i64> as_ints; 80: double as_float; ===> actually double ... ``` `TritonKernel` constructor loads attributes from a node, where `Constant` represents the variant type. And it only has `double` ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/nativert/graph/Graph.h#L86)): ``` using Constant = std::variant< None, int64_t, std::vector<int64_t>, double, ===> triton float is loaded as double ``` So, NativeRT passes float arguments (originally in Triton) as double to triton kernels. But, all of the triton backends (nvidia, amd and cpu) are reading them as float because the signature still says `fp32`. D84423898 was the current workaround: wrapping float arguments with tensors. ## The Fix Fixing the thrift definition isn't viable because Thrift only supports double type. It's also possible to fix on the triton side: it can downcast from double to float. But I needed to fix all backends. Instead, I think this diff would be the most effective way: when building `TritonKernel`, have downcasted float values, right after loading double arguments. Test Plan: ``` buck test fbcode//mode/opt-amd-gpu fbcode//sigmoid/inference/test:test_passes buck test fbcode//mode/opt-amd-gpu fbcode//caffe2/test:test_export -- ``` Reviewed By: XueningXu Differential Revision: D85747160
8befb05 to
578ce43
Compare
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This diff tries to fix a limitation in Sigmoid + Triton interaction, where float arguments are not correctly passed. NativeRT passes float arguments as double, while triton kernels were reading as a float, resulting in wrong values. --- ## Limitations in (de)seriazliation In triton, float arguments to a kernel are encoded as "fp32" ([code](https://github.com/triton-lang/triton-cpu/blob/main-merged/python/triton/runtime/jit.py#L310-L326)): ``` elif isinstance(arg, float): return ("fp32", None) ``` But it seems like that torch export serde uses double ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/_export/serde/export_schema.thrift#L149)) because Thrift only has the double type: ``` union Argument { 10: bool as_none; 20: TensorArgument as_tensor; 30: list<TensorArgument> as_tensors; 50: i64 as_int; 70: list<i64> as_ints; 80: double as_float; ===> actually double ... ``` `TritonKernel` constructor loads attributes from a node, where `Constant` represents the variant type. And it only has `double` ([code](https://github.com/pytorch/pytorch/blob/d2eff5d454ab2cb0a5ccdfb5eb6e7d6dcc75e097/torch/nativert/graph/Graph.h#L86)): ``` using Constant = std::variant< None, int64_t, std::vector<int64_t>, double, ===> triton float is loaded as double ``` So, NativeRT passes float arguments (originally in Triton) as double to triton kernels. But, all of the triton backends (nvidia, amd and cpu) are reading them as float because the signature still says `fp32`. D84423898 was the current workaround: wrapping float arguments with tensors. ## The Fix Fixing the thrift definition isn't viable because Thrift only supports double type. It's also possible to fix on the triton side: it can downcast from double to float. But I needed to fix all backends. Instead, I think this diff would be the most effective way: when building `TritonKernel`, have downcasted float values, right after loading double arguments. Test Plan: ``` buck test fbcode//mode/opt-amd-gpu fbcode//caffe2/test:test_export -- ``` Differential Revision: D85747160 Pull Request resolved: #166620 Approved by: https://github.com/XueningXu
This diff tries to fix a limitation in Sigmoid + Triton interaction, where float arguments are not correctly passed. NativeRT passes float arguments as double, while triton kernels were reading as a float, resulting in wrong values.
Limitations in (de)seriazliation
In triton, float arguments to a kernel are encoded as "fp32" (code):
But it seems like that torch export serde uses double (code) because Thrift only has the double type:
TritonKernelconstructor loads attributes from a node, whereConstantrepresents the variant type. And it only hasdouble(code):So, NativeRT passes float arguments (originally in Triton) as double to triton kernels. But, all of the triton backends (nvidia, amd and cpu) are reading them as float because the signature still says
fp32.D84423898 was the current workaround: wrapping float arguments with tensors.
The Fix
Fixing the thrift definition isn't viable because Thrift only supports double type. It's also possible to fix on the triton side: it can downcast from double to float. But I needed to fix all backends.
Instead, I think this diff would be the most effective way: when building
TritonKernel, have downcasted float values, right after loading double arguments.Test Plan:
Differential Revision: D85747160