[PyTorch] Improve conversion from/to bool on aarch64+sve #166330

Nicoshev · 2025-10-27T19:31:49Z

Summary:
We are adding autovec routines to convert to/from boolean values

We observed the following performance improvements when compiling targeting armv9-a+sve2+fp16+bf16

before:

bool->uint8->bool ===> 447.854us
bool->int8->bool ===> 445.609us
bool->int16->bool ===> 312.425us
bool->int32->bool ===> 324.368us
bool->float->bool ===> 320.929us
bool->float16->bool ===> 290.825us
bool->bfloat16->bool ===> 437.250us

after

bool->uint8->bool ===> 78.988us ----> 467% higher throughput
bool->int8->bool ===> 78.494us -----> 468% higher throughput
bool->int16->bool ===> 107.993us ----> 189% higher throughput
bool->int32->bool ===> 186.887us -----> 74% higher throughput
bool->float->bool ===> 188.048us ------> 71% higher throughput
bool->float16->bool ===> 102.789us --> 183% higher throughput
bool->bfloat16->bool ===> 105.809us -> 313% higher throughput

Test Plan:
Correctness:

buck2 test mode/opt //caffe2/test:test_ops
buck2 test mode/opt //caffe2/test:torch

Performance:

buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test

Reviewed By: mcfi

Differential Revision: D85533284

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

pytorch-bot · 2025-10-27T19:31:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166330

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bb6f1bf with merge base 2dc5645 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-10-27T19:32:04Z

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85533284.

Nicoshev · 2025-10-27T19:36:01Z

@pytorchbot label "topic: not user facing" "release notes: cpu (aarch64)"

) Summary: We are adding autovec routines to convert to/from boolean values We observed the following performance improvements when compiling targeting armv9-a+sve2+fp16+bf16 before: bool->uint8->bool ===> 447.854us bool->int8->bool ===> 445.609us bool->int16->bool ===> 312.425us bool->int32->bool ===> 324.368us bool->float->bool ===> 320.929us bool->float16->bool ===> 290.825us bool->bfloat16->bool ===> 437.250us after bool->uint8->bool ===> 78.988us ----> 467% higher throughput bool->int8->bool ===> 78.494us -----> 468% higher throughput bool->int16->bool ===> 107.993us ----> 189% higher throughput bool->int32->bool ===> 186.887us -----> 74% higher throughput bool->float->bool ===> 188.048us ------> 71% higher throughput bool->float16->bool ===> 102.789us --> 183% higher throughput bool->bfloat16->bool ===> 105.809us -> 313% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Reviewed By: mcfi Differential Revision: D85533284

Summary: We are adding autovec routines to convert to/from boolean values We observed the following performance improvements when compiling targeting armv9-a+sve2+fp16+bf16 before: bool->uint8->bool ===> 447.854us bool->int8->bool ===> 445.609us bool->int16->bool ===> 312.425us bool->int32->bool ===> 324.368us bool->float->bool ===> 320.929us bool->float16->bool ===> 290.825us bool->bfloat16->bool ===> 437.250us after bool->uint8->bool ===> 78.988us ----> 467% higher throughput bool->int8->bool ===> 78.494us -----> 468% higher throughput bool->int16->bool ===> 107.993us ----> 189% higher throughput bool->int32->bool ===> 186.887us -----> 74% higher throughput bool->float->bool ===> 188.048us ------> 71% higher throughput bool->float16->bool ===> 102.789us --> 183% higher throughput bool->bfloat16->bool ===> 105.809us -> 313% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Reviewed By: mcfi Differential Revision: D85533284

) Summary: We are adding autovec routines to convert to/from boolean values We observed the following performance improvements when compiling targeting armv9-a+sve2+fp16+bf16 before: bool->uint8->bool ===> 447.854us bool->int8->bool ===> 445.609us bool->int16->bool ===> 312.425us bool->int32->bool ===> 324.368us bool->float->bool ===> 320.929us bool->float16->bool ===> 290.825us bool->bfloat16->bool ===> 437.250us after bool->uint8->bool ===> 78.988us ----> 467% higher throughput bool->int8->bool ===> 78.494us -----> 468% higher throughput bool->int16->bool ===> 107.993us ----> 189% higher throughput bool->int32->bool ===> 186.887us -----> 74% higher throughput bool->float->bool ===> 188.048us ------> 71% higher throughput bool->float16->bool ===> 102.789us --> 183% higher throughput bool->bfloat16->bool ===> 105.809us -> 313% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Reviewed By: mcfi Differential Revision: D85533284

aten/src/ATen/cpu/vec/vec128/vec128_convert.h

) Summary: We are adding autovec routines to convert to/from boolean values We observed the following performance improvements when compiling targeting armv9-a+sve2+fp16+bf16 before: bool->uint8->bool ===> 447.854us bool->int8->bool ===> 445.609us bool->int16->bool ===> 312.425us bool->int32->bool ===> 324.368us bool->float->bool ===> 320.929us bool->float16->bool ===> 290.825us bool->bfloat16->bool ===> 437.250us after bool->uint8->bool ===> 78.988us ----> 467% higher throughput bool->int8->bool ===> 78.494us -----> 468% higher throughput bool->int16->bool ===> 107.993us ----> 189% higher throughput bool->int32->bool ===> 186.887us -----> 74% higher throughput bool->float->bool ===> 188.048us ------> 71% higher throughput bool->float16->bool ===> 102.789us --> 183% higher throughput bool->bfloat16->bool ===> 105.809us -> 313% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Reviewed By: mcfi Differential Revision: D85533284

facebook-github-bot · 2025-10-29T01:01:17Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-10-29T01:03:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 27, 2025

meta-codesync bot added fb-exported meta-exported labels Oct 27, 2025

Nicoshev added ciflow/trunk Trigger trunk jobs on your pull request ciflow/linux-aarch64 linux aarch64 CI workflow labels Oct 27, 2025

pytorch-bot bot added release notes: cpu (aarch64) release notes category for aarch64, arm, etc. topic: not user facing topic category labels Oct 27, 2025

Nicoshev requested a review from mcfi October 27, 2025 19:36

Nicoshev force-pushed the export-D85533284 branch from 9721e28 to 10df531 Compare October 27, 2025 19:52

Nicoshev force-pushed the export-D85533284 branch 2 times, most recently from b5b988c to 1686b10 Compare October 27, 2025 20:37

mcfi approved these changes Oct 27, 2025

View reviewed changes

Nicoshev force-pushed the export-D85533284 branch from 1686b10 to 15f7a1c Compare October 28, 2025 14:31

Nicoshev force-pushed the export-D85533284 branch from 15f7a1c to b80567f Compare October 28, 2025 14:31

Nicoshev force-pushed the export-D85533284 branch from b80567f to e9ddff5 Compare October 28, 2025 14:43

Nicoshev force-pushed the export-D85533284 branch from e9ddff5 to 0d48af2 Compare October 28, 2025 14:48

Skylion007 reviewed Oct 28, 2025

View reviewed changes

aten/src/ATen/cpu/vec/vec128/vec128_convert.h Outdated Show resolved Hide resolved

Nicoshev force-pushed the export-D85533284 branch from 0d48af2 to bb6f1bf Compare October 28, 2025 17:31

Nicoshev requested a review from Skylion007 October 28, 2025 18:07

pytorchmergebot added the merging label Oct 29, 2025

pytorchmergebot added the Merged label Oct 29, 2025

pytorchmergebot closed this in bea89d6 Oct 29, 2025

pytorchmergebot removed the merging label Oct 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[PyTorch] Improve conversion from/to bool on aarch64+sve #166330

[PyTorch] Improve conversion from/to bool on aarch64+sve #166330

Uh oh!

Nicoshev commented Oct 27, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Oct 27, 2025

Uh oh!

Nicoshev commented Oct 27, 2025

Uh oh!

Uh oh!

facebook-github-bot commented Oct 29, 2025

Uh oh!

pytorchmergebot commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[PyTorch] Improve conversion from/to bool on aarch64+sve #166330

[PyTorch] Improve conversion from/to bool on aarch64+sve #166330

Uh oh!

Conversation

Nicoshev commented Oct 27, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166330

✅ No Failures

Uh oh!

meta-codesync bot commented Oct 27, 2025

Uh oh!

Nicoshev commented Oct 27, 2025

Uh oh!

Uh oh!

facebook-github-bot commented Oct 29, 2025

Uh oh!

pytorchmergebot commented Oct 29, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Nicoshev commented Oct 27, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 27, 2025 •

edited

Loading