[Microbenchmarks] Add benchmark for conditional scalar assignment autovec #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

huntergr-arm merged 3 commits into llvm:main from huntergr-arm:conditional-scalar-assignment-microbenchmark

Nov 20, 2025

+223 −0

Contributor

huntergr-arm commented Nov 13, 2025

Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').


          [Microbenchmarks] Add benchmark for conditional scalar assignment aut…

489b92e

…ovec

Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').

huntergr-arm requested review from MacDue, fhahn and sdesmalen-arm

November 13, 2025 13:51

Contributor Author

huntergr-arm commented Nov 13, 2025

Microbenchmark for FindLast/CSA autovec, as requested on llvm/llvm-project#158088

With just the conditional assignment in the loop, there was no noticeable performance difference. However, when I added a small arithmetic payload I saw a noticeable difference, especially for uint8t.

MacDue approved these changes

View reviewed changes

Member

MacDue left a comment

Generally seems reasonable to me (bar a few nits), but I've not added a benchmark before, so wait and see if there's any more comments.

MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp Outdated Show resolved Hide resolved

MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp

		@@ -0,0 +1,118 @@
		#include <iostream>

Member

MacDue Nov 13, 2025 •

edited

Loading

Was going to comment about the license header, but it seems that's not done here (looking at other files).

Contributor Author

huntergr-arm Nov 13, 2025

Yeah, I wondered about that too.

MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp Outdated Show resolved Hide resolved


          Remove unnecessary headers, improve comments

03848a1

fhahn reviewed

View reviewed changes

MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp

+                // for 'A' in init_data below.
+                T Result = 101;
+                for (unsigned i = 0; i < ITERATIONS; i++) {
+                  // Do some work to make the difference noticeable

Contributor

fhahn Nov 14, 2025

could you add a few more variations, like the minimal case with just a CAS and multiple independent CAS?

Contributor Author

huntergr-arm Nov 14, 2025

done.

MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp Outdated

+                }
+              }
+              // Add add auto-vectorized and disabled vectorization benchmarks for math

Contributor

fhahn Nov 14, 2025

The comment needs updating, currently passes only ty and Threshold, but it might be helpful to also pass a function if it helps to reduce the duplication for additional patterns

Contributor Author

huntergr-arm Nov 14, 2025

done.


          Add single-csa-only and multi-csa-only variants, tidy up

bc3492f

huntergr-arm mentioned this pull request

[LV] Vectorize conditional scalar assignments llvm/llvm-project#158088

Open

huntergr-arm merged commit e810d81 into llvm:main

1 check passed

fhahn reviewed

View reviewed changes

MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp

    
              #endif

                for (auto _ : state) {

                  VecFn(&A[0], &B[0], &C[0], Threshold);

Contributor

fhahn Nov 21, 2025

I am not sure this is working as expected. I think we need something like below to make sure the CAS result is used:

-    NoVecFn(&A[0], &B[0], &C[0], Threshold);
+    auto Res = NoVecFn(&A[0], &B[0], &C[0], Threshold);
+    benchmark::DoNotOptimize(Res);

Without a use of the result, compiler is probably able completely remove the variantst that don't have stores in the loop and also remove the unused CAS chain after inlining?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet