-
Notifications
You must be signed in to change notification settings - Fork 371
[Microbenchmarks] Add benchmark for conditional scalar assignment autovec #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Microbenchmarks] Add benchmark for conditional scalar assignment autovec #295
Conversation
…ovec Benchmarks with vs. without autovec for a loop containing conditional scalar assignment (plus a little extra arithmetic as a 'work payload').
|
Microbenchmark for FindLast/CSA autovec, as requested on llvm/llvm-project#158088 With just the conditional assignment in the loop, there was no noticeable performance difference. However, when I added a small arithmetic payload I saw a noticeable difference, especially for uint8t. |
MacDue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems reasonable to me (bar a few nits), but I've not added a benchmark before, so wait and see if there's any more comments.
MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,118 @@ | |||
| #include <iostream> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to comment about the license header, but it seems that's not done here (looking at other files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wondered about that too.
MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp
Outdated
Show resolved
Hide resolved
| // for 'A' in init_data below. | ||
| T Result = 101; | ||
| for (unsigned i = 0; i < ITERATIONS; i++) { | ||
| // Do some work to make the difference noticeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a few more variations, like the minimal case with just a CAS and multiple independent CAS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| } | ||
| } | ||
|
|
||
| // Add add auto-vectorized and disabled vectorization benchmarks for math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment needs updating, currently passes only ty and Threshold, but it might be helpful to also pass a function if it helps to reduce the duplication for additional patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| #endif | ||
|
|
||
| for (auto _ : state) { | ||
| VecFn(&A[0], &B[0], &C[0], Threshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is working as expected. I think we need something like below to make sure the CAS result is used:
- NoVecFn(&A[0], &B[0], &C[0], Threshold);
+ auto Res = NoVecFn(&A[0], &B[0], &C[0], Threshold);
+ benchmark::DoNotOptimize(Res);
Without a use of the result, compiler is probably able completely remove the variantst that don't have stores in the loop and also remove the unused CAS chain after inlining?
Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').