Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

Merged

Conversation

nilanjana87
Copy link
Contributor

This is for re-committing the microbenchmark that attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. For context, the loop vectorizer uses a threshold of TinyTripCountInterleaveThreshold that is currently set to 128.

The timeout issue in the previous PR #26 is solved in this PR by using a pre-processor configuration flag of ALL_LOOP_IC_TESTS, which must be passed during compilation to run the whole set of tests, whereas by default it runs a reduced test set.

nilanjana87 and others added 12 commits September 27, 2023 10:38
…nterleaving Count with varying loop iterations.

This microbenchmark attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it.
Note: Interleaving count of 1 means interleaving is disabled.

Differential Revision: https://reviews.llvm.org/D159475
…ed a redundant function, added test cases where the compiler selects the vectorization configuration
…d vectorization hints with preprocessor-macro-based template functions, as per reviewer comments
Copy link
Contributor

@antmox antmox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This one should be fine IMO.
~250secs instead of 2500secs for the 1st version.

@nilanjana87 nilanjana87 merged commit 5482cc3 into llvm:main Dec 8, 2023
nilanjana87 added a commit to nilanjana87/llvm-project that referenced this pull request Dec 12, 2023
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases:
1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice
2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run
where, TC = trip count, VW = vectorization width, IC = interleaving count

We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
nilanjana87 added a commit to nilanjana87/llvm-project that referenced this pull request Dec 12, 2023
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases:
1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice
2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run
where, TC = trip count, VW = vectorization width, IC = interleaving count

We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
nilanjana87 added a commit to nilanjana87/llvm-project that referenced this pull request Dec 12, 2023
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases:
1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice
2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run
where, TC = trip count, VW = vectorization width, IC = interleaving count

We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
nilanjana87 added a commit to nilanjana87/llvm-project that referenced this pull request Jan 3, 2024
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the post-vectorization remainder tail of the loop is minimal in cases where the vector loop gets to run only a few times. This patch attempts to compute interleaving count (IC) based on the trip count so as to minimize the remainder tail while maximizing the IC.
nilanjana87 added a commit to llvm/llvm-project that referenced this pull request Jan 4, 2024
[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both.

The initial tests for this change were submitted in PRs:
#70272 and #74689.
nilanjana87 added a commit to apple/llvm-project that referenced this pull request Mar 5, 2024
[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both.

The initial tests for this change were submitted in PRs:
llvm#70272 and llvm#74689.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants