[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

nilanjana87 · 2023-11-28T05:00:19Z

This is for re-committing the microbenchmark that attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. For context, the loop vectorizer uses a threshold of TinyTripCountInterleaveThreshold that is currently set to 128.

The timeout issue in the previous PR #26 is solved in this PR by using a pre-processor configuration flag of ALL_LOOP_IC_TESTS, which must be passed during compilation to run the whole set of tests, whereas by default it runs a reduced test set.

…nterleaving Count with varying loop iterations. This microbenchmark attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. Note: Interleaving count of 1 means interleaving is disabled. Differential Revision: https://reviews.llvm.org/D159475

…ed a redundant function, added test cases where the compiler selects the vectorization configuration

…d vectorization hints with preprocessor-macro-based template functions, as per reviewer comments

…th bigger bodies

…duced test set by default

antmox

LGTM. This one should be fine IMO.
~250secs instead of 2500secs for the 1st version.

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases: 1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice 2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run where, TC = trip count, VW = vectorization width, IC = interleaving count We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the post-vectorization remainder tail of the loop is minimal in cases where the vector loop gets to run only a few times. This patch attempts to compute interleaving count (IC) based on the trip count so as to minimize the remainder tail while maximizing the IC.

[LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: #70272 and #74689.

[LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: llvm#70272 and llvm#74689.

nilanjana87 and others added 12 commits September 27, 2023 10:38

As per reviewer comments, added comments explaining test cases, remov…

3a22381

…ed a redundant function, added test cases where the compiler selects the vectorization configuration

Merge branch 'llvm:main' into loop_interleaving_microbenchmark

418ec55

Replacing functions with same functionality but different pragma-base…

66ec798

…d vectorization hints with preprocessor-macro-based template functions, as per reviewer comments

Ran clang-format

cb3a077

Renaming functions to be more meaningful

379381b

Added benchmarks for testing impact of loop interleaving for loops wi…

eb33e98

…th bigger bodies

Fixed some of the code comments & ran clang-format

9f5bda4

Made a separate executable for Loop Interleaving microbenchmark

107bc87

Force disable loop unrolling for the auto-vectorization cases

802ca40

Ran clang-format

398bc8c

Added full set of tests for a preprocessor option, while it runs a re…

27a4648

…duced test set by default

nilanjana87 requested review from LebedevRI, fhahn, Meinersbur and antmox November 28, 2023 05:00

This was referenced Nov 28, 2023

[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations #26

Merged

[LV] Change loops' interleave count computation llvm/llvm-project#73766

Merged

antmox approved these changes Dec 5, 2023

View reviewed changes

nilanjana87 merged commit 5482cc3 into llvm:main Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

nilanjana87 commented Nov 28, 2023

antmox left a comment

[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

Conversation

nilanjana87 commented Nov 28, 2023

antmox left a comment

Choose a reason for hiding this comment