[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations #26

nilanjana87 · 2023-09-27T18:07:55Z

This microbenchmark attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. For context, the loop vectorizer uses a threshold of TinyTripCountInterleaveThreshold that is currently set to 128.

…nterleaving Count with varying loop iterations. This microbenchmark attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. Note: Interleaving count of 1 means interleaving is disabled. Differential Revision: https://reviews.llvm.org/D159475

MicroBenchmarks/LoopVectorization/LoopInterleaving.cpp

fhahn · 2023-10-10T00:17:26Z

IIUC this adds benchmarks for the best case for interleaving. Would it be possible to also add variants where interleaving is less beneficial?

…ed a redundant function, added test cases where the compiler selects the vectorization configuration

nilanjana87 · 2023-10-23T03:34:50Z

IIUC this adds benchmarks for the best case for interleaving. Would it be possible to also add variants where interleaving is less beneficial?

This file adds benchmarks for testing cases that may or may not be beneficial for interleaving. For example, cases with low trip counts are better off without interleaving, whereas if the interleaved loop runs at least twice it starts showing performance benefit. If we need to add more cases where it is less beneficial, can you suggest some?

fhahn · 2023-10-23T09:06:58Z

IIUC this adds benchmarks for the best case for interleaving. Would it be possible to also add variants where interleaving is less beneficial?

This file adds benchmarks for testing cases that may or may not be beneficial for interleaving. For example, cases with low trip counts are better off without interleaving, whereas if the interleaved loop runs at least twice it starts showing performance benefit. If we need to add more cases where it is less beneficial, can you suggest some?

Some additional cases could be loops with a reduction, but also a number of additional independent memory and compute operation chains (i.e. larger loop body with multiple independent operation chains) and one with a larger loop body without reduction (in that case, LLVM likely will decide not to interleave).

They might provide additional insight into current issues of the cost modeling for interleaving.

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#26), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases: 1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice 2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run where, TC = trip count, VW = vectorization width, IC = interleaving count We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).

…d vectorization hints with preprocessor-macro-based template functions, as per reviewer comments

…th bigger bodies

nilanjana87 · 2023-11-04T02:02:48Z

IIUC this adds benchmarks for the best case for interleaving. Would it be possible to also add variants where interleaving is less beneficial?

This file adds benchmarks for testing cases that may or may not be beneficial for interleaving. For example, cases with low trip counts are better off without interleaving, whereas if the interleaved loop runs at least twice it starts showing performance benefit. If we need to add more cases where it is less beneficial, can you suggest some?

Some additional cases could be loops with a reduction, but also a number of additional independent memory and compute operation chains (i.e. larger loop body with multiple independent operation chains) and one with a larger loop body without reduction (in that case, LLVM likely will decide not to interleave).

They might provide additional insight into current issues of the cost modeling for interleaving.

Replicated the existing tests for loops with bigger bodies with additional independent memory operations. These can be found in the latest patch, if you do case-insensitive search of function names for "BigLoop".

fhahn

Thanks for adding the extra loops, a few more comments inline

MicroBenchmarks/LoopVectorization/LoopInterleaving.cpp

MicroBenchmarks/LoopVectorization/CMakeLists.txt

MicroBenchmarks/LoopVectorization/LoopInterleaving.cpp

fhahn

LGTM, thanks!

antmox · 2023-11-17T10:07:51Z

Hello. I think this patch broke several bots:

clang-aarch64-lld-2stage https://lab.llvm.org/buildbot/#/builders/185/builds/5395
clang-armv8-lld-2stage https://lab.llvm.org/buildbot/#/builders/178/builds/6117
clang-aarch64-full-2stage https://lab.llvm.org/buildbot/#/builders/179/builds/8485
clang-aarch64-sve-vla-2stage https://lab.llvm.org/buildbot/#/builders/198/builds/6328
clang-aarch64-sve-vls-2stage https://lab.llvm.org/buildbot/#/builders/176/builds/7027
clang-armv7-lnt https://lab.llvm.org/buildbot/#/builders/174/builds/27416
clang-armv7-global-isel https://lab.llvm.org/buildbot/#/builders/186/builds/13219
clang-aarch64-global-isel https://lab.llvm.org/buildbot/#/builders/183/builds/17403
clang-aarch64-sve-vla https://lab.llvm.org/buildbot/#/builders/197/builds/10921

Could you please look at this ?

… Loop Interleaving Count with varying loop iterations (#26)" This reverts commit eda2d6c.

… Loop Interleaving Count with varying loop iterations (#26)" (#54) This reverts commit eda2d6c.

antmox · 2023-11-17T10:33:57Z

Reverted it. Hope that it's OK and that it was indeed the culprit commit. (edit: yes it was)

nilanjana87 · 2023-11-17T20:39:48Z

Reverted it. Hope that it's OK and that it was indeed the culprit commit. (edit: yes it was)

Thanks for figuring this out & reverting it. Seems like adding too many tests ended up hitting the timeout. I'll try to reduce the test points to keep it within the time limit & re-land it.

The tests passed for me locally though, even with the ninja check command that timed out in the build-bot. So, what is a good test to find this issue locally, other than reducing the test time below 1200 seconds as suggested by the failure log?

On a separate matter, I was checking the bot links you posted above, but I don't see myself or this patch in the Responsible Users or Changes tab, for example in https://lab.llvm.org/buildbot/#/builders/185/builds/5395. Is this because my patch has been removed from this list after it was reverted?

antmox · 2023-11-20T09:54:48Z

Yes the test does completes successfully here as well. I don't know the rules for adding new tests, but this one seems really long.
2500sec here on a huge aarch64 machine, while the testsuite can run on smaller machines, sometimes heavily loaded.
I don't think there is a 1200sec limit, but jobs are killed after 1200sec without any output.

For the other thing, the responsible users and changes tabs should be read with care.
Changes from the llvm-test-suite and lnt repos are not listed here. Some bots may not even list all changes from the llvm-project here.
I think this is defined in the llvm-zorg builders classes, depends-on-projects attribute.

antmox · 2023-11-20T10:10:29Z

Also, I wonder why only the arm/aarch64 bots failed with this test. Maybe there's something to analyze here.
On what type of machine did you run your test, and what was the execution time?

nilanjana87 · 2023-11-28T05:13:59Z

2500sec here on a huge aarch64 machine, while the testsuite can run on smaller machines, sometimes heavily loaded.
I don't think there is a 1200sec limit, but jobs are killed after 1200sec without any output.

It runs for little more than 40min in my machine, but locally it doesn't hit the timeout. I added a new PR #56 which runs a reduced test set by default and runs the whole set when compiled with a flag.

Also, I wonder why only the arm/aarch64 bots failed with this test.
There's nothing Arm/AArch64 specific in the code, so I don't have a clue why this is so.

…terleaving Count with varying loop iterations (llvm#26) * [MicroBenchmarks,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations. This microbenchmark attempts to find the impact of loop interleaving count for different types of loops (big or small, with or without reductions inside them) over different vectorization factors for varying loop trip counts. Note: Interleaving count of 1 means interleaving is disabled. These microbenchmarks are to help guide changes in loop interleaving count computation and removal of trip count threshold for interleaving loops in llvm/llvm-project#67725 & related patches.

… Loop Interleaving Count with varying loop iterations (llvm#26)" (llvm#54) This reverts commit eda2d6c.

…the loop The current loop trip count threshold to allow loop interleaving is 128 which seems arbitrarily high & uncorrelated with factors like VW, IC, register pressure etc. A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#26), when tested on a AArch64 platform, shows that loop interleaving is beneficial even for loops with low trip counts. We have also found similar evidence in an application benchmark that when compiled with PGO shows a 40% regression when it's hot loop with profile-guided trip count of 24 doesn't get interleaved because of this threshold. Therefore, it seems reasonable to eliminate this threshold and use the trip count for computing interleaving count instead (llvm#73766).

…ave a loop (#67725) A set of microbenchmarks (llvm/llvm-test-suite#26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.

…ave a loop (llvm#67725) A set of microbenchmarks (llvm/llvm-test-suite#26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.

nilanjana87 requested review from LebedevRI, fhahn and Meinersbur September 27, 2023 18:07

nilanjana87 mentioned this pull request Sep 28, 2023

[LV] Relax high loop trip count threshold for deciding to interleave a loop llvm/llvm-project#67725

Merged

fhahn reviewed Oct 10, 2023

View reviewed changes

nilanjana87 and others added 2 commits October 22, 2023 20:22

As per reviewer comments, added comments explaining test cases, remov…

3a22381

…ed a redundant function, added test cases where the compiler selects the vectorization configuration

Merge branch 'llvm:main' into loop_interleaving_microbenchmark

418ec55

nilanjana87 mentioned this pull request Oct 24, 2023

[LV] Change loops' interleave count computation llvm/llvm-project#70141

Closed

nilanjana87 added 4 commits November 2, 2023 19:08

Replacing functions with same functionality but different pragma-base…

66ec798

…d vectorization hints with preprocessor-macro-based template functions, as per reviewer comments

Ran clang-format

cb3a077

Renaming functions to be more meaningful

379381b

Added benchmarks for testing impact of loop interleaving for loops wi…

eb33e98

…th bigger bodies

fhahn reviewed Nov 7, 2023

View reviewed changes

MicroBenchmarks/LoopVectorization/LoopInterleaving.cpp Outdated Show resolved Hide resolved

MicroBenchmarks/LoopVectorization/LoopInterleaving.cpp Outdated Show resolved Hide resolved

Fixed some of the code comments & ran clang-format

9f5bda4

fhahn reviewed Nov 9, 2023

View reviewed changes

MicroBenchmarks/LoopVectorization/CMakeLists.txt Outdated Show resolved Hide resolved

Made a separate executable for Loop Interleaving microbenchmark

107bc87

fhahn reviewed Nov 14, 2023

View reviewed changes

MicroBenchmarks/LoopVectorization/LoopInterleaving.cpp Show resolved Hide resolved

Force disable loop unrolling for the auto-vectorization cases

802ca40

fhahn approved these changes Nov 16, 2023

View reviewed changes

Ran clang-format

398bc8c

nilanjana87 merged commit eda2d6c into llvm:main Nov 17, 2023

antmox added a commit that referenced this pull request Nov 17, 2023

Revert "[MicroBenchmark,LoopInterleaving] Check performance impact of…

0cdf078

… Loop Interleaving Count with varying loop iterations (#26)" This reverts commit eda2d6c.

antmox mentioned this pull request Nov 17, 2023

Revert "[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations" #54

Merged

antmox added a commit that referenced this pull request Nov 17, 2023

Revert "[MicroBenchmark,LoopInterleaving] Check performance impact of…

7e54b7b

… Loop Interleaving Count with varying loop iterations (#26)" (#54) This reverts commit eda2d6c.

nilanjana87 mentioned this pull request Nov 28, 2023

[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations #26

[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations #26

nilanjana87 commented Sep 27, 2023 •

edited by artagnon

fhahn commented Oct 10, 2023

nilanjana87 commented Oct 23, 2023

fhahn commented Oct 23, 2023

nilanjana87 commented Nov 4, 2023

fhahn left a comment

fhahn left a comment

antmox commented Nov 17, 2023

antmox commented Nov 17, 2023 •

edited

nilanjana87 commented Nov 17, 2023

antmox commented Nov 20, 2023

antmox commented Nov 20, 2023

nilanjana87 commented Nov 28, 2023

[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations #26

[MicroBenchmark,LoopInterleaving] Check performance impact of Loop Interleaving Count with varying loop iterations #26

Conversation

nilanjana87 commented Sep 27, 2023 • edited by artagnon

fhahn commented Oct 10, 2023

nilanjana87 commented Oct 23, 2023

fhahn commented Oct 23, 2023

nilanjana87 commented Nov 4, 2023

fhahn left a comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

antmox commented Nov 17, 2023

antmox commented Nov 17, 2023 • edited

nilanjana87 commented Nov 17, 2023

antmox commented Nov 20, 2023

antmox commented Nov 20, 2023

nilanjana87 commented Nov 28, 2023

nilanjana87 commented Sep 27, 2023 •

edited by artagnon

antmox commented Nov 17, 2023 •

edited