[CMake][LIT] Add option to run lit testsuites in parallel #82899

patrickdoc · 2024-02-24T23:37:14Z

Currently add_lit_target sets the USES_TERMINAL CMake option. When using Ninja, this forces all lit testsuite targets into the single-threaded console pool.

This PR adds a new option LLVM_PARALLEL_LIT which drops the USES_TERMINAL flag, allowing Ninja to run them in parallel.

The default setting (LLVM_PARALLEL_LIT=OFF) retains the existing behavior of serial testsuite execution.

github-actions · 2024-02-24T23:37:30Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

JoeLoser

Thanks for pushing this through upstream! Let me know if you need me to commit this on your behalf.

patrickdoc · 2024-02-27T14:32:03Z

@JoeLoser yes please merge, I do not have write access to this repo.

github-actions · 2024-02-28T18:53:31Z

@patrickdoc Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may recieve a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

joker-eph · 2024-03-08T07:50:47Z

Can you clarify why this is useful/desirable?

(it's good practice in general for the PR description to describe the context and motivation for the change, that is the "why" instead of "what" is done)

Same for the documentation change: it does not talk about when is it useful/desirable for a user to do.

patrickdoc · 2024-03-08T17:01:19Z

Running the test targets in parallel can result in faster build times depending on your build graph + available resources on your machine. For example, individual targets can be started earlier or you can get increased cpu usage if threads are paused on I/O.

Here's a simple example of a handful of test targets being:

ninja -C build check-llvm-analysis-aliasset check-llvm-analysis-assumptioncache check-llvm-analysis-basicaa check-llvm-analysis-blockfrequencyinfo check-llvm-analysis-branchprobabilityinfo check-llvm-analysis-callgraph check-llvm-analysis-costmodel

Ninja trace for parallelism disabled (pre-existing behavior and current default):

Ninja trace for parallelism enabled (by enabling the new flag):

I would summarize it as: if you have multiple lit targets and idle cpu during your builds, try enabling this.

It greatly depends on your setup, but in my case it cut build times in half.

joker-eph · 2024-03-08T21:50:00Z

Oh you're running very tiny lit target!

The problem of this is that most of the lit suite are massively parallel themselves.

The way to achieve good parallelism with lit isn't to call the targets the way you're doing here, instead you can do:

ninja && bin/llvm-lit ../llvm/test/Analysis/{AliasSet,AssumptionCache,BasicAA,BlockFrequencyInfo,BranchProbabilityInfo,CallGraph,CostModel}

Or:

LIT_OPTS=" --filter='Analysis/AliasSet|Analysis/AssumptionCache|Analysis/BasicAA|Analysis/BlockFrequencyInfo|Analysis/BranchProbabilityInfo|Analysis/CallGraph|Analysis/CostModel' " \
ninja check

I believe these are giving you "perfect" parallelism without over-subscription problem associated with the cmake option you introduced.

Your screenshots are actually misleading in that they show multiple threads apparently "balanced", which looks all nice, but it's actually a "lie": this is just the ninja view of the world. Each of the lit invocation will spawn as many threads as cores on the machine (the over-subscription I'm referring to).

Mogball · 2024-03-09T08:08:46Z

@patrickdoc @JoeLoser Can we try what Mehdi is suggesting instead of what is enabled by this PR? If that works, I don't see a need for this PR and we can revert it, since there is no clear use case.

patrickdoc · 2024-03-11T15:49:38Z

Hey @joker-eph, the pictures in my example weren't my actual builds, just a demonstration of the intended change. I'm aware that lit also tries to parallelize on top of ninja's parallelism so have reduced lit's parallelism with -j X to compensate.

My use-case is CI builds, where we are running the whole build + all the tests, with the goal of being as fast as possible.

Using a single mega lit target has some downsides:

It artificially splits the build into first a compile phase, then a test phase. This means that tests which could be running against early compilation targets get delayed until the end of the build
It doesn't allow flexibility for different sizes of tests. i.e. N small unit tests would have poor utilization of available cpu/mem/etc. while N larger end-to-end integration tests could run into the oversubscribing issue you've described
It doesn't play well with other testing targets in the build. As you mentioned, lit assumes it owns the machine and so runs one test per core. But Ninja doesn't know that so can schedule other targets alongside lit.

The wins I am seeing come from moving parallelism logic out of lit and into Ninja. So we end up with a much smoother build graph because Ninja knows more about the build than lit does.

But to do that, we need to allow Ninja to flexibly schedule the lit targets. The upside of the USES_TERMINAL setting is that you can get live progress bar output, but the downside is that Ninja can only run one target as a time.

Truthfully even just removing this flag altogether wouldn't affect the "single lit target" use-case (aside from losing the progress bar) because it is already delayed until after every other target anyway. But I've added it as an option here to allow folks to opt-in.

For clarification in the docs, would something like this be better:

**LLVM_PARALLEL_LIT**:BOOL
  Defaults to ``OFF``. If using more than one lit CMake target and the Ninja
  backend, setting this to ``ON`` allows Ninja to run the targets in parallel. This
  could lead to too many tests running at once, so it is recommended to also
  reduce lit's internal parallelism with something like`-DLLVM_LIT_ARGS="-j 1"`.

joker-eph · 2024-03-11T18:32:39Z

My use-case is CI builds, where we are running the whole build + all the tests, with the goal of being as fast as possible.

This is weird, your motivation stated earlier was "I want to run a very small set of the tests", I'm having a hard time following now what is the motivating example really!

It artificially splits the build into first a compile phase, then a test phase. This means that tests which could be running against early compilation targets get delayed until the end of the build

This is true in theory, however in practice in LLVM the check targets depends on all the binaries needed for all the tests (unless you have many lit test suites in the first place, but I'd like to see an example). Another problem is that running the tests in parallel concurrently with the build is again oversubscribing the machine.

It doesn't allow flexibility for different sizes of tests. i.e. N small unit tests would have poor utilization of available cpu/mem/etc. while N larger end-to-end integration tests could run into the oversubscribing issue you've described

I don't understand your point about small/large? As long as each test is single threader, it does not matter if they are small or large.

The wins I am seeing come from moving parallelism logic out of lit and into Ninja. So we end up with a much smoother build graph because Ninja knows more about the build than lit does.

If you want to do this in a principle way, then we should

Discover the lit tests targets using lit
Feed these individual tests to ninja for scheduling and let ninja do the scheduling.

If you care about solving this properly, I'd be happy to see something in this direction, however this isn't the granularity you operate at right now.

I rather see this reverted and a more principled approach discussed first.

…2899)" This reverts commit 782147e.

…84813) Reverts #82899 Per the discussion on the PR, this needs more design and justification.

JoeLoser · 2024-03-11T18:57:15Z

@joker-eph sorry about approving/merging this early before engaging more with the community here on the feedback and approach. As I talked with Jeff this morning, he reverted this in 8846b91 and we'll re-evaluate how we want to proceed. Sorry again, and we appreciate your feedback!

joker-eph · 2024-03-11T20:14:16Z

No need to apologize, I assumed all good intents here!
I'd love to find a good solution here though!

joker-eph · 2024-03-11T20:14:57Z

One more subtlety: when lit runs the tests, it'll run them all and reports the failures.
When we split the lit invocation across ninja targets, ninja stops at the first failure by default...

[CMake][LIT] Add option to run lit testsuites in parallel

209682d

llvmbot added the cmake Build system in general and CMake in particular label Feb 24, 2024

JoeLoser self-requested a review February 24, 2024 23:45

JoeLoser approved these changes Feb 27, 2024

View reviewed changes

JoeLoser merged commit 782147e into llvm:main Feb 28, 2024
6 of 7 checks passed

patrickdoc deleted the lit-parallelism branch March 3, 2024 16:19

Mogball added a commit that referenced this pull request Mar 11, 2024

Revert "[CMake][LIT] Add option to run lit testsuites in parallel (#8…

7cc43fe

…2899)" This reverts commit 782147e.

Mogball mentioned this pull request Mar 11, 2024

Revert "[CMake][LIT] Add option to run lit testsuites in parallel" #84813

Merged

Mogball added a commit that referenced this pull request Mar 11, 2024

Revert "[CMake][LIT] Add option to run lit testsuites in parallel" (#…

8846b91

…84813) Reverts #82899 Per the discussion on the PR, this needs more design and justification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CMake][LIT] Add option to run lit testsuites in parallel #82899

[CMake][LIT] Add option to run lit testsuites in parallel #82899

patrickdoc commented Feb 24, 2024

github-actions bot commented Feb 24, 2024

JoeLoser left a comment •

edited

patrickdoc commented Feb 27, 2024

github-actions bot commented Feb 28, 2024

joker-eph commented Mar 8, 2024 •

edited

patrickdoc commented Mar 8, 2024

joker-eph commented Mar 8, 2024 •

edited

Mogball commented Mar 9, 2024

patrickdoc commented Mar 11, 2024

joker-eph commented Mar 11, 2024

JoeLoser commented Mar 11, 2024 •

edited

joker-eph commented Mar 11, 2024

joker-eph commented Mar 11, 2024

[CMake][LIT] Add option to run lit testsuites in parallel #82899

[CMake][LIT] Add option to run lit testsuites in parallel #82899

Conversation

patrickdoc commented Feb 24, 2024

github-actions bot commented Feb 24, 2024

JoeLoser left a comment • edited

Choose a reason for hiding this comment

patrickdoc commented Feb 27, 2024

github-actions bot commented Feb 28, 2024

joker-eph commented Mar 8, 2024 • edited

patrickdoc commented Mar 8, 2024

joker-eph commented Mar 8, 2024 • edited

Mogball commented Mar 9, 2024

patrickdoc commented Mar 11, 2024

joker-eph commented Mar 11, 2024

JoeLoser commented Mar 11, 2024 • edited

joker-eph commented Mar 11, 2024

joker-eph commented Mar 11, 2024

JoeLoser left a comment •

edited

joker-eph commented Mar 8, 2024 •

edited

joker-eph commented Mar 8, 2024 •

edited

JoeLoser commented Mar 11, 2024 •

edited