Run tests in parallel, whenever possible #7283

mj-xmr · 2021-01-05T18:50:40Z

using tests/run-tests-parallel.sh script.

With the current simplistic approach I was able to achieve a 20% speedup. This new, small test execution framework will allow for further improvements in the future, by separating conflicting portions of code (or simply by selecting non-conflicting ports of monerod).

The speedup was calculated by summing up the time taken by tests, which were being ran while others, longer tests were running (dt), and was divided by the sum of (total run time + dt).

The test run script has a fuse box. If the tests fail for any reason, they are being restarted using a single core. Note, that it's the tests that fail the most often (rpc tests failing randomly) and quickest (unit tests), which are being ran at the beginning, giving you a hint early enough.

Work time: ~6.7h plus a day of passive tests while doing something else.

iamamyth · 2021-01-06T03:17:17Z

tests/run-tests-parallel.sh

+# Run the excluded tests first, then everything but excluded, and if any of these fail, 
+# then fallback to the original single processing version
+time ( \
+ctest -j2 -R "${TESTS_EXCLUDED_REGEX}" && \


A fixed number of jobs here seems somewhat undesirable; it would be nice if the script picked a suitable default (based on nproc) and allowed a user to override it.

I was thinking about it. For the first 2 tests more than 2 cores obviously doesn't make sense.

For the second part adding more cores will actually only prolong the test time. It can't currently get any lower, since core_tests take much more time, than all the other tests combined, so already with 2 cores, you experience thread starvation, because of the too long time of core tests. Adding more cores at this time (say nproc as an extreme), will only make the longest test core_tests run longer, since the CPU cache is used up quicker at the beginning, and then when the other test finish sooner, the cache will not be used any better by the core_tests.

So adding more cores will only help, if the core tests are split into many more, independent pieces, which I will be doing later. It's been contracted by my CCS proposal. Only after I do it, it will be reasonable to increase the number of cores, or let the user select it.

A general remark to everybody: please don't treat my changes as the final state, but just an iteration, in order to keep the PRs small (but still mergable). 20% gain for such a small change is in my opinion worth it. Plus, the testing time is currently the largest bottleneck of the CI.

If certain tests should never run in parallel, and others have a maximum parallelism factor, then you can (and should) limit the maximum number of runners based on the tests (i.e. nproc is an upper bound). The point is that using two runners on a single-core setup actively makes things worse, which means some kind of variability is important. Once you accept that premise, it's little extra work (a single branch) to permit users to decide the upper limit, rather than defaulting to nproc (and, this feature seems important, otherwise you're potentially making things worse for people who have multiple cores but don't want to dedicate more than one to the build).

A general remark to everybody: please don't treat my changes as the final state, but just an iteration, in order to keep the PRs small (but still mergable)

No PR is ever a "final state", the question is whether the change makes sense enough on its own. Here, for example, everyone gets stuck with -j2, even if they only have a single core to run the tests. One could argue that is a real step backwards for many users.

To this I fully agree. I should take the minimum between 2 and nproc, or even of 2 and (nproc-1), since a single core setup with a virtual core won't make it any faster either, except maybe for the situation when core_tests are blocked by IO or network.

So would this logic be OK?

num cores = min(2, nproc - 1) if (num_cores < 1) { // in case we're dealing with a single core CPU without a virtual core num_cores = 1 }

In the end, I've decided not to pretend, that a single core machine has 2 logical cores by default (re: nproc -1). If it has, then as said before, the core tests have some IO/Network blocking moments, when other tests may run in the virtual thread. If this is not wanted for a single physical core with 1 virtual core, then I let the user preset this with the environment variable.

iamamyth · 2021-01-06T03:23:40Z

tests/run-tests-parallel.sh

+ctest -j2 -R "${TESTS_EXCLUDED_REGEX}" && \
+ctest -j2 -E "${TESTS_EXCLUDED_REGEX}" \
+) \
+|| ctest 


As best as I can tell, ctest returns a non-zero exit status if a test fails, meaning this || ends up double-running the test suite on failure. Even if it worked right, I'm not sure it's a good idea, as it seems prudent to expose test runner issues to users so they can be fixed (e.g. if users can specify a parallelism override, then a failure can be manually overridden as needed and hopefully addressed at some juncture, rather than papered over by this wrapper).

You understand it correctly.

I'm ok with removing the "|| ctest", but I'd like to hear a 3rd opinion. My rationale is, that it would prevent failures of the parallelism itself. Imagine, that somebody adds a new test, which like the 2 excluded ones is incompatible with the core tests. To fix the CI, you'd need to call the maintainer. Otherwise, with "|| ctest" you could just get along with your high priority task and wait until somebody fixes the parallelism, independently of your task.

And one more remark.
To specify a parallelism override, it's enough for the user just to call ctest in the build directory.

So maybe it would make sense to do the following two moves at the same time:

remove the || ctest from the bash script and make it parallel only, with an adjustable number of cores in future, once it makes sense.

move the || ctest into the Makefile, after the runner script is now being called, so that the CI doesn't suffer from sudden breaks of the script.

move the || ctest into the Makefile, after the runner script is now being called, so that the CI doesn't suffer from sudden breaks of the script.

I had thought to suggest this in my earlier comment. But, I see some other options:

Add a Makefile target specifically for CI.

Have CI run || ctest, rather than including this in the Makefile, as it's really something an ordinary user of the Makefile doesn't typically want or need.

That said, I don't agree with the premise of CI parallelism blocking progress. If a commit causes CI to start failing, that commit should fix the problem (e.g. by not running the new tests in parallel). Otherwise, you'll end up simply never running anything in parallel, because someone will break build parallelism, hit the || and not notice the breakage, and master will continue with serial builds. Either the parallelism should be useful and minimally invasive to contributors, such that it's worth their time to fix parallelism-induced issues, or, it shouldn't exist at all.

moneromooo-monero · 2021-01-06T20:57:02Z

If would be nice if this were not default. It's really annoying when people screw with parallel settings inside makefiles.

selsta · 2021-01-06T20:58:50Z

Seems like we could enable this for CI but yes, probably best if it isn't on by default.

mj-xmr · 2021-01-06T21:03:56Z

Seems like we could enable this for CI but yes, probably best if it isn't on by default.

So I'll make a separate target for CI and "make" it from the Workflows.

Should the new target go to a separate Makefile as well? I'd rather keep it in the original Makefile, as this will spare a lot of duplication.

iamamyth · 2021-01-06T22:29:12Z

Per my earlier comments, you could also just use an env var to enable parallel builds, which is how cmake itself does things (default to off, have CI set the var and pick it up in the test build script).

Edit: I should add that you don't even need to introduce an extra boolean env var, you could just make use of one which enables controlling the parallelism ceiling (e.g. the CI setup can do MONERO_TEST_PARALLELISM=nproc run-tests-parallel.sh).

mj-xmr · 2021-01-07T08:00:35Z

Makefile

@@ -98,6 +98,10 @@ release-test:
 	mkdir -p $(builddir)/release
 	cd $(builddir)/release && cmake -D BUILD_TESTS=ON -D CMAKE_BUILD_TYPE=release $(topdir) && $(MAKE) && $(MAKE) test

+release-test-ci:
+	mkdir -p $(builddir)/release
+	cd $(builddir)/release && cmake -D BUILD_TESTS=ON -D CMAKE_BUILD_TYPE=release -D BUILD_SHARED_LIBS=ON $(topdir) && $(MAKE) && MONERO_PARALLEL_TEST_JOBS=nproc $(topdir)/tests/run-tests-parallel.sh


Notice, that I smuggled -D BUILD_SHARED_LIBS=ON, in order to leverage faster linkage and less disk space usage for the release version, which is static by default.

The new CI target has a default of nproc preset as an env var, and the runner script also has a default of nproc, if the env var isn't set. Then it tries to find the minimum between nproc (or the forced setting) and the currently reasonable value of 2 jobs.

mj-xmr · 2021-01-07T11:00:50Z

The RPC tests are repeatedly failing on the mining part, as usual, but it proves the usefulness of this separation: The CI delivers a failure in the first 40 minutes, as opposed to having to wait 1.5 hour for possibly a negative result.

mj-xmr · 2021-01-07T12:30:06Z

Per my earlier comments, you could also just use an env var to enable parallel builds, which is how cmake itself does things (default to off, have CI set the var and pick it up in the test build script).

Edit: I should add that you don't even need to introduce an extra boolean env var, you could just make use of one which enables controlling the parallelism ceiling (e.g. the CI setup can do MONERO_TEST_PARALLELISM=nproc run-tests-parallel.sh).

Thanks. I've made something in-between.

tests/run-tests-parallel.sh

iamamyth · 2021-01-07T20:32:00Z

Thanks for the changes, overall this looks pretty good to me.

iamamyth · 2021-01-07T20:34:13Z

tests/run-tests-parallel.sh

+NUM_PROC_MAX_REASONABLE=2 # This might be increased, once the core_tests are divided into many more independent pieces or simply sped up
+TESTS_EXCLUDED_REGEX="unit_tests|functional_tests_rpc" # These tests collide with core_tests
+
+if [ -z $MONERO_PARALLEL_TEST_JOBS ]; then


If you're using bash here, might as well switch to [[, that way you skip an invocation of the test program.

I didn't know this. I've done formal trainings in OO languages, but Bash was always just a hobby.

I don't think it's a good idea to assume that bash is being used. Should always target vanilla sh.

The logic got much more complex, than in the initial state, so if I have to switch to sh, I will prefer to do it in Python3 after all. The majority of time was consumed by gathering requirements anyway.

mj-xmr · 2021-01-07T22:19:18Z

Thanks for the changes, overall this looks pretty good to me.

Thanks for reasonable comments to all 3 of you.

selsta · 2021-01-08T00:54:39Z

I did not follow the whole discussion but currently it only seems to run

    Start 6: functional_tests_rpc
    Start 8: unit_tests

Is this intentional? Ideally you would want to run all tests to see which ones fail.

mj-xmr · 2021-01-08T08:11:11Z

I did not follow the whole discussion but currently it only seems to run
    Start 6: functional_tests_rpc
    Start 8: unit_tests
Is this intentional? Ideally you would want to run all tests to see which ones fail.

This is intentional, but can be changed to run all, even though the first two typically fail. My philosophy is, that if the tests most probable to fail do fail, you'll have to fix first and rerun the whole thing again anyway, just like you'd fail a compilation on the first error, not to have to wait the remaining time to get the feedback. In this case, you'd have to wait additional hour to learn, that your build has failed.

If the other, high level tests do fail afterwards, the proper way to handle this is to write a unit test, that simulates exactly this failure, without having to allocate all the resources. As a unit test it will exposing the weakness in a much shorter time, than the smoke test does.

If this doesn't convince you, I can run them all. Any other opinions on this one? @moneromooo-monero @iamamyth ?

selsta · 2021-01-08T08:16:28Z

The problem is with these false positives we have (mining, p2p functional tests). As long as we don't fix these it does not make sense to fail early.

mj-xmr · 2021-01-08T08:18:09Z

The problem is with these false positives we have (mining, p2p functional tests). As long as we don't fix these it does not make sense to fail early.

If this is how you see it, then I'll run them all, alright!

mj-xmr · 2021-01-08T14:55:44Z

Reimplemented in Python, without much problems, gaining better readability.

iamamyth · 2021-01-09T01:28:48Z

tests/run-tests-parallel.py

+
+def run_all():
+    # Run all of the tests, no matter if the first part fails
+    ctest1 = get_ctest_command(num_proc_final, '-R')


These function looks very much like run_all, they could easily be one function which accepts a fail_fast param; e.g.:

def run(nproc, fail_fast=False): cmds = [get_test_command(...), ...] error = False for cmd in cmds: status = run_cmd(cmd) if status != 0: if fail_fast: return status error = True return ERR_CODE if error else 0

Done, but this principle kept me from doing this initially. In python at least you can name the parameter in a call.

Same thoughts as in the below case, but I will merely add, the function here does one thing: Run the test commands and return a status code; it just may not run all the commands, depending on the value of fail_fast. Anyways, this is a great example of principle-vs-principle being unhelpful: "DRY" vs. "boolean arguments are bad" gets you nowhere. The better questions to ask might be: "does this function make sense as a concept?"; "can I easily describe, e.g. in form of a comment, what this function does?"; "is the internal complexity of the function incredibly high, such that it needs an alternate composition?"; etc. The fact that the shell execution environment, which you ended up replacing, provides an option to bail on command failure, strongly suggests that the complexity incurred by such a strategy is in no way unreasonable.

The point of Uncle Bob here (as well in his book), is that whenever there is a boolean parameter, it suggests, that there should be 2 functions.
But I'm not here to argue about principles, really. If the reviewers prefer it this way, I'm OK with this. Both approaches have pros and cons, but again, in Python at least you can name the boolean parameter when calling the function, so I would still avoid doing the same in C++.

tests/run-tests-parallel.py

iamamyth · 2021-01-09T01:37:02Z

tests/run-tests-parallel.py

+
+def get_forced_ceiling():
+    key = 'MONERO_PARALLEL_TEST_JOBS'
+    if key not in os.environ:


A bit easier on the eyes, and saves a dict lookup, to just fetch the key and catch the exception:

try: ceiling = os.environ['MONERO_PARALLEL_TEST_JOBS'] except KeyError: ceiling = multiprocessing.cpu_count() else: if ceiling == 'nproc': ceiling = multiprocessing.cpu_count() else: ceiling = int(ceiling)

Makes sense. Although I always have this mind.

Two things:

I find it best not to require the reviewer to read forum links and similar (I'm not much inclined to visit stackexchange), and instead briefly state whatever may be contained therein that you find relevant.

As I can glean the contents of the argument from the URL, I will simply state that most arguments of the form "x pattern is bad" tend to be irrelevant on their own: In general, the pattern exists because it can be useful; like all tools, when improperly used, it can be harmful. So, the question is, whether the pattern proves reasonable in a particular context. Here, the exception-based version is quite readable, as it's a standard python idiom, and the overhead isn't a concern, but, if you don't like exceptions, then you could instead use .get.

The discussion under the link suggests, that:

In general it's not a good approach to use exceptions for controlling the logic

In Python it's actually expected to use them this way. Most probably, because the runtime overhead of exceptions in case of Python is much lower than in other languages

So my thinking is, that we shouldn't be doing this for the core implementation, but I will leave it like it is for Python. Providing a link brings more context and more independent voices (and votes for the voices, in case of StackExchange). I hope that my 2 above points would describe the content under the link well enough.

Anyway, for me the PR is ready, and if you have no more comments, I'd ask you to approve.
Cheers.

mj-xmr · 2021-02-26T12:26:37Z

Important!
Please merge #7387 first.

The reason is, that since the mining test in functional tests requires some more processing power during operation, it could easily overshoot the timeout given for the mining.

[EDIT] #7387 already merged!

mj-xmr · 2021-04-04T20:00:15Z

I have ran the tests several times, see here from test number 470. There were two failures on node_server.bind_same_p2p_port. Initially during a parallel execution of Unit Tests with Functional Tests. Even after executing both in single threaded mode, I got the same error on the same test. The total probability of failure is 10%.

mj-xmr · 2021-04-04T20:01:59Z

tests/run-tests-parallel.py

+
+start = timer()
+#TODO: set fail_fast=True after mining / RPC random test failures are resolved
+ret = run(num_proc_final, fail_fast=False)


@selsta : Since the mining test ("Functional") is now stable, do you think I could set the fail fast to True?

Because of the currently unstable test node_server.bind_same_p2p_port I think I would leave it to false for the same reason.

Otherwise, the PR is ready to be merged from my POV.

The above test will be fixed with: #7646

I don't like the concept of fail fast personally. I want to see everything that fails.

Then I will remove that TODO from line 104.

using tests/run-tests-parallel.py script like: MONERO_PARALLEL_TEST_JOBS=nproc tests/run-tests-parallel.py

mj-xmr · 2021-08-04T08:21:28Z

Rebased.

mj-xmr marked this pull request as ready for review January 5, 2021 19:03

iamamyth reviewed Jan 6, 2021

View reviewed changes

mj-xmr force-pushed the parallel-tests branch 2 times, most recently from 020f9b7 to 739facf Compare January 7, 2021 07:58

mj-xmr commented Jan 7, 2021

View reviewed changes

mj-xmr force-pushed the parallel-tests branch 2 times, most recently from 91f162f to 739facf Compare January 7, 2021 10:06

mj-xmr force-pushed the parallel-tests branch 5 times, most recently from 6a85e23 to d1b6fe3 Compare January 7, 2021 16:55

iamamyth reviewed Jan 7, 2021

View reviewed changes

tests/run-tests-parallel.sh Outdated Show resolved Hide resolved

iamamyth reviewed Jan 7, 2021

View reviewed changes

mj-xmr force-pushed the parallel-tests branch from d1b6fe3 to 040dbe4 Compare January 7, 2021 21:58

mj-xmr force-pushed the parallel-tests branch from a3b5db3 to 040dbe4 Compare January 8, 2021 08:13

mj-xmr force-pushed the parallel-tests branch from 040dbe4 to 869ce0f Compare January 8, 2021 08:41

mj-xmr marked this pull request as draft January 8, 2021 10:53

mj-xmr marked this pull request as ready for review January 8, 2021 14:54

iamamyth reviewed Jan 9, 2021

View reviewed changes

tests/run-tests-parallel.py Outdated Show resolved Hide resolved

iamamyth reviewed Jan 9, 2021

View reviewed changes

mj-xmr force-pushed the parallel-tests branch from b273a4c to dd2e3e5 Compare January 9, 2021 06:42

mj-xmr mentioned this pull request Jan 9, 2021

core_tests: Caching generation, take 1: get_block_longhash() #7300

Closed

iamamyth approved these changes Feb 13, 2021

View reviewed changes

mj-xmr force-pushed the parallel-tests branch from dd2e3e5 to e860afc Compare March 30, 2021 17:26

mj-xmr marked this pull request as draft March 30, 2021 19:36

mj-xmr force-pushed the parallel-tests branch from 992026e to 5f875a1 Compare April 3, 2021 09:54

mj-xmr marked this pull request as ready for review April 4, 2021 16:56

mj-xmr commented Apr 4, 2021

View reviewed changes

mj-xmr force-pushed the parallel-tests branch 2 times, most recently from e6ad243 to 53a3cb1 Compare April 6, 2021 19:16

mj-xmr force-pushed the parallel-tests branch from 53a3cb1 to 05b47ed Compare May 19, 2021 15:04

Run tests in parallel, whenever possible

22bc3b5

using tests/run-tests-parallel.py script like: MONERO_PARALLEL_TEST_JOBS=nproc tests/run-tests-parallel.py

mj-xmr force-pushed the parallel-tests branch from 05b47ed to 22bc3b5 Compare August 4, 2021 08:21

Run tests in parallel, whenever possible #7283

Are you sure you want to change the base?

Run tests in parallel, whenever possible #7283

Conversation

mj-xmr commented Jan 5, 2021 • edited

Choose a reason for hiding this comment

mj-xmr Jan 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr Jan 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr Jan 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr Jan 6, 2021 • edited

Choose a reason for hiding this comment

iamamyth Jan 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moneromooo-monero commented Jan 6, 2021

selsta commented Jan 6, 2021 • edited

mj-xmr commented Jan 6, 2021

iamamyth commented Jan 6, 2021 • edited

mj-xmr Jan 7, 2021 • edited

Choose a reason for hiding this comment

mj-xmr Jan 7, 2021 • edited

Choose a reason for hiding this comment

mj-xmr commented Jan 7, 2021 • edited

mj-xmr commented Jan 7, 2021

iamamyth commented Jan 7, 2021

Choose a reason for hiding this comment

mj-xmr Jan 7, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr commented Jan 7, 2021

selsta commented Jan 8, 2021

mj-xmr commented Jan 8, 2021 • edited

selsta commented Jan 8, 2021 • edited

mj-xmr commented Jan 8, 2021

mj-xmr commented Jan 8, 2021

Choose a reason for hiding this comment

mj-xmr Jan 9, 2021 • edited

Choose a reason for hiding this comment

iamamyth Jan 9, 2021 • edited

Choose a reason for hiding this comment

mj-xmr Feb 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr Jan 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr Feb 8, 2021 • edited

Choose a reason for hiding this comment

mj-xmr commented Feb 26, 2021 • edited

mj-xmr commented Apr 4, 2021

mj-xmr Apr 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mj-xmr commented Aug 4, 2021

mj-xmr commented Jan 5, 2021 •

edited

mj-xmr Jan 6, 2021 •

edited

mj-xmr Jan 6, 2021 •

edited

mj-xmr Jan 6, 2021 •

edited

mj-xmr Jan 6, 2021 •

edited

iamamyth Jan 6, 2021 •

edited

selsta commented Jan 6, 2021 •

edited

iamamyth commented Jan 6, 2021 •

edited

mj-xmr Jan 7, 2021 •

edited

mj-xmr Jan 7, 2021 •

edited

mj-xmr commented Jan 7, 2021 •

edited

mj-xmr Jan 7, 2021 •

edited

mj-xmr commented Jan 8, 2021 •

edited

selsta commented Jan 8, 2021 •

edited

mj-xmr Jan 9, 2021 •

edited

iamamyth Jan 9, 2021 •

edited

mj-xmr Feb 8, 2021 •

edited

mj-xmr Jan 9, 2021 •

edited

mj-xmr Feb 8, 2021 •

edited

mj-xmr commented Feb 26, 2021 •

edited

mj-xmr Apr 4, 2021 •

edited