Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests in parallel, whenever possible #7283

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mj-xmr
Copy link
Contributor

@mj-xmr mj-xmr commented Jan 5, 2021

using tests/run-tests-parallel.sh script.

With the current simplistic approach I was able to achieve a 20% speedup. This new, small test execution framework will allow for further improvements in the future, by separating conflicting portions of code (or simply by selecting non-conflicting ports of monerod).

The speedup was calculated by summing up the time taken by tests, which were being ran while others, longer tests were running (dt), and was divided by the sum of (total run time + dt).

The test run script has a fuse box. If the tests fail for any reason, they are being restarted using a single core. Note, that it's the tests that fail the most often (rpc tests failing randomly) and quickest (unit tests), which are being ran at the beginning, giving you a hint early enough.

Work time: ~6.7h plus a day of passive tests while doing something else.

@mj-xmr mj-xmr marked this pull request as ready for review January 5, 2021 19:03
# Run the excluded tests first, then everything but excluded, and if any of these fail,
# then fallback to the original single processing version
time ( \
ctest -j2 -R "${TESTS_EXCLUDED_REGEX}" && \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fixed number of jobs here seems somewhat undesirable; it would be nice if the script picked a suitable default (based on nproc) and allowed a user to override it.

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it. For the first 2 tests more than 2 cores obviously doesn't make sense.

For the second part adding more cores will actually only prolong the test time. It can't currently get any lower, since core_tests take much more time, than all the other tests combined, so already with 2 cores, you experience thread starvation, because of the too long time of core tests. Adding more cores at this time (say nproc as an extreme), will only make the longest test core_tests run longer, since the CPU cache is used up quicker at the beginning, and then when the other test finish sooner, the cache will not be used any better by the core_tests.

So adding more cores will only help, if the core tests are split into many more, independent pieces, which I will be doing later. It's been contracted by my CCS proposal. Only after I do it, it will be reasonable to increase the number of cores, or let the user select it.

A general remark to everybody: please don't treat my changes as the final state, but just an iteration, in order to keep the PRs small (but still mergable). 20% gain for such a small change is in my opinion worth it. Plus, the testing time is currently the largest bottleneck of the CI.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If certain tests should never run in parallel, and others have a maximum parallelism factor, then you can (and should) limit the maximum number of runners based on the tests (i.e. nproc is an upper bound). The point is that using two runners on a single-core setup actively makes things worse, which means some kind of variability is important. Once you accept that premise, it's little extra work (a single branch) to permit users to decide the upper limit, rather than defaulting to nproc (and, this feature seems important, otherwise you're potentially making things worse for people who have multiple cores but don't want to dedicate more than one to the build).

A general remark to everybody: please don't treat my changes as the final state, but just an iteration, in order to keep the PRs small (but still mergable)

No PR is ever a "final state", the question is whether the change makes sense enough on its own. Here, for example, everyone gets stuck with -j2, even if they only have a single core to run the tests. One could argue that is a real step backwards for many users.

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this I fully agree. I should take the minimum between 2 and nproc, or even of 2 and (nproc-1), since a single core setup with a virtual core won't make it any faster either, except maybe for the situation when core_tests are blocked by IO or network.

So would this logic be OK?

num cores = min(2, nproc - 1)
if (num_cores < 1) 
{ // in case we're dealing with a single core CPU without a virtual core
   num_cores = 1
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, I've decided not to pretend, that a single core machine has 2 logical cores by default (re: nproc -1). If it has, then as said before, the core tests have some IO/Network blocking moments, when other tests may run in the virtual thread. If this is not wanted for a single physical core with 1 virtual core, then I let the user preset this with the environment variable.

ctest -j2 -R "${TESTS_EXCLUDED_REGEX}" && \
ctest -j2 -E "${TESTS_EXCLUDED_REGEX}" \
) \
|| ctest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As best as I can tell, ctest returns a non-zero exit status if a test fails, meaning this || ends up double-running the test suite on failure. Even if it worked right, I'm not sure it's a good idea, as it seems prudent to expose test runner issues to users so they can be fixed (e.g. if users can specify a parallelism override, then a failure can be manually overridden as needed and hopefully addressed at some juncture, rather than papered over by this wrapper).

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You understand it correctly.

I'm ok with removing the "|| ctest", but I'd like to hear a 3rd opinion. My rationale is, that it would prevent failures of the parallelism itself. Imagine, that somebody adds a new test, which like the 2 excluded ones is incompatible with the core tests. To fix the CI, you'd need to call the maintainer. Otherwise, with "|| ctest" you could just get along with your high priority task and wait until somebody fixes the parallelism, independently of your task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one more remark.
To specify a parallelism override, it's enough for the user just to call ctest in the build directory.

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe it would make sense to do the following two moves at the same time:

  1. remove the || ctest from the bash script and make it parallel only, with an adjustable number of cores in future, once it makes sense.
  2. move the || ctest into the Makefile, after the runner script is now being called, so that the CI doesn't suffer from sudden breaks of the script.

Copy link

@iamamyth iamamyth Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the || ctest into the Makefile, after the runner script is now being called, so that the CI doesn't suffer from sudden breaks of the script.

I had thought to suggest this in my earlier comment. But, I see some other options:

  1. Add a Makefile target specifically for CI.
  2. Have CI run || ctest, rather than including this in the Makefile, as it's really something an ordinary user of the Makefile doesn't typically want or need.

That said, I don't agree with the premise of CI parallelism blocking progress. If a commit causes CI to start failing, that commit should fix the problem (e.g. by not running the new tests in parallel). Otherwise, you'll end up simply never running anything in parallel, because someone will break build parallelism, hit the || and not notice the breakage, and master will continue with serial builds. Either the parallelism should be useful and minimally invasive to contributors, such that it's worth their time to fix parallelism-induced issues, or, it shouldn't exist at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright.

@moneromooo-monero
Copy link
Collaborator

If would be nice if this were not default. It's really annoying when people screw with parallel settings inside makefiles.

@selsta
Copy link
Collaborator

selsta commented Jan 6, 2021

Seems like we could enable this for CI but yes, probably best if it isn't on by default.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 6, 2021

Seems like we could enable this for CI but yes, probably best if it isn't on by default.

So I'll make a separate target for CI and "make" it from the Workflows.

Should the new target go to a separate Makefile as well? I'd rather keep it in the original Makefile, as this will spare a lot of duplication.

@iamamyth
Copy link

iamamyth commented Jan 6, 2021

Per my earlier comments, you could also just use an env var to enable parallel builds, which is how cmake itself does things (default to off, have CI set the var and pick it up in the test build script).

Edit: I should add that you don't even need to introduce an extra boolean env var, you could just make use of one which enables controlling the parallelism ceiling (e.g. the CI setup can do MONERO_TEST_PARALLELISM=nproc run-tests-parallel.sh).

@mj-xmr mj-xmr force-pushed the parallel-tests branch 2 times, most recently from 020f9b7 to 739facf Compare January 7, 2021 07:58
Makefile Outdated
@@ -98,6 +98,10 @@ release-test:
mkdir -p $(builddir)/release
cd $(builddir)/release && cmake -D BUILD_TESTS=ON -D CMAKE_BUILD_TYPE=release $(topdir) && $(MAKE) && $(MAKE) test

release-test-ci:
mkdir -p $(builddir)/release
cd $(builddir)/release && cmake -D BUILD_TESTS=ON -D CMAKE_BUILD_TYPE=release -D BUILD_SHARED_LIBS=ON $(topdir) && $(MAKE) && MONERO_PARALLEL_TEST_JOBS=nproc $(topdir)/tests/run-tests-parallel.sh
Copy link
Contributor Author

@mj-xmr mj-xmr Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice, that I smuggled -D BUILD_SHARED_LIBS=ON, in order to leverage faster linkage and less disk space usage for the release version, which is static by default.

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CI target has a default of nproc preset as an env var, and the runner script also has a default of nproc, if the env var isn't set. Then it tries to find the minimum between nproc (or the forced setting) and the currently reasonable value of 2 jobs.

@mj-xmr mj-xmr force-pushed the parallel-tests branch 2 times, most recently from 91f162f to 739facf Compare January 7, 2021 10:06
@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 7, 2021

The RPC tests are repeatedly failing on the mining part, as usual, but it proves the usefulness of this separation: The CI delivers a failure in the first 40 minutes, as opposed to having to wait 1.5 hour for possibly a negative result.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 7, 2021

Per my earlier comments, you could also just use an env var to enable parallel builds, which is how cmake itself does things (default to off, have CI set the var and pick it up in the test build script).

Edit: I should add that you don't even need to introduce an extra boolean env var, you could just make use of one which enables controlling the parallelism ceiling (e.g. the CI setup can do MONERO_TEST_PARALLELISM=nproc run-tests-parallel.sh).

Thanks. I've made something in-between.

@mj-xmr mj-xmr force-pushed the parallel-tests branch 5 times, most recently from 6a85e23 to d1b6fe3 Compare January 7, 2021 16:55
@iamamyth
Copy link

iamamyth commented Jan 7, 2021

Thanks for the changes, overall this looks pretty good to me.

NUM_PROC_MAX_REASONABLE=2 # This might be increased, once the core_tests are divided into many more independent pieces or simply sped up
TESTS_EXCLUDED_REGEX="unit_tests|functional_tests_rpc" # These tests collide with core_tests

if [ -z $MONERO_PARALLEL_TEST_JOBS ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using bash here, might as well switch to [[, that way you skip an invocation of the test program.

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this. I've done formal trainings in OO languages, but Bash was always just a hobby.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to assume that bash is being used. Should always target vanilla sh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic got much more complex, than in the initial state, so if I have to switch to sh, I will prefer to do it in Python3 after all. The majority of time was consumed by gathering requirements anyway.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 7, 2021

Thanks for the changes, overall this looks pretty good to me.

Thanks for reasonable comments to all 3 of you.

@selsta
Copy link
Collaborator

selsta commented Jan 8, 2021

I did not follow the whole discussion but currently it only seems to run

    Start 6: functional_tests_rpc
    Start 8: unit_tests

Is this intentional? Ideally you would want to run all tests to see which ones fail.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 8, 2021

I did not follow the whole discussion but currently it only seems to run

    Start 6: functional_tests_rpc
    Start 8: unit_tests

Is this intentional? Ideally you would want to run all tests to see which ones fail.

This is intentional, but can be changed to run all, even though the first two typically fail. My philosophy is, that if the tests most probable to fail do fail, you'll have to fix first and rerun the whole thing again anyway, just like you'd fail a compilation on the first error, not to have to wait the remaining time to get the feedback. In this case, you'd have to wait additional hour to learn, that your build has failed.

If the other, high level tests do fail afterwards, the proper way to handle this is to write a unit test, that simulates exactly this failure, without having to allocate all the resources. As a unit test it will exposing the weakness in a much shorter time, than the smoke test does.

If this doesn't convince you, I can run them all. Any other opinions on this one? @moneromooo-monero @iamamyth ?

@selsta
Copy link
Collaborator

selsta commented Jan 8, 2021

The problem is with these false positives we have (mining, p2p functional tests). As long as we don't fix these it does not make sense to fail early.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 8, 2021

The problem is with these false positives we have (mining, p2p functional tests). As long as we don't fix these it does not make sense to fail early.

If this is how you see it, then I'll run them all, alright!

@mj-xmr mj-xmr marked this pull request as draft January 8, 2021 10:53
@mj-xmr mj-xmr marked this pull request as ready for review January 8, 2021 14:54
@mj-xmr
Copy link
Contributor Author

mj-xmr commented Jan 8, 2021

Reimplemented in Python, without much problems, gaining better readability.


def run_all():
# Run all of the tests, no matter if the first part fails
ctest1 = get_ctest_command(num_proc_final, '-R')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These function looks very much like run_all, they could easily be one function which accepts a fail_fast param; e.g.:

def run(nproc, fail_fast=False):
    cmds = [get_test_command(...), ...]
    error = False
    for cmd in cmds:
        status = run_cmd(cmd)
        if status != 0:
            if fail_fast:
                return status
            error = True
    return ERR_CODE if error else 0

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but this principle kept me from doing this initially. In python at least you can name the parameter in a call.

Copy link

@iamamyth iamamyth Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thoughts as in the below case, but I will merely add, the function here does one thing: Run the test commands and return a status code; it just may not run all the commands, depending on the value of fail_fast. Anyways, this is a great example of principle-vs-principle being unhelpful: "DRY" vs. "boolean arguments are bad" gets you nowhere. The better questions to ask might be: "does this function make sense as a concept?"; "can I easily describe, e.g. in form of a comment, what this function does?"; "is the internal complexity of the function incredibly high, such that it needs an alternate composition?"; etc. The fact that the shell execution environment, which you ended up replacing, provides an option to bail on command failure, strongly suggests that the complexity incurred by such a strategy is in no way unreasonable.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of Uncle Bob here (as well in his book), is that whenever there is a boolean parameter, it suggests, that there should be 2 functions.
But I'm not here to argue about principles, really. If the reviewers prefer it this way, I'm OK with this. Both approaches have pros and cons, but again, in Python at least you can name the boolean parameter when calling the function, so I would still avoid doing the same in C++.


def get_forced_ceiling():
key = 'MONERO_PARALLEL_TEST_JOBS'
if key not in os.environ:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit easier on the eyes, and saves a dict lookup, to just fetch the key and catch the exception:

try:
    ceiling = os.environ['MONERO_PARALLEL_TEST_JOBS']
except KeyError:
    ceiling = multiprocessing.cpu_count()
else:
    if ceiling == 'nproc':
        ceiling = multiprocessing.cpu_count()
    else:
        ceiling = int(ceiling)

Copy link
Contributor Author

@mj-xmr mj-xmr Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Although I always have this mind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. I find it best not to require the reviewer to read forum links and similar (I'm not much inclined to visit stackexchange), and instead briefly state whatever may be contained therein that you find relevant.
  2. As I can glean the contents of the argument from the URL, I will simply state that most arguments of the form "x pattern is bad" tend to be irrelevant on their own: In general, the pattern exists because it can be useful; like all tools, when improperly used, it can be harmful. So, the question is, whether the pattern proves reasonable in a particular context. Here, the exception-based version is quite readable, as it's a standard python idiom, and the overhead isn't a concern, but, if you don't like exceptions, then you could instead use .get.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion under the link suggests, that:

  1. In general it's not a good approach to use exceptions for controlling the logic
  2. In Python it's actually expected to use them this way. Most probably, because the runtime overhead of exceptions in case of Python is much lower than in other languages

So my thinking is, that we shouldn't be doing this for the core implementation, but I will leave it like it is for Python. Providing a link brings more context and more independent voices (and votes for the voices, in case of StackExchange). I hope that my 2 above points would describe the content under the link well enough.

Anyway, for me the PR is ready, and if you have no more comments, I'd ask you to approve.
Cheers.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Feb 26, 2021

Important!
Please merge #7387 first.

The reason is, that since the mining test in functional tests requires some more processing power during operation, it could easily overshoot the timeout given for the mining.

[EDIT] #7387 already merged!

@mj-xmr mj-xmr marked this pull request as draft March 30, 2021 19:36
@mj-xmr mj-xmr marked this pull request as ready for review April 4, 2021 16:56
@mj-xmr
Copy link
Contributor Author

mj-xmr commented Apr 4, 2021

I have ran the tests several times, see here from test number 470. There were two failures on node_server.bind_same_p2p_port. Initially during a parallel execution of Unit Tests with Functional Tests. Even after executing both in single threaded mode, I got the same error on the same test. The total probability of failure is 10%.


start = timer()
#TODO: set fail_fast=True after mining / RPC random test failures are resolved
ret = run(num_proc_final, fail_fast=False)
Copy link
Contributor Author

@mj-xmr mj-xmr Apr 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@selsta : Since the mining test ("Functional") is now stable, do you think I could set the fail fast to True?

Because of the currently unstable test node_server.bind_same_p2p_port I think I would leave it to false for the same reason.

Otherwise, the PR is ready to be merged from my POV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above test will be fixed with: #7646

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the concept of fail fast personally. I want to see everything that fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will remove that TODO from line 104.

@mj-xmr mj-xmr force-pushed the parallel-tests branch 2 times, most recently from e6ad243 to 53a3cb1 Compare April 6, 2021 19:16
using tests/run-tests-parallel.py script like:
MONERO_PARALLEL_TEST_JOBS=nproc tests/run-tests-parallel.py
@mj-xmr
Copy link
Contributor Author

mj-xmr commented Aug 4, 2021

Rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants