New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run tests in parallel, whenever possible #7283
base: master
Are you sure you want to change the base?
Conversation
tests/run-tests-parallel.sh
Outdated
# Run the excluded tests first, then everything but excluded, and if any of these fail, | ||
# then fallback to the original single processing version | ||
time ( \ | ||
ctest -j2 -R "${TESTS_EXCLUDED_REGEX}" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fixed number of jobs here seems somewhat undesirable; it would be nice if the script picked a suitable default (based on nproc
) and allowed a user to override it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it. For the first 2 tests more than 2 cores obviously doesn't make sense.
For the second part adding more cores will actually only prolong the test time. It can't currently get any lower, since core_tests take much more time, than all the other tests combined, so already with 2 cores, you experience thread starvation, because of the too long time of core tests. Adding more cores at this time (say nproc as an extreme), will only make the longest test core_tests run longer, since the CPU cache is used up quicker at the beginning, and then when the other test finish sooner, the cache will not be used any better by the core_tests.
So adding more cores will only help, if the core tests are split into many more, independent pieces, which I will be doing later. It's been contracted by my CCS proposal. Only after I do it, it will be reasonable to increase the number of cores, or let the user select it.
A general remark to everybody: please don't treat my changes as the final state, but just an iteration, in order to keep the PRs small (but still mergable). 20% gain for such a small change is in my opinion worth it. Plus, the testing time is currently the largest bottleneck of the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If certain tests should never run in parallel, and others have a maximum parallelism factor, then you can (and should) limit the maximum number of runners based on the tests (i.e. nproc
is an upper bound). The point is that using two runners on a single-core setup actively makes things worse, which means some kind of variability is important. Once you accept that premise, it's little extra work (a single branch) to permit users to decide the upper limit, rather than defaulting to nproc
(and, this feature seems important, otherwise you're potentially making things worse for people who have multiple cores but don't want to dedicate more than one to the build).
A general remark to everybody: please don't treat my changes as the final state, but just an iteration, in order to keep the PRs small (but still mergable)
No PR is ever a "final state", the question is whether the change makes sense enough on its own. Here, for example, everyone gets stuck with -j2, even if they only have a single core to run the tests. One could argue that is a real step backwards for many users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To this I fully agree. I should take the minimum between 2 and nproc, or even of 2 and (nproc-1), since a single core setup with a virtual core won't make it any faster either, except maybe for the situation when core_tests are blocked by IO or network.
So would this logic be OK?
num cores = min(2, nproc - 1)
if (num_cores < 1)
{ // in case we're dealing with a single core CPU without a virtual core
num_cores = 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, I've decided not to pretend, that a single core machine has 2 logical cores by default (re: nproc -1). If it has, then as said before, the core tests have some IO/Network blocking moments, when other tests may run in the virtual thread. If this is not wanted for a single physical core with 1 virtual core, then I let the user preset this with the environment variable.
tests/run-tests-parallel.sh
Outdated
ctest -j2 -R "${TESTS_EXCLUDED_REGEX}" && \ | ||
ctest -j2 -E "${TESTS_EXCLUDED_REGEX}" \ | ||
) \ | ||
|| ctest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As best as I can tell, ctest returns a non-zero exit status if a test fails, meaning this ||
ends up double-running the test suite on failure. Even if it worked right, I'm not sure it's a good idea, as it seems prudent to expose test runner issues to users so they can be fixed (e.g. if users can specify a parallelism override, then a failure can be manually overridden as needed and hopefully addressed at some juncture, rather than papered over by this wrapper).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You understand it correctly.
I'm ok with removing the "|| ctest", but I'd like to hear a 3rd opinion. My rationale is, that it would prevent failures of the parallelism itself. Imagine, that somebody adds a new test, which like the 2 excluded ones is incompatible with the core tests. To fix the CI, you'd need to call the maintainer. Otherwise, with "|| ctest" you could just get along with your high priority task and wait until somebody fixes the parallelism, independently of your task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And one more remark.
To specify a parallelism override, it's enough for the user just to call ctest
in the build directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe it would make sense to do the following two moves at the same time:
- remove the || ctest from the bash script and make it parallel only, with an adjustable number of cores in future, once it makes sense.
- move the || ctest into the Makefile, after the runner script is now being called, so that the CI doesn't suffer from sudden breaks of the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the || ctest into the Makefile, after the runner script is now being called, so that the CI doesn't suffer from sudden breaks of the script.
I had thought to suggest this in my earlier comment. But, I see some other options:
- Add a Makefile target specifically for CI.
- Have CI run
|| ctest
, rather than including this in the Makefile, as it's really something an ordinary user of the Makefile doesn't typically want or need.
That said, I don't agree with the premise of CI parallelism blocking progress. If a commit causes CI to start failing, that commit should fix the problem (e.g. by not running the new tests in parallel). Otherwise, you'll end up simply never running anything in parallel, because someone will break build parallelism, hit the ||
and not notice the breakage, and master will continue with serial builds. Either the parallelism should be useful and minimally invasive to contributors, such that it's worth their time to fix parallelism-induced issues, or, it shouldn't exist at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright.
If would be nice if this were not default. It's really annoying when people screw with parallel settings inside makefiles. |
Seems like we could enable this for CI but yes, probably best if it isn't on by default. |
So I'll make a separate target for CI and "make" it from the Workflows. Should the new target go to a separate Makefile as well? I'd rather keep it in the original Makefile, as this will spare a lot of duplication. |
Per my earlier comments, you could also just use an env var to enable parallel builds, which is how cmake itself does things (default to off, have CI set the var and pick it up in the test build script). Edit: I should add that you don't even need to introduce an extra boolean env var, you could just make use of one which enables controlling the parallelism ceiling (e.g. the CI setup can do |
020f9b7
to
739facf
Compare
Makefile
Outdated
@@ -98,6 +98,10 @@ release-test: | |||
mkdir -p $(builddir)/release | |||
cd $(builddir)/release && cmake -D BUILD_TESTS=ON -D CMAKE_BUILD_TYPE=release $(topdir) && $(MAKE) && $(MAKE) test | |||
|
|||
release-test-ci: | |||
mkdir -p $(builddir)/release | |||
cd $(builddir)/release && cmake -D BUILD_TESTS=ON -D CMAKE_BUILD_TYPE=release -D BUILD_SHARED_LIBS=ON $(topdir) && $(MAKE) && MONERO_PARALLEL_TEST_JOBS=nproc $(topdir)/tests/run-tests-parallel.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice, that I smuggled -D BUILD_SHARED_LIBS=ON, in order to leverage faster linkage and less disk space usage for the release version, which is static by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new CI target has a default of nproc preset as an env var, and the runner script also has a default of nproc, if the env var isn't set. Then it tries to find the minimum between nproc (or the forced setting) and the currently reasonable value of 2 jobs.
91f162f
to
739facf
Compare
The RPC tests are repeatedly failing on the mining part, as usual, but it proves the usefulness of this separation: The CI delivers a failure in the first 40 minutes, as opposed to having to wait 1.5 hour for possibly a negative result. |
Thanks. I've made something in-between. |
6a85e23
to
d1b6fe3
Compare
Thanks for the changes, overall this looks pretty good to me. |
tests/run-tests-parallel.sh
Outdated
NUM_PROC_MAX_REASONABLE=2 # This might be increased, once the core_tests are divided into many more independent pieces or simply sped up | ||
TESTS_EXCLUDED_REGEX="unit_tests|functional_tests_rpc" # These tests collide with core_tests | ||
|
||
if [ -z $MONERO_PARALLEL_TEST_JOBS ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using bash here, might as well switch to [[
, that way you skip an invocation of the test
program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this. I've done formal trainings in OO languages, but Bash was always just a hobby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to assume that bash is being used. Should always target vanilla sh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic got much more complex, than in the initial state, so if I have to switch to sh, I will prefer to do it in Python3 after all. The majority of time was consumed by gathering requirements anyway.
Thanks for reasonable comments to all 3 of you. |
I did not follow the whole discussion but currently it only seems to run
Is this intentional? Ideally you would want to run all tests to see which ones fail. |
This is intentional, but can be changed to run all, even though the first two typically fail. My philosophy is, that if the tests most probable to fail do fail, you'll have to fix first and rerun the whole thing again anyway, just like you'd fail a compilation on the first error, not to have to wait the remaining time to get the feedback. In this case, you'd have to wait additional hour to learn, that your build has failed. If the other, high level tests do fail afterwards, the proper way to handle this is to write a unit test, that simulates exactly this failure, without having to allocate all the resources. As a unit test it will exposing the weakness in a much shorter time, than the smoke test does. If this doesn't convince you, I can run them all. Any other opinions on this one? @moneromooo-monero @iamamyth ? |
The problem is with these false positives we have (mining, p2p functional tests). As long as we don't fix these it does not make sense to fail early. |
If this is how you see it, then I'll run them all, alright! |
Reimplemented in Python, without much problems, gaining better readability. |
tests/run-tests-parallel.py
Outdated
|
||
def run_all(): | ||
# Run all of the tests, no matter if the first part fails | ||
ctest1 = get_ctest_command(num_proc_final, '-R') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These function looks very much like run_all, they could easily be one function which accepts a fail_fast
param; e.g.:
def run(nproc, fail_fast=False):
cmds = [get_test_command(...), ...]
error = False
for cmd in cmds:
status = run_cmd(cmd)
if status != 0:
if fail_fast:
return status
error = True
return ERR_CODE if error else 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but this principle kept me from doing this initially. In python at least you can name the parameter in a call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thoughts as in the below case, but I will merely add, the function here does one thing: Run the test commands and return a status code; it just may not run all the commands, depending on the value of fail_fast
. Anyways, this is a great example of principle-vs-principle being unhelpful: "DRY" vs. "boolean arguments are bad" gets you nowhere. The better questions to ask might be: "does this function make sense as a concept?"; "can I easily describe, e.g. in form of a comment, what this function does?"; "is the internal complexity of the function incredibly high, such that it needs an alternate composition?"; etc. The fact that the shell execution environment, which you ended up replacing, provides an option to bail on command failure, strongly suggests that the complexity incurred by such a strategy is in no way unreasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of Uncle Bob here (as well in his book), is that whenever there is a boolean parameter, it suggests, that there should be 2 functions.
But I'm not here to argue about principles, really. If the reviewers prefer it this way, I'm OK with this. Both approaches have pros and cons, but again, in Python at least you can name the boolean parameter when calling the function, so I would still avoid doing the same in C++.
tests/run-tests-parallel.py
Outdated
|
||
def get_forced_ceiling(): | ||
key = 'MONERO_PARALLEL_TEST_JOBS' | ||
if key not in os.environ: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit easier on the eyes, and saves a dict lookup, to just fetch the key and catch the exception:
try:
ceiling = os.environ['MONERO_PARALLEL_TEST_JOBS']
except KeyError:
ceiling = multiprocessing.cpu_count()
else:
if ceiling == 'nproc':
ceiling = multiprocessing.cpu_count()
else:
ceiling = int(ceiling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Although I always have this mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- I find it best not to require the reviewer to read forum links and similar (I'm not much inclined to visit stackexchange), and instead briefly state whatever may be contained therein that you find relevant.
- As I can glean the contents of the argument from the URL, I will simply state that most arguments of the form "x pattern is bad" tend to be irrelevant on their own: In general, the pattern exists because it can be useful; like all tools, when improperly used, it can be harmful. So, the question is, whether the pattern proves reasonable in a particular context. Here, the exception-based version is quite readable, as it's a standard python idiom, and the overhead isn't a concern, but, if you don't like exceptions, then you could instead use
.get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion under the link suggests, that:
- In general it's not a good approach to use exceptions for controlling the logic
- In Python it's actually expected to use them this way. Most probably, because the runtime overhead of exceptions in case of Python is much lower than in other languages
So my thinking is, that we shouldn't be doing this for the core implementation, but I will leave it like it is for Python. Providing a link brings more context and more independent voices (and votes for the voices, in case of StackExchange). I hope that my 2 above points would describe the content under the link well enough.
Anyway, for me the PR is ready, and if you have no more comments, I'd ask you to approve.
Cheers.
I have ran the tests several times, see here from test number 470. There were two failures on |
|
||
start = timer() | ||
#TODO: set fail_fast=True after mining / RPC random test failures are resolved | ||
ret = run(num_proc_final, fail_fast=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@selsta : Since the mining test ("Functional") is now stable, do you think I could set the fail fast to True?
Because of the currently unstable test node_server.bind_same_p2p_port
I think I would leave it to false for the same reason.
Otherwise, the PR is ready to be merged from my POV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above test will be fixed with: #7646
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the concept of fail fast personally. I want to see everything that fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I will remove that TODO from line 104.
e6ad243
to
53a3cb1
Compare
using tests/run-tests-parallel.py script like: MONERO_PARALLEL_TEST_JOBS=nproc tests/run-tests-parallel.py
Rebased. |
using
tests/run-tests-parallel.sh
script.With the current simplistic approach I was able to achieve a 20% speedup. This new, small test execution framework will allow for further improvements in the future, by separating conflicting portions of code (or simply by selecting non-conflicting ports of monerod).
The speedup was calculated by summing up the time taken by tests, which were being ran while others, longer tests were running (dt), and was divided by the sum of (total run time + dt).
The test run script has a fuse box. If the tests fail for any reason, they are being restarted using a single core. Note, that it's the tests that fail the most often (rpc tests failing randomly) and quickest (unit tests), which are being ran at the beginning, giving you a hint early enough.
Work time: ~6.7h plus a day of passive tests while doing something else.