Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Adaptive mining timeout, depending on available CPU power #7387

Merged

Conversation

mj-xmr
Copy link
Contributor

@mj-xmr mj-xmr commented Feb 19, 2021

Intro

A more serious approach to determining the appropriate timeout for the mining test in functional_tests_rpc. The timeouts are calculated separately for RX init and for the mining of subsequent blocks themselves. Printing also available RAM, in case the RX init went wrong, which needs above 3 GB of RAM. Added comprehensive description.

More details:

As per the discussion with @iamamyth in #7371 , the adjustment is done by testing the time of generation of N next digits of Pi - once for a single core (mining use case) and once for as many cores as there are available (RX init). This test helps orientating in the currently available CPU processing power, even in the event of:

  • other jobs taking up the virtual resources at the same time
  • ctest being ran in parallel (with -j$(nproc))

The obtained measurements serve as multipliers for the assumed timeout constants. For the case of RX init, the number of cores serves additionally as a divisor of the timeout constant, since availability of more cores decreases the init time (ideally) linearly.

Example runs

(see the bottom of the tests log):
-1- -2- -3- -4-

Example log:

6: [TEST STARTED] mining
6: Resetting blockchain
6: Creating wallet

6: Test mining via daemon
6: Available RAM = 17.0 GB
6: Measuring the currently available CPU power...
6: Time taken to calculate 10000 Pi digits on 4 core(s) was 2.03 s.
6: Measuring the currently available CPU power...
6: Time taken to calculate 10000 Pi digits on 1 core(s) was 1.31 s.
6: Timeout for init, adjusted for the currently available CPU power, is 31 s
6: Timeout for mining, adjusted for the currently available CPU power, is 27 s
6: Time taken for RX init + mining 1st block = 12 s.
6: Time taken for total mining = 12 s.

6: Test mining via wallet
6: Available RAM = 16.8 GB
6: Measuring the currently available CPU power...
6: Time taken to calculate 10000 Pi digits on 4 core(s) was 2.13 s.
6: Measuring the currently available CPU power...
6: Time taken to calculate 10000 Pi digits on 1 core(s) was 1.3 s.
6: Timeout for init, adjusted for the currently available CPU power, is 33 s
6: Timeout for mining, adjusted for the currently available CPU power, is 27 s
6: Time taken for RX init + mining 1st block = 11 s.
6: Time taken for total mining = 23 s.

6: Test submitblock
6: Resetting blockchain
6: Test RandomX
6: Recreating the chain
6: Mining from genesis block again
6: Adding one to reorg
6: [TEST PASSED] mining

Work time: ~15 h

@@ -131,7 +131,7 @@ jobs:
- name: install monero dependencies
run: sudo apt -y install build-essential cmake libboost-all-dev miniupnpc libunbound-dev graphviz doxygen libunwind8-dev pkg-config libssl-dev libzmq3-dev libsodium-dev libhidapi-dev libnorm-dev libusb-1.0-0-dev libpgm-dev libprotobuf-dev protobuf-compiler
- name: install requests
run: pip install requests
run: pip install requests psutil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for RAM readouts

@mj-xmr mj-xmr marked this pull request as ready for review February 19, 2021 18:06

def pi_digits_numeric_yield(digits):
"""Generate x digits of Pi.
https://stackoverflow.com/a/54967370

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using code from stackoverflow is a serious licensing issue. Also, as a practical matter, you can perhaps do something much simpler here, such as a calculation with the decimal or math libraries.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper copyrights actually belong to these gents:

This is implementing a spigot algorithm by A. Sale, D. Saada, S. Rabinowitz, mentioned on http://mail.python.org/pipermail/edu-sig/2012-December/010721.html

https://gist.github.com/deeplook/4947835

If this is not acceptable, I will look for alternatives, but really, Stackoverflow was just the messenger.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course algorithms to compute pi are plentiful in the public domain. The issue here is you've now attributed work to stackoverflow and taking that attribution back would be doubly bad. Plus, using an optimized C-based mathematical function, as found in decimal or math, seems maybe a better strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that as long as the PR isn't merged, I don't give nor take anything. That's what a PR is for. Otherwise you create a destructive mental pressure on contributors, making them refrain from doing any contributions.
Besides, a link is not a copyright clause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that you've broken the project by introducing a PR with an SO link, I'm saying it could be problematic to merge said code, given its history.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'm very happy, that I misunderstood you.

The Decimal solution works equally fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decimal solution isn't right. The same code runs for 2s on my machine and 2000s on the CI machine. I'll try to avoid Decimal.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying the Leibnitz' formula now -> OK!

while height < target_height:
seen_height = height
for _ in range(timeout):
time.sleep(1)
delta = datetime.datetime.now() - start

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct methodology (depends on system clock and timezone), use time.monotonic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

timeout_base_mine = 20
cores_init = multiprocessing.cpu_count() # RX init uses all cores
cores_mine = res_status.threads_count # Mining uses parametric number of cores
timeout_init, timeout_float_init = get_timeout_int_float(time_pi_all_cores, timeout_base_init, cores_init)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would anyone need the value as an int and a float? All the relevant python timing functions use floats.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need int for printing it out to the user and for it to be a valid argument for the for loop a few lines below. I wanted to avoid multiple conversions to int along the way.

Float is obviously used for calculations, in order not to loose on the precision.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop issue can be trivially solved, just use math.ceil (i.e. range(math.ceil(timeout))), or switch to a while time.monotonic() < timeout_time format. As for printing, I don't know what you mean, floats can be printed and formatted arbitrarily, and the precise value seems more useful than a rounded one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Inputs are the number of digits to calculate for each CPU
inp = [num_digits] * cores
# Start measuring the time
start = datetime.datetime.now()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methodology here ends up counting process startup time, which could lead to wildly inaccurate results. A simple workaround would be to have each task time itself and push results onto a Queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does count the process start up time, but if it increases by a large margin, so will an equivalent functionality of the RX init just in the next 2 seconds after this calculation.

Copy link

@iamamyth iamamyth Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, first off, it's not accurate to assume process startup time will be relatively constant over a short timeframe, but second, even if one makes such an assumption, I fail to see how it resolves my concern.

Edit: If you want to also gauge process startup time in order to adjust the init timeout, that would be reasonable, but, in that case, you should do it separately (though I think assuming some fixed value, e.g. 1s, would be fine). Combining two signals into a single benchmark only works when you're using it as a baseline for the superposition, but, here, you're using it as a baseline for both the superposition and one of its constituent factors, which logically doesn't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right from the logical PoV. I would just like to cover the startup up with a calculation long enough, that the process init becomes negligible. I have a small problem with your approach, that it makes a simple thing slightly more complex. I wanted to solve this with averaging the returned processing time of the threads. I only fear, that the time calculation within the threads will be not accurate. Do you think the monotonic timer will help?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think running a benchmark with a broken methodology makes everything more complex, as it's very difficult to reason about a calculation with many different noise sources (are they acceptable? did the code intentionally leave them in play, and if so, why? etc). Using a Queue maybe adds 5-10 LoC and much clarity. As for timing process startup, I proposed just not even bothering and assuming some fixed value, so that's not really a concern, I think (you could easily time it, too, but I don't have a strong opinion on the matter).

Makefile Outdated Show resolved Hide resolved
@@ -131,7 +131,7 @@ jobs:
- name: install monero dependencies
run: sudo apt -y install build-essential cmake libboost-all-dev miniupnpc libunbound-dev graphviz doxygen libunwind8-dev pkg-config libssl-dev libzmq3-dev libsodium-dev libhidapi-dev libnorm-dev libusb-1.0-0-dev libpgm-dev libprotobuf-dev protobuf-compiler
- name: install requests
run: pip install requests
run: pip install requests psutil monotonic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this library? time.monotonic is a python built-in.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wanted to use time.monotinic(), but the CI crashed. It turned out, that we were using Python2 all the time, hence:
#7396

Either we leave it like it is (Python2 code for now), or we quickly merge the #7396

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the runner in detail, but these files have a shebang line which claims python3 is to be expected, so it's problematic to be using python2. I also note that the files import print_function from __future__, which would imply python2 is in use. Overall, I would try to clean things up to use python3, as python2 is EOL and dying rather quickly (the only reason it's still available, and sometimes the default in certain distros, is because core system components have assumed it, and it takes a while to make all the requisite changes, though I think most distros are quite close and it'll be gone in a year).

@iamamyth
Copy link

I should probably contextualize my comments up to this point, as I think there's a danger they could be misconstrued: Normally system benchmarks would be written in C, so as to minimize overhead and maximize accuracy. However, creating an accurate measure of "system performance" is itself a fraught and difficult exercise, and one that I don't think proves necessary here. Instead, I imagine this PR as introducing a "smoke test" for something having gone very wrong with the mining code (say, a 5x or 10x regression). A more focused, narrowly bounded mining benchmark might be a good idea, but you can't reasonably achieve that result in a noisy neighbor environment. To that end, your bounds in this PR can be quite loose (1.5 or even 2x baseline), so you shouldn't need to worry about python version (if you're seeing > 50% delta based on python version, probably you should use a different test). I still think proper benchmark hygiene (using a sane timing methodology) should be observed, to the extent that it's not terribly difficult and in some ways makes the code both clearer and more useful (a poorly constructed benchmark begs the question, "Was this an oversight, or just intentionally lax in its methodology?" While code comments can answer that question, it's often simpler to avoid asking it in the first place).

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Feb 23, 2021

Still to do for the new C++ timing app: Copyright and documentation.

Done.

decoded = time_calc.decode('utf-8')
try:
miliseconds = int(decoded)
except ValueError as ex:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/exception seems unproductive, if conversion fails it'll throw a ValueError which includes the value and rationale for the failure. Plus, utility functions should generally avoid print, unless their primary purpose is producing output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

by running numerical calculations
"""

from __future__ import print_function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes no sense in python3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until #7396 is merged, we're stuck with Python2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but I think, perhaps, you've ended up making your life more difficult by splitting commits in this way: Though each commit and PR should achieve a single logical goal, there are cases where a clear dependency exists, and ignoring the dependency therefore introduces extra work and complexity, the very reasons to ordinarily avoid PRs with multiple aims. For example, this PR adds multiple python2-specific pieces of code, which will eventually have to be removed. On top of that, the files all still mark themselves as targeting python3 via the shebang line, which makes life very confusing for the reader. You could have, instead, either branched this PR from the "use python3" PR, or introduced a more limited change which just runs the mining tests with python3, as a single commit which forms part of this multi-commit PR. I don't think it's an issue worthy of further discussion, but certainly something to consider in future work.

def available_ram_gb():
ram_bytes = psutil.virtual_memory().available
kilo = 1024.0
ram_gb = ram_bytes / kilo / kilo / kilo
Copy link

@iamamyth iamamyth Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important, but this seems rather verbose an unnecessarily slow (three divisions), when you could just do:

return psutil.virtual_memory().available / (1<<30)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used your previous suggestion of kilo ** 3 for better readability.

while height < target_height:
seen_height = height
for _ in range(timeout):
for _ in range(int(math.ceil(timeout))):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int is superfluous, ceil returns an int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only since Python3

timeout_init = calc_timeout(timeout_base_init, time_pi_all_cores, cores_init)
timeout_mine = calc_timeout(timeout_base_mine, time_pi_single_cpu, cores_mine)

msg = "Timeout for {} adjusted for the currently available CPU power, is {} s"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thought here regarding using number formatting rather than rounding the injected value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -71,6 +75,12 @@ def create(self):
def mine(self, via_daemon):
print("Test mining via " + ("daemon" if via_daemon else "wallet"))

time_pi_single_cpu = self.measure_cpu_power_get_time(1)
time_pi_all_cores = self.measure_cpu_power_get_time(multiprocessing.cpu_count())
Copy link

@iamamyth iamamyth Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest putting the core count in a variable, as you use it repeatedly in calculations (technically there is such a variable now, cores_init, used later in this function and not here; to my mind, it'd be clearer to have a single variable, e.g. nproc, used everywhere).

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Regarding nproc, such variable name would suggest the source and quantity of the variable, but not the intention how it's meant to be used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I will say here is that if you find nproc insufficient to document intent, you can always alias it while still avoiding repeatedly querying the machine's core count.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion was correct, and I query the cpu count only once now (well twice, since the method itself is being called twice)

assert False, 'Failed to mine successor to block %d (initial block = %d)' % (seen_height, prev_height)
timeout = 10
assert False, 'Failed to mine successor to block %d (initial block = %d) after %d s. RX initialized = %r' % (seen_height, prev_height, round(seconds_passed), init_rx)
if not init_rx:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really need this boolean, can just use seen_height == prev_height.

Copy link
Contributor Author

@mj-xmr mj-xmr Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it for readability. Same as with documentation and self-documenting code: the goal is not only to write the most minimal code, but code that can be understood by others.

It was hard enough to understand the flow of this test at the first glance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the variable terribly offensive or anything, but it seems you're perhaps papering over the real readability issue, which is, I think, that prev_height is a rather poor name, and should be something along the lines or initial_height. With that change, I think the conditional proves quite clear without an extra variable, but I won't belabor the point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial_height is indeed a much better name for the variable, especially since it's a constant.

Still the rx_inited var is used twice - once in the print function and secondly below the for loop. Using the comparison twice is error-prone IMHO.

BTW. I added an additional printout, which will be executed if one while loop's iteration is not enough.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Feb 25, 2021

It sounds to me, like we're nearing the end of the review. I'm rebasing and reverting the turnaround time simplifications.

Copy link

@iamamyth iamamyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to open a PR which removes the python2 compat code (montonic module, __future__ imports, etc) once the python3 test runner PR gets merged.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Feb 27, 2021

Thanks for the review.

Please make sure to open a PR which removes the python2 compat code (montonic module, __future__ imports, etc) once the python3 test runner PR gets merged.

For the record, we're talking about #7396 , which needs to be approved and merged first.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Mar 14, 2021

A successful ad-hoc run under MacOSX requested by @selsta .

@mj-xmr mj-xmr force-pushed the ci-functional-tests-mining-adaptive branch from 0fcb9d3 to b795a94 Compare March 15, 2021 06:29
@mj-xmr
Copy link
Contributor Author

mj-xmr commented Mar 15, 2021

Just added the Python dependency list to the documentation.

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Mar 18, 2021

As requested by @selsta , I ran the mining test multiple times (19) and each of the runs was successful. This should serve as the ultimate proof, that the concept works fine.

Within the logs of the tests, you're looking for the string:
Test mining via daemon
You will find, that at times, when my CPU power probe needs more time, increasing the timeout, so does the mining process.

@mj-xmr mj-xmr force-pushed the ci-functional-tests-mining-adaptive branch 3 times, most recently from 7e1fb66 to 553ba67 Compare March 23, 2021 07:17
@mj-xmr
Copy link
Contributor Author

mj-xmr commented Mar 23, 2021

Rebased and integrated latest test fixes.

@luigi1111
Copy link
Collaborator

@mj-xmr I went ahead and merged #7396

@mj-xmr
Copy link
Contributor Author

mj-xmr commented Mar 30, 2021

@mj-xmr I went ahead and merged #7396

Got it. Will resolve very soon.
[EDIT]
@luigi1111 : Resolved.

Printing also available RAM. Add comprehensive description.
@mj-xmr mj-xmr force-pushed the ci-functional-tests-mining-adaptive branch from 553ba67 to 45f01f5 Compare March 30, 2021 03:49
@luigi1111 luigi1111 merged commit 9a5579f into monero-project:master Mar 30, 2021
@mj-xmr mj-xmr deleted the ci-functional-tests-mining-adaptive branch March 30, 2021 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants