Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up 25-cache-service.t with a shorter worker timeout #4362

Merged
merged 1 commit into from Nov 17, 2021

Conversation

kraih
Copy link
Member

@kraih kraih commented Nov 16, 2021

Turns out the test spends most of its time in a blocking SQL loop in Minion::Backend::SQLite.

Before:

All tests successful.
Files=1, Tests=23, 103 wallclock secs ( 0.07 usr  0.01 sys + 15.86 cusr  1.72 csys = 17.66 CPU)
Result: PASS

After:

All tests successful.
Files=1, Tests=23, 51 wallclock secs ( 0.05 usr  0.01 sys +  5.39 cusr  0.97 csys =  6.42 CPU)
Result: PASS

Progress: https://progress.opensuse.org/issues/102221

@@ -93,7 +93,7 @@ sub cache_minion_worker {
require OpenQA::CacheService;
local $ENV{MOJO_MODE} = 'test';
note('Starting cache minion worker');
OpenQA::CacheService::run(qw(run));
OpenQA::CacheService::run(qw(run --dequeue-timeout 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, what is that? Can we speedup everything by factor 10 by making this .1? :) You could go into more details in the git commit message. That would be nice.

Copy link
Member Author

@kraih kraih Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's a point where performance gets worse again (implementation detail upstream...). 😁

The Minion worker is designed as a daemon with a blocking mainloop that primarily blocks when waiting for new jobs to be dequeued. By default Minion::Backend::SQLite blocks for 5 seconds before handing control back to Minion::Worker, where it does housekeeping stuff like reaping failed jobs and so on before blocking to dequeue again. That option limits the time it blocks to 1 second. So our workers can stop 4 seconds earlier on average, and this test stops a lot of workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 0 timeout might work, but is a little risky since the mainloop could start eating a lot of cpu. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 0 is "a little" better.

All tests successful.
Files=1, Tests=23, 32 wallclock secs ( 0.04 usr  0.01 sys + 21.66 cusr  3.95 csys = 25.66 CPU)
Result: PASS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at the cost of almost 5 times the cpu cycles. I think 1 has the best balance between runtime and cpu cost.

Copy link
Member Author

@kraih kraih Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option --dequeue-timeout itself comes from Minion::Command::minion::worker, which ::run() calls indirectly.

@codecov
Copy link

codecov bot commented Nov 16, 2021

Codecov Report

Merging #4362 (7900679) into master (bd5bb76) will not change coverage.
The diff coverage is 50.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4362   +/-   ##
=======================================
  Coverage   97.94%   97.94%           
=======================================
  Files         371      371           
  Lines       33705    33705           
=======================================
  Hits        33012    33012           
  Misses        693      693           
Impacted Files Coverage Δ
t/lib/OpenQA/Test/Utils.pm 82.26% <0.00%> (ø)
t/25-cache-service.t 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd5bb76...7900679. Read the comment docs.

@kraih
Copy link
Member Author

kraih commented Nov 16, 2021

As a bonus full-stack.t gets a little faster too.

Before:

All tests successful.
Files=1, Tests=48, 170 wallclock secs ( 0.32 usr  0.00 sys + 52.77 cusr  2.73 csys = 55.82 CPU)
Result: PASS

After:

All tests successful.
Files=1, Tests=48, 147 wallclock secs ( 0.23 usr  0.04 sys + 52.23 cusr  2.84 csys = 55.34 CPU)
Result: PASS

@kraih
Copy link
Member Author

kraih commented Nov 16, 2021

Setting the timeout to 0 would improve the runtime a little more, but at the cost of cpu. Very well illustrated by this full-stack.t result:

All tests successful.
Files=1, Tests=48, 146 wallclock secs ( 0.22 usr  0.01 sys + 131.33 cusr 17.02 csys = 148.58 CPU)
Result: PASS

@okurz
Copy link
Member

okurz commented Nov 16, 2021

That's very impressive and a great side-effect. Still, I would appreciate if you could put these details into the git commit message

The `--dequeue-timeout` option comes from
`Minion::Command::minion::worker` and is used to decide how long the
Minion backend will block when waiting for the next job to be dequeued.
By reducing it from 5 seconds to 1 second we can speed up the worker
shurdown in our tests.

While it is possible to reduce the value further to 0 seconds, and
speed up tests a little more, that also has the negative side effect of
increasing CPU usage very significantly. So it is probably not worth
doing.
@kraih
Copy link
Member Author

kraih commented Nov 17, 2021

Ok, i've added some more information to the commit message.

@kraih kraih merged commit 2b7d7cc into master Nov 17, 2021
@Martchus Martchus deleted the k/faster_cache_test branch November 17, 2021 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants