New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make KVS garbage collection automatic #5840
Conversation
74a20e7
to
9e6a0ff
Compare
Strange: I'm getting a failure here, but I don't see how this PR could have affected this test. Maybe I'm not thinking hard enough.
|
Oops, it looks like this is missing from diff --git a/t/t9000-system.t b/t/t9000-system.t
index e982b8d3b..f81168d37 100755
--- a/t/t9000-system.t
+++ b/t/t9000-system.t
@@ -13,6 +13,8 @@ if test -n "$FLUX_ENABLE_SYSTEM_TESTS"; then
FLUX_TEST_INSTALLED_PATH=${FLUX_TEST_INSTALLED_PATH:-/usr/bin}
fi
fi
+# Append --logfile option if FLUX_TESTS_LOGFILE is set in environment:
+test -n "$FLUX_TESTS_LOGFILE" && set -- "$@" --logfile
. `dirname $0`/sharness.sh
# Do not run system tests by default unless FLUX_ENABLE_SYSTEM_TESTS |
Ah thanks, I'll toss that in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement! LGTM!
@@ -4,7 +4,7 @@ Wants=munge.service | |||
|
|||
[Service] | |||
Type=notify | |||
NotifyAccess=main | |||
NotifyAccess=all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a missing "to" in the commit message "Change to NotifyAccess=all (from =main)..."?
if (sd_notify_flag | ||
&& (keycount % 100 == 0 || keycount < 10)) { | ||
sd_notifyf (0, "EXTEND_TIMEOUT_USEC=%d", 10000000); // 10s | ||
sd_notifyf (0, "STATUS=flux-restore(1) has restored %d keys", keycount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does flux-restore(1)
know the total number of keys being restored? If so it might be useful to supply that here as a kind of progress indicator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't, unfortunately. AFAICT, like tar, libarchive doesn't know the full table of contents until it has streamed the entire archive, which it doesn't do up front because it might be a tape :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Perhaps in the future flux-dump
could write a final item to the archive in a well known path that gives the total number of keys if it becomes an issue.
Actually probably flux dump
is the bigger issue. On a recent system the dump took >6m -- it might be useful for admins to know how far along it is. Not sure if that one's even possible though.
Anyway, this isn't really applicable to this PR, so nevermind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah delaying shutdown for an open ended period of time is really not ok. But I'm not sure what else to do here. I hope that when it's taking too long, the sys admins will peek at systemctl status flux
and see that apparently something is actively happening before they start turning things off.
Well that produced a clue!
Ah the test is right after 0004-recovery.t which restarts the system instance, so perhaps it is now restoring a dump and not properly up when the job is submitted. |
Well that sorted the tests. I probably should add a test for the |
I guess I'll set MWP on this one and likely play with it some more on my test system before we tag. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #5840 +/- ##
==========================================
- Coverage 83.30% 83.27% -0.04%
==========================================
Files 510 510
Lines 82880 82903 +23
==========================================
- Hits 69047 69040 -7
- Misses 13833 13863 +30
|
Problem: processes other than the main one cannot send systemd status updates, but flux-dump(1) would like to do so. Set NotifyAccess=all instead of NotifyAccess=main to allow all members of the service's control group to send updates.
Problem: garbage collecting the KVS is a manual process that is often overlooked. Set content.dump=auto in the systemd unit file, so 'systemctl stop flux' will perform garbage collection.
Problem: when garbage collection is enabled by default in systemd, it may be handy to have a way to avoid it, for example if the system has been garbage collected recently, but it is still expected to take a long time. Add a flux shutdown --skip-gc option to avoid garbage collection.
Problem: if an instance that is configured to dump is shutdown via 'systemctl stop flux', it could run out the shutdown timer and be killed, interrupting the dump. Have flux-dump check for the broker attribute 'broker.sd-notify'. If it is set to something other than zero, then use sd_notify(3) to send systemd EXTEND_TIMEOUT_USEC messages as the dump progresses. In addition, send progress updates that are displayed under "Status:" in 'systemctl status flux'.
Problem: flux-dump(1) now sends updates to systemd via sd_notify() but flux-restore(1), which runs during rc1 does not provide similar updates. Do the same thing in flux-restore(1).
Problem: the --skip-gc option has no documentation. Add it to the man page.
Problem: t9000-system.t doesn't handle FLUX_TESTS_LOGFILE so logs are not shown on failure in CI. Add handling for that environment variable. Co-authoried-by: Mark A. Grondona <mark.grondona@gmail.com>
Problem: the 0004-recovery system test restarts the system instance, which can cause the next test to fail if the instance doesn't enter run state quickly enough. Add a test that waits for the instance to reach run state.
Problem: no tests show that flux shutdown --skip-gc prevent a scheduled dump from happening. Add a test.
This enables KVS garbage collection when flux is started with systemd, and wires
flux-dump(1)
andflux-restore(1)
into sd_notify(3) so that systemd won't kill the dump process if it takes too long, andsystemctl status flux
will show regular updates about how many keys have been dumped/restored.I've tested this on my home system by beefing up the content store and reducing the TimeoutStopSec=90 in the unit file to just a few seconds. Definitely kills flux-dump without this PR, and does not with it.
I probably need to add a little more testing but thought I'd post this as is since dinner just arrived!