-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd: add prolog/epilog service units #6040
Conversation
What happens on systems where nodes are shared, can the |
I forgot most of what I know about systemd unitfile templates, but perhaps if the the housekeeping unit file was made into a template, it could be started per job with (Even in the exclusive node case, this would allow a sysadmin to quickly determine for what job a housekeeping script was running) Just an idea. |
If multiple jobs sharing a node start the epilog at once, the first one in will start the unit and the rest will block until that one is done. The script would run with environment variables associated with the first job. Yeah I see your point. This is maybe not the best idea for epilog on those systems (or maybe on any systems) where the prolog and epilog contain matched do/undo actions that apply to a single job. Perhaps this should just be part of the housekeeping PR. |
That would solve that problem I think. If we're thinking of housekeeping as doing stuff that is relatively independent of a job then I like the single unit and the idempotency for that. We could go all in with this approach and add prolog@jobid and epilog@jobid units I suppose! :-) |
Yeah, I had kind of forgotten about that. However, the risk there is that not all per-job prolog/epilog scripts will be guaranteed not to hang or get stuck, and if housekeeping is the only way that partial release can work, then the tendency will be to place even per-job scripts into housekeeping - especially on systems allocating exclusive nodes. If they can't be, well then we might be stuck like we are now holding back all nodes of a job when one node has a stuck epilog. |
Well that is a compelling point. One nice thing I just learned is that both Maybe I'll just play with templated units a bit and see what I can learn. |
Re-pushed with a templated unit file, This doesn't pass the other documented perilog environment variables FLUX_OWNER_USERID, FLUX_JOB_USERID, HOME, and USER so maybe we need some way to do that. (Aside: those should probably be added to flux-envirnoment(7)). |
Pushed another update that reinstates the environment copy via Would there be interest in having separate units for prolog and epilog? Then the housekeeping unit could just be added later. It seems like the benefit of running these site provided scripts under systemd is high. I'm pretty sure systemd could reliably stop a fork bomb with its freeze/signal strategy or in later kernels/systemds the |
Yes, that seems like a good idea, though
We should check with the admins - I think currently output from the prolog and epilog are (somehow) forwarded to the management node so that they can be searched in one place. As long as there's still a way to do that, this should be fine.
FYI - Another thought, should these service units perhaps have |
If we're encouraging this approach (which I think we should) then there is probably a support advantage in having it work uniformly at all sites rather than let each site cook up their own unit files. Plus providing the unit files reduces the inertia required to get it done. I guess it's obvious but if we implement a "real" node prolog/epilog in place of overloading the job manager perilog, then this approach of having the IMP start a systemd unit could be retained.
This change would make it so that only an indication of failure is logged to
(Hey that log message should include the hostname) The detailed output/errors would go to the journal. Any thoughts on that @ryanday36? I assumed that logs that go to the systemd journal are collectable in ways already familiar to the sys admins (like rsyslog).
Yeah, this process tree starts out as root so that won't work :-( I mean something like this but it kinda sucks to type
Agreed. |
Ok, that sounds great. I'm all for this approach. It also gives us a chance to somehow recover state of prolog/epilog/housekeeping workloads after a restart of the local broker or system instance. |
OK, this now adds prolog/epilog service units, not housekeeping. I added an update to the admin guide also. |
just pushed some tweaks to the admin description to fix my poor word choices and also add the |
I fixed this up based on what we learned wrapping the housekeeping scripts in a systemd unit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM!
I didn't notice the approval - thanks! Setting MWP. |
Problem: prolog/epilog scripts should be run in a systemd cgroup for reliable termination, logging to the local systemd journal, and debugging using well known systemd tools. Add systemd oneshot service units for prolog and epilog, patterned after the housekeeping one, and helper scripts that can be configured as IMP run commands.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6040 +/- ##
==========================================
- Coverage 83.33% 83.32% -0.02%
==========================================
Files 521 521
Lines 84391 84391
==========================================
- Hits 70325 70316 -9
- Misses 14066 14075 +9 |
Problem: future housekeeping scripts should be run in a systemd cgroup for reliable termination, logging to the local systemd journal, and debugging using well known systemd tools.
Add a systemd "oneshot" service unit for housekeeping. The service unit runs a user-provided
housekeeping
script. It is configured so thatsystemctl start housekeeping
blocks until the the run is complete and its exit code reflects the exit code of the housekeeping script.Add a helper script that can be configured as an IMP run command. It runs
systemctl start housekeeping
and traps SIGTERM, which can be sent to enforce a timeout. Upon receipt of SIGTERM, it stops the unit and exits with a nonzero code.To enable environment variables such as the FLUX_JOB_ID to be passed into the user-provided housekeeping script via the systemd unit, the helper script dumps its environment into
/run/housekeeping.env
, which is read in by the unit.Also of note: the user-provided scripts are automatically idempotent when run this way. systemd never starts multiple instances of the unit. If one is running when a start request is received, the second start blocks until the existing run finishes and reports its status.
I split this out to a separate PR because I think it can be used to improve the existing epilog even before housekeeping as proposed in #5818 is available. For example, on my test system, I renamed
/etc/flux/system/epilog
tohousekeeping
and configured the IMP withand the job manager with
And now my epilog runs in the housekeeping unit. After a job runs,
journalctl -u housekeeping
says(my epilog just burns cpu for 10s)
systemctl status housekeeping
provides the usual informative output: