-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Buildbot restarting more robustly #304
Comments
That's correct - I'm not as worried about the builder machines (personally) as we don't change their configuration very often. cc @edunham |
A key component of a robust automated solution will likely involve waiting for Buildbot to not have any open jobs. A few questions:
If the latency here is short, I'd look towards a solution that integrates the waiting time into the highstate sequence. If the latency is longer (or likely to increase in the future), I'd prefer to do this more asynchronously - the Salt event bus should make this easy to do. Buildbot masters also seem to have a multimaster mode that could help make these transitions more seamless: https://docs.buildbot.net/current/manual/cfg-global.html#multi-master-mode Bonus points if we can rig up a "Buildbot is restarting message...' to be shown via nginx (i.e. also inform nginx of buildbot up/down times). |
Another consideration is that the Ubuntu machines (running Trusty) currently use Upstart for service management, but newer Ubuntu releases use systemd instead. It would ideal if the chosen solution is init-agnostic, or at least has minimal coupling. |
I've had more luck with:
The only issue has been ensuring that it really is run as the correct user, which I think is much easier to do in Salt? This leaves around a process that will do a It usually takes about 45-50 minutes for a given job to complete. Our homu job queue is between empty and 10 items deep at any time, and it's hard to predict when those times are :-) I'm a little afraid of something that takes down homu to let the buildbot job end, because homu also handles all of the other queues on our other servo org repos. Does that sound reasonable? I do think it doesn't play great for upstart/systemd, though. |
When updating the Buildbot configuration, we need to wait for Buildbot to not be executing any jobs before we can safely restart it.
See discussion in #300.
Apparently there is a way to just reload the Buildbot configuration instead of restarting it via
SIGHUP
orbuildbot reconfig
, but it's fragile, so I'd prefer not to do that: http://docs.buildbot.net/current/manual/cfg-intro.html?highlight=reconfig#reloading-the-config-file-reconfigJust to be clear, this is all for the Buildbot master config + service, not the builder machines, yes?
The text was updated successfully, but these errors were encountered: