Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upWIP: Automatically restart Buildbot master on configuration change #505
Conversation
|
I'm not 100% sure about how Upstart works, so we may want to try this manually on |
|
Upstart terminates the process and starts another on restart. The only difference between calling stop then start versus restart is that post-start and pre-stop and such logic is skipped for restart and the job control config is not reread. There is also respawn, which restarts the job automatically when it exits. http://upstart.ubuntu.com/cookbook/#respawn I don't think this patch is correct. What is the behavior you are observing and trying to fix? |
|
Hmm, it does sound this patch won't work. Essentially, I want to teach Upstart how to restart Buildbot gracefully, so instead of running the steps at https://github.com/servo/servo/wiki/Buildbot-administration#dealing-with-troubles manually, we can always do Since I don't see us upgrading Buildbot soon, most of the time we only need to reload the configuration, so another option would be cleanly reloading Buildbot. Maybe using |
|
There are other job types you could set up for one off things like this. What does --clean do anyway? |
|
Buildbot's The main point is to 1) avoid dropping builds 2) require as little admin effort as possible to update the Buildbot config (if you force-kill it it has to be cleaned up by hand to restart). |
347a310
to
0330c52
0330c52
to
9be76f2
|
I looked into it more deeply and it seems Buildbot's reloading functionality has a lot of gotchas and edges cases it doesn't handle well. I also spent some more time with the Upstart Cookbook and I think I found another approach that may work - @metajack, does this look like a reasonable idea? (Still WIP.) |
|
Looks good, though I'm confused where the value of |
|
The Earlier I was looking into making it work without specifying a reason, but it looks like that causes other problems, so I think I need to add another Upstart job to start an instance at boot-up. |
|
Oh, I see it now. r=me if you think this is ready to go. |
fb5aa02
to
6c90434
|
First commit now fixes #552, and I included a bunch more changes too beyond the original PR. This should really make Buildbot much nicer and automated! r? @larsbergstrom or @edunham on the Salt/Buildbot changes Also, let me know what docs I should add with this PR! Most things should be handled automatically by Salt now. I highly recommend trying this out in Vagrant with the
FYI, strange output is due to saltstack/salt#35989 and can be fixed by a Salt update. |
For some reason (likely due to our "graceful" restarts of Buildbot), when Buildbot is restarted after its configuration is updated, it does not read the updated `.py` files but instead uses the existing compiled bytecode (`.pyc`) files, which reflect an older version and are not up to date. These bytecode files are also not regenerated, meaning the new configuration does not (fully) take effect. To work around this, disable bytecode caching for the Buildbot master entirely to avoid using out-of-date bytecode. saltfs-migration: Delete all `.pyc` files (recursively) in the Buildbot master directory (/home/servo/buildbot/master).
When upgrading the Buildbot master version, or starting from a fresh deploy (no existing database), the Buildbot database must be upgraded in order for Buildbot to start normally. (The DB is usually created during `buildbot create-master`, but we want to avoid checking in the database.) Add a `cmd.run` state that only runs if the Buildbot version is changed, using an `onchanges` requisite. The upgrade script requires that Buildbot is not running, presuambly to avoid conflicting updates, so Buildbot must be stopped before running the upgrade. We want to perform a clean stop, but the built-in `buildbot stop` command is nonblocking and will return immediately, without waiting for the existing Buildbot instance to finish. Buildbot has blocking/waiting for stop functionality built-in but not exposed, so add a small helper script to stop Buildbot and block until it is finished shutting down, and invoke it before the upgrade. An alternative would be trying to use Upstart or Salt directly to stop Buildbot, as a clean shutdown boils down to sending a SIGUSR1 to the Buildbot process (only if one is running), in the Unix tradition. However, this would be hard to integrate with; in particular, we need to wait for the existing Buildbot process to finish running; the easiest way to integrate this into a Salt state (without writing a custom Salt state) is to start a process to do the waiting, hence the stop script. Note that the upgrade-master command also adds various other cruft to the master directory; the Buildbot internal upgradeDatabase API is not called because it is layered in Twisted Reactor/inline callback goop, and it is simpler to just call the CLI command. Also update the Buildbot master states to be more strict about using requisites for better ordering control, and re-order/space out states for a better reading flow.
Buildbot comes with built-in functionality for clean restarts, which entail starting a new Buildbot process that does the following: - Cleanly shut down the existing buildmaster (existing instance) by waiting for pending builds to be finshed, ending the existing process. - Start a new buildmaster in the new process, taking over. Note that the new process becomes the new daemon, and thus needs to linger/be kept alive and managed. Upstart's built-in restart functionality hard-kills the existing process, which is undesirable; it's also hard to make Upstart wait for pending builds before stopping the existing process, as only Buildbot knows about any pending builds. Additionally, Buildbot has a limited reload functionality, but there are many pitfalls, gotchas, and inconsistencies, and it is not recommended for customized installations like ours. (e.g. re-loading imports doesn't work.) Note that when a Buildbot clean restart is requested, there are multiple processes running simultaneously. Model this in Upstart by using an "instance" job, which makes it easy to queue restarts and monitor them, without having to leave processes running in e.g. tmux or screen. Add an additional Upstart task to spawn a single instance of the Buildbot master service at boot-up, or during a Salt deploy if no instances are alive. More instances of the buildbot-master job can be started manually, or automatically by Salt, to queue up restarts. (The original process will exit gracefully after some time.) All instances will stop gracefully on shutdown. Note that the switch to an instance job means each separate instance must be differentiated by a `reason` variable, which is not used for any other purpose; this is automatically set to the current date/time, suffixed with some metadata about the reason for the instance. This has a side effect of creating separate Upstart logs (in `/var/log/upstart` for each instance), but the choice of reason keeps the logs sorted by date. Finally, automate this by having Salt queue a new restart job instance if anything about the Buildbot master configuration (or package) changes.
6c90434
to
0b69a00
|
Also, there is some manual migration required: Make sure to delete the existing |
|
|
aneeshusa commentedOct 6, 2016
•
edited by larsbergstrom
This should allow us to use Upstart to restart Buildbot cleanly, instead
of having to
suto servo and restart Buildbot manually.This change is