Investigate zero-downtime upgrades #1295

strugee · 2017-03-15T19:11:43Z

It may be possible to perform zero-downtime upgrades for setups with clustering configured to use >1 worker. Essentially, the process would be:

Admin sends e.g. SIGUSR1 to the pump master process
A cluster worker is selected by the master process and told to shut down
The worker stops accepting new connections and finalizes current connections (note this may be tricky because WebSockets)
Worker shuts down
Master process starts a new worker as normal
Repeat from step 2, selecting a different cluster worker until all workers have upgraded

Potential problems here:

This leaves the master process out-of-date, but we don't update it that often. Will this be a problem? Can we possibly exec() a new master process?
How do we deal with semver-major upgrades? This should work if there's only config changes but database migrations will be sticky. Seems like we could just document whether each release was compatible with the zero-downtime feature?

The text was updated successfully, but these errors were encountered:

strugee added the admin label Mar 15, 2017

strugee modified the milestone: Future Mar 15, 2017

strugee added a commit that referenced this issue Aug 7, 2017

Add WIP zero-downtime restart support

9ca5179

Ref #1295

strugee mentioned this issue Aug 7, 2017

Zero-downtime restart support #1406

Merged

strugee added a commit that referenced this issue Aug 9, 2017

Add WIP zero-downtime restart support

71ade73

Ref #1295

strugee added a commit that referenced this issue Aug 11, 2017

Add WIP zero-downtime restart support

25f146f

Ref #1295

strugee mentioned this issue Aug 12, 2017

Refresh website branding pump-io/pump-io.github.io#19

Open

strugee added a commit that referenced this issue Aug 13, 2017

Add WIP zero-downtime restart support

0be95e7

Ref #1295

strugee added a commit that referenced this issue Aug 18, 2017

Add WIP zero-downtime restart support

aac5095

Ref #1295

strugee closed this as completed in f30890b Aug 18, 2017

strugee mentioned this issue Aug 18, 2017

Try to find a way to zero-downtime upgrade the master process too #1415

Open

Provide feedback