JEP-302: Evergreen snapshotting data safety system
Evergreen snapshotting data safety system
- Backwards Compatibility
- Infrastructure Requirements
- Prototype Implementation
Continuous Delivery is about making small incremental changes, making failures much more easily recoverable. In the context here, it means Jenkins Evergreen must be able to seamlessly upgrade Jenkins, but also roll back to the previously running version if an upgrade goes wrong. As Jenkins does not support downgrading alone, this document introduces the snapshotting system which enables that auto-downgrade capability.
Evergreen works with two main components: Jenkins itself, and the evergreen-client.
Upgrading and downgrading
Once an evergreen client has been instructed to perform an upgrade, it is responsible for the following operations:
(If needed) Initialize the git repository.
Take a snapshot (see Take a snapshot below)
Perform the instructed upgrade to the given Evergreen BOM 
Start new Jenkins version and check Jenkins state (see below Checking Jenkins health).
If a rollback is decided:
Take a snapshot .
Roll back to the previous data snapshot and Evergreen BOM version. Doing so, we will create an actual new commit using
git reset --hard HEAD~), to keep a durable track of where we went through, accessible through
Start (previous) Jenkins version
Report the outcome to the Evergreen backend.
Take a snapshot
Behind the scenes, this system uses
git for the purpose of snapshotting, excluding everything not ignored by
.gitignore in Git’s index.
In other words,
git status just after a snapshot action, would read:
nothing to commit, working tree clean
This system is not responsible for snapshotting data which is too large or is not sensible to store. To support snapshotting, Jenkins is configured in a way that physically separates as much as possible between those files which must be snapshotted, and those which must not. See Segregating configuration from binaries, build data, logs, etc for more details below.
An manual example of the process can be visualized with the following:
.gitignorecontent with current Evergreen release.
git add --all
git commit -m '[Upgrade] From BOM x.y.z to a.b.c'
The commit log should ideally be made understandable for humans. We will use tags to be able to revert/switch between snapshots in a programmatic reliable way. Each tag name should be designed so that it is clear and easy to link it to a given version of Evergreen.
Segregating configuration from binaries, build data, logs, etc
To support snapshotting only appropriate files, Jenkins must be configured with a few non-default values to better separate the more static, and critically important, files which must be preserved between upgrades. These files include:
Jenkins global configuration
Compared to the more ephemeral files, such as: build data, workspaces, exploded plugin files, and exploded core files.
To keep things simple, Evergreen uses a single Docker volume, but introduces an additional level to separate files that must be snapshotted, and those which don’t require snapshotting.
Incidentally, this keeps
Basically, instead of the typical
/var/jenkins_home, Evergreen introduces two subdirectories under
/evergreen, referred to hereafter as
$JENKINS_HOME) for to-be-snapshotted content, and
/evergreen/jenkins/varfor the rest.
On the filesystem, for example, this would be laid out as such:
/evergreen/jenkins/ ├── home │ ├── jobs │ │ └── the_job # configuration file only │ ├── nodes │ ├── plugins │ ├── secrets │ ├── updates │ ├── userContent │ └── users └── var ├── logs # JENKINS-50291 │ └── tasks ├── plugins # exploded plugins, using --pluginroot switch ├── jobs # JENKINS-50164 │ └── the_job │ ├── builds │ └── workspace └── war # using --webroot ├── META-INF ├── WEB-INF ├── ...
Files to store
Using the data segregation explained above, Evergreen snapshots almost everything under
Evergreen must have a
.gitignore file for some files that either cannot be moved elsewhere, or that should not be stored in the Git repository.
As mentioned above, this file will likely need to be iterated upon as needs change:
/plugins/ /updates/ /secrets/master.key
$JENKINS_HOME/plugins, this directory contains the hpi/jpi files before extraction.
Ideally, Evergreen would move this elsewhere under
$EVERGREEN_HOME/jenkins/var/plugins, but this is currently not yet doable, as
--pluginsroot only configures a different location for exploded plugins.
Checking Jenkins health
From the perspective of this proposal, health checking Jenkins itself is out of scope. But the driver of the upgrade, evergreen client, requires a way to determine whether or not a rollback should be executed.
This aspect is described in the dedicated JEP-306 covering Instance Client Health Checking.
Jenkins has never supported downgrading by itself, and it’s unlikely the core constructs will change in this regard anytime soon. The official way to revert an upgrade if something went wrong is to restore a previous backup.
In the context of Evergreen, it cannot rely on external backups to revert to the N-1 version as this would require regular manual user intervention, which is clearly not the desired user experience.
Scope of the data snapshotting
Snapshotting data is not a backup system.
The practical time frame where the snapshots are designed to be used is within the seconds or minutes after an upgrade has been initiated. If Jenkins, after it has been restarted, is deemed unhealthy, then an auto-rollback can be initiated.
If a version is determined to be problematic after a few days, the data snapshotting system will not be used. After a longer time period, where Jenkins has executed user-motivated workloads, generating new data, the snapshots can no longer be treated as a source of truth. Therefore rolling back outside of the "upgrade window" would risk data loss.
Errors discovered outside of this "upgrade window" should instead be resolved by new changes to Jenkins core, or an erring plugin, in order to solve the user’s issue.
Using filesystem-level tools offering a snapshotting feature, like LVM, ZFS or btrfs to give a few examples, was considered. But this was discounted because Evergreen vision is about providing an "easier to use and easier to manage Jenkins environment". As per the targeted audience, we obviously do not want to expect Evergreen users to be system experts able to set up a dedicated filesystem to operate Jenkins. And even with system expert, doing so would not make Evergreen a very easy and quick to use distribution of Jenkins.
Git offers in this matter a powerful user-space tool that allows Evergreen to version, and quickly roll back to some previous state if need be.
Git is also a very common tool nowadays for developers, hence it makes Evergreen more accessible to contributors.
Why not use compatibleSinceVersion metadata
For context, a plugin can indicate a
compatibleSinceVersion information, i.e. what is "the oldest version […] configuration-compatible with.". For example:
a plugin is being upgraded from version
In such case, if this plugin wrote configuration files, this means you cannot safely roll back to the
1.4 version of the plugin.
Conversely, with the following situation:
a plugin is being upgraded from version
1.4or less, or absent.
In such case, even if the plugin did write its updated configuration files on the disk, we can expect being able to safely rollback the plugin to the previous
1.4 version, while leaving the configuration file content that was just updated for
This situation is not specifically handled in this design. In other words, Evergreen will also roll back those files.
For two reasons:
this looks like an optimization. Hence as such, this is probably premature to try and be very smart with the way the downgrade will work ;
First, work must be done on the JEP to define criteria for selecting plugins to include in Jenkins Evergreen, so that there is a clear process and automated tests in place to check for correct
There are no backwards compatibility concerns related to this proposal.
Versioning secrets should not be an issue per se, as the data snapshotting system is designed to be local to the running instance. The Git repository data will never be pushed outside by the Evergreen code, so no data leak is normally expected from this side.
As users may have the unfortunate idea to push that repository elsewhere, not being aware they could leak secrets, Evergreen conservatively adds
secrets/master.key to the
Man In The Middle
The main issue here is that an attacker could for instance instruct the evergreen client to ignore everything (by putting
.gitignore), hence make it impossible to roll back.
But this would mean someone was able to talk with connected instances. So even if this is a valid concern, this is considered a larger scope issue that will be addressed through JENKINS-49844.
Hence there are no specific security risks related to this proposal.
There are no new infrastructure requirements related to this proposal.
We must create an image of Evergreen preconfigured with a complete set of representative data.
Creating/defining this data clearly requires human work, but the following checks are deemed automatable.
Before delivering updates on real connected instances, testing must occur in at least the following scenarios:
Apply the upgrade or downgrade, then check the instance is running fine 
Ad-hoc testing tools should be developed to be able to automatically assess the health of a Jenkins Evergreen instance after an upgrade or a downgrade.
Automatically giving some kind of health grade to a running instance is definitely a critical part of Jenkins Evergreen. Detailing this here is out of scope for this proposal. This logic however, should be centralized and used in both during automated tests, and in production for the evergreen-client to automatically analyze if a product instance is healthy or is not (and decide to roll back or not, for the current matter here).
Evergreen should leverage the Jenkins Acceptance Test Harness project for this purpose.
Leveraging Telemetry and live instances data
Evergreen is a connected system. That means we are able to know exactly what versions are running in production. This information must be used to test the actual possible upgrade paths.
Along the way, that also means Evergreen should continuously be able to adjust and enrich what is reported by the Evergreen clients from live instances to improve the associated combinations of tests we run.
A prototype implementation is available in the jenkins-infra/evergreen repository.