Switch branches/tags
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.adoc

README.adoc

JEP-302: Evergreen snapshotting data safety system

Abstract

Jenkins Evergreen, and more specifically its Evergreen component, aims at providing an automatically updating distribution of Jenkins.

Continuous Delivery is about making small incremental changes, making failures much more easily recoverable. In the context here, it means Jenkins Evergreen must be able to seamlessly upgrade Jenkins, but also roll back to the previously running version if an upgrade goes wrong. As Jenkins does not support downgrading alone, this document introduces the snapshotting system which enables that auto-downgrade capability.

Specification

Evergreen works with two main components: Jenkins itself, and the evergreen-client.

Upgrading and downgrading

Once an evergreen client has been instructed to perform an upgrade, it is responsible for the following operations:

  1. (If needed) Initialize the git repository.

  2. Stop Jenkins

  3. Take a snapshot (see Take a snapshot below)

  4. Perform the instructed upgrade to the given Evergreen BOM [1]

  5. Start new Jenkins version and check Jenkins state (see below Checking Jenkins health).

  6. If a rollback is decided:

    1. Take a snapshot [2].

    2. Roll back to the previous data snapshot and Evergreen BOM version. Doing so, we will create an actual new commit using revert (i.e. avoid git reset --hard HEAD~), to keep a durable track of where we went through, accessible through git log.

    3. Start (previous) Jenkins version

  7. Report the outcome to the Evergreen backend.

Take a snapshot

Behind the scenes, this system uses git for the purpose of snapshotting, excluding everything not ignored by .gitignore in Git’s index. In other words, git status just after a snapshot action, would read: nothing to commit, working tree clean

This system is not responsible for snapshotting data which is too large or is not sensible to store. To support snapshotting, Jenkins is configured in a way that physically separates as much as possible between those files which must be snapshotted, and those which must not. See Segregating configuration from binaries, build data, logs, etc for more details below.

An manual example of the process can be visualized with the following:

  1. Update .gitignore content with current Evergreen release.

  2. git add --all

  3. git commit -m '[Upgrade] From BOM x.y.z to a.b.c'

    ℹ️
    The commit log should ideally be made understandable for humans.
    We will use tags to be able to revert/switch between snapshots in a programmatic reliable way.
    Each tag name should be designed so that it is clear and easy to link it to a given version of Evergreen.
Segregating configuration from binaries, build data, logs, etc

To support snapshotting only appropriate files, Jenkins must be configured with a few non-default values to better separate the more static, and critically important, files which must be preserved between upgrades. These files include:

  • Jenkins global configuration

  • Job/Pipeline configuration

Compared to the more ephemeral files, such as: build data, workspaces, exploded plugin files, and exploded core files. * To keep things simple, Evergreen uses a single Docker volume, but introduces an additional level to separate files that must be snapshotted, and those which don’t require snapshotting. Incidentally, this keeps .gitignore short.

Basically, instead of the typical /var/jenkins_home, Evergreen introduces two subdirectories under /evergreen, referred to hereafter as $EVERGREEN_HOME.

  • /evergreen/jenkins/home (=$JENKINS_HOME) for to-be-snapshotted content, and

  • /evergreen/jenkins/var for the rest.

On the filesystem, for example, this would be laid out as such:

/evergreen/jenkins/
├── home
│   ├── jobs
│   │   └── the_job # configuration file only
│   ├── nodes
│   ├── plugins
│   ├── secrets
│   ├── updates
│   ├── userContent
│   └── users
└── var
    ├── logs # JENKINS-50291
    │   └── tasks
    ├── plugins # exploded plugins, using --pluginroot switch
    ├── jobs # JENKINS-50164
    │   └── the_job
    │       ├── builds
    │       └── workspace
    └── war # using --webroot
        ├── META-INF
        ├── WEB-INF
        ├── ...
Files to store

Using the data segregation explained above, Evergreen snapshots almost everything under /evergreen/jenkins/home.

Evergreen must have a .gitignore file for some files that either cannot be moved elsewhere, or that should not be stored in the Git repository. As mentioned above, this file will likely need to be iterated upon as needs change:

.gitignore
/plugins/
/updates/
/secrets/master.key

Regarding $JENKINS_HOME/plugins, this directory contains the hpi/jpi files before extraction. Ideally, Evergreen would move this elsewhere under $EVERGREEN_HOME/jenkins/var/plugins, but this is currently not yet doable, as --pluginsroot only configures a different location for exploded plugins.

Checking Jenkins health

From the perspective of this proposal, health checking Jenkins itself is out of scope. But the driver of the upgrade, evergreen client, requires a way to determine whether or not a rollback should be executed.

This aspect is described in the dedicated JEP-306 covering Instance Client Health Checking.

Motivation

Jenkins has never supported downgrading by itself, and it’s unlikely the core constructs will change in this regard anytime soon. The official way to revert an upgrade if something went wrong is to restore a previous backup.

In the context of Evergreen, it cannot rely on external backups to revert to the N-1 version as this would require regular manual user intervention, which is clearly not the desired user experience.

Reasoning

Scope of the data snapshotting

Snapshotting data is not a backup system.

The practical time frame where the snapshots are designed to be used is within the seconds or minutes after an upgrade has been initiated. If Jenkins, after it has been restarted, is deemed unhealthy, then an auto-rollback can be initiated.

If a version is determined to be problematic after a few days, the data snapshotting system will not be used. After a longer time period, where Jenkins has executed user-motivated workloads, generating new data, the snapshots can no longer be treated as a source of truth. Therefore rolling back outside of the "upgrade window" would risk data loss.

Errors discovered outside of this "upgrade window" should instead be resolved by new changes to Jenkins core, or an erring plugin, in order to solve the user’s issue.

Why Git

Using filesystem-level tools offering a snapshotting feature, like LVM, ZFS or btrfs to give a few examples, was considered. But this was discounted because Evergreen vision is about providing an "easier to use and easier to manage Jenkins environment". As per the targeted audience, we obviously do not want to expect Evergreen users to be system experts able to set up a dedicated filesystem to operate Jenkins. And even with system expert, doing so would not make Evergreen a very easy and quick to use distribution of Jenkins.

Git offers in this matter a powerful user-space tool that allows Evergreen to version, and quickly roll back to some previous state if need be.

Git is also a very common tool nowadays for developers, hence it makes Evergreen more accessible to contributors.

Why not use compatibleSinceVersion metadata

For context, a plugin can indicate a compatibleSinceVersion information, i.e. what is "the oldest version […​] configuration-compatible with.". For example:

  • a plugin is being upgraded from version 1.4 to 1.5

  • it specifies compatibleSinceVersion=1.5

In such case, if this plugin wrote configuration files, this means you cannot safely roll back to the 1.4 version of the plugin.

Conversely, with the following situation:

  • a plugin is being upgraded from version 1.4 to 1.5

  • compatibleSinceVersion is 1.4 or less, or absent.

In such case, even if the plugin did write its updated configuration files on the disk, we can expect being able to safely rollback the plugin to the previous 1.4 version, while leaving the configuration file content that was just updated for 1.5 version.

This situation is not specifically handled in this design. In other words, Evergreen will also roll back those files.

For two reasons:

Backwards Compatibility

There are no backwards compatibility concerns related to this proposal.

Security

Secrets

Versioning secrets should not be an issue per se, as the data snapshotting system is designed to be local to the running instance. The Git repository data will never be pushed outside by the Evergreen code, so no data leak is normally expected from this side.

As users may have the unfortunate idea to push that repository elsewhere, not being aware they could leak secrets, Evergreen conservatively adds secrets/master.key to the .gitignore file.

Man In The Middle

The main issue here is that an attacker could for instance instruct the evergreen client to ignore everything (by putting * in .gitignore), hence make it impossible to roll back.

But this would mean someone was able to talk with connected instances. So even if this is a valid concern, this is considered a larger scope issue that will be addressed through JENKINS-49844.

Hence there are no specific security risks related to this proposal.

Infrastructure Requirements

There are no new infrastructure requirements related to this proposal.

Testing

We must create an image of Evergreen preconfigured with a complete set of representative data.

Creating/defining this data clearly requires human work, but the following checks are deemed automatable.

Upgrading/downgrading

Before delivering updates on real connected instances, testing must occur in at least the following scenarios:

  • Apply the upgrade or downgrade, then check the instance is running fine [3]

Ad-hoc testing tools should be developed to be able to automatically assess the health of a Jenkins Evergreen instance after an upgrade or a downgrade.

Automatically giving some kind of health grade to a running instance is definitely a critical part of Jenkins Evergreen. Detailing this here is out of scope for this proposal. This logic however, should be centralized and used in both during automated tests, and in production for the evergreen-client to automatically analyze if a product instance is healthy or is not (and decide to roll back or not, for the current matter here).

Evergreen should leverage the Jenkins Acceptance Test Harness project for this purpose.

Leveraging Telemetry and live instances data

Evergreen is a connected system. That means we are able to know exactly what versions are running in production. This information must be used to test the actual possible upgrade paths.

Along the way, that also means Evergreen should continuously be able to adjust and enrich what is reported by the Evergreen clients from live instances to improve the associated combinations of tests we run.

Prototype Implementation

A prototype implementation is available in the jenkins-infra/evergreen repository.

References


1. Bill Of Materials: this format is currently being designed, but will list everything constituting a version of Evergreen: WAR and exact versions of all plugins
2. this way, if new files were created, we don’t just delete them in an unrecoverable way when going back to the previous snapshot