Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
321 lines (236 sloc) 17.1 KB

JEP-202: External Artifact Storage

Abstract

Jenkins uses the master filesystem to store all generated artifacts, unless explicitly using a plugin that archives somewhere else. There are serious drawbacks to use filesystem when running in cloud or containerized environments. This proposal provides the APIs needed to store all artifacts in an external location without going through the master. It includes an initial implementation using AWS S3.

Specification

Jenkins agents upload and download artifacts directly to the external location using HTTP, no content traffic is sent over the remoting channel.

All external store operations except upload/download are executed from the master, so the agent does not need any permissions other than HTTP upload/download. Pre-signed urls with expiration can be used so the agent does not have access to the full store.

Stashes are stored as tarballs to ensure permissions and links are preserved.

Any operations using ArtifactManager remain unaffected and will transparently use the chosen implementation.

The flow looks like

  • Upload:

    • master creates blob metadata in external location

    • master gets pre-signed url for blob upload operation

    • url is sent to agent

    • agent creates a tarball in case of stashing

    • agent does the actual file upload through HTTP

  • Download:

    • master gets pre-signed url for blob download operation

    • url is sent to agent

    • agent does the actual file download through HTTP

    • agent unpacks the tarball in case of unstashing

  • Browser:

    • user clicks on artifact download link

    • master gets pre-signed url for blob download operation

    • browser is sent a HTTP redirect to that url so download is between external store and browser

This change adds more methods to the VirtualFile API to support this flow.

  • toExternalURL() Optionally obtains a URL which may be used to retrieve file contents from any process on any node.

  • mode() Gets the file Unix mode, if meaningful

  • readLink() If this file is a symlink, returns the link target.

  • list(String, String, boolean) Lists recursive files of this directory with pattern matching

In workflow-api-plugin a new StashAwareArtifactManager is also added, a mixin interface for an ArtifactManager which supports specialized stash behavior as well.

The VirtualFile.run(Callable) method, that already existed to optimize Workspace link browsing on agents, is now being used in artifact-related code in Jenkins core. It was important to implement this in the S3 plugin in order to ensure that Classic or Blue Ocean UI flows relating to artifacts, including simply opening the index page of a build, did not make a large number of network calls. The cache system allows a single S3 metadata call (possibly one HTTP request, depending on page size) to retrieve sufficient information for a complete page render of some or all of the artifact tree, without the rendering code (e.g., in Jenkins core) knowing anything about the storage system.

The apache-httpcomponents-client-4-api plugin is extended with a RobustHTTPClient which standardizes handling of blob upload and download operations.

Motivation

Jenkins uses the master filesystem to store all generated artifacts, unless explicitly using a plugin that archives somewhere else.

There are serious drawbacks to use filesystem when running in cloud or containerized environments:

  • Preventing easy scalability as big disks are expensive to move around vms

  • Causing all sorts of issues due to usage of remoting for file transfer. Going through the master as files are copied from/to the agent cause load, cpu, network issues that are hard to diagnose and recover from.

  • Causing a big performance hit due to the traffic and disk access, preventing the usage of distributed filesystems such as NFS, GlusterFS,…​

  • Providing limited functionality compared to a first class blob store (AWS S3, Azure Blob Storage, Google Blobstore). Organizations may have some requirements that are not available today: auditability, sophisticated retention policies, versioning,…​

From the usability point of view, currently a Jenkins installation sizing has to have into account the expected amount of disk needed or a restart, disk reprovisioning, disk swap,…​ instead of a infinite scale, pay per use that a blob store can offer. Cloud installations can automatically configure the blob store to use offering a much better first time user experience.

Several alternatives exist today but they all require changes to all the pipelines and job definitions to explicitly choose the backend to send artifacts to.

Even without this JEP people can use things like the S3 plugin to upload and download artifacts. But since the use of S3, and details about location, are baked into the script, we cannot publish general examples like tests-and-artifacts that are actually ready for people to use. That would contradict one of the goals of Evergreen, that you can get a reasonable workflow going in a few minutes.

Without ArtifactManager and VirtualFile integration, a number of integrations between plugins are impossible. For example, using only the S3 plugin, if you wish to copy artifacts from an upstream build, you cannot use the Copy Artifact plugin; you would need to devise your own system for passing an S3 bucket/path from the upstream build to the downstream build. When JENKINS-45455 is implemented, unstash from S3 will work automatically in a restarted Pipeline build to copy files stashed by the original build. Using only the S3 plugin, you would need to think about saving bucket/path to a variable that could be read by the restarted build. Blue Ocean will display an Artifacts tab for files uploaded to S3 via archiveArtifacts; with only the S3 plugin, you would need to go to Classic UI.

Core APIs already existed for customized artifact storage, but lacked the crucial capability to offer pre-signed URLs, making it impossible to provide a satisfactory S3 implementation. Only customized master-side storage (such as with Compress Artifacts) was really practical.

Reasoning

Initial implementation in AWS S3

AWS is the focus as it is the most widely used cloud provider, S3 being the prevalent blob store. Equivalent features to S3 exist in other cloud providers and artifact repositories.

The S3 implementation also uses Apache JClouds that abstracts most of the implementation from the underlying blob store.

Container (Bucket) and Path References

The implementation for S3 uses a master-wide configuration option to set the name of the container (S3 bucket) and path inside. (AWS-specific installers for Jenkins, such as for Evergreen, could preconfigure these fields.)

This means that different runs cannot store the artifacts in different buckets or paths, as we don’t expect that to be a common use case. It would be more common to move all the artifacts from one location to another and that could be easily achieved by moving the blobs in S3 and changing the master wide configuration parameters.

Interruptions

The VirtualFile API does not support InterruptedException, but there is no evidence that it matters. Test coverage confirms reasonable handling of error conditions including build timeouts and user aborts.

Security

Two possible implementations were considered:

Agents only need upload/download permissions

If agents only do upload/download operations we can use pre-signed urls so they will not be able to access other jobs artifacts. Other operations (list, create, delete,…​) would run on the master, which would be a performance hit for builds with many artifacts

Passing limited credentials to each agent

Masters need to run with elevated permissions to be able to create new roles and permissions on the fly for each job (AssumeRole in AWS). Those limited credentials would be passed on to the agent, who would use them to talk to the external store. All operations would run on agents, with less load on the master, although with extra role creation operations. But the configuration and setup would be considerably more complex, as well as the agent side download code, requiring larger refactorings and a more complicated core API. This temporary role creation does not exist in all clouds nor other artifact repositories. For instance, Azure Active Directory token lifetime is on public preview, and in Google Cloud ACLs are not temporary.

We opted for the first, simpler option.

Backwards Compatibility

Existing plugins using ArtifactManager API will continue to work using the new selected implementation. However, there are two classes of potential incompatibility.

File-oriented artifact reference

Various plugins call deprecated APIs which assume that build artifacts are stored as files inside the master’s build directory. These would already have been broken for users of the Compress Artifacts plugin, but that is rarely used, whereas we are proposing lots of people run with the S3 artifact manager. We could add telemetry so that such calls produce a warning in the system log, at least when the build actually does have a custom artifact manager selected.

As seen in this report, there are a number of plugins on the usual update center still calling Run.getArtifactsDir() and/or Run.Artifact.getFile(), despite the fact that these methods were deprecated in Jenkins 1.531 in 2013 as part of JENKINS-17236. These include:

Plugin Installations

Allure

2593

Artifact diff

433

Copy Artifact

36641

cucumber-perf

919

Deployer Framework

703

Deploy WebLogic

1250

HTTP POST

1498

Maven Repository Server

2023

MDT Deployment

80

NeoLoad

163

Perfecto Mobile

174

Protecode SC

26

Summary Display

1714

WebLOAD Load Testing

34

By far the most popular of these, Copy Artifact, is scheduled to be made compatible with this JEP as part of the reference implementation. (The first stage of that fix implements a longstanding RFE JENKINS-22637, originally filed for interoperability with Compress Artifacts. The second stage of the fix makes use of core APIs introduced in this JEP.)

The effect of calling the deprecated APIs when a cloud-based artifact manager is in use will vary by the plugin’s particular logic. In some cases, it may simply appear as if the build had no artifacts. JENKINS-22637 describes an error message when attempting to use Copy Artifact. As another example, Artifact diff will display a sidebar link as usual, but when clicked the rendered diff is empty, and the system log reports:

… org.jenkinsci.plugins.artifactdiff.FilePathDiff$Entry getStream
INFO: java.nio.file.NoSuchFileException: /var/jenkins_home/jobs/someproject/builds/123/archive/somefile.txt

Master-based file streaming

Some plugins using VirtualFiles corresponding to build artifacts are still calling open and then passing the stream to an agent or copying it to an HTTP response. This will work, but will be very expensive when using S3 storage. They need to be updated to call VirtualFile.toExternalURL. Finding a list of such plugins is more difficult since open is not deprecated. (Its use is appropriate as a fallback when toExternalURL is unavailable, or when the desired behavior is for artifact contents to be read by the Jenkins master process anyway.) Code inspection from this search turns up the following possible issues:

Plugin Installations

Maven Integration

124783

Security

Security considerations make agents need to be restricted to only access the artifacts needed. Having access to the blob store would mean access to other jobs artifacts.

Agents only do URL based upload/download operations and get the correct url to do so from the master.

In the common case where the vm instances are assigned roles (IAM role in AWS) the instance where the master runs should have access to the blob store but the agents should run in a different instance where its role does not allow it.

In a Kubernetes environment this means either using different node pools for masters or agents or using something like kube2iam to have different roles per pod.

Infrastructure Requirements

Ideally we could use Jenkins infrastructure to do live testing with S3, which is not currently possible due to lack of AWS account. But tests can be run from a EC2 instance or a local machine.

Testing

Automated tests for the common archive/unarchive and stash/unstash flow have been added to the ArtifactManager API to ensure all implementations comply.

The AWS S3 implementation tests exercise this flow plus add some extra S3 specific tests. They require an AWS account and S3 permissions and can be run from a EC2 instance or a local machine.

There is an abstraction layer allowing use of any blob store supported by Apache jclouds. This layer has its own mock tests confirming general behaviors.

Prototype Implementation

References