Skip to content

Using AWS S3 for sstate and downloads mirrors

Matt Madison edited this page Sep 13, 2022 · 4 revisions

This distro includes features that enable the use of AWS S3 storage buckets to serve as mirrors for holding shared state and downloads (source) artifacts. With suitable configuration, CI builders running in EC2 can directly populate the mirrors for both their own use as well as use by developers building the distro from their own workstations.

Components

There are two components added to the distro to implement these features: a replacement for the bitbake s3:// fetcher, and a bbclass file to implement on-the-fly replication of downloaded sources and sstate package to a mirror site.

S3 fetcher replacement

OE-Core has rudimentary support for fetching artifacts from S3 via the s3:// fetcher in bitbake. That fetcher uses awscli commands for downloads, and its performance is fairly low due to the overhead of creating a new S3 session context with each invocation of the aws s3 command in subprocesses.

This distro includes two Python modules that provide a drop-in replacement of the s3:// fetcher with one that uses the boto3 Python package directly. The s3session module keeps a persistent S3 session within a running bitbake thread, allowing a session context to be reused for multiple transactions. The botos3fetcher module replaces the default s3:// fetcher with an implementation that uses s3session.

Prerequisites

The Python environment under which you run bitbake must have the boto3 package installed.

Configuration

To enable the replacement fetcher, add

OE_IMPORTS += "oeaws.botos3fetcher"

to your build configuration.

If users need to specify AWS configuration and credential information in environment variables, they should add the variables to the BB_ENV_EXTRAWHITE (for kirkstone and later OE-Core branches, BB_ENV_PASSTHROUGH_ADDITIONS) environment variable setting as part of their build setup or shell profile:

BB_ENV_EXTRAWHITE="AWS_CONFIG_FILE AWS_PROFILE AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SHARED_CREDENTIALS_FILE AWS_SESSION_TOKEN AWS_DEFAULT_REGION"

On-the-fly replication to mirrors

Populating mirrors from CI automated builders is typically handled in one of two ways:

  • Directly serving the SSTATE_DIR and DL_DIR areas written to by the builders via NFS, or
  • Periodically syncing (or running a post-build syncing script) the SSTATE_DIR/DL_DIR trees to the mirror

The implementation in this distro instead adds a bbclass file called sstate_mirror_update.bbclass that implements the mirror replication by adding postfuncs hooks to the do_fetch tasks and each of the tasks in the SSTATETASKS variable to make a second copy of any downloaded file (for do_fetch) or created sstate package (for shared state tasks) at a mirror location. This permits a mirror to be populated on the fly as artifacts get added, without having to use NFS to serve the mirror.

The bbclass supports both local (file://) and S3 (s3://) URIs for the mirror location to copy to. The main reason to use this approach in AWS is to reduce costs by eliminating the need for expensive persistent EC2 volume storage to hold all of the downloads and/or sstate artifacts - CI builders can use only ephemeral storage (or much smaller EC2 volumes, just large enough for temporary use during builds), and instead use much cheaper S3 for mass storage of downloads/sstate.

Configuration for uploading to mirrors

CI builders should include the following in their build configuration:

INHERITS += sstate_mirror_update"    

Alternatively, the above could be added to the distro configuration.

CI builders that will be populating the downloads mirror should also have the following in their build configuration:

BB_GENERATE_MIRROR_TARBALLS = "1"
UPDATE_DOWNLOADS_MIRROR = "1"
DOWNLOADS_MIRRORDIR = "s3://bucket/..."

substituting the bucket name and prefix for your own downloads mirror location in the URI.

For sstate mirroring:

UPDATE_SSTATE_MIRROR = "1"
SSTATE_MIRRORDIR = "s3://bucket/..."

substituting the bucket name and prefix for your sstate mirror location in the URI.

Configuration for downloading from the mirrors

CI builders and any users that need to use the S3-hosted mirrors should have the following in their build configuration:

SOURCE_MIRROR_URL = "s3://bucket/..."
INHERITS += "own-mirrors"
SSTATE_MIRRORS = "file://.* s3://bucket/.../PATH;downloadfilename=PATH"

With the first SOURCE_MIRROR_URL pointing at the path to the downloads mirror location, and the SSTATE_MIRRORS URL pointing at the path to the shared state mirror.

Access control

You can either use IAM access keys or IAM roles with appropriate permissions policy to control access to the S3 buckets. For this distro, the EC2 instances running as CI builders are configured with an IAM role that grants them read/write access to the buckets. Users have IAM access keys that grant them read-only access.

Alternatives for using S3-hosted mirrors

The configuration described above directly uses the S3-hosted mirrors during builds. Some alternatives are:

  • Have users periodically use aws s3 sync --delete to synchronize the S3-hosted mirror to local disk. This would provide faster access to the mirror during their builds (particularly with slow Internet links), at the expensive of needing additional local storage to hold the mirror contents, plus the periodic sync time.
  • Sync the mirror contents to a server on your local network that then serves the mirrors via NFS to local users.
  • Use an AWS Storage Gateway appliance to provide local access to the S3-hosted mirrors (if you can afford it).
  • Serve the S3 buckets via http/https

The best solution will depend on your network connectivity, the geographic distribution of your developers, budget, etc.

Lifecycle rules

Consider using S3 lifecycle rules to help with managing the S3 bucket storage. For this distro, I set a 1 year lifecycle on downloads and a 30-day lifecycle on sstate artifacts, with expired objects getting permanently deleted.