Swift-S3 Sync is a way to share data between on-premises OpenStack Swift deployments and Amazon S3 (or S3-clones). The project initially allowed for propagating any changes from Swift to S3 -- PUT, DELETE, or POST -- in an asynchronous fashion. Since then, it has evolved to support a limited set of data policy options to express the life cycle of the data and transparent access to data stored in S3.
- asynchronously propagates object operations to Amazon S3, Google Cloud Storage¹, S3-clones, and other Swift Clusters
- allows for an "archival" mode after set time period
- on-line access to archived data through the Swift interface
¹Google Cloud Storage requires interoperability access to be enabled.
swift-s3-sync runs as a standalone process, intended to be used on Swift
container nodes. The container database provides the list of changes to the
objects in Swift (whether it was a metadata update, new object, or a deletion).
To provide on-line access to archived Swift objects, there is a Swift middleware component. If a Swift container was configured to be archived, the middleware will query the destination store for contents on a GET request, as well as splice the results of LIST requests between the two stores.
There is no explicit coordination between the
Implicitly, they coordinate through their progress in the container database.
Each daemon looks up the number of container nodes in the system (with the
assumption that each node has a running daemon). Initially, each only handles
the objects assigned to it. Afterward, each one verifies that the other objects
have been processed, as well. This means that for each operation, there are
as many requests issued against the remote store as there are container
databases for the container. For example, in a three replica policy, there would
be three HEAD requests if an object PUT was performed (but only one PUT against
the remote store in the common case).
How to setup and use
swift-s3-sync depends on:
- container-crawler library
- botocore (unfortunately, we had to use our own fork, as a number of patches were difficult to merge upstream)
Until we can merge the boto patches, you will also have to install botocore from
our fork (do this before installing swift-s3-sync):
pip install -e git://firstname.lastname@example.org#egg=botocore
Build the package to be installed on the nodes with:
python ./setup.py build sdist
Install the tarball with:
pip install swift-s3-sync-<version>.tar.gz
You also will need to install the
container-crawler library from Git:
pip install -e git://email@example.com#egg=container-crawler
After that, you should have the
swift-s3-sync executable available in
swift-s3-sync has to be invoked with a configuration file, specifying which
containers to watch, where the contents should be placed, as well as a number of
global settings. A sample configuration file is in the
To configure the Swift Proxy servers to use
swift-s3-sync to redirect requests
for archived objects, you have to add the following to the proxy pipeline:
[filter:swift_s3_shunt] use = egg:swift-s3-sync#cloud-shunt conf_file = <Path to swift-s3-sync config file>
This middleware should be in the pipeline before the DLO/SLO middleware.
You can run all tests (flake8, unit, and integration) by executing:
A code line and branch HTML coverage report for the unit tests will get
.coverhtml/, and on macOS, you can view the results with
You can run just the unit tests with
You can get a shell into the integration test container to run arbitrary commands within it like so:
You can run the integration tests by running
Non-integration test time is so low that there isn't any reason to make another command that only runs integration tests.
The integration tests need access to a Swift cluster and some sort of an S3 provider. Currently, they use a Docker container to provide Swift and are configured to talk to S3Proxy.
The cloud sync configuration for the tests is defined in
containers/swift-s3-sync/swift-s3-sync.conf. In particular, there are mappings for S3
sync and archive policies and the same for Swift. The S3 mappings point to
S3Proxy running in the swift-s3-sync container, listening on port 10080.
You can run a subset of the integration tests in the container as well:
docker exec -e DOCKER=true swift-s3-sync nosetests \ /swift-s3-sync/test/integration/test_s3_sync:TestCloudSync.test_s3_sync
The tests create and destroy the Swift containers and S3 buckets configured in
swift-s3-sync.conf file. If you need to examine the state of a Swift
container or S3 bucket after the tests have finished executing, you can set
NO_TEARDOWN=1 in the environment when you run the integration tests. This
will make the
tearDownClass method a NOOP. It may also introduce test
failures if different subclasses of
TestCloudSyncBase end up operating on the
same Swift containers or S3 buckets.
If you would like to examine the logs from each of the services, all logs are in /var/log (e.g. /var/log/swift-s3-sync.log).
For the cloud-connector service, you view its logs by executing:
docker logs cloud-connector
Building and Deploying cloud-connector
Build cloud-connector Docker Image
You build docker images for cloud-connector using the
build_docker_image.py script. Many options have a sane default, but here is
an example invocation specifying all options and illustrating how you can
change the GitHub repository from which Swift is pulled:
cd containers/cloud-connector ./build_docker_image.py --swift-repo swiftstack/swift \ --swift-tag ss-release-22.214.171.124 --swift-s3-sync-tag DEV \ --config-bucket default-bucket-name-to-use \ --repository swiftstack/cloud-connector
Publish the Image
If you want to publish the image after it's built, you can include the
--push flag, and
build_docker_image.py will push the built image for
You can also just push a built image using the
docker push command.
To deploy cloud-connector, you need the following inputs:
- A healthy Swift cluster with CloudSync configured, deployed, and happy.
- The image repository and tag of the cloud-connector Docker image you want to deploy.
- A JSON-format database of authorized cloud-connector users and their corresponding secret S3-API keys from the Swift cluster that the cloud-connector container will be pointed at. See here for an example.
- A copy of the CloudSync JSON-format config file as used inside the Swift cluster.
- A configuration file for cloud-connector. See
for an example. Of particular interest are the
swift_baseurlsetting in the
conf_filesetting in the
[app:proxy-server]section, and the
s3_passwd_jsonsetting in the
[filter:cloud-connector-auth]section. This config file will determine what port the cloud-connector service listens on inside the container. How client traffic is delivered to that port depends on how the container is run.
- A S3-API object storage service that is "local" to where the cloud-connector
container will be deployed. For Amazon EC2, that would be S3. The endpoint of
this storage service will be
CONF_ENDPOINTlater. If S3 is used, then
CONF_ENDPOINTdoes not need to be specified in the container environment.
- A bucket in the S3-API object storage service that will hold the
configuration files for the cloud-connector container. This bucket name
will be the
- S3-API Credentials authorized to list and read objects in
CONF_BUCKET. These will be the
AWS_SECRET_ACCESS_KEYvalues later. If you are using Amazon ECS with an IAM Role to provide access to
CONF_BUCKET, then you do not need to specify
AWS_SECRET_ACCESS_KEYin the cloud-connector container environment.
With those in hand, perform these steps:
- Upload the S3-API user database and CloudSync config files into the
CONF_BUCKET. Note their key names and make sure they are correct in the
- Upload the
cloud-connector.confconfig file into
CONF_BUCKET. The key name of this file in the S3-API object store will be the
- Run the container with the following environment variables (unless earlier
instructions specified that they were not necessary in your circumstances):
Configuring networking to deliver client traffic to the bound port inside the container is outside the scope of this document and depends entirely on the container runtime environment you use.
CONF_BUCKET CONF_ENDPOINT CONF_NAME AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY