Sync GCS with S3

Google official way

gsutil -m rsync -r -d gs://bucket s3://bucket

But we wanna auto sync between them, right?

Sync huge files using Rclone and Cloud Run

This works better to sync huge files like Cloud SQL backups between GCS and S3. A cloud scheduler will call the container which will use rclone to sync the buckets. You can see the original article with this idea and code here.

Configure Service Accounts Cloud Run and Cloud Scheduler:

export PROJECT_ID=`gcloud config get-value core/project`
export PROJECT_NUMBER=`gcloud projects describe $PROJECT_ID --format="value(projectNumber)"`
export REGION=us-central1
export RSYNC_SRC=gcs-bucket-name
export RSYNC_DEST=s3-bucket-name
export AWS_ACCESS_KEY_ID=your-aws-key
export AWS_SECRET_ACCESS_ID=your-aws-secret
export AWS_REGION=your-region

gcloud iam service-accounts create rsync-sa --display-name "RSYNC Service Account" --project $PROJECT_ID


gcloud iam service-accounts create rsync-scheduler --display-name "RSYNC Scheduler Account" --project $PROJECT_ID

Configure source and destination GCS Buckets

Configure Uniform Bucket Access Policy

gsutil iam ch serviceAccount:$RSYNC_SERVER_SERVICE_ACCOUNT:objectViewer gs://$RSYNC_SRC

Build and deploy Cloud Run image

The server.go as an extra secondary check for the audience value that the Cloud Scheduler sends. This is not a necessary step since Cloud Run checks the audience value by itself automatically (see Authenticating service-to-service).

This secondary check is left in to accommodate running the service on any other platform.

To deploy, we first need to find out the URL for the Cloud Run instance.

First build and deploy the cloud run instance (dont' worry about the AUDIENCE value below)

docker build -t$PROJECT_ID/rsync  .

docker push$PROJECT_ID/rsync

gcloud beta run deploy rsync  --image$PROJECT_ID/rsync \
  --set-env-vars AUDIENCE="" \
  --set-env-vars GS=$RSYNC_SRC \
  --set-env-vars S3=$RSYNC_DEST \
  --set-env-vars AWS_REGION=$AWS_REGION \
  --region $REGION --platform=managed \
  --no-allow-unauthenticated \

Get the URL and redeploy

export AUDIENCE=`gcloud beta run services describe rsync --platform=managed --region=$REGION --format="value(status.address.url)"`

gcloud beta run deploy rsync --image$PROJECT_ID/rsync \
  --set-env-vars AUDIENCE="$AUDIENCE" \
  --set-env-vars GS=$RSYNC_SRC \
  --set-env-vars S3=$RSYNC_DEST \
  --set-env-vars AWS_REGION=$AWS_REGION \
  --region $REGION --platform=managed \
  --no-allow-unauthenticated \

Configure IAM permissions for the Scheduler to invoke Cloud Run:

gcloud run services add-iam-policy-binding rsync --region $REGION --platform=managed \
  --member=serviceAccount:$SCHEDULER_SERVER_SERVICE_ACCOUNT \

Deploy Cloud Scheduler

First allow Cloud Scheduler to assume its own service accounts OIDC Token:

envsubst < "bindings.tmpl" > "bindings.json"

Where the bindings file will have the root service account for Cloud Scheduler:

  • bindings.tmpl:
  "bindings": [
      "members": [
      "role": "roles/cloudscheduler.serviceAgent"

Assign the IAM permission and schedule the JOB to execute every 5mins:

gcloud iam service-accounts set-iam-policy $SCHEDULER_SERVER_SERVICE_ACCOUNT  bindings.json  -q

gcloud beta scheduler jobs create http rsync-schedule --schedule "0 1 * * *" \ 
  --http-method=GET \
  --uri=$AUDIENCE \
  --oidc-service-account-email=$SCHEDULER_SERVER_SERVICE_ACCOUNT   \

Sync small files with event based sync and cloud functions

The rclone way works fine, but it's expensive to sync everything. The below method will sync only new files when a new object is created or modificated in the GCS bucket. This method is entire "as is" created and described in this repo.

First, you'll need to define some environment variables:

# Name of your GCP project
# Name for the runtime config (this MUST match the bucket name)
# AWS region in which your S3 bucket was created
# Name of the S3 bucket
# Name for your Cloud Function
# GCS Bucket where Cloud Function zip files are stored.
# GCS source bucket to be synced

Next, create the runtime config and variables

gcloud --project $PROJECT beta runtime-config configs create $CONFIG_NAME
gcloud --project $PROJECT beta runtime-config configs variables set aws-access-key $AWS_ACCESS_KEY_ID --config-name=$CONFIG_NAME
gcloud --project $PROJECT beta runtime-config configs variables set aws-secret-key $AWS_SECRET_ACCESS_KEY --config-name=$CONFIG_NAME
gcloud --project $PROJECT beta runtime-config configs variables set aws-region $S3_REGION --config-name=$CONFIG_NAME
gcloud --project $PROJECT beta runtime-config configs variables set aws-bucket $S3_TARGET_BUCKET --config-name=$CONFIG_NAME

Finally, deploy the Cloud Function

gcloud --project $PROJECT beta functions deploy $CLOUD_FUNCTION_NAME --stage-bucket $GCS_STAGING_BUCKET \
--trigger-event providers/ \
--trigger-resource $GCS_SOURCE_BUCKET \
--entry-point syncGCS --runtime nodejs10 \