This project contains configuration and glue code to facilitate mirroring and transcoding OpenStreetMap data into the AWS OSM Public Dataset.
functions/mirror
contains a Lambda function intended for deployment using AWS SAM. It is triggered
by Amazon EventBridge Scheduler periodically in order to compare the contents of the OSM PDS S3
bucket with what rsync
reports is available on planet.openstreetmap.org
. MD5 hashes are mirrored
immediately (and used as indicators as to whether Batch jobs have been submitted); larger files are
queued for mirroring using AWS Batch.
osm-pds.sh
is the main entrypoint and responsible for most of the heavy
lifting. It includes support for mirroring files from planet.openstreetmap.org
and transcoding them into ORC.
Transcoding uses OSM2ORC under the hood.
Dockerfile
produces a Docker
image:
quay.io/mojodna/osm-pds-pipelines
intended for use by AWS Batch jobs.
Quay automatically builds updated images when changes are pushed to GitHub.
aws/
contains configurations for an AWS Batch environment intended for use
with the AWS command line interface.
To deploy the Lambda function:
cd functions/mirror
sam build
sam deploy
To build the Docker image locally (for testing):
make
To create an AWS Batch environment:
make compute-environment job-queue register-job-definitions
To manually submit jobs:
make submit-job job=aws/sample-mirror-changeset-job.json.hbs