Sample Cumulus Library Database

This repository holds a sample database that can be fed into the Cumulus Library.

How do I use this?

First, check out this repository so that you have local copies of the files to work with.

These next few sections will walk you through creating a new S3 bucket where we will put sample data and also an Athena database & workgroup to query that data.

Create CloudFormation stack

Log into AWS.
Go to the CloudFormation service.
Click Stacks on the left.
Click Create Stack (choose With new resources from dropdown).
On the new screen, click Upload a template file.
Click the Choose file button.
Navigate to the aws-template.yaml template file in this repo.
Click Next.
Enter a stack name (can be anything).
You can edit the parameters, but it isn't necessary.
Click Next.
Click Next.
Scroll to the bottom, click I acknowledge that AWS CloudFormation might create IAM resources.
Click Submit.
Watch it create the stack, this might take a couple of minutes.

Upload sample data

When it's done, switch to the Resources tab and click on the S3Bucket link.
This will bring you to the newly created bucket.
Click the Upload button.
Click the Add folder button.
Navigate to the data folder in this repo and upload it itself (don't select any files inside it, just upload the whole data folder).
You should see a few files listed, with a Folder column value of data/condition/ or data/encounter/ etc.
Click the Upload button.
When it's done, you should be able to see files in the bucket, under the data/ folder.

Crawl sample data

Switch to the AWS Glue service.
Open up the Data Catalog section on the left sidebar and click on Crawlers.
Select the crawler you created (will be named after the database you created above).
Click the Run button.
Wait for it to finish (a Succeeded last run state). You should see 6 created in the Table changes column.

Confirm data made it to Athena

Switch to the Athena service.
Click on Query editor
Click on the Workgroup dropdown on the upper right.
Select the name of the database you used for the stack - if you didn't change it, it's cumulus_library_sample_db.
A dialog will pop up, click Acknowledge
On the left sidebar, click the database dropdown and choose the database name you entered above (likely cumulus_library_sample_db)
You should see that it has 6 tables (condition, etc).

You are now ready to follow the further instructions in the Cumulus Library!

How was this database generated?

We started with a 1000-patient Synthea dataset, and ran it through Cumulus ETL for de-identification and compression.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
LICENSE		LICENSE
README.md		README.md
aws-template.yaml		aws-template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

LICENSE

LICENSE

README.md

README.md

aws-template.yaml

aws-template.yaml

Repository files navigation

Sample Cumulus Library Database

How do I use this?

Create CloudFormation stack

Upload sample data

Crawl sample data

Confirm data made it to Athena

How was this database generated?

About

Releases

Packages

License

smart-on-fhir/cumulus-library-sample-database

Folders and files

Latest commit

History

Repository files navigation

Sample Cumulus Library Database

How do I use this?

Create CloudFormation stack

Upload sample data

Crawl sample data

Confirm data made it to Athena

How was this database generated?

About

Resources

License

Stars

Watchers

Forks