This repository holds a sample database that can be fed into the Cumulus Library.
First, check out this repository so that you have local copies of the files to work with.
These next few sections will walk you through creating a new S3 bucket where we will put sample data and also an Athena database & workgroup to query that data.
- Log into AWS.
- Go to the
CloudFormation
service. - Click
Stacks
on the left. - Click
Create Stack
(chooseWith new resources
from dropdown). - On the new screen, click
Upload a template file
. - Click the
Choose file
button. - Navigate to the
aws-template.yaml
template file in this repo. - Click Next.
- Enter a stack name (can be anything).
- You can edit the parameters, but it isn't necessary.
- Click Next.
- Click Next.
- Scroll to the bottom, click
I acknowledge that AWS CloudFormation might create IAM resources.
- Click Submit.
- Watch it create the stack, this might take a couple of minutes.
- When it's done, switch to the Resources tab and click on the S3Bucket link.
- This will bring you to the newly created bucket.
- Click the
Upload
button. - Click the
Add folder
button. - Navigate to the
data
folder in this repo and upload it itself (don't select any files inside it, just upload the wholedata
folder). - You should see a few files listed, with a Folder column value of
data/condition/
ordata/encounter/
etc. - Click the
Upload
button. - When it's done, you should be able to see files in the bucket, under the
data/
folder.
- Switch to the
AWS Glue
service. - Open up the
Data Catalog
section on the left sidebar and click onCrawlers
. - Select the crawler you created (will be named after the database you created above).
- Click the
Run
button. - Wait for it to finish (a
Succeeded
last run state). You should see6 created
in theTable changes
column.
- Switch to the
Athena
service. - Click on
Query editor
- Click on the
Workgroup
dropdown on the upper right. - Select the name of the database you used for the stack - if you didn't change it, it's cumulus_library_sample_db.
- A dialog will pop up, click
Acknowledge
- On the left sidebar, click the database dropdown and choose the database name you entered above (likely
cumulus_library_sample_db
) - You should see that it has 6 tables (
condition
, etc).
You are now ready to follow the further instructions in the Cumulus Library!
We started with a 1000-patient Synthea dataset, and ran it through Cumulus ETL for de-identification and compression.