Skip to content

smart-on-fhir/cumulus-library-sample-database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sample Cumulus Library Database

This repository holds a sample database that can be fed into the Cumulus Library.

How do I use this?

First, check out this repository so that you have local copies of the files to work with.

These next few sections will walk you through creating a new S3 bucket where we will put sample data and also an Athena database & workgroup to query that data.

Create CloudFormation stack

  1. Log into AWS.
  2. Go to the CloudFormation service.
  3. Click Stacks on the left.
  4. Click Create Stack (choose With new resources from dropdown).
  5. On the new screen, click Upload a template file.
  6. Click the Choose file button.
  7. Navigate to the aws-template.yaml template file in this repo.
  8. Click Next.
  9. Enter a stack name (can be anything).
  10. You can edit the parameters, but it isn't necessary.
  11. Click Next.
  12. Click Next.
  13. Scroll to the bottom, click I acknowledge that AWS CloudFormation might create IAM resources.
  14. Click Submit.
  15. Watch it create the stack, this might take a couple of minutes.

Upload sample data

  1. When it's done, switch to the Resources tab and click on the S3Bucket link.
  2. This will bring you to the newly created bucket.
  3. Click the Upload button.
  4. Click the Add folder button.
  5. Navigate to the data folder in this repo and upload it itself (don't select any files inside it, just upload the whole data folder).
  6. You should see a few files listed, with a Folder column value of data/condition/ or data/encounter/ etc.
  7. Click the Upload button.
  8. When it's done, you should be able to see files in the bucket, under the data/ folder.

Crawl sample data

  1. Switch to the AWS Glue service.
  2. Open up the Data Catalog section on the left sidebar and click on Crawlers.
  3. Select the crawler you created (will be named after the database you created above).
  4. Click the Run button.
  5. Wait for it to finish (a Succeeded last run state). You should see 6 created in the Table changes column.

Confirm data made it to Athena

  1. Switch to the Athena service.
  2. Click on Query editor
  3. Click on the Workgroup dropdown on the upper right.
  4. Select the name of the database you used for the stack - if you didn't change it, it's cumulus_library_sample_db.
  5. A dialog will pop up, click Acknowledge
  6. On the left sidebar, click the database dropdown and choose the database name you entered above (likely cumulus_library_sample_db)
  7. You should see that it has 6 tables (condition, etc).

You are now ready to follow the further instructions in the Cumulus Library!

How was this database generated?

We started with a 1000-patient Synthea dataset, and ran it through Cumulus ETL for de-identification and compression.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published