Skip to content

Tool to generate synthetic EHR data via Synthea and load into BigQuery

Notifications You must be signed in to change notification settings

jmandel/synthea-to-bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quick and Dirty Data Loader

Download the Synthea data release and prepare for BigQuery:

  • Translate patient bundles --> resource-specific .ndjson files like Patient.ndjson.gz containing all resources of a given type
  • Generate BigQuery schema files for each resource, based on the complete hierarchy of fields used with the resource-specific .ndjson files
  • Load all resource-specific .ndjson files into BigQuery
./01-prepare.sh
./02-load.sh

Note: This takes O(3h) as currently written.

TODO

  • Clean up the logic for schema generation (currently it's unreadable / unmaintainable)
  • Make it easy to parallelize (e.g. multile files per resource type so generation can run in multiple processes at once
  • Move schema generation to a post-processing step to avoid runtime overhead of walking each JSON hierachy when generating resource-spscific .ndjson files

About

Tool to generate synthetic EHR data via Synthea and load into BigQuery

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published