Skip to content

google/fhir-data-pipes

Build Status codecov

What is this?

This repository includes pipelines to transform data from a FHIR server (like HAPI, GCP FHIR store, or even OpenMRS) using the FHIR format into a data warehouse based on Apache Parquet files, or another FHIR server. There is also a query library in Python to make working with FHIR-based data warehouses simpler.

These tools are intended to be generic and eventually work with any FHIR-based data source and data warehouse. Here is the list of main directories with a brief description of their content:

  • pipelines/ *START HERE*: Batch and streaming pipelines to transform data from a FHIR-based source to an analytics-friendly data warehouse or another FHIR store.

  • docker/: Docker configurations for various servers/pipelines.

  • doc/: Documentation for project contributors. See the pipelines README and wiki for usage documentation.

  • utils/: Various artifacts for setting up an initial database, running pipelines, etc.

  • dwh/: Query library for working with distributed FHIR-based data warehouses.

  • bunsen/: A fork of a subset of the Bunsen project which is used to transform FHIR JSON resources to Avro records with SQL-on-FHIR schema.

  • e2e-tests/: Scripts for testing pipelines end-to-end.

NOTE: This was originally started as a collaboration between Google and the OpenMRS community.