Skip to content
Fred Hersch edited this page Aug 23, 2023 · 5 revisions

FHIR Data Pipes provides a series of pipelines to transform data from a FHIR server to either Apache Parquet files for analysis or another FHIR store for data integration. It also provides some minimal support to integrate other tools for querying Parquet files.

Good places to start are:

See the right navigation bar for all the documentation.

Quick start guide

The FHIR Analytics is made up of two core components:

  • (i) Data Pipelines (fhir-data-pipes) and
  • (ii) Query Libraries for generating views (FHIR Views)

FHIR Data Pipes: ETL pipelines that allow you to transform data from a FHIR Source (could be store or via a FHIR transformer/facade) into SQL-on-FHIR schema that can be loaded into an SQL DWH (current implementation uses parquet for distributed storage).

It should take about 45-60 minutes to get this set-up and running with sample code (provided). Please let us know if you are having any issues

We also have query libraries - i.e FHIR Views - that makes it easier to write (the otherwise complex) SQL-on-FHIR queries using Python and FHIRPath expressions. This is for creating views that then further simplify the SQL you need to write to query the DWH

Why anchor on a common schema i.e. SQL-on-FHIR:

By converging around a common schema (i.e SQL-on-FHIR), we can then unlock capabilities including:

The FHIR-dbt-analytics project by our sister team at Google, contains a suite of dbt macros for working with FHIR data (in the SQL-on-FHIR schema) as well as a sample set of data quality metrics that can be visualized through a dashboard that uses materialized views (current demo is only for BigQuery, and we have a prototype available for Spark and SuperSet)

This approach could be used for common shareable program indicators and we believe there is a community opportunity here

To learn more or to provide any feedback, please get in touch via hello-ohs[AT]google.com