Skip to content

GCP Validation

alexmbennett2 edited this page Oct 7, 2022 · 18 revisions

Exploring the GCP healthcare API using the MIMIC-IV on FHIR dataset.

Sources used:

Prep GCP

Enable API

Enable the Healthcare API: https://console.cloud.google.com/healthcare/

This creates a service account for the healthcare API with a name service-PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com. Project number can be found at this link: https://console.cloud.google.com/iam-admin/settings?project=PROJECT_NAME_HERE

Define variables

We'll use a few variables throughout

export GCP_PROJECT_NUMBER=<project number>
export GCP_PROJECT_ID=<project id>
export GOOGLE_LOCATION=<project location>
export GOOGLE_BILLING_ACCOUNT=<billing account info>

export GOOGLE_DATASET=mimic-iv-fhir-dataset
export GOOGLE_DATASTORE=mimic-iv-fhir-v2-demo
export GOOGLE_TOPIC=mimic-fhir-bundles
export GOOGLE_IG_FOLDER='gs://mimic-fhir/implementation-guides/mimic-iv-on-fhir-ig/'
# gcloud beta services identity create --service=healthcare.googleapis.com --project=$GCP_PROJECT_ID
gcloud projects add-iam-policy-binding ${GCP_PROJECT_ID} \
  --member=serviceAccount:service-${GCP_PROJECT_NUMBER}@gcp-sa-healthcare.iam.gserviceaccount.com \
  --role=roles/storage.objectViewer
# pick the GCP project
gcloud init
# create healthcare dataset to host the project
gcloud healthcare datasets create $GOOGLE_DATASET
# create a FHIR store
gcloud healthcare fhir-stores create $GOOGLE_DATASTORE --dataset=$GOOGLE_DATASET --version=R4 --enable-update-create
  • when data store is created through UI there is a setting to "Allow update create" which is not on by default...

Import Implementation Guide(s)

With the FHIR store created, we'll need to import an IG so we have the custom profiles we need to validate against. Loading the IG is 5 steps:

  1. Add the global var to the implementation guide json file as described on this page.
  • Effectively you are adding a list of all the profiles with the FHIR resource type
"global": [
  {
    "type": "Patient",
    "profile": "http://mimic.mit.edu/fhir/mimic/StructureDefinition/mimic-patient"
  },
...    
]
  1. Upload IG to the google bucket. Only update StructureDefinition, ValueSets, CodeSystems, and ImplementationGuide

import IG - go to the folder with the IG (ie mimic-profiles/output after generating the IG with the publisher)

gsutil -m cp -r ImplementationGuide-kindlab.fhir.mimic.json "StructureDefinition*.json" "ValueSet*.json" "CodeSystem*.json" $GOOGLE_IG_FOLDER
gcloud healthcare fhir-stores import gcs $GOOGLE_DATASTORE \
  --dataset=$GOOGLE_DATASET \
  --gcs-uri=${GOOGLE_IG_FOLDER}* \
  --content-structure=resource-pretty
  1. Enable the implementation guide
curl -X PATCH -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" -H "Content-Type: application/fhir+json; charset=utf-8" --data '{"validationConfig": {"enabledImplementationGuides":["http://mimic.mit.edu/fhir/mimic/ImplementationGuide/kindlab.fhir.mimic"], "disableProfileValidation": false}}' "https://healthcare.googleapis.com/v1beta1/projects/$GCP_PROJECT/locations/$GOOGLE_LOCATION/datasets/$GOOGLE_DATASET/fhirStores/$GOOGLE_DATASTORE?updateMask=validationConfig" 

Import data into the FHIR Store

The default import function for the FHIR Store does not have validation. We'll need to set up a pub/sub service to work as a queue for all the FHIR bundles being inserted into the FHIR Store.

Setting up Pub/Sub

  • Create a topic for your pub/sub:
gcloud pubsub topics create mimic-fhir-bundles

Configuring Cloud Functions

Create Cloud Function to respond to Topic and insert bundles into Healthcare FHIR Store. Whenever a bundle is posted to the Topic it will be processed by the Cloud Function.

  • The Cloud Function will accomplish the following:

    • Post bundles for validation to the Google Healthcare FHIR Store (cloud function script to post bundles)
    • Log validation errors into a BigQuery table (create table script)
    • Save failed bundles into Cloud Storage for debugging/reprocessing (describe where to point this, based on Cloud Function script)
  • Create the Cloud Function:

    • From the main directory of the mimic-fhir repo run the following command to create the cloud function:
gcloud functions deploy bundle_processor \
--runtime=python39 \
--region=$GOOGLE_LOCATION \
--source=gcp/functions/bundle_processor/. \
--entry-point=bundler \
--trigger-topic=$GOOGLE_TOPIC \
--timeout=300

Set up logging using BigQuery

  • From the main directory of the mimic-fhir repo run the commands to set up the schema and tables:
bq mk mimic_fhir_log
bq query < gcp/bigquery/bundle_pass.sql --use_legacy_sql=false
bq query < gcp/bigquery/bundle_error.sql --use_legacy_sql=false
  • The bundle_error table will include:
column description
logtime time the error was logged
bundle_group bundle group name (lab, medication etc)
bundle_id A unqiue id that is a combination of the bungle_group and a random UUID4
bundler_dir The location in Cloud Storage where the errors are written.
error_text The main error message from the Healthcare API
error_diagnostics A longer explanation of the error
error_expression The element location the error occurred in the resource
  • The bundle_pass table will include:
column description
logtime Time the bundle validation results were logged
patient_id Patient identifier from mimic-fhir
bundle_group bundle group name (lab, medication etc)
bundle_id A unqiue id that is a combination of the bungle_group and a random UUID4
bundler_dir The location in Cloud Storage where the errors are written.
starttime Time the bundle validation request was sent
endtime Time the Healthcare API responded with a successful validation

Run validation with py_mimic_fhir

  • Update your .env file to specify "GCP" as the VALIDATOR environment variable
    • Run source .env to update your environment variables
  • Run py_mimic_fhir validation!
    • py_mimic_fhir validate --num_patients=10 --num_cores=7
    • num_patients: Select how many patients you want to validate
    • num_cores: Select cores at max to be one less than your computer's max cores (or everything will get really slow on your computer)

Check results of FHIR validation

  • Check BigQuery Tables. Some useful queries below:
    • Bundle errors: SELECT * FROM kind-lab.mimic_fhir_log.bundle_error;`
    • Bundles that passed validation:
      • Full table: SELECT * FROM kind-lab.mimic_fhir_log.bundle_pass;`
      • Summary of validation run:
SELECT 
  bundle_dir, 
  MAX(endtime)-MIN(starttime) AS deltaT, 
  COUNT(DISTINCT patient_id) AS pat_count,
  COUNT(DISTINCT CONCAT(patient_id, '-',bundle_group) ) AS bundle_count, 
  COUNT(bundle_id) AS total_bundles_posted
FROM `kind-lab.mimic_fhir_log.bundle_pass`
GROUP BY bundle_dir
  • Check Cloud Storage to dive deeper into any errors
    • From BigQuery you can find the name of the directory from bundle_dir to find the spot in Cloud Storage
    • Open specific bundle_id file and investigate the issue further