# MIMIC-FHIR Tutorial
This tutorial will walk through importing, searching, and analyzing mimic-fhir resources using the [Pathling](https://pathling.csiro.au/) FHIR server. Pathling is a FHIR server optimized for analytics with additional functionality added.

To begin complete these steps:
- Start the [Pathling](https://pathling.csiro.au/) server by running the `docker-compose up` command in terminal (download [docker](https://docs.docker.com/engine/install/) and [docker-compose](https://docs.docker.com/compose/install/) if needed)
- Ensure all mimic-fhir ndjson are unzipped and stored in the *staging* folder beside the docker-compose.yml 
- [Import](#import-all-mimic-fhir-resources-to-pathling) mimic-fhir resources using this notebook 
- Proceed to [Index of Operations](#index-of-operations) for search, aggregation and extract examples 

In [None]:
from pathlib import Path
import requests
import json
import ndjson
import pandas as pd

import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams.update({'font.size': 20})

from fhirclient.models.parameters import Parameters, ParametersParameter
from py_mimic_fhir.lookup import MIMIC_FHIR_RESOURCES

import_folder = 'file:///usr/share/staging' 
server = 'http://localhost:8000/fhir'

<a id='index'></a>
### Index of operations
- [Import](#import-resources)
    - [Import MIMIC-FHIR](#import-all-mimic-fhir-resources-to-pathling)
- [Search](#search-resources)
    - [Gender with export](#search-and-export-by-gender)
    - [Atrial fibrillation patients](#search-for-atrial-fibrillation-patients)
    - [Atrial fibrillation patients taking metoprolol](#search-for-atrial-fibrillation-patients-taking-metoprolol)
- [Aggregate](#aggregate-resources)
    - [Gender](#aggregate-gender)
    - [Conditions](#aggregate-conditions)
    - [Male patients with atrial fibrillation](#aggregate-male-patients-with-atrial-fibrillation)
    - [Medication for atrial fibrillation patients](#aggregate-medication-from-patients-with-atrial-fibrillation)
    - [Procedures for atrial fibrillation patients](#aggregate-procedures-for-atrial-fibrillation-patients)
    - [Top lab events](#aggregate-lab-events)
    - [Top microbiology tests](#aggregate-microbiology-tests)
    - [Top microbiology organisms](#aggregate-microbiology-organisms)
    - [Top EMAR medication](#aggregate-top-emar-medication)
    - [Top ICU medication](#aggregate-top-icu-medication)
- [Extract](#extract-resource-table)




# Import Resources
[back to index](#index-of-operations)

In [None]:
def generate_import_parameters(import_folder, profile, resource, mode):
    param_resource = Parameters()

    param_resource_type = ParametersParameter()
    param_resource_type.name= 'resourceType'
    param_resource_type.valueCode = resource

    param_url = {}
    param_url['name'] = 'url'
    param_url['valueUrl'] = f'{import_folder}/{profile}.ndjson'

    param_mode = ParametersParameter()
    param_mode.name= 'mode'
    param_mode.valueCode = mode

    param_source = ParametersParameter()
    param_source.name = 'source'
    param_source.part = [param_resource_type, param_url, param_mode]
    param_resource.parameter = [param_source]
    
    return param_resource.as_json()

def post_import_ndjson(server, param):
    url = f'{server}/$import'

    resp = requests.post(url,  json = param, headers={"Content-Type": "application/fhir+json"} )
    return resp 

### Import all mimic-fhir resources to Pathling
[back to index](#index-of-operations)
* Place all files in a directory called 'staging', right beside this notebook
  * Data files can be placed in another location as long as the docker-compose.yml is updated
* NOTE: Need to make sure the files are in ndjson format NOT ndjson.gzip. Unzip the files if gzipped.

In [None]:
mode = 'merge' # overwrite for fresh load (but not really since need to merge Observations not overwrite)

for profile, resource in MIMIC_FHIR_RESOURCES.items():
    # ObservationChartevents too large and crashing all the observation searches
    if (profile != 'ObservationChartevents'):
        param = generate_import_parameters(import_folder, profile, resource, mode)
        resp = post_import_ndjson(server, param)
        print(f"{profile}: {resp.json()['issue'][0]['diagnostics']}")

# Search Resources
[back to index](#index-of-operations)

The search functionality allows you to get grouping of resources with common elements. Ie getting all female patients. 

To get specific resource grouping you can use the aggregation output to create the fhirpath search string. 


In [None]:
def get_query(server, resource_type, filter_path, count=10):
    url = f'{server}/{resource_type}?_query=fhirPath&filter={filter_path}&_count={count}' 

    resp = requests.get(url,  headers={"Content-Type": "application/fhir+json"} )
    return resp.json()

def export_resources(resp, resource_type, output_path):
    output_file = f'{output_path}/{resource_type}.ndjson'

    with open(output_file, 'w+') as outfile:
        writer = ndjson.writer(outfile)
        for entry in resp['entry']:
            writer.writerow(entry['resource'])           

#### Search and export by gender
[back to index](#index-of-operations)

In [None]:
resource_type = 'Patient'
filter_path = "gender='male'" # ((reverseResolve(Condition.subject).code.coding.where($this.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|4019)).code).empty()) and (gender = 'male')
count = 100

resp = get_query(server, resource_type, filter_path, count)
resp

In [None]:
output_path = 'output'
export_resources(resp, resource_type, output_path)

#### Search for atrial fibrillation patients

In [None]:
resource_type = 'Patient'
filter_path = "reverseResolve(Condition.subject).code.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|42731) contains true"
count = 10

resp = get_query(server, resource_type, filter_path, count)
resp

#### Search for atrial fibrillation patients taking metoprolol

In [None]:
resource_type = 'Patient'
filter_path = "reverseResolve(Condition.subject).code.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|42731) contains true \
               and reverseResolve(MedicationAdministration.subject).medicationCodeableConcept.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/medication-icu|225974) contains true"
count = 10

resp = get_query(server, resource_type, filter_path, count)
resp

# Aggregate Resources
[back to index](#index-of-operations)

In [None]:
def get_aggregate(server, resource_type, element_path, filter_path=None):
    url = f'{server}/{resource_type}/$aggregate?aggregation=count()&grouping={element_path}'

    if filter_path is not None:
        url = f'{url}&filter={filter_path}'
    

    resp = requests.get(url, headers={"Content-Type": "application/fhir+json"} )
    return resp.json()

def plot_aggregate(resp, title, limit, size=[12,8], rotation=90, ascending=True, skip_missing=False):
     parameters = resp['parameter']
     list_label= []
     list_value = []
     for parameter in parameters:
          if (len(parameter['part'][0]) == 2):
               label_val = list(parameter['part'][0].values())[1]
          elif skip_missing:
               continue
          else:
               label_val = 'WITHOUT'
          list_label.append(label_val)
          list_value.append(parameter['part'][1]['valueUnsignedInt'])

     df = pd.DataFrame({'label': list_label, 'value': list_value})
     df_sorted = df.sort_values(by=['value'], ascending=ascending).iloc[-limit:] 
     plt.figure(figsize=size)
     plt.barh(df_sorted['label'], df_sorted['value'])
    #plt.xticks(rotation=rotation)
     plt.title(title)
     plt.show()


#### Aggregate gender
[back to index](#index-of-operations)

In [None]:
resource_type = 'Patient'
element_path = 'gender'
limit=10
size = [6,6]

resp = get_aggregate(server, resource_type, element_path)
plot_aggregate(resp, element_path, limit, size, rotation=45)

#### Aggregate conditions
[back to index](#index-of-operations)

In [None]:
resource_type = 'Condition'
element_path = 'code.coding.display'
title = f'{resource_type}: {element_path}'
limit = 10

resp = get_aggregate(server, resource_type, element_path)
plot_aggregate(resp, title, limit)

#### Aggregate male patients with atrial fibrillation
[back to index](#index-of-operations)

In [None]:
resource_type = 'Patient'
element_path = "reverseResolve(Condition.subject).code.coding.where(subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|42731)).display"
filter_path="gender='male'"
title = f'Males with atrial fibrillation'
limit = 10

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit, size=[6,6])

#### Aggregate medication from patients with atrial fibrillation
[back to index](#index-of-operations)

In [None]:
# patients with AF, what are the top meds?
resource_type = 'Patient'
element_path = "reverseResolve(MedicationAdministration.subject).medicationCodeableConcept.coding.display"
filter_path = "reverseResolve(Condition.subject).code.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|42731) contains true"
title = f'Top medication administered for atrial fibrillation patients'
limit = 20

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit)

#### Aggregate procedures for atrial fibrillation patients
[back to index](#index-of-operations)

In [None]:
# patients with atrial fibrillation and their procedures
resource_type = 'Patient'
element_path = "reverseResolve(Procedure.subject).code.coding.display"
filter_path = "reverseResolve(Condition.subject).code.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|42731) contains true"
title = f'{resource_type}: Top procedures for atrial fibrillation patients'
limit = 15

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit)

#### Aggregate lab events
[back to index](#index-of-operations)

In [None]:
resource_type = 'Observation'
element_path = "code.coding.display"
filter_path = "meta.where(profile.first()='http://fhir.mimic.mit.edu/StructureDefinition/mimic-observation-labevents').empty().not()"
title = f'{resource_type}: Top observation labs'
limit = 20

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit)

#### Aggregate microbiology tests
[back to index](#index-of-operations)

In [None]:
# top microbiology tests for patients with atrial fibrillation
resource_type = 'Patient'
element_path = "reverseResolve(Observation.subject).code.coding.where(system.first()='http://fhir.mimic.mit.edu/CodeSystem/microbiology-test').display"
title = f'{resource_type}: Top microbiolgoy tests'
limit = 20

resp = get_aggregate(server, resource_type, element_path)
plot_aggregate(resp, title, limit)

#### Aggregate microbiology organisms
[back to index](#index-of-operations)

In [None]:
# top microbiology organisms for patients with atrial fibrillation
resource_type = 'Patient'
element_path = "reverseResolve(Observation.subject).code.coding.where(system.first()='http://fhir.mimic.mit.edu/CodeSystem/microbiology-organism').display"
title = f'{resource_type}: Top microbiology organism'
limit = 20

resp = get_aggregate(server, resource_type, element_path)
plot_aggregate(resp, title, limit, skip_missing=True)

#### Aggregate top emar medication
[back to index](#index-of-operations)

In [None]:
resource_type = 'MedicationAdministration'
element_path = "medicationCodeableConcept.coding.code"
filter_path = "meta.where(profile.first()='http://fhir.mimic.mit.edu/StructureDefinition/mimic-medication-administration').empty().not()"
title = f'{resource_type}: Top EMAR medication administered'
limit = 10

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit)

#### Aggregate top ICU medication 
[back to index](#index-of-operations)

In [None]:
resource_type = 'MedicationAdministration'
element_path = "medicationCodeableConcept.coding.display"
filter_path = "meta.where(profile.first()='http://fhir.mimic.mit.edu/StructureDefinition/mimic-medication-administration-icu').empty().not()"
title = f'{resource_type}: Top ICU medication administered'
limit = 10

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit)

#### Aggregate top labs for atrial fibrillation patient
[back to index](#index-of-operations)

In [None]:
# top labs for patients with atrial fibrillation
resource_type = 'Patient'
element_path = "reverseResolve(Observation.subject).code.coding.where(system.first()='http://fhir.mimic.mit.edu/CodeSystem/d-labitems').display"
filter_path = "reverseResolve(Condition.subject).code.subsumedBy(http://fhir.mimic.mit.edu/CodeSystem/diagnosis-icd9|42731) contains true"
title = f'{resource_type}: Top labs runs for Atrial Fibrillation patients'
limit = 20

resp = get_aggregate(server, resource_type, element_path, filter_path)
plot_aggregate(resp, title, limit)

## Extract resource table
[back to index](#index-of-operations)
- Run the get extract function and then go to the link specified in the response to download the content. Content will be in CSV format.

In [None]:
def get_extract(server, resource_type, columns, limit):      
    url = f'{server}/{resource_type}/$extract?'

    for column in columns:
        url = f'{url}column={column}&'
    
    url = f'{url}limit={limit}'
    resp = requests.get(url, headers={"Content-Type": "application/fhir+json"} )
    return resp.json()

In [None]:
resource_type = 'Patient'
column1 = 'gender'
column2 = 'birthDate'
columns = [column1, column2]
title = f'{resource_type}: {element_path}'
limit = 10

resp = get_extract(server, resource_type, columns, limit)
resp

## Streaming data (not fully implemented)

In [None]:
# This server is Pathling, but using HAPI FHIR server. Need to start HAPI FHIR for the below commands
server = 'http://localhost:8080/fhir'
url = f'{server}/metadata?mode=terminology '  
resp = requests.get(url=url, headers={"Content-Type": "application/fhir+json"})
resp.json()

#### $everything
- only writing out patient, condition, and procedure right now...

In [None]:
url = 'http://localhost:8080/fhir/Patient/a6e7e991-6801-5425-b435-4ca6b7decfcc/$everything?_type=Encounter'
resp = requests.get(url=url, headers={"Content-Type": "application/fhir+json"})

with open('output/patient_everything.ndjson', 'w+') as patfile:
    writer = ndjson.writer(patfile)
    i=0
    for entry in resp.json()['entry']:
        i=i+1
        print(f'writing resource {i}')
        writer.writerow(entry['resource'])