# MIMIC Microbiology on FHIR
This notebook looks to explore the microbiology data as it migrates from MIMIC to FHIR
In this notebook a couple key areas will be explored:
- Microbiology tests with no explicit result
- Microbiology tests with organisms and no susceptibility
- Microbiology tests with organisms and susceptibility
- Do we need to add a specimen resource? 
  - Currently not referencing since no extra information is present (check with alistair if more can be made available)

To Do
- Add invariant to ObservationMicroTest profile to specify one of hasMember or valueString MUST be present
- Add Specimen resource?
  - Can we assume the collection time is the charttime?
  - Is it worth having specimen resource if all it has is an identifier, time, and patient link?
- Do dilution values need to be stored in ObservaitonMicroSusc?


In microbiology there are 3,397,914 rows that need to be translated into FHIR. That breaks down to:
- Test with no organism: 2,064,764 
- Test with organism no susc: 185,980
- Test with organism and susc: 1,147,170

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import psycopg2
import os
import uuid
from pathlib import Path
from dotenv import load_dotenv
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot') 

In [None]:
# load environment variables
load_dotenv(load_dotenv(Path(Path.cwd()).parents[0] / '.env'))

SQLUSER = os.getenv('SQLUSER')
SQLPASS = os.getenv('SQLPASS')
DBNAME_MIMIC = os.getenv('DBNAME_MIMIC')
HOST = os.getenv('HOST')

# Connect to DB
con = psycopg2.connect(dbname=DBNAME_MIMIC, user=SQLUSER, password=SQLPASS, host=HOST)

# initialize uuid namespaces
ns_mimic = uuid.uuid5(uuid.NAMESPACE_OID, 'MIMIC-IV')
ns_micro_test = uuid.uuid5(ns_mimic,'ObservationMicroTest')
ns_micro_org = uuid.uuid5(ns_mimic,'ObservationMicroOrg')
ns_micro_susc = uuid.uuid5(ns_mimic,'ObservationMicroSusc')

## 1. Tests with no linked organisms
What is the value of storing a test with no orgnaisms or explicit results? The results appear to be more of a non-result, with notes stored in the `comments` column.

To explore this we will pull in a sample of urine culture test with no organisms associated with them.

In [None]:
q_microtest = """
    SELECT * FROM mimic_hosp.microbiologyevents m 
    WHERE org_itemid IS NULL AND test_name = 'URINE CULTURE'
    LIMIT 100
"""
microtest = pd.read_sql_query(q_microtest,con)

In [None]:
microtest

In [None]:
microtest.comments[20:40]

What we see is that the information we can pull from a test with no organism is timing and the comment. The comment for urine culture specified 'No Growth' in the most straight forward case and 'Mixed Bacterial Flora' in the more detailed comment. Both effectively state that the test could not be completed for various reasons.

The primary reasons for no test results appear to be
- no growth
- contaminiation
- not enough organism

Let's pull all comments from microbiologyevents and see the count for different reasons

In [None]:
q_microcomments = """
    SELECT comments FROM mimic_hosp.microbiologyevents m 
    WHERE org_itemid IS NULL AND comments IS NOT NULL
"""
microcomments = pd.read_sql_query(q_microcomments,con)

In [None]:
microcomments.head()

In [None]:
# Get the count for each distinct comment
df_comments = microcomments.groupby(['comments']).size().reset_index(name='counts')

# Grab the most commonly used comments and plot!
df_comments_plt = df_comments[df_comments.counts > 20000]
df_comments_plt.comments = df_comments_plt.comments.str[0:30] # trim comment so that it can be displayed 
df_comments_plt = df_comments_plt.sort_values('counts', ascending=True)

In [None]:
plt.figure(figsize=[12,8])
plt.barh(df_comments_plt.comments, df_comments_plt.counts)
plt.xticks(rotation=45)
plt.xlabel('Comment counts')
plt.show()

From the plot of the most common comments for tests with no organism, we can clearly see a pattern. The majority of comments are non-results where the organism could not be isolated or grown. A couple do state results with positive/negative readings. 

Thus it is useful to store the test comments as a value in the FHIR Observation resource to specify why a result/organism is not present. Let's look at an example of just that in FHIR!

In [None]:
micro_id = 'd6e9f4ae-8449-5607-b5c1-264a3393e674'
q_fhir_micotest = """
    SELECT * 
    FROM mimic_fhir.observation_micro_test
    WHERE id = 'd6e9f4ae-8449-5607-b5c1-264a3393e674'
"""
fhir_microtest = pd.read_sql_query(q_fhir_micotest,con)
fhir_microtest

In [None]:
fhir_microtest.fhir[0]

The FHIR resource summarizes the test with no organism with the following information:
- code -> the test completed
- subject -> the patient the test was performed on
- valueString -> the result comments, since no organism specified
- effectiveDateTime -> the time the result was entered

## 2. Tests with organism and no susceptiblity
There are some cases where an organsim will be found for a test but not susceptibility results come through. The primary reason for this is healthcare providers putting the results in comments instead.

Breakdown of rows translating to FHIR
- Total: 185,890
- With comments: 42,300
  - Can report comments as valueString in place of the interpretation code
- Without comments: 143,680
  - Not really much we can report then...
  
In this case we will have a resource for the test and a resource for the organism. Some notes
- The test resource will have timing information and the references to organisms
- The organism resource will have the organism details but no results...

In [None]:
q_microtest2 = """
    SELECT m.* 
    FROM 
        mimic_hosp.microbiologyevents m 
        INNER JOIN fhir_etl.subjects sub
            ON m.subject_id = sub.subject_id
    WHERE org_itemid IS NOT NULL AND interpretation IS NULL
    LIMIT 100
"""
microtest2 = pd.read_sql_query(q_microtest2,con)
microtest2

The FHIR test resource just contains a link to the organism resources:

In [None]:
# pull in mimic_fhir resources for test and organism
micro_specimen_id = microtest2.iloc(0)[0].micro_specimen_id
test_itemid = microtest2.iloc(0)[0].test_itemid
uuid_test = uuid.uuid5(ns_micro_test,f'{micro_specimen_id}-{test_itemid}')

q_fhir_microtest2 = f"""
    SELECT * FROM mimic_fhir.observation_micro_test
    WHERE id = '{str(uuid_test)}'
"""
fhir_microtest2 = pd.read_sql_query(q_fhir_microtest2,con)

In [None]:
fhir_microtest2.fhir[0]

From the test Observation resource we can see the organism uuid in the `hasMember` element. Let's pull that in to see the organism Observation resource:

In [None]:
uuid_micro_org = fhir_microtest2.fhir[0]['hasMember'][0]['reference'].split('/')[1]
q_fhir_micro_org2= f"""
    SELECT * FROM mimic_fhir.observation_micro_org
    WHERE id = '{str(uuid_micro_org)}'
"""
fhir_micro_org2 = pd.read_sql_query(q_fhir_micro_org2,con)

In [None]:
fhir_micro_org2.fhir[0]

In the microrgnaism Observation resource there are a couple interesting elements
- derivedFrom -> this points back to the test Observation that pointed to this organism
- valueString -> provides the result context if no susceptiblity data is present
  - This is value is imputed if there is an organism with no antibiotics tested or comments. Imputed value is 'No susceptibility data present
- effectiveDateTime - > This is the same time as the test, as 

## 3. Test with orgnaism and susceptibility
Tests with organism and susceptiblity acount for only a 1/3 of the microbiology data in mimic but it is the more useful data. Let follow one test through the whole process test->organism->susceptibility

A common microbiology event is testing of urine cultures, so let's follow an example of that through the whole process. 

![microbiology_workflow](img/microbiology_workflow.png)

Load in an urine culture test example

In [None]:
specimen_id = 364005
test_itemid = 90039 # URINE CULTURE
q_micro_test3 = f"""
    SELECT 
        microevent_id
        , micro_specimen_id
        , charttime
        , storetime
        , test_itemid
        , test_name
        , org_name
        , ab_itemid
        , ab_name
        , interpretation
    FROM mimic_hosp.microbiologyevents
    WHERE 
        micro_specimen_id = {specimen_id}
        AND test_itemid = {test_itemid}
"""
micro_test3 = pd.read_sql_query(q_micro_test3,con)

In [None]:
micro_test3

From the above table from MIMIC *microbiologyevents* we can see an urine culture example. Some things to note:
- There are 2 organisms that were identified
- Each of those organisms was tested for multiple antibiotic susceptibility


The resulting translation to FHIR should have:
- 1 test Observation resource
- 2 organism Observation resources
- 18 susceptiblity Observation resources

What we see coming to FHIR:

In [None]:
test_uuid = uuid.uuid5(ns_micro_test, f'{specimen_id}-{test_itemid}')

q_fhir_microtest3 = f"""
    SELECT * FROM mimic_fhir.observation_micro_test
    WHERE id = '{test_uuid}'
"""
fhir_micro_test3 = pd.read_sql_query(q_fhir_microtest3,con)                       

We can see that the test Resource has the test 'URINE CULTURE' set as a `code` and references the two organisms in `hasMember` 

In [None]:
fhir_micro_test3.fhir[0]

Let's pull up one of the organisms, PSEUDOMONAS AERUGINOSA.

In [None]:
org_uuid = fhir_micro_test3.fhir[0]['hasMember'][1]['reference'].split('/')[1]

q_fhir_micro_org3 = f"""
    SELECT * FROM mimic_fhir.observation_micro_org
    WHERE id = '{org_uuid}'
"""
fhir_micro_org3 = pd.read_sql_query(q_fhir_micro_org3,con)   

In [None]:
fhir_micro_org3.fhir[0]

We now see that the ObservationMicroOrg resource points to multiple Observations in `hasMember`. These Observations are suceptiblity results. Now let's pull up one of these to complete the full test->organism->susceptbility workflow

In [None]:
susc_uuid = fhir_micro_org3.fhir[0]['hasMember'][0]['reference'].split('/')[1]

q_fhir_micro_susc3 = f"""
    SELECT * FROM mimic_fhir.observation_micro_susc
    WHERE id = '{susc_uuid}'
"""
fhir_micro_susc3 = pd.read_sql_query(q_fhir_micro_susc3,con)   

In [None]:
fhir_micro_susc3.fhir[0]

We finally have some results! Let's dive into them. 
- `code` tells use which antibiotic was tested - Ciproflaxin here
- `derivedFrom` points back to the organism resource we are testing
- `valueCodeableConcept` is the susceptibility result- here we have R which is resistant

We have now gone through the full microbiology workflow. Some summary notes:
- ObservationMicroTest
  - Marks the time the specimen was taken in `effectiveDateTime`
  - Specifies reason test has no results in `valueString`
  - Links to organisms tested in `hasMember`
- ObservationMicroOrg
  - Specifies reason an organism has no results in `valueString` if needed
  - Links to susceptiblity results in `hasMember`
  - Links to the parent test in `derivedFrom`
- ObservationMicroSusc
  - Marks the time the result was avaialble in `effectiveDateTime`
  - Results are presented in `valueCodeableConcept`
  - Links back to the organism observation in `derivedFrom`