# MIMIC-FHIR Medication
An exploration of the medication in MIMIC and how best to map into FHIR

In [None]:
%matplotlib inline
# Imports
import pandas as pd
import numpy as np
import psycopg2
from pathlib import Path
import matplotlib.pyplot as plt

In [None]:
# Database connection
sqluser = 'postgres'
sqlpass = 'postgres'
dbname = 'mimic'
host = 'localhost'

db_conn = psycopg2.connect(dbname=dbname, user=sqluser, password=sqlpass, host=host)

## Medication
Look into medication tables in MIMIC: `emar`, `emar_detail`, `input_events`, `prescriptions`, and `pharmacy`

### 1. Medication Identifier
In FHIR, medication is stored as a resource that can be referenced by other medication actions (Request, Dispense, Administration). A common identifier is needed then from MIMIC that can relate across `emar`, `emar_detail`, `input_events`, `prescriptions`, and `pharmacy`.

Potential identifiers
- GSN: generic sequence number for medication, found in `prescriptions`
- NDC: national drug code for medication, found in `prescriptions`
- product_code/product_description: medication details found in `emar_detail`
- drug name: The straight drug name, found in all tables

The final medication identifier could be a single one of the mentioned identifiers but could also be a combination depending on table linking limitations.

### 1.1 Medication Identifier - GSN
GSN is found in `prescriptions`. A typical GSN is a six digit number.

In [None]:
q_gsn = """ SELECT length(gsn), count(pharmacy_id) FROM mimic_hosp.prescriptions 
            GROUP BY length(gsn) """
gsn = pd.read_sql_query(q_gsn, db_conn)

In [None]:
# From gsn these should all be 6 digit numerics. The number of NA and varying 
# sizes is concerning for use as an identifier. GSN can have multiple gsn it appears separated by commas
gsn

In [None]:
# Plot of all gsn lengths, the length of 6 is clearly the majority
plt.bar(gsn['length'], gsn['count'], width=6)

In [None]:
# Excluding the 6 digit GSNs (the majority), we can see empty gsn are found
# Not seen in the graph also are the ~2 million NA values for GSN
plt.bar(gsn['length'].drop(1, axis=0), gsn['count'].drop(1, axis=0), width =6)

**Conclusion**: GSN has too much variance and missing data to be used as the medication identifier

### 1.2 Medication Identifier - NDC
The NDC is found in `prescriptions` and should be a 11 digit numeric identifier

In [None]:
q_ndc = """ SELECT length(ndc), count(pharmacy_id) FROM mimic_hosp.prescriptions 
            GROUP BY length(ndc) """
ndc = pd.read_sql_query(q_ndc, db_conn)
ndc

NDC values with length zero are primarly just zero values, so missing data. NDC unfortunately is missing ~2 million medication codes

**Conclusion**: NDC is missing too may values to be the medication identifier, but is a good target for medication codes in future concept mapping

### 1.3 Medication Identifier - Product Code
The product code is found in `emar_detail`. The one limitate of the product code is that there is no linkage to the ICU tables. Could potentially link back using just the drug name

In [None]:
q_product_code = """ 
    SELECT 
        SUM(CASE WHEN product_code IS NULL THEN 1 ELSE 0 END) AS null_count
        , SUM(CASE WHEN product_code IS NOT NULL THEN 1 ELSE 0 END) AS valid_count
    FROM mimic_hosp.emar_detail
"""
product_code = pd.read_sql_query(q_product_code, db_conn)
product_code

**Conclusion**: Again way too many null values(~29 million) for this to be the sole identifier for medication

### 1.4 Medication Identifier - Drug Name
The drug name is common across all the medication tables in MIMIC. The main limitation is that product specific information would be missing if the drug name is used alone.

In [None]:
# Medication from prescriptions (10,255 distinct values)
q_pr_meds = "SELECT drug as medication FROM mimic_hosp.prescriptions"
pr_meds = pd.read_sql_query(q_pr_meds, db_conn)
pr_meds.medication.unique().size

In [None]:
# Medication from pharmacy (10,229 distinct values)
q_ph_meds = "SELECT medication FROM mimic_hosp.pharmacy"
ph_meds = pd.read_sql_query(q_ph_meds, db_conn)
ph_meds.medication.unique().size

In [None]:
# Medication from emar (4,293 distinct values)
q_em_meds = "SELECT medication FROM mimic_hosp.emar"
em_meds = pd.read_sql_query(q_em_meds, db_conn)
em_meds.medication.unique().size

In [None]:
# Medication from inputevents/d_items (470 distinct values)
q_ie_meds = """
    SELECT di.label as medication
    FROM mimic_icu.d_items di
    WHERE di.linksto = 'inputevents'
"""
ie_meds = pd.read_sql_query(q_ie_meds, db_conn)
ie_meds.medication.unique().size

In [None]:
# Check for NA values 
data = {'Tables' : ['prescriptions', 'pharmacy', 'emar', 'inputevents'],
        'NA Count' : [pr_meds.isna().sum().medication, 
                      ph_meds.isna().sum().medication, 
                      em_meds.isna().sum().medication, 
                      ie_meds.isna().sum().medication]}
print(pd.DataFrame(data))

So pharmacy and emar both have some NA values, lets investigate the scale and reason behind that
Primary reason is likely IV meds

In [None]:
q_emar = """
    SELECT 
        em.medication
        , em.pharmacy_id as em_pharmacy_id
        , ed.pharmacy_id as ed_pharmacy_id
        --, ed.parent_field_ordinal
        , ed.product_code
    FROM 
        mimic_hosp.emar em
        LEFT JOIN mimic_hosp.emar_detail ed 
            ON em.emar_id = ed.emar_id 
    WHERE
        em.medication IS NULL
        AND ed.parent_field_ordinal IS NOT NULL
"""
emar = pd.read_sql_query(q_emar, db_conn)
# values still have pharmacy_ids which link back to actual values (mimic update to fill these values?)

In [None]:
print(f'Null medication in emar size: {emar.size}')
emar

In [None]:
# Most of the remaining values have a pharmacy_id or product code to link to, 
# but there is still a subset missing that too!

emar.loc[(emar.medication.isnull()) 
         & (emar.em_pharmacy_id.isnull()) 
         & (emar.ed_pharmacy_id.isnull()) 
         & (emar.product_code.isnull()) 
        ].info()

# These emar have no information for FHIR, so we will need to filter 
# out the ~50,000 emar events with no related medication, potentially IV but no link to it (maybe poe)

In [None]:
# Check pharmacy table null values
q_pharma = """
    SELECT *
    FROM 
        mimic_hosp.pharmacy ph
    WHERE
        ph.medication IS NULL
"""
pharma = pd.read_sql_query(q_pharma, db_conn)
pharma.info()

The `proc_type` is indicative what kind of medication is being delivered. Since the `proc_type` is non-null for all values where medication is null we can glean the medication intent.

In [None]:
pharma.proc_type.unique()

In [None]:
pharma.groupby(['proc_type']).size()

The primary offender is IV/TPN, so we can decide if these just get mapped to one thing. The few irrigation/unit dose cases could probably just be omitted.

### 1.5 Medication Identifier - Decisions
From the options the best option initially will be to use medication names. Limitations with using names:
- There are ~55,000 emar events without an explicit medication name or pharmacy link. Likely IV meds
- There are ~1 million pharmacy entries without an explicit medication name. Primarily IV procedures, so again could be grouped into IV meds
- The `prescriptions` table can have multiple drugs under the same pharmacy_id, so they will need to be grouped
  - Proposal for now is to use the format MAIN_BASE_ADDITIVE based on drug_type to concatenate values 
  - Other option is to concatenate them alphabetically

## 2. Medication Examples
Look into specific cases for medication
- Pills - ranitidine/acetominophin
- Infusion - heparin/noepinephrine
- Antibiotic - vancamycin
- Saline - IV entries

Grab the medication administrations from one patient, who took them all throughout a hopsital stay

In [None]:
subject_id = 10012853
q_med_ex = f"""
    SELECT 
        em.*
        , ed.emar_seq
        , ed.parent_field_ordinal
        , ed.administration_type
        , ed.barcode_type
        , ed.dose_due
        , ed.dose_due_unit
        , ed.dose_given
        , ed.dose_given_unit
        , ed.product_amount_given
        , ed.product_unit
        , ed.product_description
        , ed.product_code
        , ed.infusion_rate
        , ed.infusion_rate_unit
        , ed.route
    FROM 
        mimic_hosp.emar em
        LEFT JOIN mimic_hosp.emar_detail ed
            ON em.emar_id = ed.emar_id    
    WHERE em.subject_id = {subject_id}
"""
med_ex = pd.read_sql_query(q_med_ex, db_conn)


And you can see the different medications and the count of taking them

In [None]:
med_ex.groupby(['medication']).size()

### 2.1 Medication Examples - Pills

In [None]:
pill = 'Acetaminophen' # Aspirin

In [None]:
idx = med_ex['parent_field_ordinal'].notnull()

In [None]:
emar = med_ex.loc[idx, :].copy()

In [None]:
pills = emar[(emar.medication == pill)]

In [None]:
# The typical product_unit for something like Acetaminophen is a tablet (TAB)
pills.product_unit

cols = ['subject_id', 'emar_id', 'pharmacy_id', 'charttime', 'medication',
       'administration_type', 'dose_given', 'dose_given_unit', 'product_unit']

pills[cols]

### 2.2 Medication Examples - Infusion
Look into heparin in emar and inputevents

### 2.2.1 - Infusion emar

In [None]:
infusion = 'Heparin'

In [None]:
infusions = med_ex[(med_ex.medication == infusion)]

In [None]:
cols = ['subject_id', 'emar_id', 'pharmacy_id', 'charttime', 'medication',
       'administration_type', 'dose_given', 'dose_given_unit', 'product_unit',
       'infusion_rate', 'infusion_rate_unit']

infusions[cols].head(n=10)

In [None]:
infusions.groupby(['medication']).size()

In [None]:
q_heparin = """
    SELECT * 
    FROM 
        mimic_hosp.emar em
        LEFT JOIN mimic_hosp.emar_detail ed
            ON em.emar_id = ed.emar_id   
    WHERE medication = 'Heparin'
    LIMIT 1000
"""
heparin = pd.read_sql_query(q_heparin, db_conn)

In [None]:
heparin.info()

In [None]:
heparin[0:10]

- Heparin is delivered with infusion, but the 

### 2.2.2 infusion inputevents