The aim of [Synthea](https://github.com/synthetichealth/synthea) is to generate rich, high quality, representative patient records.

Handbuilt [modules](https://synthetichealth.github.io/module-builder/) reflect epidemiology with respect to prevelance, disease heterogeneity, prognosis, etc.

These are based on US statistics.

Data can be generated or downloaded directly, including [specialized](https://synthea.mitre.org/downloads) datasets.

They wrote a [paper](https://academic.oup.com/jamia/article/25/3/230/4098271?login=true).

In [1]:
import pandas as pd

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [24]:
#read interesting files
#others relate to billing, organizations
patient_df = pd.read_csv('sample_data/patients.csv')
allergy_df = pd.read_csv('sample_data/allergies.csv')
careplan_df = pd.read_csv('sample_data/careplans.csv')
condition_df = pd.read_csv('sample_data/conditions.csv')
device_df = pd.read_csv('sample_data/devices.csv')
encounter_df = pd.read_csv('sample_data/encounters.csv')
imaging_study_df = pd.read_csv('sample_data/imaging_studies.csv')
immunization_df = pd.read_csv('sample_data/immunizations.csv')
medication_df = pd.read_csv('sample_data/medications.csv')
observation_df = pd.read_csv('sample_data/observations.csv')
procedure_df = pd.read_csv('sample_data/procedures.csv')

In [25]:
#see patients
patient_df.head()

Unnamed: 0,Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,MIDDLE,LAST,...,CITY,STATE,COUNTY,FIPS,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE,INCOME
0,6f971f99-f56f-233c-06c5-7cff1c1d0225,2016-10-27,,999-12-3237,,,,Harry448,Luciano237,Ratke343,...,Medfield,Massachusetts,Norfolk County,25021.0,2052,42.137196,-71.332594,9771.28,6436.31,975748
1,1598e485-f50d-ad05-086e-acdd7b897628,2016-09-03,,999-17-3111,,,,Tasia358,Mafalda94,Ledner144,...,Haverhill,Massachusetts,Essex County,25009.0,1830,42.745283,-71.104796,19127.77,8556.03,985308
2,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,1996-08-19,,999-46-4568,S99934773,X32481266X,Ms.,Royce974,,Simonis280,...,Fairhaven,Massachusetts,Bristol County,,0,41.674529,-70.852106,69450.55,81379.39,145771
3,9bdc9831-6723-44dd-5e29-f3f7f6494f70,2001-12-25,,999-75-5853,S99993381,X6826934X,Ms.,Esperanza675,Lucia634,Muñiz642,...,Medford,Massachusetts,Middlesex County,25017.0,2155,42.458645,-71.150454,5805.52,497471.14,4356
4,f3d34535-1d37-9e82-adbc-62ab1d0855c5,2013-07-23,,999-27-1061,,,,Halley419,Danita413,Sawayn19,...,Milford,Massachusetts,Worcester County,25027.0,1757,42.19227,-71.500335,1900.0,26679.64,12683


In [26]:
#condition records
condition_df.head()

#use patient id to link patients to condition records
patient_df = patient_df.rename(columns={"Id":"PATIENT"})
df = conditions_df.merge(patient_df, how="left", on="PATIENT")

#filter patient cohort by features
#only female with essential hypertension
df[(df["GENDER"] == "F") & (df["DESCRIPTION"] == 'Essential hypertension (disorder)')].head()

Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION
0,2016-09-03,2016-10-08,1598e485-f50d-ad05-086e-acdd7b897628,b38c4a16-c781-06ae-e920-c92dc5031d8f,http://snomed.info/sct,314529007,Medication review due (situation)
1,2016-10-27,2017-02-02,6f971f99-f56f-233c-06c5-7cff1c1d0225,c100715b-9c57-f82f-b4e9-afcac8f7f68f,http://snomed.info/sct,314529007,Medication review due (situation)
2,2016-12-10,2017-05-13,1598e485-f50d-ad05-086e-acdd7b897628,7b370fc5-5855-baf9-79aa-776b9a78f624,http://snomed.info/sct,314529007,Medication review due (situation)
3,2017-07-23,2017-07-23,1598e485-f50d-ad05-086e-acdd7b897628,292007b6-aa6b-2722-1bf9-f69d6bde5891,http://snomed.info/sct,241929008,Acute allergic reaction
4,2017-08-12,2018-08-11,1598e485-f50d-ad05-086e-acdd7b897628,420fc381-f93b-3651-aafe-36566d006d9d,http://snomed.info/sct,314529007,Medication review due (situation)


Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION,BIRTHDATE,DEATHDATE,SSN,...,CITY,STATE,COUNTY,FIPS,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE,INCOME
143,2019-05-28,,5cb361ab-88fc-325b-6e06-90b6a4e2126c,f2ac528b-3447-3665-fdb6-2d88f0b5760a,http://snomed.info/sct,59621000,Essential hypertension (disorder),1997-03-25,,999-48-5026,...,Everett,Massachusetts,Middlesex County,25017.0,2149,42.437832,-71.092039,8070.91,481475.43,14436
200,1993-10-17,,c158e029-dafd-2ecd-0bd5-91947a306e72,f12bc9ee-17b0-1012-ae90-01fe2c99377d,http://snomed.info/sct,59621000,Essential hypertension (disorder),1971-08-15,,999-88-1946,...,Worcester,Massachusetts,Worcester County,25027.0,1605,42.268712,-71.810893,27038.77,1041546.4,6598
273,2021-11-06,,1db976ed-12be-c7cd-7b51-466774a2ca90,e9cbc37c-111f-8e90-089c-9deb443652bf,http://snomed.info/sct,59621000,Essential hypertension (disorder),1973-10-13,,999-17-3416,...,Greenfield,Massachusetts,Franklin County,25011.0,1301,42.629106,-72.637615,445018.31,585184.39,60295
386,2017-03-29,,8283a4f0-6967-3c89-7b9a-88e0299ee7f2,40f90c01-9011-4da8-cc95-1e9fd804d596,http://snomed.info/sct,59621000,Essential hypertension (disorder),1962-02-28,,999-10-3315,...,Lynn,Massachusetts,Essex County,25009.0,1907,42.495736,-71.01356,21066.72,1105875.88,11878
422,2014-07-16,,d75705b7-43f6-ff36-904d-22b0903fd248,51c509e3-5bc1-b245-26c3-e5f47e0d0596,http://snomed.info/sct,59621000,Essential hypertension (disorder),1941-03-05,,999-98-9045,...,Springfield,Massachusetts,Hampden County,25013.0,1105,42.071544,-72.624442,846084.52,544952.14,47818


Tables use different structured vocabularies:
Snomed is most common, tests use LOINC, immunizations use CVX, medications use RXNorm

Most tables correspond to a single code (or node) corresponding to a finding or event associated with a patient  
The exception is observations, where a LOINC code and DESCRIPTION reference a measure/ test (BMI, QOLS, FEV, etc) and VALUE/UNITS/TYPE give the patient's measurement

In [27]:
#careplan table
careplan_df.head()

Unnamed: 0,Id,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,REASONCODE,REASONDESCRIPTION
0,ecfd8214-2880-81b9-c4ec-5fca6ff8c123,2017-07-24,,1598e485-f50d-ad05-086e-acdd7b897628,9a082496-10c2-0ec9-ae13-349400b8742d,384758001,Self-care interventions (procedure),,
1,fbfc639f-9936-fc08-80f4-1cb2bedc40de,2020-04-18,,1598e485-f50d-ad05-086e-acdd7b897628,fc62f7b7-c461-3589-4363-49c2b76a6711,711282006,Skin condition care,24079001.0,Atopic dermatitis
2,785c3a8e-60c7-4288-f3cc-5d8bd2066435,2020-09-24,2020-11-01,1598e485-f50d-ad05-086e-acdd7b897628,7f43472f-ec49-a67d-dcc1-3d8f9b126761,773513001,Physiotherapy care plan (record artifact),44465007.0,Sprain of ankle
3,24fed057-dc66-853a-7530-2ad38eec3287,2021-06-10,2021-06-23,1598e485-f50d-ad05-086e-acdd7b897628,3f9aadc0-5009-0282-bd7e-2ef879ec9052,53950000,Respiratory therapy,,
4,a077c57a-49bd-7d39-3bcb-bbde8fb1ed94,2018-07-08,2018-08-08,9bdc9831-6723-44dd-5e29-f3f7f6494f70,407c1d5d-de89-a22e-6441-b6c26bdb3ef2,773513001,Physiotherapy care plan (record artifact),70704007.0,Sprain of wrist


In [28]:
#observation table
observation_df.head()

Unnamed: 0,DATE,PATIENT,ENCOUNTER,CATEGORY,CODE,DESCRIPTION,VALUE,UNITS,TYPE
0,2016-09-03T12:50:21Z,1598e485-f50d-ad05-086e-acdd7b897628,b38c4a16-c781-06ae-e920-c92dc5031d8f,vital-signs,8302-2,Body Height,49.6,cm,numeric
1,2016-09-03T12:50:21Z,1598e485-f50d-ad05-086e-acdd7b897628,b38c4a16-c781-06ae-e920-c92dc5031d8f,vital-signs,72514-3,Pain severity - 0-10 verbal numeric rating [Sc...,1.0,{score},numeric
2,2016-09-03T12:50:21Z,1598e485-f50d-ad05-086e-acdd7b897628,b38c4a16-c781-06ae-e920-c92dc5031d8f,vital-signs,29463-7,Body Weight,3.3,kg,numeric
3,2016-09-03T12:50:21Z,1598e485-f50d-ad05-086e-acdd7b897628,b38c4a16-c781-06ae-e920-c92dc5031d8f,vital-signs,77606-2,Weight-for-length Per age and sex,29.7,%,numeric
4,2016-09-03T12:50:21Z,1598e485-f50d-ad05-086e-acdd7b897628,b38c4a16-c781-06ae-e920-c92dc5031d8f,vital-signs,8289-1,Head Occipital-frontal circumference Percentile,20.2,%,numeric


All patient records (recorded data points from medical events) are organized by Encounter as an overarching container.

Encounter provides a high-level view of a medical event for a given patient and serves as an index for all data generated at that encounter. 

A single encounter may involve dealing with multiple conditions, different procedures, observations and medications all generating data.

Although structured, the encounter may not explicitly describe how this data needs to be intepreted:  
which diagnostic tests and treatments are assocated with which conditions, patient criteria, etc., both within a single encounter and across patient history

In [17]:
#single encounter row
#snomed reasoncode not always provided
encounter_df.iloc[[100]]

Unnamed: 0,Id,START,STOP,PATIENT,ORGANIZATION,PROVIDER,PAYER,ENCOUNTERCLASS,CODE,DESCRIPTION,BASE_ENCOUNTER_COST,TOTAL_CLAIM_COST,PAYER_COVERAGE,REASONCODE,REASONDESCRIPTION
100,71c09f13-658e-f454-348f-82677e138ee9,2018-01-17T09:33:04Z,2018-01-17T10:49:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,dd280187-8662-3732-af81-595d10572911,834b48f3-853d-3097-9884-4357e67caf2f,a735bf55-83e9-331a-899d-a82a60b9f60c,ambulatory,185345009,Encounter for symptom,85.55,3179.93,2543.94,45816000.0,Pyelonephritis


In [38]:
#find all records associated with a single encounter
encounter_id = "71c09f13-658e-f454-348f-82677e138ee9"

#loop through tables
#print table if it has any data linked to that encounter id
for dataframe in [
        ("allergy", allergy_df),
        ("careplan", careplan_df),
        ("condition", condition_df),
        ("device", device_df),
        ("imaging", imaging_study_df),
        ("immunization", immunization_df),
        ("medication", medication_df),
        ("procedure", procedure_df),
        ("observation", observation_df)
        ]:

    df = dataframe[1]
    if len(df[df["ENCOUNTER"] == encounter_id]) > 0:
        print()
        print(dataframe[0])
        df[df["ENCOUNTER"] == encounter_id]
        print()


medication


Unnamed: 0,START,STOP,PATIENT,PAYER,ENCOUNTER,CODE,DESCRIPTION,BASE_COST,PAYER_COVERAGE,DISPENSES,TOTALCOST,REASONCODE,REASONDESCRIPTION
20,2018-01-17T10:49:41Z,2018-01-18T16:49:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,a735bf55-83e9-331a-899d-a82a60b9f60c,71c09f13-658e-f454-348f-82677e138ee9,309078,cefpodoxime 200 MG Oral Tablet,129.94,103.95,1,129.94,45816000.0,Pyelonephritis




procedure


Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION,BASE_COST,REASONCODE,REASONDESCRIPTION
227,2018-01-17T09:33:04Z,2018-01-17T09:39:47Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,84100007,History taking (procedure),431.4,,
228,2018-01-17T09:39:47Z,2018-01-17T09:49:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,223470000,Discussion about signs and symptoms (procedure),431.4,,
229,2018-01-17T09:49:41Z,2018-01-17T10:04:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,67879005,History and physical examination limited (pro...,431.4,,
230,2018-01-17T10:04:41Z,2018-01-17T10:19:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,386053000,Evaluation procedure (procedure),431.4,,
231,2018-01-17T10:19:41Z,2018-01-17T10:29:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,57617002,Urine specimen collection (procedure),431.4,,
232,2018-01-17T10:29:41Z,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,441550005,Urinalysis with reflex to microscopy and cultu...,431.4,,
233,2018-01-17T10:39:41Z,2018-01-17T10:49:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,http://snomed.info/sct,281789004,Antibiotic therapy (procedure),431.4,,




observation


Unnamed: 0,DATE,PATIENT,ENCOUNTER,CATEGORY,CODE,DESCRIPTION,VALUE,UNITS,TYPE
814,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5792-7,Glucose [Mass/volume] in Urine by Test strip,4.4,mg/dL,numeric
815,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,2514-8,Ketones [Presence] in Urine by Test strip,Urine ketone test negative (finding),{nominal},text
816,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5811-5,Specific gravity of Urine by Test strip,1.0,{nominal},numeric
817,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5803-2,pH of Urine by Test strip,6.2,pH,numeric
818,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5804-0,Protein [Mass/volume] in Urine by Test strip,12.4,mg/dL,numeric
819,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5802-4,Nitrite [Presence] in Urine by Test strip,Urine nitrite positive (finding),{nominal},text
820,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5794-3,Hemoglobin [Presence] in Urine by Test strip,Urine blood test = + (finding),{nominal},text
821,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5799-2,Leukocyte esterase [Presence] in Urine by Test...,Urine leukocyte test = + (finding),{nominal},text
822,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,5821-4,WBCs,34.4,/[HPF],numeric
823,2018-01-17T10:39:41Z,04bf2325-37b7-d5c9-ae57-e9798e45f5e2,71c09f13-658e-f454-348f-82677e138ee9,laboratory,13945-1,RBCs,1.6,/[HPF],numeric



