The following steps explore the generated Synthea dataset in order to understand what it contains. The in intent is particularly to understand whether there are records within it which might form part of an asthma cohort.

In [1]:
import json
from collections import Counter
import pandas as pd

patients = []
with open('../output/synthea/filtered/fhir/Patient.ndjson') as file1:
    while True:
        line = file1.readline()        
        if len(line) > 0:
            pt = json.loads(line.strip())
            patients.append(pt)
        if not line:
            break

rTypes = Counter()
    
for p in patients:
    rTypes[p['resourceType']] += 1
    extCounter = Counter()
    for e in p['extension']:
        extCounter[e['url']] +=1
    print (json.dumps(extCounter, indent=3))


print (json.dumps(rTypes, indent=3))

{
   "http://hl7.org/fhir/us/core/StructureDefinition/us-core-race": 1,
   "http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity": 1,
   "http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName": 1,
   "http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex": 1,
   "http://hl7.org/fhir/StructureDefinition/patient-birthPlace": 1,
   "http://synthetichealth.github.io/synthea/disability-adjusted-life-years": 1,
   "http://synthetichealth.github.io/synthea/quality-adjusted-life-years": 1
}
{
   "http://hl7.org/fhir/us/core/StructureDefinition/us-core-race": 1,
   "http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity": 1,
   "http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName": 1,
   "http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex": 1,
   "http://hl7.org/fhir/StructureDefinition/patient-birthPlace": 1,
   "http://synthetichealth.github.io/synthea/disability-adjusted-life-years": 1,
   "http://synthetichealth.gith

We can see the data in more compact form via a DataFrame. This shows that the same set of seven attributes are available for all seven patients.

In [2]:
patient_list = []
patient_ids = []
for p in patients:
    rTypes[p['resourceType']] += 1
    extCounter = Counter()
    for e in p['extension']:
        ext = e['url'].split('/')[-1]
        extCounter[ext] += 1
    patient_list.append(extCounter)
    patient_ids.append(p['id'])

pd.set_option("display.max_rows", None, "display.max_columns", None)
pdf = pd.DataFrame(patient_list)    
pdf

Unnamed: 0,us-core-race,us-core-ethnicity,patient-mothersMaidenName,us-core-birthsex,patient-birthPlace,disability-adjusted-life-years,quality-adjusted-life-years
0,1,1,1,1,1,1,1
1,1,1,1,1,1,1,1
2,1,1,1,1,1,1,1
3,1,1,1,1,1,1,1
4,1,1,1,1,1,1,1
5,1,1,1,1,1,1,1
6,1,1,1,1,1,1,1
7,1,1,1,1,1,1,1
8,1,1,1,1,1,1,1
9,1,1,1,1,1,1,1


The attributes above are informative but are unlikely to part of a meaningful query in building an asthma cohort.

What else might we look at? Let's try Observations.

In [3]:
import pandas as pd

obsCounter  = Counter()
codeCounter = Counter()
with open('../output/synthea/filtered/fhir/Observation.ndjson') as file1:

    while True:
        line = file1.readline()        
        #print(line)
        
        if len(line) > 0:
            o = json.loads(line.strip())
            
            obsCounter[o['subject']['reference']] +=1
            obs_display_name = o['code']['coding'][0]['display']
            codeCounter[obs_display_name] +=1
        if not line:
            break


#Summarize
print(f"Number of patients with observations {len(obsCounter.keys())}")

#print("Observation count per patient")
#print(json.dumps(obsCounter, indent=3))
print("Coding counts")
#print(json.dumps(codeCounter, indent=3))
df = pd.DataFrame.from_dict(codeCounter,  orient='index')
pd.set_option("display.max_rows", None, "display.max_columns", None)
df

Number of patients with observations 125
Coding counts


Unnamed: 0,0
Body Height,1362
Pain severity - 0-10 verbal numeric rating [Score] - Reported,1451
Body Weight,1436
Body Mass Index,1247
Body mass index (BMI) [Percentile] Per age and gender,304
Blood Pressure,1698
Heart rate,1401
Respiratory rate,1401
Tobacco smoking status NHIS,1362
"Protocol for Responding to and Assessing Patients' Assets, Risks, and Experiences [PRAPARE]",983


### Drilling down on FEV1/FVC
One observation present in the above listing which is of particular relevance to asthma is FEV1/FVC.

In [5]:
import pandas as pd

obs_of_interest = ['FEV1/FVC']

# Using readline()
obsCounter  = Counter()
codeCounter = Counter()
interesting_obs = []
with open('../output/synthea/filtered/fhir/Observation.ndjson') as file1:

 
    while True:
 
        # Get next line from file
        line = file1.readline()
        
        #print(line)
        
        if len(line) > 0:
            o = json.loads(line.strip())
            
            obsCounter[o['subject']['reference']] +=1
            obs_display_name = o['code']['coding'][0]['display']
            codeCounter[obs_display_name] +=1
            if obs_display_name in obs_of_interest:
                interesting_obs.append(o)


        if not line:
            break

pCounter = Counter()
for o in interesting_obs:
    pCounter[o['subject']['reference']] +=1
print(json.dumps(pCounter, indent=3))

{
   "Patient/9ce771bb-8cba-e201-8da7-a64c8c952d5e": 8,
   "Patient/a4cb9a37-0bdc-274c-6a26-95851d4702de": 8,
   "Patient/fd42bb2e-5d66-c8e1-0a51-bca7713dce1d": 12,
   "Patient/06ad94ec-8b8f-d6b3-0603-cee340430b2d": 8,
   "Patient/fa76b112-45ea-3a91-4f05-a0e9faa4616d": 2
}


There are 38 observations of FEV1/FVC a diagnostic indicator of asthma. The 38 observations only come from 5 patients. This suggests that generator is making an attempt to tell a medically coherent story. FEV1/FVC measurements are being taken from the same patient.

What diagnoses do these 5 patients have? Can we explore how the measurements of FEV1/FVC fit in the patients course of clinical care (encounters) and conditiosn they were diagnosed with.

In [7]:
patients_of_interest = list(pCounter.keys())
patient_conditions = {}
for p in pCounter.keys():
    patient_conditions[p] = {'conditions':[]}

with open('../output/synthea/filtered/fhir/Condition.ndjson') as file1:
        while True:
 
            # Get next line from file
            line = file1.readline()
        
            #print(line)
        
            if len(line) > 0:
                cond = json.loads(line.strip())
            
                subject = cond['subject']['reference']
                #print(subject)
                if subject in patients_of_interest:
                    encounter_id = cond['encounter']['reference'].split('/')[-1]
                    #conditions.append(cond)
                    cond_record = {'name':cond['code']['text'],
                                   'clinicalStatus':cond['clinicalStatus']['coding'][0]['code'],
                                  'encounter':encounter_id,
                                  'recorded_date':cond['recordedDate']}
                    patient_conditions[subject]['conditions'].append(cond_record)

            if not line:
                break                
#print(json.dumps(patient_conditions, indent=3))
for pt in patient_conditions:
    print(pt)
    
    # find the patient_fevs
    patient_fevs = [o for o in interesting_obs if pt == o['subject']['reference'] ]
    patient_fev_dict = {}
    for fev in patient_fevs:
        vq = fev['valueQuantity']
        encounter_id = fev['encounter']['reference'].split('/')[-1]
        if vq['unit'] == '%':
            fev_record = {
             'fev1/fvc_pct':vq['value'],
            'effectiveDateTime':fev['effectiveDateTime']}
            patient_fev_dict[encounter_id] = fev_record
        else:
            print(f'Warning: Patient {pt} has FEV1/FVC in units other than %')
    

    fevdf = pd.DataFrame.from_dict(patient_fev_dict, orient='index')
    fevdf.sort_values(by=['effectiveDateTime'])
    print(f'\nFEV1/FVC Observations for {pt}')
    display (fevdf)
    
    patdf = pd.DataFrame(patient_conditions[pt]['conditions'])
    patdf.set_index('encounter')
    patdf.sort_values(by=['recorded_date'])
    print(f'\nConditions for {pt}')
    display (patdf)
    
    # join the data frames, working on this!
    #fulldf = patdf.join(fevdf, lsuffix="_left", rsuffix="_right")
    #display (fulldf)
    
    print('_'*80)

    

Patient/9ce771bb-8cba-e201-8da7-a64c8c952d5e

FEV1/FVC Observations for Patient/9ce771bb-8cba-e201-8da7-a64c8c952d5e


Unnamed: 0,fev1/fvc_pct,effectiveDateTime
ed30d216-13fd-99b1-0d2c-0922336cba5e,58.019,2015-10-30T12:29:42-04:00
a5740340-5a5d-746f-657d-3abec6b66d05,75.523,2017-10-06T12:29:42-04:00
45c2be84-1668-d527-e543-977a4950079f,56.481,2018-10-12T12:29:42-04:00
34fe84de-30b9-15eb-ea5b-23f8e4042c89,67.648,2019-10-18T12:29:42-04:00
d2fa2627-9ec4-5810-bbf1-b4f5423f3c1b,70.58,2020-10-23T12:29:42-04:00
08609c3e-f2ed-d718-13c8-5cf38f808588,54.965,2021-09-03T12:29:42-04:00
1dfd9bb0-ef51-74d7-316f-8d1daf870b3a,76.418,2021-09-17T12:29:42-04:00
8465ef85-b85f-9c07-68e9-aac9d1a1775a,49.191,2021-10-29T12:29:42-04:00



Conditions for Patient/9ce771bb-8cba-e201-8da7-a64c8c952d5e


Unnamed: 0,name,clinicalStatus,encounter,recorded_date
0,Received higher education (finding),active,5a77dd45-2ff0-92e6-c1be-05cdb2a13b7f,1985-11-29T12:13:25-05:00
1,Part-time employment (finding),resolved,df972d0f-6093-928b-6f6c-25ced3ed7688,1992-12-11T12:00:55-05:00
2,Part-time employment (finding),resolved,ad1bcb02-6389-fbbd-d50e-4bbdaadd5c00,1995-12-15T12:27:06-05:00
3,Body mass index 30+ - obesity (finding),active,4c3bef84-0136-4c52-79f9-995786d4edc9,2001-12-21T11:29:42-05:00
4,Chronic sinusitis (disorder),active,5f602ebe-8b8e-2c96-6c40-7bbf5fbd1f59,2011-01-18T09:29:42-05:00
5,Stress (finding),resolved,02d64622-20dc-eec8-f60f-f4f7a0c52139,2011-03-04T12:05:54-05:00
6,Part-time employment (finding),resolved,ccba79e0-7e21-5e9a-c20d-f56beaf77c87,2011-03-25T13:13:58-04:00
7,Social isolation (finding),resolved,ccba79e0-7e21-5e9a-c20d-f56beaf77c87,2011-03-25T13:13:58-04:00
8,Full-time employment (finding),resolved,4e00f7ea-72db-d584-3369-1585642be82c,2011-10-21T13:25:04-04:00
9,Full-time employment (finding),resolved,24b69f69-0548-9616-a4f4-f3e0027a71f6,2013-10-25T13:00:59-04:00


________________________________________________________________________________
Patient/a4cb9a37-0bdc-274c-6a26-95851d4702de

FEV1/FVC Observations for Patient/a4cb9a37-0bdc-274c-6a26-95851d4702de


Unnamed: 0,fev1/fvc_pct,effectiveDateTime
55bc5852-57fc-7199-17c3-77c6469c1326,55.148,2001-02-28T12:04:50-05:00
8f2307c1-9e9f-0968-28aa-7bc45761a650,57.123,2002-03-06T12:04:50-05:00
8adbadd6-88f2-b51a-e315-ea9463db96c0,67.172,2003-03-19T12:04:50-05:00
77c84a9a-ecf3-176d-fb2e-977bbd4f2c5d,56.914,2004-03-24T12:04:50-05:00
4cfe85f7-ca36-5a6e-59b4-1b0e0ed846d9,55.229,2005-03-30T12:04:50-05:00
a905abf0-1c29-b2af-2b02-83f968554f1b,67.221,2006-04-05T13:04:50-04:00
cbbaaec0-fea3-1531-a03f-040941130cef,67.919,2007-04-11T13:04:50-04:00
8ca303db-9788-a306-a88d-cece6393b157,48.256,2007-12-12T12:04:50-05:00



Conditions for Patient/a4cb9a37-0bdc-274c-6a26-95851d4702de


Unnamed: 0,name,clinicalStatus,encounter,recorded_date
0,Childhood asthma,resolved,82018b5c-7963-6c14-8887-f048a6dcfcff,1984-05-14T13:04:50-04:00
1,Viral sinusitis (disorder),resolved,63ffbd5d-8bf6-652a-f03a-b680f5b9fc2a,1990-06-30T02:04:50-04:00
2,Viral sinusitis (disorder),resolved,910a7503-c165-0576-eda4-68ce3d60d2d0,1994-04-27T19:04:50-04:00
3,Viral sinusitis (disorder),resolved,b50ed155-e479-ae7b-1ef1-c67165caee8e,1995-12-21T02:04:50-05:00
4,Received higher education (finding),active,eb21a7bd-d7cd-bc55-fcd2-8702b595811d,1998-02-11T12:55:57-05:00
5,Full-time employment (finding),resolved,eb21a7bd-d7cd-bc55-fcd2-8702b595811d,1998-02-11T12:55:57-05:00
6,Social isolation (finding),resolved,eb21a7bd-d7cd-bc55-fcd2-8702b595811d,1998-02-11T12:55:57-05:00
7,Full-time employment (finding),resolved,4093b48c-4406-7624-9fca-2b47f0a78ee1,1999-02-17T12:59:25-05:00
8,Severe anxiety (panic) (finding,resolved,4093b48c-4406-7624-9fca-2b47f0a78ee1,1999-02-17T13:15:46-05:00
9,Full-time employment (finding),resolved,04ff58fd-8508-3f54-354c-cd1f1813826e,2000-02-23T12:49:07-05:00


________________________________________________________________________________
Patient/fd42bb2e-5d66-c8e1-0a51-bca7713dce1d

FEV1/FVC Observations for Patient/fd42bb2e-5d66-c8e1-0a51-bca7713dce1d


Unnamed: 0,fev1/fvc_pct,effectiveDateTime
0d7414d7-ebf9-53e7-a8fc-821af45882ad,50.015,1987-12-21T09:53:14-05:00
5a82db2e-d2f2-09b0-93ee-4af4355c2cc4,69.816,1988-05-23T10:53:14-04:00
8d15529e-74fc-0b50-5855-dfc3d1bdf397,68.331,1988-06-27T10:53:14-04:00
727e90cd-34ea-9595-6697-a99603c007d8,61.861,1988-12-26T09:53:14-05:00
3ac265de-df61-79c5-d03f-e9df7cd5b29c,52.382,1990-01-01T09:53:14-05:00
061d7056-ef8e-b431-91fb-5820c3bfe1bb,67.672,1991-01-07T09:53:14-05:00
5436069b-d4fd-43bd-22b7-93abb6a886f1,64.644,1992-01-13T09:53:14-05:00
771b5ee1-74a4-5a48-6f52-f5a460c6be11,55.611,1993-01-18T09:53:14-05:00
e8a2079d-00ae-3878-0adf-626f8964fc62,61.485,1994-01-24T09:53:14-05:00
2c8e417a-9210-4513-1017-d41b2784a972,74.505,1995-01-30T09:53:14-05:00



Conditions for Patient/fd42bb2e-5d66-c8e1-0a51-bca7713dce1d


Unnamed: 0,name,clinicalStatus,encounter,recorded_date
0,Body mass index 30+ - obesity (finding),active,9e6d21f5-35f2-7d4c-04b3-2bc85a18bc78,1971-09-20T10:53:14-04:00
1,Hypertension,active,910160ad-df69-27a7-8ee4-b9000ded07c7,1975-10-13T10:53:14-04:00
2,Only received primary school education (finding),active,910160ad-df69-27a7-8ee4-b9000ded07c7,1975-10-13T11:49:51-04:00
3,Stress (finding),resolved,910160ad-df69-27a7-8ee4-b9000ded07c7,1975-10-13T11:49:51-04:00
4,Refugee (person),active,910160ad-df69-27a7-8ee4-b9000ded07c7,1975-10-13T11:49:51-04:00
5,Chronic obstructive bronchitis (disorder),active,8779cbe1-3ef9-03f0-3c3f-a3faffb72f15,1978-10-30T09:53:14-05:00
6,Social isolation (finding),resolved,8779cbe1-3ef9-03f0-3c3f-a3faffb72f15,1978-10-30T10:30:56-05:00
7,Stress (finding),resolved,9acbf4d2-137b-8cf1-14f2-568438b2e09f,1979-11-05T10:39:39-05:00
8,Has a criminal record (finding),active,9acbf4d2-137b-8cf1-14f2-568438b2e09f,1979-11-05T10:39:39-05:00
9,Unhealthy alcohol drinking behavior (finding),resolved,afda4be1-bacc-8282-850e-817deff56377,1981-11-16T11:53:46-05:00


________________________________________________________________________________
Patient/06ad94ec-8b8f-d6b3-0603-cee340430b2d

FEV1/FVC Observations for Patient/06ad94ec-8b8f-d6b3-0603-cee340430b2d


Unnamed: 0,fev1/fvc_pct,effectiveDateTime
b73e2734-00e1-f60f-440c-27b57749e856,79.729,2014-04-01T13:10:54-04:00
1786c118-8b47-a830-8fae-319c2a6d1bdb,51.451,2015-04-07T13:10:54-04:00
5a81e7f9-053a-4714-96e5-c59c9a6e104b,63.842,2016-04-12T13:10:54-04:00
19ec3a55-98a5-bf3a-8908-4ffc414ad589,75.089,2017-04-18T13:10:54-04:00
c5a7fe22-ed26-5550-1392-9d425c4bae4c,65.77,2018-04-24T13:10:54-04:00
f143476e-dabc-5feb-f040-544fa5de08e6,50.413,2019-04-30T13:10:54-04:00
3b6d8728-eb43-5029-6374-79307ecb18f3,53.504,2020-05-05T13:10:54-04:00
6b3dc28d-cf60-a895-0785-01b744684070,66.299,2021-05-11T13:10:54-04:00



Conditions for Patient/06ad94ec-8b8f-d6b3-0603-cee340430b2d


Unnamed: 0,name,clinicalStatus,encounter,recorded_date
0,Perennial allergic rhinitis with seasonal vari...,resolved,b4c691e9-4edf-6cf7-92db-07a66bcecd43,1998-10-09T13:10:54-04:00
1,Viral sinusitis (disorder),resolved,c15e8026-8267-fa59-9bf3-c8059d05fbe3,2012-11-13T12:10:54-05:00
2,Received higher education (finding),active,cf07052b-4e9f-d9e3-6528-038d0e8fff38,2013-03-26T13:56:39-04:00
3,Part-time employment (finding),resolved,cf07052b-4e9f-d9e3-6528-038d0e8fff38,2013-03-26T13:56:39-04:00
4,Lack of access to transportation (finding),active,cf07052b-4e9f-d9e3-6528-038d0e8fff38,2013-03-26T13:56:39-04:00
5,Stress (finding),resolved,cf07052b-4e9f-d9e3-6528-038d0e8fff38,2013-03-26T13:56:39-04:00
6,Viral sinusitis (disorder),resolved,2bf992c5-a5b0-ba2a-209f-52b334ea136c,2013-06-26T16:10:54-04:00
7,Viral sinusitis (disorder),resolved,891e579e-0729-8f35-982a-2c95dbfe4289,2013-12-13T09:10:54-05:00
8,Chronic low back pain (finding),active,0b1a164e-4831-ffbb-8c9f-54560653e20c,2014-02-06T12:10:54-05:00
9,Chronic neck pain (finding),active,0b1a164e-4831-ffbb-8c9f-54560653e20c,2014-02-06T12:10:54-05:00


________________________________________________________________________________
Patient/fa76b112-45ea-3a91-4f05-a0e9faa4616d

FEV1/FVC Observations for Patient/fa76b112-45ea-3a91-4f05-a0e9faa4616d


Unnamed: 0,fev1/fvc_pct,effectiveDateTime
895de109-2b2b-caab-5bca-2d80aac863d8,47.668,2021-09-09T12:28:44-04:00
77a71ad1-9001-fd5b-15ad-9e4da6aeebd2,47.807,2021-10-21T12:28:44-04:00



Conditions for Patient/fa76b112-45ea-3a91-4f05-a0e9faa4616d


Unnamed: 0,name,clinicalStatus,encounter,recorded_date
0,Perennial allergic rhinitis with seasonal vari...,resolved,46e9421a-67ec-e88c-3e93-fb3e69b4c62a,2005-11-12T11:28:44-05:00
1,Childhood asthma,resolved,ba3583f5-79e8-20b4-d7af-d4c953bf6fa2,2007-07-10T12:28:44-04:00
2,Viral sinusitis (disorder),resolved,f1a92c74-4b22-65f4-15ea-4fc721ce2c8d,2015-04-22T09:28:44-04:00
3,Child attention deficit disorder,resolved,ee46b95a-af59-9a39-8ebf-e42bbef34fb3,2015-06-14T12:28:44-04:00
4,Risk activity involvement (finding),resolved,9f3d9598-b89f-7ada-1bde-e0c66c71d5a3,2016-09-22T13:10:37-04:00
5,Hypertension,active,c8cf2498-aee2-9384-5471-ad6b73705eb6,2020-10-15T12:28:44-04:00
6,Only received primary school education (finding),active,c8cf2498-aee2-9384-5471-ad6b73705eb6,2020-10-15T13:09:17-04:00
7,Full-time employment (finding),resolved,c8cf2498-aee2-9384-5471-ad6b73705eb6,2020-10-15T13:09:17-04:00
8,Lack of access to transportation (finding),active,c8cf2498-aee2-9384-5471-ad6b73705eb6,2020-10-15T13:09:17-04:00
9,Social isolation (finding),resolved,c8cf2498-aee2-9384-5471-ad6b73705eb6,2020-10-15T13:09:17-04:00


________________________________________________________________________________


The last example is good indication of where an FEV1/FVC measurement aligns with a condition. The measurement on 2021-09-09 coincides with a diagnosis of Chronic obstructive bronchitis. Perusing the data will likely identify others. A join of the FEV and conditions dataframes would allow this to be done in a more automated and systematic way.

Some conclusions about the dataset that can already be seen. For example, not all measurements of FEV1/FVC are associated with asthma patients.