# Reading data formatted in the Event Stream Data Standard

In this tutorial, we will walk through how to generate a FEMR PatientDatabase using the Event Stream Data Standard (ESDS) format.

ESDS is a simple format for patient data supported by several other EHR analysis libraries.

See (TODO) for more details on the format.

In [1]:
import os
import csv
import shutil
import pyarrow.parquet as pq

INPUT_DIR = 'input/esds'

# Import the example dataset 
example_data = pq.read_table('input/esds/0.parquet')

## 1. Inspect the data

We have created an example synthetic ESDS dataset. This can be inspected manually using parquet

In [2]:
patients = example_data.to_pylist()

print(patients[0])

{'subject_id': 3, 'static_measurements': [{'code': 'Birth/Birth', 'numeric_value': None, 'text_value': None, 'datetime_value': datetime.datetime(1970, 1, 7, 0, 0)}], 'events': [{'time': datetime.datetime(1970, 1, 7, 0, 0), 'measurements': []}, {'time': datetime.datetime(1990, 1, 7, 0, 0), 'measurements': [{'code': 'Gender/Gender', 'numeric_value': None, 'text_value': 'Female', 'datetime_value': None}, {'code': 'Race/Race', 'numeric_value': None, 'text_value': 'White', 'datetime_value': None}]}, {'time': datetime.datetime(2020, 7, 9, 0, 0), 'measurements': [{'code': 'Vitals/Blood Pressure', 'numeric_value': 160.0, 'text_value': None, 'datetime_value': None}]}, {'time': datetime.datetime(2020, 8, 9, 0, 0), 'measurements': [{'code': 'Vitals/HbA1c', 'numeric_value': 7.0, 'text_value': None, 'datetime_value': None}]}, {'time': datetime.datetime(2022, 5, 3, 0, 0), 'measurements': [{'code': 'ICD10CM/E11.4', 'numeric_value': None, 'text_value': None, 'datetime_value': None}]}, {'time': datetim

## 2. Convert the ESDS dataset to an extract
We now convert the dataset we loaded above to an extract using the function [etl_esds](https://github.com/som-shahlab/femr/blob/main/src/femr/etl_pipelines/esds.py#L66) from the femr repo

We need to first create folders to save the dataset and associated files 

In [3]:
import shutil
import os

TARGET_DIR = 'trash/tutorial_2b'

if os.path.exists(TARGET_DIR):
    shutil.rmtree(TARGET_DIR)

os.mkdir(TARGET_DIR)

We now move/copy the `example.csv` into the `INPUT_DIR` folder

The output extract is a femr [PatientDatabase](https://github.com/som-shahlab/femr/blob/Miking98-patch-1/tutorials/0_How%20FEMR%20Works%20%2B%20Toy%20Example.ipynb) that can be directly used by the femr pipeline

In [4]:
# Create directories for storing the extract and extract log
LOG_DIR = os.path.join(TARGET_DIR, "logs")
EXTRACT_DIR = os.path.join(TARGET_DIR, "extract")

import femr
os.system(f"etl_esds {INPUT_DIR} {EXTRACT_DIR} {LOG_DIR} --num_threads 2")

2023-09-15 23:59:34,314 [MainThread  ] [INFO ]  Extracting from OMOP with arguments Namespace(esds_source='input/esds', target_location='/share/pi/nigam/ethanid/mimic_etl/femr/tutorials/trash/tutorial_2b/extract', temp_location='/share/pi/nigam/ethanid/mimic_etl/femr/tutorials/trash/tutorial_2b/logs', num_threads=2, athena_download=None)
2023-09-15 23:59:34,315 [MainThread  ] [INFO ]  Converting to events
2023-09-15 23:59:34,622 [MainThread  ] [INFO ]  Converting to patients
2023-09-15 23:59:34,714 [MainThread  ] [INFO ]  Converting to extract


Done with main 2023-09-15T23:59:34.85932994+00:00
Done with meta 2023-09-15T23:59:34.860242203+00:00
Converting to extract 2023-09-15 23:59:34.714758


0

# 3. Open and view the data
We now open and take a look at the femr extract we generated in the last step using the [PatientDatabase](https://github.com/som-shahlab/femr/blob/main/src/femr/extension/datasets.pyi#L24) class

In [8]:
import femr.datasets

database = femr.datasets.PatientDatabase(EXTRACT_DIR)

# Number of patients
print("Num patients", len(database))

# Print out patient_id 3 (the first example patient in our synthetic dataset)
patient = database[3]
print(patient)

Num patients 201
Patient(patient_id=3, events=(Event(start=1970-01-07 00:00:00, code=Birth/Birth, value=0.0), Event(start=1990-01-07 00:00:00, code=Gender/Gender, value=Female), Event(start=1990-01-07 00:00:00, code=Race/Race, value=White), Event(start=2020-07-09 00:00:00, code=Vitals/Blood Pressure, value=160.0), Event(start=2020-08-09 00:00:00, code=Vitals/HbA1c, value=7.0), Event(start=2022-05-03 00:00:00, code=ICD10CM/E11.4), Event(start=2022-06-05 00:00:00, code=Note/ProgressNote, value=Patient Bob came to the clinic today), Event(start=2022-06-05 00:00:00, code=ICD10CM/E10.1), Event(start=2022-06-05 00:00:00, code=Drug/Atorvastatin), Event(start=2022-06-06 00:00:00, code=Note/ProgressNote, value=Complicated notes generally need escaping , "
 example), Event(start=2022-07-06 00:00:00, code=Drug/Multivitamins)))
