# Creating a PatientDatabase from OMOP format data

In this tutorial, we will walk through how to generate a dataset from OMOP 5.4 data. This is the recommend route for using FEMR, and most of the functionality assumes data has been processed in this manner.

Using this converter is very simple, simply convert your OMOP database into csvs, and feed the folder with those csvs to FEMR.

Each table in OMOP should correspond to either a file "table_name.csv" or a folder "table_name", where "table_name" is the name of the table. If a folder is provided, that folder must contain CSV files for that table.

In [1]:
import os

INPUT_DIR = 'input/omop'

# We have a csv file or folder for every table.
print(os.listdir(INPUT_DIR))

['concept.csv', 'person.csv', 'concept_relationship.csv', 'observation']


In [2]:
import shutil
import os

TARGET_DIR = 'trash/tutorial_2a'

if os.path.exists(TARGET_DIR):
    shutil.rmtree(TARGET_DIR)

os.mkdir(TARGET_DIR)

# Create directories for storing the extract and extract log
LOG_DIR = os.path.join(TARGET_DIR, "logs")
EXTRACT_DIR = os.path.join(TARGET_DIR, "extract")

import femr
import femr.etl_pipelines.simple
os.system(f"etl_generic_omop {INPUT_DIR} {EXTRACT_DIR} {LOG_DIR} --num_threads 2")

Could not find any files for extractor _ConceptTableConverter(prefix='drug_exposure', file_suffix='', concept_id_field='drug_concept_id', string_value_field=None, numeric_value_field=None)
Could not find any files for extractor _ConceptTableConverter(prefix='visit', file_suffix='occurrence', concept_id_field=None, string_value_field=None, numeric_value_field=None)
Could not find any files for extractor _ConceptTableConverter(prefix='condition', file_suffix='occurrence', concept_id_field=None, string_value_field=None, numeric_value_field=None)
Could not find any files for extractor _ConceptTableConverter(prefix='death', file_suffix='', concept_id_field='death_type_concept_id', string_value_field=None, numeric_value_field=None)
Could not find any files for extractor _ConceptTableConverter(prefix='procedure', file_suffix='occurrence', concept_id_field=None, string_value_field=None, numeric_value_field=None)
Could not find any files for extractor _ConceptTableConverter(prefix='device_expos

2023-07-08 12:27:08,791 [MainThread  ] [INFO ]  Extracting from OMOP with arguments Namespace(omop_source='input/omop', target_location='/home/ethan/femr/tutorials/trash/tutorial_2a/extract', temp_location='/home/ethan/femr/tutorials/trash/tutorial_2a/logs', num_threads=2)
2023-07-08 12:27:08,791 [MainThread  ] [INFO ]  Converting to events
2023-07-08 12:27:08,830 [MainThread  ] [INFO ]  Got converter statistics {'person': defaultdict(<class 'int'>, {'input_rows': 100, 'valid_rows': 100, 'valid_events': 200}), 'observation': defaultdict(<class 'int'>, {'input_rows': 2000, 'valid_rows': 2000, 'valid_events': 2000})}
2023-07-08 12:27:08,831 [MainThread  ] [INFO ]  Converting to patients
2023-07-08 12:27:08,848 [MainThread  ] [INFO ]  Appling transformations
2023-07-08 12:27:08,881 [MainThread  ] [INFO ]  Got transform statistics {'<function remove_nones at 0x7fe8f31627a0>': defaultdict(<class 'int'>, {'lost_events': 0}), '<function delta_encode at 0x7fe8f3162830>': defaultdict(<class 'in

0

# Open and view the data
We now open and take a look at the femr extract we generated in the last step using the PatientDatabase class (https://github.com/som-shahlab/femr/blob/main/src/femr/extension/datasets.pyi#L24)

In [3]:
import femr.datasets

database = femr.datasets.PatientDatabase(EXTRACT_DIR)

# Number of patients
print("Num patients", len(database))

# Print out an example of patient_id 3
patient = database[3]
print(patient)

Num patients 99
Patient(patient_id=3, events=(Event(start=1979-01-01 00:00:00, code=Gender/M, value=None, omop_table=person), Event(start=1979-01-01 00:00:00, code=INVALID/4216316, value=None, omop_table=person), Event(start=1979-05-07 00:00:00, code=CODE/3543, value=None, omop_table=observation), Event(start=1979-10-21 00:00:00, code=CODE/4319, value=None, omop_table=observation), Event(start=1980-05-02 00:00:00, code=CODE/3283, value=None, omop_table=observation), Event(start=1980-09-01 00:00:00, code=CODE/2191, value=None, omop_table=observation), Event(start=1980-12-10 00:00:00, code=CODE/3892, value=None, omop_table=observation), Event(start=1981-05-16 00:00:00, code=CODE/2572, value=None, omop_table=observation), Event(start=1981-11-06 00:00:00, code=CODE/2911, value=None, omop_table=observation), Event(start=1982-05-14 00:00:00, code=CODE/2599, value=None, omop_table=observation), Event(start=1982-09-29 00:00:00, code=CODE/3411, value=None, omop_table=observation), Event(start=1