# Loading Data into FEMR

Currently to use FEMR, you need to load data using some command-line tools which you get from `pip install femr`. This is because FEMR uses a specific class called `PatientDatabase` which relies on re-organizing the loaded data into an indexed format on disk (this is to preemptively avoid memory issues if running on large datasets, e.g. >50gb / something that doesn't fit into RAM).

This tutorial will go through how to get a `PatientDatabase` so you can use the rest of FEMR.

### Pre-requisite: Organize data into the FEMR "Simple" Format

The FEMR "simple" format is a CSV with at least 4 coulmns: `patient_id`, `start`, `code`, `value`

All fields are required except for `value`.

 - `patient_id` must be a 64 bit unsigned integer
 - `start` is an ISO 8601 timestamp string, ideally when the event is initially recorded in the database
 - `code` is a string that must consist of two parts: 1) a vocabulary signfier and 2) the code itself, seperated by a "/" character. For example ICD10CM/E11.4 would indicate the ICD10 code E11.4.
 - `value` can either be a numeric value, string, or omitted

A mock example is provided below:
```csv
patient_id,start,code,value
3,1970-01-07,Birth/Birth,
3,1990-01-07,Gender/Gender,Female
3,1990-01-07,Race/Race,White
3,2022-05-03,ICD10CM/E11.4,
```

You are free to use whatever method you'd like to obtain this format. Reach-out if you'd like assistance in scoping this out for your specific use-case!

### Running the FEMR ETL Script

Once you have a file formatted above, install FEMR and then run the corresponding ETL script.

It requires specifying an output path, which is later needed for loading the `PatientDatabase`

In [3]:
!pip install femr
!etl_simple_femr ./example_data/example.csv ./example_data/example_etl_output ./example_data/example_etl_temp_location

2023-05-13 16:44:41,933 [MainThread  ] [INFO ]  Extracting from OMOP with arguments Namespace(simple_source='./example_data/example.csv', target_location='/Users/ericpan/GitHub/femr/tutorials/example_data/example_etl_output', temp_location='/Users/ericpan/GitHub/femr/tutorials/example_data/example_etl_temp_location', num_threads=1, athena_download=None)
2023-05-13 16:44:41,933 [MainThread  ] [INFO ]  Converting to events
2023-05-13 16:44:42,153 [MainThread  ] [INFO ]  Converting to patients
2023-05-13 16:44:42,155 [MainThread  ] [INFO ]  Converting to extract
Converting to extract 2023-05-13 16:44:42.155327
Done with main 2023-05-13T16:44:42.158837-07:00
Done with meta 2023-05-13T16:44:42.15948-07:00


### Loading the `PatientDatabase` object

Once you have the output folder from running the script, you can load the results into a `PatientDatabase` object in memory and use the rest of FEMR.

In [10]:
from femr.datasets import PatientDatabase

patient_db = PatientDatabase("./example_data/example_etl_output")
print(type(patient_db))

<class 'femr.extension.datasets.PatientDatabase'>


### Using the `PatientDatabase` Object

We can pass objects from `PatientDatabase` to other parts of the FEMR pipeline.

With this specific object, we can now use other parts of FEMR.

In [26]:
# Index by `patient_id` to get a `Patient` object
print(patient_db[3])

# Iterate over `patient_id`s
for pid in patient_db:
    print(pid)
    # Iterate over events for each patient
    for e in patient_db[pid].events:
        print(e)



Patient(patient_id=3, events=(Event(start=1970-01-07 00:00:00, code=Birth/Birth, value=None), Event(start=1990-01-07 00:00:00, code=Race/Race, value=White), Event(start=1990-01-07 00:00:00, code=Gender/Gender, value=Female), Event(start=2020-07-09 00:00:00, code=Vitals/Blood Pressure, value=160.0, units=mmHg), Event(start=2020-08-09 00:00:00, code=Vitals/HbA1c, value=7.0, dosage=%), Event(start=2022-05-03 00:00:00, code=ICD10CM/E11.4, value=None), Event(start=2022-06-05 00:00:00, code=Drug/Atorvastatin, value=None, units=mg, dosage=50), Event(start=2022-06-05 00:00:00, code=ICD10CM/E10.1, value=None), Event(start=2022-06-05 00:00:00, code=Note/ProgressNote, value=Patient Bob came to the clinic today), Event(start=2022-06-06 00:00:00, code=Note/ProgressNote, value=Complicated notes generally need escaping , "
 example), Event(start=2022-07-06 00:00:00, code=Drug/Multivitamins, value=None, units=ml, dosage=5)))
3
Event(start=1970-01-07 00:00:00, code=Birth/Birth, value=None)
Event(start=