# FEMR Ontology support

FEMR provides support for querying ontologies using the OMOP Vocabulary. 

This enables easier definition of labeling functions as well as better feature generation.

# Downloading the OMOP Vocabulary

The OMOP Vocabulary can be downloaded for free from the [OHDSI ATHENA website.](https://athena.ohdsi.org/)

# Processing the OMOP Vocabulary

femr.ontology.Ontology allows you to process, and then use the OMOP Vocabulary, optionally combining it with [code metadata from MEDS](https://github.com/Medical-Event-Data-Standard/meds/blob/e93f63a2f9642123c49a31ecffcdb84d877dc54a/src/meds/__init__.py#L94).

```python 
ontology = femr.ontology.Ontology(path_to_athena, code_metadata)
```

# Working with an Ontology object

The following code samples illustrate the main ways to use a vocabulary object

In [1]:
import pickle

# You can load / save ontology objects with pickle

with open('input/meds/ontology.pkl', 'rb') as f:
    ontology = pickle.load(f)

print("Loaded ontology")

Loaded ontology


In [2]:
path='/share/pi/nigam/projects/zphuo/data/omop_extract_deid/ontology.pkl'
with open(path, 'rb') as f:
    ontology = pickle.load(f)

In [8]:
dataset = datasets.Dataset.from_parquet("/share/pi/nigam/projects/zphuo/data/PE/inspect/timelines_smallfiles_meds/data_subset/*")

In [3]:
# Ontology datasets downloaded by Athena tend to be very large as they contain many codes, including several that are no longer used.
# We therefore provide a function to prune ontologies to a particular dataset of interest.
# This makes it much cheaper to store and use an ontology object, both in terms of disk space and RAM

import datasets
dataset = datasets.Dataset.from_parquet("input/meds/data/*")

ontology.prune_to_dataset(dataset)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

SNOMED/184099003
Gender/F
Race/White
ICD9CM/665.9
ICD9CM/17
ICD9CM/810.9
ICD9CM/819
ICD9CM/341.7
ICD9CM/319.3
ICD9CM/137.6
SNOMED/184099003
Gender/M
Race/Non-White
RxNorm/7827
ICD9CM/802.1
ICD9CM/100.8
ICD9CM/580.4
RxNorm/6468
RxNorm/7885
RxNorm/3221
ICD9CM/490.9
RxNorm/7809
ICD9CM/679.6
SNOMED/184099003
Gender/M
Race/White
RxNorm/9132
SNOMED/184099003
Gender/M
Race/White
RxNorm/8561
ICD9CM/909.0
RxNorm/2606
ICD9CM/764.5
RxNorm/3304
SNOMED/184099003
Gender/M
Race/Non-White
ICD9CM/918.4
RxNorm/5852
ICD9CM/220.1
RxNorm/2215
ICD9CM/494.4
ICD9CM/388.4
ICD9CM/503.6
RxNorm/1323
ICD9CM/242.0
RxNorm/9523
SNOMED/184099003
Gender/M
Race/Non-White
RxNorm/1727
RxNorm/5377
SNOMED/184099003
Gender/M
Race/Non-White
RxNorm/5607
ICD9CM/914.4
RxNorm/4049
ICD9CM/990.4
SNOMED/184099003
Gender/F
Race/Non-White
ICD9CM/833.3
ICD9CM/622
ICD9CM/833.2
ICD9CM/745.7
ICD9CM/625.5
RxNorm/9992
ICD9CM/224.9
ICD9CM/864.0
RxNorm/2318
ICD9CM/198.9
RxNorm/5733
ICD9CM/794.5
RxNorm/5716
ICD9CM/586.2
RxNorm/318
ICD9CM/726.5

In [4]:
# First, we can query the description for a particular code
print("Description", ontology.get_description("ATC/A02B"))

# Second, we can search for the parents of a particular code
print("Parents", ontology.get_parents("ATC/A02B"))

# Finally, we can search for the children of a particular code
print("Children", ontology.get_children("ATC/A02B"))

# For the sake of convience, we also support the recursive versions of querying parents and children
print("All children", ontology.get_all_children("ATC/A02B"))
print("All parents", ontology.get_all_parents("ATC/A02B"))

Description DRUGS FOR PEPTIC ULCER AND GASTRO-OESOPHAGEAL REFLUX DISEASE (GORD)
Parents {'ATC/A02'}
Children {'ATC/A02BX'}
All children {'RxNorm/8705', 'RxNorm/6852', 'RxNorm/2353', 'RxNorm/4501', 'RxNorm/7815', 'RxNorm/2403', 'ATC/A02BX', 'RxNorm/2344', 'ATC/A02BX71', 'RxNorm/38574', 'RxNorm/8704', 'RxNorm/7019', 'ATC/A02BX77', 'RxNorm/2620', 'RxNorm/2018', 'RxNorm/2017', 'RxNorm/8730', 'ATC/A02B'}
All parents {'ATC/A', 'ATC/A02', 'ATC/A02B'}
