# How to query drug_exposure table

The primary objectives of this notebook are as follows:

- Replicate results using Cohort Builder/Dataset (cb/ds) tools: this notebook provides step-by-step instructions to reproduce results from the drug_exposure table generated using the cb/ds_tools.  

- Provide BigQuery Examples: this notebook provides several practical examples of using BigQuery queries of the drug_exposure table to help users familiarize themselves with BigQuery and accelerate their ability to work with OMOP tables.

**For Advanced Users:**

If you are already familiar with BigQuery and the OMOP Common Data Model, feel free to develop your own algorithms and analytical workflows tailored to your research objectives. This notebook can serve as a starting point for exploring the AoU dataset using BigQuery.

**Prerequisites**
This notebook shows examples using the Anatomical Therapeutic Chemical (ATC) vocabulary in the drug_exposure table. ATC vocabulary has a heiarchal structure (ATC 1 level, ATC 2nd level, etc). This notebook introduces the use of the concept_ancestor and concept_relationship tables to map relationships between concepts within the ATC vocabulary and between other vocabularies (i.e. RxNorm). For example, its possible to use the concept_relationship table to directly connect an ATC 5th-level concept to a RxNorm ingredient concept. Please note, ATC hierarchy can be complex, and mappings are not always straightforward. We recommend you to become familiar with the heiarchary before running your analysis. 

It is highly recommended that you go through these official OMOP resources to gain a fundamental understanding of OMOP model.

https://github.com/OHDSI/Vocabulary-v5.0/wiki/Vocab.-ATC

https://ohdsi.github.io/TheBookOfOhdsi/

As mentioned by the OHDSI team, "the quality of the hierarchy differs between domains, and the completion of the hierarchy system is an ongoing task. You are strongly encouraged to engage with the community if you believe you found a mistake or inaccuracy." Please visit https://forums.ohdsi.org/ to connect with ODHSI forums. 

# Example 1: Given an ATC code, how to find standard drug_concept_id (RxNorm)

## Using cohort/dataset builder tools

Using ATC 5th `code=C09BA02` as an example, you can search this term 'Enalapril' or `standard concept_code=3827` using cb/ds_tools and will find the count is 4571.

In [None]:
from IPython.display import Image, display

# Display the image
display(Image(filename='drug_cb1.jpg'))


If using other non-5th ATC code, can just search ATC code itself, e.g. 'C09AA'

In [None]:
# Display the image
display(Image(filename='drug_cb3.jpg'))

The two cells below are generated by the cb/ds_tools.

In [None]:
import pandas
import os

# This query represents dataset "drug_enalapril_act5th_cb" for domain "person" and was generated for All of Us Controlled Tier Dataset v8
dataset_44856103_person_sql = """
    SELECT
        person.person_id,
        person.gender_concept_id,
        p_gender_concept.concept_name as gender,
        person.birth_datetime as date_of_birth,
        person.race_concept_id,
        p_race_concept.concept_name as race,
        person.ethnicity_concept_id,
        p_ethnicity_concept.concept_name as ethnicity,
        person.sex_at_birth_concept_id,
        p_sex_at_birth_concept.concept_name as sex_at_birth,
        person.self_reported_category_concept_id,
        p_self_reported_category_concept.concept_name as self_reported_category 
    FROM
        `""" + os.environ["WORKSPACE_CDR"] + """.person` person 
    LEFT JOIN
        `""" + os.environ["WORKSPACE_CDR"] + """.concept` p_gender_concept 
            ON person.gender_concept_id = p_gender_concept.concept_id 
    LEFT JOIN
        `""" + os.environ["WORKSPACE_CDR"] + """.concept` p_race_concept 
            ON person.race_concept_id = p_race_concept.concept_id 
    LEFT JOIN
        `""" + os.environ["WORKSPACE_CDR"] + """.concept` p_ethnicity_concept 
            ON person.ethnicity_concept_id = p_ethnicity_concept.concept_id 
    LEFT JOIN
        `""" + os.environ["WORKSPACE_CDR"] + """.concept` p_sex_at_birth_concept 
            ON person.sex_at_birth_concept_id = p_sex_at_birth_concept.concept_id 
    LEFT JOIN
        `""" + os.environ["WORKSPACE_CDR"] + """.concept` p_self_reported_category_concept 
            ON person.self_reported_category_concept_id = p_self_reported_category_concept.concept_id  
    WHERE
        person.PERSON_ID IN (SELECT
            distinct person_id  
        FROM
            `""" + os.environ["WORKSPACE_CDR"] + """.cb_search_person` cb_search_person  
        WHERE
            cb_search_person.person_id IN (SELECT
                criteria.person_id 
            FROM
                (SELECT
                    DISTINCT person_id, entry_date, concept_id 
                FROM
                    `""" + os.environ["WORKSPACE_CDR"] + """.cb_search_all_events` 
                WHERE
                    (concept_id IN(SELECT
                        DISTINCT ca.descendant_id 
                    FROM
                        `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria_ancestor` ca 
                    JOIN
                        (SELECT
                            DISTINCT c.concept_id       
                        FROM
                            `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria` c       
                        JOIN
                            (SELECT
                                CAST(cr.id as string) AS id             
                            FROM
                                `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria` cr             
                            WHERE
                                concept_id IN (1341927)             
                                AND full_text LIKE '%_rank1]%'       ) a 
                                ON (c.path LIKE CONCAT('%.', a.id, '.%') 
                                OR c.path LIKE CONCAT('%.', a.id) 
                                OR c.path LIKE CONCAT(a.id, '.%') 
                                OR c.path = a.id) 
                        WHERE
                            is_standard = 1 
                            AND is_selectable = 1) b 
                            ON (ca.ancestor_id = b.concept_id)) 
                        AND is_standard = 1)) criteria ) )"""

dataset_44856103_person_df = pandas.read_gbq(
    dataset_44856103_person_sql,
    dialect="standard",
    use_bqstorage_api=("BIGQUERY_STORAGE_API_ENABLED" in os.environ),
    progress_bar_type="tqdm_notebook")

dataset_44856103_person_df.head(5)

In [None]:
import pandas
import os

# This query represents dataset "drug_enalapril_act5th_cb" for domain "drug" and was generated for All of Us Controlled Tier Dataset v8
dataset_44856103_drug_sql = """
    SELECT
        d_exposure.person_id,
        d_exposure.drug_concept_id,
        d_standard_concept.concept_name as standard_concept_name,
        d_standard_concept.concept_code as standard_concept_code,
        d_standard_concept.vocabulary_id as standard_vocabulary,
        d_exposure.drug_exposure_start_datetime,
        d_exposure.drug_exposure_end_datetime,
        d_exposure.verbatim_end_date,
        d_exposure.drug_type_concept_id,
        d_type.concept_name as drug_type_concept_name,
        d_exposure.stop_reason,
        d_exposure.refills,
        d_exposure.quantity,
        d_exposure.days_supply,
        d_exposure.sig,
        d_exposure.route_concept_id,
        d_route.concept_name as route_concept_name,
        d_exposure.lot_number,
        d_exposure.visit_occurrence_id,
        d_visit.concept_name as visit_occurrence_concept_name,
        d_exposure.drug_source_value,
        d_exposure.drug_source_concept_id,
        d_source_concept.concept_name as source_concept_name,
        d_source_concept.concept_code as source_concept_code,
        d_source_concept.vocabulary_id as source_vocabulary,
        d_exposure.route_source_value,
        d_exposure.dose_unit_source_value 
    FROM
        ( SELECT
            * 
        FROM
            `""" + os.environ["WORKSPACE_CDR"] + """.drug_exposure` d_exposure 
        WHERE
            (
                drug_concept_id IN (SELECT
                    DISTINCT ca.descendant_id 
                FROM
                    `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria_ancestor` ca 
                JOIN
                    (SELECT
                        DISTINCT c.concept_id       
                    FROM
                        `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria` c       
                    JOIN
                        (SELECT
                            CAST(cr.id as string) AS id             
                        FROM
                            `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria` cr             
                        WHERE
                            concept_id IN (1341927)             
                            AND full_text LIKE '%_rank1]%'       ) a 
                            ON (c.path LIKE CONCAT('%.', a.id, '.%') 
                            OR c.path LIKE CONCAT('%.', a.id) 
                            OR c.path LIKE CONCAT(a.id, '.%') 
                            OR c.path = a.id) 
                    WHERE
                        is_standard = 1 
                        AND is_selectable = 1) b 
                        ON (ca.ancestor_id = b.concept_id)))  
                    AND (d_exposure.PERSON_ID IN (SELECT
                        distinct person_id  
                FROM
                    `""" + os.environ["WORKSPACE_CDR"] + """.cb_search_person` cb_search_person  
                WHERE
                    cb_search_person.person_id IN (SELECT
                        criteria.person_id 
                    FROM
                        (SELECT
                            DISTINCT person_id, entry_date, concept_id 
                        FROM
                            `""" + os.environ["WORKSPACE_CDR"] + """.cb_search_all_events` 
                        WHERE
                            (concept_id IN(SELECT
                                DISTINCT ca.descendant_id 
                            FROM
                                `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria_ancestor` ca 
                            JOIN
                                (SELECT
                                    DISTINCT c.concept_id       
                                FROM
                                    `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria` c       
                                JOIN
                                    (SELECT
                                        CAST(cr.id as string) AS id             
                                    FROM
                                        `""" + os.environ["WORKSPACE_CDR"] + """.cb_criteria` cr             
                                    WHERE
                                        concept_id IN (1341927)             
                                        AND full_text LIKE '%_rank1]%'       ) a 
                                        ON (c.path LIKE CONCAT('%.', a.id, '.%') 
                                        OR c.path LIKE CONCAT('%.', a.id) 
                                        OR c.path LIKE CONCAT(a.id, '.%') 
                                        OR c.path = a.id) 
                                WHERE
                                    is_standard = 1 
                                    AND is_selectable = 1) b 
                                    ON (ca.ancestor_id = b.concept_id)) 
                                AND is_standard = 1)) criteria ) ))) d_exposure 
        LEFT JOIN
            `""" + os.environ["WORKSPACE_CDR"] + """.concept` d_standard_concept 
                ON d_exposure.drug_concept_id = d_standard_concept.concept_id 
        LEFT JOIN
            `""" + os.environ["WORKSPACE_CDR"] + """.concept` d_type 
                ON d_exposure.drug_type_concept_id = d_type.concept_id 
        LEFT JOIN
            `""" + os.environ["WORKSPACE_CDR"] + """.concept` d_route 
                ON d_exposure.route_concept_id = d_route.concept_id 
        LEFT JOIN
            `""" + os.environ["WORKSPACE_CDR"] + """.visit_occurrence` v 
                ON d_exposure.visit_occurrence_id = v.visit_occurrence_id 
        LEFT JOIN
            `""" + os.environ["WORKSPACE_CDR"] + """.concept` d_visit 
                ON v.visit_concept_id = d_visit.concept_id 
        LEFT JOIN
            `""" + os.environ["WORKSPACE_CDR"] + """.concept` d_source_concept 
                ON d_exposure.drug_source_concept_id = d_source_concept.concept_id"""

dataset_44856103_drug_df = pandas.read_gbq(
    dataset_44856103_drug_sql,
    dialect="standard",
    use_bqstorage_api=("BIGQUERY_STORAGE_API_ENABLED" in os.environ),
    progress_bar_type="tqdm_notebook")

dataset_44856103_drug_df.head(5)

In [None]:
dataset_44856103_drug_df.person_id.nunique()

## How to write your own query to replicate this example ?

In [None]:
import os
dataset=os.environ["WORKSPACE_CDR"]
dataset

Write a BQ() function for query purpose.

In [None]:
import pandas as pd
from google.cloud import bigquery
client = bigquery.Client()

def BQ(query:str, dataset = dataset):
    
    job_config = bigquery.QueryJobConfig(default_dataset=dataset)
    query_job = client.query(query, job_config =job_config)  # API request
    df = query_job.result().to_dataframe()
        
    return df

Take a look at this ATC code

In [None]:
code='C09BA02'

In [None]:
query=f"""
SELECT * FROM concept
WHERE concept_code IN ('{code}')
"""
df=BQ(query)
df.shape

In [None]:
df

Since this is an ATC 5th code, we need to find the mapped standard concept_code using relationship table
and `relationship_id='Maps to'`

In [None]:
query=f"""
SELECT 
      concept_id_1,
      c1.concept_name,c1.concept_code,c1.vocabulary_id,c1.concept_class_id,
      cr.relationship_id,
      concept_id_2,c2.concept_code, c2.vocabulary_id,c2.concept_class_id,
      c2.concept_name 
    FROM concept_relationship cr
    JOIN concept c1 ON cr.concept_id_1 = c1.concept_id
    JOIN concept c2 ON cr.concept_id_2 = c2.concept_id
    WHERE c1.concept_code IN ('{code}')
    AND relationship_id IN ('Maps to')
"""

df=BQ(query)
df.shape

In [None]:
df

And then we use this standard `cocnept_id=1341927` to extract the records from the drug_exposure table using the ancestor table

In [None]:
cid=(1341927)
cid

In [None]:
query=f"""

WITH df1 AS (
SELECT distinct descendant_concept_id
    FROM concept_ancestor    
    WHERE  ancestor_concept_id IN ({cid})
    )
    
SELECT COUNT(DISTINCT person_id) countp
FROM drug_exposure
WHERE drug_concept_id IN (SELECT descendant_concept_id FROM df1)
"""

df=BQ(query)
df.shape

We get the exact count as using the cohort/dataset builder tools

In [None]:
df

Feel free to test this all_in_one bigquery, given any ATC code. We write a function for this purpose.

In [None]:
def get_count_pid(code):
    """Returns cids in relationship table given a cid"""
    query=f"""

WITH df1 AS (
  -- Get all 5th-level descendants 
  SELECT ancestor_concept_id, ancestor.concept_code,ancestor.concept_class_id,
  descendant_concept_id,descendant.concept_code, descendant.vocabulary_id,descendant.concept_class_id,
  FROM concept ancestor
  JOIN concept_ancestor ON ancestor.concept_id = ancestor_concept_id
  JOIN concept descendant ON descendant.concept_id = descendant_concept_id
    WHERE ancestor.vocabulary_id = 'ATC' 
    AND ancestor.concept_code IN ('{code}')
    AND descendant.vocabulary_id = 'ATC'
    AND LENGTH(descendant.concept_code) = 7
   ),

df11 AS (SELECT 
      concept_id_1,
      c1.concept_name,c1.concept_code,c1.vocabulary_id,c1.concept_class_id,
      cr.relationship_id,
      concept_id_2,c2.concept_code, c2.vocabulary_id,c2.concept_class_id,
      c2.concept_name 
    FROM concept_relationship cr
    JOIN concept c1 ON cr.concept_id_1 = c1.concept_id
    JOIN concept c2 ON cr.concept_id_2 = c2.concept_id
    WHERE concept_id_1 IN (SELECT descendant_concept_id FROM df1)
    AND relationship_id IN ('Maps to')
    ),    
    
    df22 AS (
    SELECT DISTINCT descendant_concept_id
    FROM concept_ancestor    
    WHERE  ancestor_concept_id IN (SELECT concept_id_2 FROM df11)
    )
   
   SELECT
  COUNT(DISTINCT person_id) countp , '{code}' ATC_code
FROM drug_exposure
WHERE drug_concept_id IN (
  SELECT descendant_concept_id 
  FROM df22
)   
   """
    return BQ(query)

In [None]:
res=get_count_pid(code)
res

Test other ATC codes, as shown here

![](drug_cb2.jpg)

In [None]:
# Display the image
display(Image(filename='drug_cb2.jpg'))

In [None]:
# ATC 4th
code='C09AA'
res=get_count_pid(code)
res

In [None]:
# ATC 3rd
code='C09A'
res=get_count_pid(code)
res

In [None]:
# ATC 2nd
code='C09'
res=get_count_pid(code)
res

You may consider using these additional ATC-specific relationships if you want to get the maximal counts

'ATC - RxNorm',
'ATC - RxNorm pr lat',
'ATC - RxNorm sec lat',
'ATC - RxNorm pr up',
'ATC - RxNorm sec up',

In [None]:
query=f"""

WITH df1 AS (
  -- Get all 5th-level descendants 
  SELECT ancestor_concept_id, ancestor.concept_code,ancestor.concept_class_id,
  descendant_concept_id,descendant.concept_code, descendant.vocabulary_id,descendant.concept_class_id,
  -- concept_id_2, relationship_id, c3.concept_code, c3.vocabulary_id
  FROM concept ancestor
  JOIN concept_ancestor ON ancestor.concept_id = ancestor_concept_id
  JOIN concept descendant ON descendant.concept_id = descendant_concept_id
    WHERE ancestor.vocabulary_id = 'ATC' 
    AND ancestor.concept_code IN ('{code}')
    AND descendant.vocabulary_id = 'ATC'
    AND LENGTH(descendant.concept_code) = 7
   ),

df11 AS (SELECT 
      concept_id_1,
      c1.concept_name,c1.concept_code,c1.vocabulary_id,c1.concept_class_id,
      cr.relationship_id,
      concept_id_2,c2.concept_code, c2.vocabulary_id,c2.concept_class_id,
      c2.concept_name 
    FROM concept_relationship cr
    JOIN concept c1 ON cr.concept_id_1 = c1.concept_id
    JOIN concept c2 ON cr.concept_id_2 = c2.concept_id
    WHERE concept_id_1 IN (select descendant_concept_id from df1)
    AND relationship_id IN ('ATC - RxNorm pr lat','ATC - RxNorm sec lat','ATC - RxNorm pr up',
'ATC - RxNorm sec up','ATC - RxNorm')
    ),    
    
    df22 AS (
    SELECT distinct descendant_concept_id
    FROM concept_ancestor    
    WHERE  ancestor_concept_id IN (SELECT concept_id_2 FROM df11)
    )

   
  SELECT
  COUNT(DISTINCT person_id) countp , '{code}' ATC_code
FROM drug_exposure
WHERE drug_concept_id IN (
  SELECT descendant_concept_id 
  FROM df22
)   

   
   """

In [None]:
res=BQ(query)
res