# IQVIA NLP - FDA Drug Label Biomarkers

## API Description
Understanding biomarkers for drug efficacy or disease progression is critical in clinical development. Relevant data is often in unstructured text, such as clinical trial records, and drug labels.   For Data Scientists, the FDA Drug Label Biomarkers API solves this problem by offering an API that extracts gene/protein biomarkers from specific regions of FDA Drug labels (Indications and Usages section). Biomarker-disease relationships are provided (e.g.positive, negative); diseases are mapped to MedDRA, and mechanism of action data are also extracted.

## Accessing the API
In order to consume this API, you will first need to Request access to the FDA Drug Label Biomarkers API 
via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000ytJqMAAU/api-marketplaceiqvianlpfdadruglabelbiomarkerspreview .

Please refer to "API Documentation" to learn more about accessing and using the API.

## Notebook Description
This notebook is designed to biomarkers and their related diseases from from openFDA drug product labeling records. Users could specify records of interest by supplying FDA Drug Label document identifiers in the 'FDAID' parameter when posting the request.

### Authorization
The instructions for getting your credentials and the API endpoint URL can be found under the section "Get Started" and "How to use the API" following this link: https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000ytJqMAAU/api-marketplaceiqvianlpfdadruglabelbiomarkerspreview .

In [5]:
import getpass
import requests

# In this demo scenario, URL for US based customers
# api_marketplace_url = 'https://vt.us-rds.solutions.iqvia.com/fda/api/v1/biomarkers'
# In this demo scenario, URL for EU based customers
api_marketplace_url = 'https://vt.eu-apim.solutions.iqvia.com/eu/fda/api/v1/biomarkers'

mkp_user = input("Marketplace clientId: ")
mkp_password = getpass.getpass("Marketplace clientSecret: ")
mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}

print("Thanks for inputting your user name and password!")

Marketplace clientId: 77922ba7e417474a959a705fee4f272f
Marketplace clientSecret: ········
Thanks for inputting your user name and password!


### Example: Make a request for a list of known FDA Drug Label documents
FDA Drug Label Biomarker NLP API expects openFDA application number (FDAID) as input parameter. This example shows how to make a request to the API with a list of FDAIDs.


In [6]:
import requests

# Define input FDAIDs
input_fdaids = ['NDA204114', 'ANDA076183']

# Make a request
print("Posting request to extract biomarkers and related diseases from specified CT.gov documents...")

response = requests.get(api_marketplace_url, headers=mkp_headers, params={'FDAID': input_fdaids, 'rowLimit': 5})
# Check the response
if response.status_code == 200:
    print("Success!")
    results_json = response.json()
else:
    raise Exception(f"Error: {response}")

print(results_json)

Posting request to extract biomarkers and related diseases from specified CT.gov documents...
Success!
[{'doc_id': 'cdb2c4b4-a20a-3f8f-e053-2a95a90aaf72', 'results': [{'Biomarker': {'logical_column_id': 0, 'value': 'CYP3A4', 'indexed_spans_outer': [[69637, 69644]], 'indexed_spans_inner': [[69637, 69644]], 'text_spans_outer': [[71, 78]], 'text_spans_inner': [[71, 78]]}, 'Relation': {'logical_column_id': 1, 'value': 'inducer', 'indexed_spans_outer': [[69645, 69653]], 'indexed_spans_inner': [[69645, 69653]], 'text_spans_outer': [[79, 87]], 'text_spans_inner': [[79, 87]]}, '[SID] Diseases and Disorder': {'logical_column_id': 2, 'value': 'meddra', 'indexed_spans_outer': [[28552, 28558]], 'indexed_spans_inner': [[28552, 28558]], 'text_spans_outer': [[18, 24]], 'text_spans_inner': [[18, 24]]}, '[NID] Diseases and Disorder': {'logical_column_id': 2, 'value': '10028997'}, '[PT] Diseases and Disorder': {'logical_column_id': 2, 'value': 'Neoplasm malignant'}, 'Section': {'logical_column_id': 3, '

Now that we have got the JSON responses from the FDA Drug Label Biomarker API, we could convert the useful information associated with the keys into a pandas dataframe.

In [7]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step
for results in results_json:
    df_dict = {}
    for result_dict in results["results"]:
        for key, value_dict in result_dict.items():
            df_dict[key] = value_dict['value']
    df_dict["Doc Id"] = results["doc_id"]
    df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)

# Check the dataframe
df

Unnamed: 0,doc_id,results
0,cdb2c4b4-a20a-3f8f-e053-2a95a90aaf72,"[{'Biomarker': {'logical_column_id': 0, 'value..."


### Optional: Make a request to openFDA drug product labeling API to get FDAIDs associated with a specific criteria

Search URL for the openFDA drug product labeling API that allows user to search for the FDAId's of 10 documents. Search can be customized by modifying the search parameters. In this example, the search will return 10 FDAIds of FDA Drug Labels about prescription drugs with indications and usage for cancer.


In [4]:
import json

# Search URL for the openFDA drug product labeling API
# Search can be customized by modifying the search parameters in the URL

url = "https://api.fda.gov/drug/label.json?search=openfda.product_type:prescription+AND+indications_and_usage:'cancer'&limit=10"

# load in JSON reponse from the openFDA API
response = requests.get(url)
body = json.loads(response.text)

# initialize list for FDAID
fdaids = []

# find the openfda application number in response (if exists) and add to FDAID list
for i in body['results']:
    try:
        #extract the document id (ANDA, ANADA, BA, BLA, BN, BP, DMF,K, MIF, NDA, NADA, P or VM)
        x = i["openfda"]
        fdaids.append(x['application_number'][0])
    except KeyError:
        print(f"No openFDA application number found for response results id: {i['id']}")
        continue

print(fdaids)

No openFDA application number found for response results id: ae413477-0e46-4763-8005-091f81dd6070
['NDA204114', 'ANDA076183', 'ANDA075798', 'ANDA040584', 'NDA050629', 'ANDA204345', 'ANDA212399', 'NDA021116', 'BLA761310']


That's it! Hope you find this tutorial useful! Bye!