# IQVIA NLP - Clinical Trials Biomarkers

## API Description
Understanding biomarkers for drug efficacy or disease progression is critical in clinical development, but relevant data is often in unstructured text, such as clinical trial records, and drug labels.  For Data Scientists, ClinicalTrials.gov Biomarkers solves this problem by offering an NLP API that extracts gene/protein biomarkers from specific regions of ClinicalTrials.gov records (Inclusion & exclusion criteria, primary and secondary outcomes).

This simple API quickly identifies gene/disease biomarkers from ClinicalTrials.gov records, for known indications (mapped to MedDRA), from an NCT ID.  Biomarker-disease relationships are provided (e.g. positive, negative); diseases are mapped to MedDRA and mechanism of action data are also extracted.

## Accessing the API
In order to consume this API, you will first need to Request access to the Clinical Trials Biomarkers API 
via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000ytPu7AAE/external-api-marketplaceclinicaltrialsbiomarkers .

Please refer to "API Documentation" to learn more about accessing and using the API.

## Notebook Description
This notebook is designed to biomarkers and their related diseases from from ClinicalTrials.gov records. Users could specify records of interest by supplying ClinicalTrials.gov document identifiers in the 'NCTID' parameter when posting the request.

### Authorization
The instructions for getting your credentials and the API endpoint URL can be found under the section "Get Started" and "How to use the API" following this link: https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000ytPu7AAE/external-api-marketplaceclinicaltrialsbiomarkers .

In [2]:
import getpass
import requests

# In this demo scenario, URL for US based customers
# api_marketplace_url = 'https://vt.us-rds.solutions.iqvia.com/clinicaltrial/api/v1/biomarkers'
# In this demo scenario, URL for EU based customers
api_marketplace_url = 'https://vt.eu-apim.solutions.iqvia.com/eu/clinicaltrial/api/v1/biomarkers'

mkp_user = input("Marketplace clientId: ")
mkp_password = getpass.getpass("Marketplace clientSecret: ")
mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}

print("Thanks for inputting your user name and password!")

Marketplace clientId: ccb8d0c8612748dc955c794d3fc7b166
Marketplace clientSecret: ········
Checking your credentials, please wait...
Congratulations! Your credentials are accepted!


### Example: Make a request for a list of known Clinical Trials documents
Clinical Trials Biomarker NLP API expects ClinicalTrials.gov document identifier (NCTID) as input parameter. This example shows how to make a request to the API with a list of NCTIDs.


In [3]:
import requests

# Define input NCTIDs
input_nctids = ['NCT04875351', 'NCT03180086']

# Make a request
print("Posting request to extract biomarkers and related diseases from specified CT.gov documents...")

response = requests.get(api_marketplace_url, headers=mkp_headers, params={'NCTID': input_nctids})
# Check the response
if response.status_code == 200:
    print("Success!")
    results_json = response.json()
else:
    raise Exception(f"Error: {response}")

print(results_json)

Posting request to extract biomarkers and related diseases from specified CT.gov documents...
Processing NCT04875351...
Success!
Processing NCT03180086...
Success!
[[{'Biomarker': 'ERBB2', 'Relation': 'negative', '[NID] MedDRA': '10006187', '[PT] MedDRA': 'Breast cancer', 'Section': 'Inclusion Criteria & Condition', 'NCTID': 'NCT04875351', 'Source': 'https://clinicaltrials.gov/show/NCT04875351', 'Hit': 'Breast Cancer... Inclusion Criteria: - Early stage (I, II or III) female breast cancer patients, who have completed 4-7 years of primary adjuvant endocrine therapy... - The primary tumor was HER2 negative or positive and node-negative or ...'}, {'Biomarker': 'ESR1', 'Relation': 'positive', '[NID] MedDRA': '10006187', '[PT] MedDRA': 'Breast cancer', 'Section': 'Inclusion Criteria & Condition', 'NCTID': 'NCT04875351', 'Source': 'https://clinicaltrials.gov/show/NCT04875351', 'Hit': 'Breast Cancer... Inclusion Criteria: - Early stage (I, II or III) female breast cancer patients, who have co

Now that we have got the JSON responses from the Clinical Trials Biomarker API, we could convert the useful information associated with the keys into a pandas dataframe.

In [6]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step
for results in results_json:
    df_dict = {}
    for result_dict in results["results"]:
        for key, value_dict in result_dict.items():
            df_dict[key] = value_dict['value']
    df_dict["Doc Id"] = results["doc_id"]
    df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)

# Check the dataframe
df

Unnamed: 0,Biomarker,Relation,[NID] MedDRA,[PT] MedDRA,Section,NCTID,Source,Hit
0,ERBB2,negative,10006187,Breast cancer,Inclusion Criteria & Condition,NCT04875351,https://clinicaltrials.gov/show/NCT04875351,Breast Cancer... Inclusion Criteria: - Early s...
1,ESR1,positive,10006187,Breast cancer,Inclusion Criteria & Condition,NCT04875351,https://clinicaltrials.gov/show/NCT04875351,Breast Cancer... Inclusion Criteria: - Early s...
2,,,10006187,Breast cancer,Condition,NCT03180086,https://clinicaltrials.gov/show/NCT03180086,Breast Cancer


### Optional: Make a request to ClinicalTrials.gov API to get NCTIDs associated with a specific disease

Search URL for the ClincialTrials.gov API that allows user to search for the NCTId's of 10 documents. Search can be customized by modifying the study_fields expr value. In this example, the search will return 10 NCTIds of clinical trials about Breast Cancer.

In [7]:
import json

# Search URL for the ClincialTrials.gov API that allows user to search for the NCTId's of 10 documents
# Search can be customized by modifying the study_fields expr value
# This search will return 10 NCTIds of clinical trials about Breast Cancer
url = "https://clinicaltrials.gov/api/query/study_fields?expr=Breast+Cancer&fields=NCTId&min_rnk=1&max_rnk=10&fmt=json"
response = requests.get(url)

# Read in the JSON from the ClinicalTrials.gov API request
j = json.loads(response.text)

# initialize list for storing the NCTId's we will submit to endpoint
nctids = []

# Pull out the NCTId's for each document returned by the ClinicalTrials.gov and add to the list
for item in j['StudyFieldsResponse']['StudyFields']:
    nctids.append(item['NCTId'][0])

print(nctids)

['NCT04875351', 'NCT03180086', 'NCT03662633', 'NCT02493569', 'NCT03598660', 'NCT04167605', 'NCT04516330', 'NCT04495244', 'NCT03343691', 'NCT05082740']


That's it! Hope you find this tutorial useful! Bye!