# IQVIA NLP - Content Store Search

## API Description
In today’s complex and competitive environment finding the right data to generate the right insight matters. External sources contain vast amounts of rich data but going source by source, searching for and ingesting the right data is time consuming and takes vital time away from developing impactful insights. The IQVIA NLP Content Store APIs are your go-to resource for keeping your team’s focus out of the data weeds and where it belongs: making strategic decisions to achieve your goals.

## Accessing the API
In order to consume this API, you will first need to request access to the Content Store API via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12lAAA/external-api-marketplaceiqvianlpcontentstoresearch .

Please refer to "API Documentation" to learn more about accessing and using the API.

## Notebook Description
This notebook is designed to show users an example of using the IQVIA NLP - Content Store Search API.

### Authorization
The instructions for getting your credentials and the API endpoint URL can be found under the section "Get Started" and "How to use the API" following this link: https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12lAAA/external-api-marketplaceiqvianlpcontentstoresearch

In [None]:
import getpass
import requests

# In this demo scenario, URL for US based customers
# api_marketplace_url = 'https://vt.us-rds.solutions.iqvia.com/easl/api/v1/easl'
# In this demo scenario, URL for EU based customers
api_marketplace_url = 'https://vt.eu-apim.solutions.iqvia.com/eu/easl/api/v1/easl'

mkp_user = input("Marketplace clientId: ")
mkp_password = getpass.getpass("Marketplace clientSecret: ")
mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}

# Check credentials by making a dummy request
print("Thanks for inputting your user name and password!")

### Example: Make a request with I2E query on selected I2E index
Content Store Search API expects an I2E query and an Index name as input parameter. This example shows how to make a request to the API with a pre-configured API from  as input.

In [4]:
import requests
import os

# Define input query and index to run
input_query = os.path.join(os.getcwd(), "demo_queries/Content_Store_Search-Post_EASL_Run/biomkr_reln-covid19_disease.i2qy")
index_name = "medline-full"

# Make a request
print("Posting query request...")
with open(input_query, "r") as query:
    response = requests.post(api_marketplace_url, headers=mkp_headers, files=dict(query=(input_query, query, "application/vnd.linguamatics.i2e.i2qy")), data=dict(index_name=index_name))

# Check the response
if response.status_code == 200:
    print("Success!")
    body_json = response.json()
    print(f"Raw JSON response from the API is: {body_json}")
else:
    raise Exception(f"Error: {response}")


Posting query request...
Success!
Raw JSON response from the API is: {'id': '0cffce8a-f9b7-4024-8429-2d48dd091d2f', 'status': 'PENDING', 'submission_time': '2022-09-28T11:03:06.066308Z', 'completion_time': None}


Now that we have got the JSON response from the Post EASL Run API, we could get the response with the GET endpoint for the run id returned.

In [6]:
run_id = body_json["id"]
run_url = api_marketplace_url+"/"+run_id

response = requests.get(run_url, headers=mkp_headers)

# Check the response
if response.status_code == 200:
    print("Success!")
    body = response.json()
    print(f"Raw JSON response from the API is: {body}")
else:
    raise Exception(f"Error: {response}")


Success!
Raw JSON response from the API is: {'id': '0cffce8a-f9b7-4024-8429-2d48dd091d2f', 'results': [{'Doc': '36149303', 'IndexName': 'medline-full'}, {'Doc': '36146835', 'IndexName': 'medline-full'}, {'Doc': '36143448', 'IndexName': 'medline-full'}, {'Doc': '36143338', 'IndexName': 'medline-full'}, {'Doc': '36143280', 'IndexName': 'medline-full'}, {'Doc': '36143211', 'IndexName': 'medline-full'}, {'Doc': '36143072', 'IndexName': 'medline-full'}, {'Doc': '36142913', 'IndexName': 'medline-full'}, {'Doc': '36142336', 'IndexName': 'medline-full'}, {'Doc': '36140771', 'IndexName': 'medline-full'}, {'Doc': '36140551', 'IndexName': 'medline-full'}, {'Doc': '36140543', 'IndexName': 'medline-full'}, {'Doc': '36140490', 'IndexName': 'medline-full'}, {'Doc': '36140471', 'IndexName': 'medline-full'}, {'Doc': '36140453', 'IndexName': 'medline-full'}, {'Doc': '36140055', 'IndexName': 'medline-full'}, {'Doc': '36139946', 'IndexName': 'medline-full'}, {'Doc': '36139597', 'IndexName': 'medline-full'

Get the list of document ID returned by the API.

In [7]:
doc_id_list = [result["Doc"] for result in body["results"]]
print(f"Document IDs are: {doc_id_list}")

Document IDs are: ['36149303', '36146835', '36143448', '36143338', '36143280', '36143211', '36143072', '36142913', '36142336', '36140771', '36140551', '36140543', '36140490', '36140471', '36140453', '36140055', '36139946', '36139597', '36139072', '36138306', '36138150', '36136251', '36136238', '36135003', '36134517', '36131504', '36131342', '36129404', '36129169', '36129046', '36126729', '36125534', '36125526', '36125152', '36125149', '36124586', '36124564', '36124254', '36123837', '36123740', '36123519', '36123384', '36119415', '36117150', '36116608', '36116582', '36114671', '36114180', '36112136', '36111943', '36111942', '36111789', '36111618', '36111511', '36111172', '36109825', '36106817', '36106293', '36104754', '36104292', '36104128', '36104051', '36102943', '36102684', '36102258', '36099790', '36099709', '36097650', '36097568', '36097300', '36096223', '36093351', '36090302', '36089786', '36086941', '36086888', '36084236', '36083979', '36080827', '36079087', '36078166', '36077708

Fetch features

In [11]:
features = "chebi"
body_json = []
for doc_id in doc_id_list[:2]:
    fetch_url = f"https://vt.eu-apim.solutions.iqvia.com/eu/fetch-features/api/v1/fetch-features?index={index_name}&features={features}&docIds={doc_id}"
    print(f"Making requests to get features for {doc_id}...")
    response = requests.get(fetch_url, headers=mkp_headers)
    # Check the response
    if response.status_code == 200:
        print("Success!")
        json_response = response.json()
        body_json.append(json_response)
    else:
        raise Exception(f"Error: {response}")
print(f"Raw JSON response from the API for documents of interest: {body_json}")

Making requests to get features...
Success!
Raw JSON response from the API is: [{'doc_id': '36149303', 'results': [{'Concept': {'logical_column_id': 0, 'value': 'drug', 'indexed_spans_outer': [[1494, 1498]], 'indexed_spans_inner': [[1494, 1498]], 'text_spans_outer': [[54, 58]], 'text_spans_inner': [[54, 58]]}, '[PT] Concept': {'logical_column_id': 0, 'value': 'drug'}, '[SID] Concept': {'logical_column_id': 0, 'value': 'chebi'}, '[NID] Concept': {'logical_column_id': 0, 'value': '23888'}, 'Ontology': {'logical_column_id': 1, 'value': 'Chemicals (ChEBI)', 'indexed_spans_outer': [[1494, 1498]], 'indexed_spans_inner': [[1494, 1498]], 'text_spans_outer': [[54, 58]], 'text_spans_inner': [[54, 58]]}, 'text': {'value': '... interindividual and intraindividual variations in drug efficacy and safety.'}}, {'Concept': {'logical_column_id': 0, 'value': 'drug', 'indexed_spans_outer': [[1575, 1579]], 'indexed_spans_inner': [[1575, 1579]], 'text_spans_outer': [[38, 42]], 'text_spans_inner': [[38, 42]]

Success!
Raw JSON response from the API is: []
Making requests to get features...
Success!
Raw JSON response from the API is: [{'doc_id': '36143338', 'results': [{'Concept': {'logical_column_id': 0, 'value': 'protein-coding', 'indexed_spans_outer': [[1412, 1426]], 'indexed_spans_inner': [[1412, 1426]], 'text_spans_outer': [[36, 50]], 'text_spans_inner': [[36, 50]]}, '[PT] Concept': {'logical_column_id': 0, 'value': 'protein'}, '[SID] Concept': {'logical_column_id': 0, 'value': 'chebi'}, '[NID] Concept': {'logical_column_id': 0, 'value': '36080'}, 'Ontology': {'logical_column_id': 1, 'value': 'Chemicals (ChEBI)', 'indexed_spans_outer': [[1412, 1426]], 'indexed_spans_inner': [[1412, 1426]], 'text_spans_outer': [[36, 50]], 'text_spans_inner': [[36, 50]]}, 'text': {'value': 'Despite the functional relevance of protein-coding regions, rare variants located ...'}}, {'Concept': {'logical_column_id': 0, 'value': 'protein-coding', 'indexed_spans_outer': [[1412, 1426]], 'indexed_spans_inner': [[

Convert to pandas dataframe

In [13]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step

for result_list in body_json:
    for result_dict in result_list:
        results = result_dict["results"]
        df_dict = {"doc_id": result_dict["doc_id"]}
        for output_dict in results:
            for key, value_dict in output_dict.items():
                df_dict[key] = value_dict['value']
        df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)

# df.to_csv("content_store_search_output_demo.csv", index=False)

# Check the dataframe
df

Unnamed: 0,doc_id,Concept,[PT] Concept,[SID] Concept,[NID] Concept,Ontology,text
0,36149303,pharmaceutical,pharmaceutical,chebi,52217,Chemicals (ChEBI),"... scholars, health professionals, pharmaceut..."
1,36146835,RNA,ribonucleic acid,chebi,33697,Chemicals (ChEBI),"... cases, sequencing of wastewater-derived RN..."
2,36143338,protein-coding,protein polypeptide chain,chebi,16541,Chemicals (ChEBI),Despite the functional relevance of protein-co...
3,36143280,thrombin,Thrombin,chebi,9574,Chemicals (ChEBI),"... GCA) including thromboelastography, thromb..."


That's it! Hope you find this tutorial useful! Bye!