# IQVIA NLP - Content Store Search and Feature Extraction

## API Description
In today’s complex and competitive environment finding the right data to generate the right insight matters. External sources contain vast amounts of rich data but going source by source, searching for and ingesting the right data is time consuming and takes vital time away from developing impactful insights. The IQVIA NLP Content Store APIs are your go-to resource for keeping your team’s focus out of the data weeds and where it belongs: making strategic decisions to achieve your goals.

## Accessing the API
In order to consume this API, you will first need to request access to the Content Store Search API via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12lAAA/external-api-marketplaceiqvianlpcontentstoresearch

And the access to the Content Store Feature Extraction API via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12qAAA/external-api-marketplaceiqvianlpcontentstorefeatureextraction .

Please refer to "API Documentation" to learn more about accessing and using the API.

## Notebook Description
This notebook is designed to show users an example of using the IQVIA NLP - Content Store Search API and IQVIA NLP - Content Store Feature Extraction API.

### Authorization
The instructions for getting your credentials and the API endpoint URL can be found under the section "Get Started" and "How to use the API" following this link: https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12lAAA/external-api-marketplaceiqvianlpcontentstoresearch

In [1]:
import getpass

# In this demo scenario, URL for US based customers
# api_marketplace_url = 'https://vt.us-rds.solutions.iqvia.com/easl/api/v1/easl'
# In this demo scenario, URL for EU based customers
api_marketplace_url = 'https://vt.eu-apim.solutions.iqvia.com/eu/easl/api/v1/easl'

mkp_user = input("Content Store Search API clientId: ")
mkp_password = getpass.getpass("Content Store Search API clientSecret: ")
mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}

# Check credentials by making a dummy request
print("Thanks for inputting your user name and password!")

Marketplace clientId: bb7c0c66b89b4d75956b113c686be9bf
Marketplace clientSecret: ········
Thanks for inputting your user name and password!


### Example: Make a request with I2E query on selected I2E index
Content Store Search API expects an I2E query and an Index name as input parameter. This example shows how to make a request to the API with a pre-configured API from  as input.

In [5]:
import requests
import os

# Define input query and index to run
input_query = os.path.join(os.path.dirname(os.getcwd()), "demo_queries/Content_Store_Search-Post_EASL_Run/biomkr_reln-covid19_disease.i2qy")
index_name = "medline-full"

# Make a request
print("Posting query request...")
with open(input_query, "r") as query:
    response = requests.post(api_marketplace_url, headers=mkp_headers, files=dict(query=(input_query, query, "application/vnd.linguamatics.i2e.i2qy")), data=dict(index_name=index_name))

Posting query request...


In [6]:
# Check the response
if response.status_code == 200:
    print("Success!")
    body_json = response.json()
    print(f"Run Id from the API is: {body_json['id']}\nStatus of the run is: {body_json['status']}")
else:
    raise Exception(f"Error: {response}")

Success!
Raw JSON response from the API is: {'id': 'dd929002-a251-4ad8-8521-b5a54c59779e', 'status': 'PENDING', 'submission_time': '2022-11-30T10:43:17.481031Z', 'completion_time': None}


Now that we have got the JSON response from the Post EASL Run API, we could get the response with the GET endpoint for the run id returned.

In [7]:
run_id = body_json["id"]
run_url = api_marketplace_url+"/"+run_id

response = requests.get(run_url, headers=mkp_headers)

# Check the response
if response.status_code == 200:
    print("Success!")
    body = response.json()
    print(f"Run Id from the API is: {body['id']}\nStatus of the run is: {body['status']}")
else:
    raise Exception(f"Error: {response}")


Success!
Raw JSON response from the API is: {'id': 'dd929002-a251-4ad8-8521-b5a54c59779e', 'results': [{'Doc': '36422480', 'IndexName': 'medline-full'}, {'Doc': '36417336', 'IndexName': 'medline-full'}, {'Doc': '36400969', 'IndexName': 'medline-full'}, {'Doc': '36374780', 'IndexName': 'medline-full'}, {'Doc': '36366324', 'IndexName': 'medline-full'}, {'Doc': '36366282', 'IndexName': 'medline-full'}, {'Doc': '36363686', 'IndexName': 'medline-full'}, {'Doc': '36363472', 'IndexName': 'medline-full'}, {'Doc': '36361818', 'IndexName': 'medline-full'}, {'Doc': '36359444', 'IndexName': 'medline-full'}, {'Doc': '36353011', 'IndexName': 'medline-full'}, {'Doc': '36352911', 'IndexName': 'medline-full'}, {'Doc': '36347056', 'IndexName': 'medline-full'}, {'Doc': '36346096', 'IndexName': 'medline-full'}, {'Doc': '36338506', 'IndexName': 'medline-full'}, {'Doc': '36335709', 'IndexName': 'medline-full'}, {'Doc': '36329334', 'IndexName': 'medline-full'}, {'Doc': '36320966', 'IndexName': 'medline-full'

Get the list of document ID returned by the API.

In [8]:
doc_id_list = [result["Doc"] for result in body["results"]]
print(f"Number of document found: {len(doc_id_list)}")
print(f"Sampled Document IDs from results: {doc_id_list[:5]}")

Document IDs are: ['36422480', '36417336', '36400969', '36374780', '36366324', '36366282', '36363686', '36363472', '36361818', '36359444', '36353011', '36352911', '36347056', '36346096', '36338506', '36335709', '36329334', '36320966', '36320192', '36318846', '36318662', '36316288', '36315170', '36313911', '36313910', '36310154', '36308249', '36305225', '36299432', '36298704', '36298678', '36298522', '36298235', '36297248', '36297155', '36295791', '36295594', '36295478', '36295093', '36295088', '36294842', '36294770', '36294430', '36294309', '36293332', '36293181', '36292243', '36292068', '36292038', '36291035', '36290964', '36289811', '36289796', '36289725', '36289356', '36289276', '36285728', '36285702', '36285536', '36284383', '36283064', '36282139', '36281029', '36278053', '36273247', '36271948', '36271745', '36270074', '36262026', '36262025', '36257329', '36254583', '36253560', '36253553', '36252893', '36251682', '36251583', '36250841', '36246016', '36245940', '36245411', '36245027

Fetch features from the first two documents in the list.

In [17]:
features = ["chebi", "meddra"]
results_json = []
fetch_url = f"https://vt.eu-apim.solutions.iqvia.com/eu/fetch-features/api/v1/fetch-features"

id_validation = input("Is your Content Store Feature Extraction clientId the same as Content Store Search (y/n)?")

if id_validation.lower() == "y":
    print("Thank you!")
elif id_validation.lower() == "n":
    mkp_user = input("Content Store Feature Extraction API clientId: ")
    mkp_password = getpass.getpass("Content Store Feature Extraction API clientSecret: ")
    mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}
    print("Thanks for inputting your user name and password!")
else:
    print("Please type in 'y' or 'n'.")

print(f"Making requests to get features for specified documents...")
for feature in features:
    response = requests.get(fetch_url, headers=mkp_headers, params={"index": index_name, "features": feature, "docIds": doc_id_list[:2]})
    # Check the response
    if response.status_code == 200:
        print(f"Successfully fetched for feature: {feature}!")
        json_response = response.json()
        results_json.append(json_response)
    else:
        raise Exception(f"Error: {response}")
print(f"Example of the JSON response from the API for one document: {results_json[0][0]}")

Making requests to get features for specified documents...
Success!
Success!
Raw JSON response from the API for documents of interest: [[{'doc_id': '36422480', 'results': [{'Concept': {'logical_column_id': 0, 'value': 'biomarker', 'indexed_spans_outer': [[8625, 8634]], 'indexed_spans_inner': [[8625, 8634]], 'text_spans_outer': [[21, 30]], 'text_spans_inner': [[21, 30]]}, '[PT] Concept': {'logical_column_id': 0, 'value': 'biomarker'}, '[SID] Concept': {'logical_column_id': 0, 'value': 'chebi'}, '[NID] Concept': {'logical_column_id': 0, 'value': '59163'}, 'Ontology': {'logical_column_id': 1, 'value': 'Chemicals (ChEBI)', 'indexed_spans_outer': [[8625, 8634]], 'indexed_spans_inner': [[8625, 8634]], 'text_spans_outer': [[21, 30]], 'text_spans_inner': [[21, 30]]}, 'text': {'value': 'Smell dysfunction: A biomarker for covid-19.'}}, {'Concept': {'logical_column_id': 0, 'value': 'CD', 'indexed_spans_outer': [[15153, 15155]], 'indexed_spans_inner': [[15153, 15155]], 'text_spans_outer': [[28, 30

Convert to pandas dataframe

In [18]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step

for result_list in results_json:
    for result_dict in result_list:
        results = result_dict["results"]
        df_dict = {"Doc ID": result_dict["doc_id"]}
        for output_dict in results:
            for key, value_dict in output_dict.items():
                df_dict[key] = value_dict['value']
        df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)

# df.to_csv("content_store_search_output_demo.csv", index=False)

# Check the dataframe
df

Unnamed: 0,doc_id,Concept,[PT] Concept,[SID] Concept,[NID] Concept,Ontology,text
0,36422480,protease inhibitor,protease inhibitor,chebi,37670,Chemicals (ChEBI),... blocked by a clinically proven protease in...
1,36417336,colchicine,colchicine,chebi,23359,Chemicals (ChEBI),Effect of colchicine vs standard care on cardi...
2,36422480,taste disorders,Taste disorders,meddra,10043131,MedDRA,Acute-onset smell and taste disorders in the c...
3,36417336,weight,Weight,meddra,10047890,MedDRA,... an unhealthy phenotype: normal weight but ...


That's it! Hope you find this tutorial useful! Bye!