# IQVIA NLP - Content Store Search and Feature Extraction

## API Description
In today’s complex and competitive environment finding the right data to generate the right insight matters. External sources contain vast amounts of rich data but going source by source, searching for and ingesting the right data is time consuming and takes vital time away from developing impactful insights. The IQVIA NLP Content Store APIs are your go-to resource for keeping your team’s focus out of the data weeds and where it belongs: making strategic decisions to achieve your goals.

## Accessing the API
In order to consume this API, you will first need to request access to the Content Store Search API via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12lAAA/external-api-marketplaceiqvianlpcontentstoresearch

And the access to the Content Store Feature Extraction API via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12qAAA/external-api-marketplaceiqvianlpcontentstorefeatureextraction .

Please refer to "API Documentation" to learn more about accessing and using the API.

## Notebook Description
This notebook is designed to show users an example of using the IQVIA NLP - Content Store Search API and IQVIA NLP - Content Store Feature Extraction API.

### Authorization
The instructions for getting your credentials and the API endpoint URL can be found under the section "Get Started" and "How to use the API" following this link: https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000yu12lAAA/external-api-marketplaceiqvianlpcontentstoresearch

In [2]:
import getpass

# Get URL and credentials from customers
api_endpoint_url = input('Please enter the Content Store Search API URL: ').rstrip('/')

mkp_user = input("Marketplace clientId: ")
mkp_password = getpass.getpass("Marketplace clientSecret: ")
mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}

print("Thanks for inputting URL, your user name and password!")

Please enter the Content Store Search API URL: https://vt.eu-apim-devtest.solutions.iqvia.com/eu/easl/api/v2/easl
Marketplace clientId: a99b627bb38a4be38d2804869e3b4c40
Marketplace clientSecret: ········
Thanks for inputting URL, your user name and password!


### Example: Make a request with I2E query on selected I2E index
Content Store Search API expects an I2E query and an Index name as input parameter. This example shows how to make a request to the API with a pre-configured API from  as input.

In [3]:
import requests
import os
import time

# Define input query and index to run
input_query = os.path.join(os.getcwd(), "demo_queries/Content_Store_Search-Post_EASL_Run/biomkr_reln-covid19_disease.i2qy")
index_name = "medline-full"

# Make a request
print("Posting query request...")
with open(input_query, "r") as query:
    response = requests.post(api_endpoint_url, headers=mkp_headers, files={'query': (input_query, query, "application/octet-stream")}, data={'index_name': index_name})

# Poll the API until results are available
while response.status_code == 202:
    print('Results are not available yet. Waiting 5 seconds before polling again...')
    time.sleep(5)
    # Use the run id from the Post request to get results
    run_identifier = response.json()['id']    
    response = requests.get(url=f'{api_endpoint_url}/{run_identifier}', headers=mkp_headers)
    
# Check the response
if response.status_code == 200:
    print("Success!")
    results_json = response.json()
else:
    raise Exception(f"Unexpected status code: {response.status_code}")

print(f"Number of document found: {len(results_json)}")
print(f"Sampled results: {results_json[:5]}")

Posting query request...
Success!
Number of document found: 509
Sampled results: [{'Doc': '36374780', 'IndexName': 'medline-full'}, {'Doc': '36366324', 'IndexName': 'medline-full'}, {'Doc': '36366282', 'IndexName': 'medline-full'}, {'Doc': '36363686', 'IndexName': 'medline-full'}, {'Doc': '36363472', 'IndexName': 'medline-full'}]


Get the list of document ID returned by the Content Store Search API.

In [4]:
doc_id_list = [result["Doc"] for result in results_json]
print(f"Sampled Document IDs from results: {doc_id_list[:5]}")

Sampled Document IDs from results: ['36374780', '36366324', '36366282', '36363686', '36363472']


Fetch features from the first two documents in the list.

In [7]:
features = ["chebi", "meddra"]
results_json = []
fetch_url = input('Please enter the Content Store Feature Extraction API URL: ').rstrip('/')

id_validation = input("Is your Content Store Feature Extraction clientId the same as Content Store Search (y/n)?")

if id_validation.lower() == "y":
    print("Thank you!")
elif id_validation.lower() == "n":
    mkp_user = input("Content Store Feature Extraction API clientId: ")
    mkp_password = getpass.getpass("Content Store Feature Extraction API clientSecret: ")
    mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}
    print("Thanks for inputting your user name and password!")
else:
    print("Please type in 'y' or 'n'.")

print(f"Making requests to get features for specified documents...")

response = requests.post(fetch_url, headers=mkp_headers, json={"index_name": index_name, "params": {"features": features, "docIds": doc_id_list[:2]}})

# Poll the API until results are available
while response.status_code == 202:
    print('Results are not available yet. Waiting 5 seconds before polling again...')
    time.sleep(5)
    # Use the run id from the Post request to get results
    run_identifier = response.json()['id']    
    response = requests.get(url=f'{fetch_url}/{run_identifier}', headers=mkp_headers)
    
# Check the response
if response.status_code == 200:
    print("Success!")
    results_json = response.json()
else:
    raise Exception(f"Unexpected status code: {response.status_code}")

print(f"JSON response from the API: {results_json}")

Please enter the Content Store Feature Extraction API URL: https://vt.eu-apim-devtest.solutions.iqvia.com/eu/fetch-features/api/v2/fetch-features
Is your Content Store Feature Extraction clientId the same as Content Store Search (y/n)?n
Content Store Feature Extraction API clientId: 39f89fc1b1ba4613b41a993642bde651
Content Store Feature Extraction API clientSecret: ········
Thanks for inputting your user name and password!
Making requests to get features for specified documents...
Success!
JSON response from the API: [{'doc_id': '36374780', 'results': [{'Concept': {'logical_column_id': 0, 'value': 'Covid-19', 'indexed_spans_outer': [[1959, 1967]], 'indexed_spans_inner': [[1959, 1967]], 'text_spans_outer': [[36, 44]], 'text_spans_inner': [[36, 44]]}, '[PT] Concept': {'logical_column_id': 0, 'value': 'COVID-19'}, '[SID] Concept': {'logical_column_id': 0, 'value': 'meddra'}, '[NID] Concept': {'logical_column_id': 0, 'value': '10084268'}, 'Ontology': {'logical_column_id': 1, 'value': 'MedD

Convert to pandas dataframe

In [9]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step
for inner_results in results_json:
    for result_dict in inner_results['results']:
        df_dict = {}
        df_dict['Doc Id'] = inner_results['doc_id']
        for key, value_dict in result_dict.items():                                                                                                                                                                                                                                                                                                                           
            df_dict[key] = value_dict['value']                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
        df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)                                                                                                                                                                                                                                                                                     

# Print the DataFrame                                                                                                                                                                                                                                                                                                                                                         
df.head(10)

Unnamed: 0,Doc Id,Concept,[PT] Concept,[SID] Concept,[NID] Concept,Ontology,text
0,36374780,Covid-19,COVID-19,meddra,10084268,MedDRA,... the findings from patients with Covid-19 a...
1,36374780,Covid-19,COVID-19,meddra,10084268,MedDRA,Covid-19 is an example where MAIT ...
2,36374780,immunology,Immunology test,meddra,10062297,MedDRA,Seminars in immunology
3,36374780,ligands,ligand,chebi,52214,Chemicals (ChEBI),"... cells, recognise quite distinct ligands, t..."
4,36374780,SARS-COV2,Severe acute respiratory syndrome,meddra,10061982,MedDRA,SARS-COV2
5,36374780,vaccination,Immunisation,meddra,10021430,MedDRA,... in human viral disease and vaccination.
6,36374780,vaccines,Immunisation,meddra,10021430,MedDRA,... Covid-19 and responses to novel vaccines.
7,36374780,Viral infection,Viral infection,meddra,10047461,MedDRA,Viral infection
8,36366324,antigens,antigen,chebi,59132,Chemicals (ChEBI),... in antibody titers against SARS-CoV-2 anti...
9,36366324,biomarker,biomarker,chebi,59163,Chemicals (ChEBI),vaccine biomarker


That's it! Hope you find this tutorial useful! Bye!