# IQVIA NLP - Healthcare Concepts

## API Description
NLP with the ability to recognize key Healthcare concepts, whilst recognizing context and patterns in data such as drugs, disease, smoking categorization, which are relevant to healthcare organizations.

## Accessing the API
In order to consume this API, you will first need to Request access to the Healthcare Concepts API via this link:
https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000zhFVKAA2/api-marketplaceiqvianlphealthcareconceptspreview

Please refer to "API Documentation" to learn more about accessing and using the API.

## Notebook Description
This notebook is designed to show users an example of using the Healthcare Concepts NLP API to extract features related to healthcare concepts such as populations and medications found in medical records.

### Authorization
The instructions for getting your credentials and the API endpoint URL can be found under the section "Get Started" and "How to use the API" following this link: https://api-marketplace.work.iqvia.com/s/communityapi/a085w00000zhFVKAA2/api-marketplaceiqvianlphealthcareconceptspreview

In [1]:
import getpass

# Getting credentials
api_marketplace_url = input('Please enter the API URL: ').rstrip('/')

mkp_user = input("Marketplace clientId: ")
mkp_password = getpass.getpass("Marketplace clientSecret: ")
mkp_headers = {'clientId': mkp_user, 'clientSecret': mkp_password}

print("Thanks for inputting URL, your user name and password!")

Please enter the API URL: https://vt.eu-apim-devtest.solutions.iqvia.com/eu/multi/api/v2/coding/multi
Marketplace clientId: 3c5b244fcc99407e9a31c1e21272e72d
Marketplace clientSecret: ········
Thanks for inputting URL, your user name and password!


### Example one: Make a request with text string as input
Healthcare Concepts NLP API expects the String as Request Data Type. This example shows how to make a request to the API with text strings as input.

In [2]:
import requests
import time

# Define input text
input_text = "HISTORY OF PRESENT ILLNESS:  This 60-year-old white male is referred to us by his medical physician with a complaint of recent finding of a both pancreatic lesion and lesions with left adrenal gland.  The patient's history dates back to at the end of the January of this past year when he began experiencing symptoms consistent with difficulty almost like a suffocating feeling whenever he would lie flat on his back.  He noticed whenever he would recline backwards, he would begin this feeling and it is so bad now that he can barely recline, very little before he has this feeling. He does have a history of frequent urination.  Has been followed by urologist for this.  There is no family history of pancreatic cancer.  There is a history of gallstone pancreatitis in the patient's sister. MEDICATIONS:  Include glipizide 5 mg b.i.d., metformin 500 mg b.i.d., Atacand 16 mg daily, metoprolol 25 mg b.i.d., Lipitor 10 mg daily, pantoprazole 40 mg daily, Flomax 0.4 mg daily, Detrol 4 mg daily, Zyrtec 10 mg daily, Advair Diskus 100/50 mcg one puff b.i.d., and fluticasone spray 50 mcg two sprays daily. PAST SURGICAL HISTORY:  He has not had any previous surgery."

# Make a request
print("Posting text strings...")
response = requests.post(api_marketplace_url, headers=mkp_headers, files={'source_data': input_text})

# Poll the API until results are available
while response.status_code == 202:
    print('Results are not available yet. Waiting 5 seconds before polling again...')
    time.sleep(5)
    run_identifier = response.json()['id']
    response = requests.get(url=f'{api_marketplace_url}/{run_identifier}', headers=headers)

# Check the response
if response.status_code != 200:
    raise Exception(f'Unexpected status code: {response.status_code}')
results = response.json()

# Print the results
print(f'Results: \n{results}\n')


Posting text strings...
Results: 
[{'doc_id': 'source_data', 'results': [{'concept_1': {'logical_column_id': 0, 'value': '60-year-old', 'original_spans_outer': [[34, 45]], 'original_spans_inner': [[34, 45]], 'indexed_spans_outer': [[1656, 2057]], 'indexed_spans_inner': [[1656, 2057]], 'text_spans_outer': [[30, 41]], 'text_spans_inner': [[30, 41]]}, 'preferred_term_1': {'logical_column_id': 0, 'value': 'Adults'}, 'ontology_1': {'logical_column_id': 0, 'value': 'age'}, 'node_id_1': {'logical_column_id': 0, 'value': 'age:60y'}, 'category_1': {'logical_column_id': 1, 'value': 'Age'}, 'certainty': {'logical_column_id': 2, 'value': ''}, 'relation': {'logical_column_id': 3, 'value': ''}, 'concept_2': {'logical_column_id': 4, 'value': ''}, 'preferred_term_2': {'logical_column_id': 4, 'value': ''}, 'ontology_2': {'logical_column_id': 4, 'value': ''}, 'node_id_2': {'logical_column_id': 4, 'value': ''}, 'category_2': {'logical_column_id': 5, 'value': ''}, 'text': {'value': '... OF PRESENT ILLNESS

Now that we have got the JSON response from the Healthcare Concepts NLP API, we could convert the useful information associated with the keys into a pandas dataframe.

In [3]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step
for inner_results in results:
    for result_dict in inner_results['results']:
        df_dict = {}
        for key, value_dict in result_dict.items():
            df_dict[key] = value_dict['value']
        df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)

# Check the dataframe
df

Unnamed: 0,concept_1,preferred_term_1,ontology_1,node_id_1,category_1,certainty,relation,concept_2,preferred_term_2,ontology_2,node_id_2,category_2,text
0,60-year-old,Adults,age,age:60y,Age,,,,,,,,... OF PRESENT ILLNESS : This 60-year-old whit...
1,adrenal gland,Adrenal Glands,nlm,D000311,BodyStructure,,,,,,,,... lesion and lesions with left adrenal gland .
2,Advair,fluticasone propionate + salmeterol,nci,salmeterol_+_fluticasone_propionate,MedicationName,,,,,,,,"... Zyrtec 10 mg daily , Advair Diskus 100 / 5..."
3,Atacand,Agent Affecting Cardiovascular System,nci,C78274,MedicationClass,,,,,,,,"... metformin 500 mg b.i.d. , Atacand 16 mg da..."
4,Atacand,Candesartan Cilexetil,nci,C28903,MedicationName,,DrugDosage,16 mg daily,16 mg every 1d,measurement,dosage-g:16 mg every 1d,Dosage,"... metformin 500 mg b.i.d. , Atacand 16 mg da..."
5,back,Back,nci,C13062,BodyStructure,,,,,,,,... would lie flat on his back .
6,back,Back,nlm,D001415,BodyStructure,,,,,,,,... would lie flat on his back .
7,back,Structure of back of trunk (body structure),snomed,77568009,BodyStructure,,,,,,,,... would lie flat on his back .
8,complaint,Complaint,nci,C176714,DiseaseOrSymptom,,,,,,,,... his medical physician with a complaint of ...
9,complaint,Complaint (finding),snomed,409586006,DiseaseOrSymptom,,,,,,,,... his medical physician with a complaint of ...


### Example two: Make a request with a zip file as input

In [5]:
import os
import shutil
import zipfile
import time

# Define input zip location
input_zip = os.path.join(os.getcwd(), "demo_docs/HealthcareConcepts/HealthcareConcepts_demo.zip")

# Define a directory to extract the input zip file into
input_folder = os.path.join(os.getcwd(), "demo_docs/HealthcareConcepts/HealthcareConcepts_demo")
if os.path.isdir(input_folder):
    shutil.rmtree(input_folder)
os.mkdir(input_folder)

# Extract files from the input zip into the folder
with zipfile.ZipFile(input_zip, "r") as zip_ref:
    zip_ref.extractall(input_folder)
print(f"Documents extracted to: {input_folder}")

# Make a request with all extracted files
print("Posting text files from the zip file...")
all_results = []
for filename in os.listdir(input_folder):
    file_path = os.path.join(input_folder, filename)
    with open(file_path, "r") as file:
        print(f"Posting {filename}...")
        response = requests.post(api_marketplace_url, headers=mkp_headers, files={'source_data': file})
        
    # Poll the API until results are available
    while response.status_code == 202:
        print('Results are not available yet. Waiting 5 seconds before polling again...')
        time.sleep(5)
        run_identifier = response.json()['id']
        response = requests.get(url=f'{api_marketplace_url}/{run_identifier}', headers=headers)

    # Check the response
    if response.status_code != 200:
        raise Exception(f'Unexpected status code: {response.status_code}')
    all_results.append(response.json())

print("All done!")
print(f"All JSON responses are: {all_results}")

Documents extracted to: C:\Users\Hui.Feng\Documents\Git\api-marketplace-demo\demo_docs/HealthcareConcepts/HealthcareConcepts_demo
Posting text files from the zip file...
Posting M1.txt...
Posting M2.txt...
All done!
All JSON responses are: [[{'doc_id': 'M1', 'results': [{'concept_1': {'logical_column_id': 0, 'value': 'a consultation', 'original_spans_outer': [[1121, 1135]], 'original_spans_inner': [[1123, 1135]], 'indexed_spans_outer': [[45967, 46514]], 'indexed_spans_inner': [[46051, 46514]], 'text_spans_outer': [[8, 22]], 'text_spans_inner': [[10, 22]]}, 'preferred_term_1': {'logical_column_id': 0, 'value': 'Consultation (procedure)'}, 'ontology_1': {'logical_column_id': 0, 'value': 'snomed'}, 'node_id_1': {'logical_column_id': 0, 'value': '11429006'}, 'category_1': {'logical_column_id': 1, 'value': 'TreatmentAndProcedures'}, 'certainty': {'logical_column_id': 2, 'value': ''}, 'relation': {'logical_column_id': 3, 'value': ''}, 'concept_2': {'logical_column_id': 4, 'value': ''}, 'pref

Similar to Example one, you could convert the JSON output into a pandas dataframe.

In [7]:
import pandas as pd

# initiate an empty dataframe
df = pd.DataFrame()
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width", 1000)

# Retrieve main results from the JSON response, please note this cell would fail if the request failed in the last step
for results in all_results:
    for inner_results in results:
        for result_dict in inner_results['results']:
            df_dict = {}
            df_dict["Doc ID"] = inner_results["doc_id"]
            for key, value_dict in result_dict.items():
                df_dict[key] = value_dict['value']
            df = pd.concat([df, pd.DataFrame.from_records([{**df_dict}])], ignore_index=True)

# Check the dataframe
df.head(10)

Unnamed: 0,Doc ID,concept_1,preferred_term_1,ontology_1,node_id_1,category_1,certainty,relation,concept_2,preferred_term_2,ontology_2,node_id_2,category_2,text
0,M1,a consultation,Consultation (procedure),snomed,11429006,TreatmentAndProcedures,,,,,,,,This is a consultation for the patient in rega...
1,M1,A sterile dressing,Application of dressing (procedure),snomed,3895009,TreatmentAndProcedures,,,,,,,,A sterile dressing was applied .
2,M1,A sterile dressing,Sterilisation,meddra,10062116,TreatmentAndProcedures,,,,,,,,A sterile dressing was applied .
3,M1,anesthetic,Agent Affecting Nervous System,nci,C78272,MedicationClass,,,,,,,,Local anesthetic medication was infiltrated ar...
4,M1,anesthetic,Anesthetic Agent,nci,C245,MedicationName,,,,,,,,Local anesthetic medication was infiltrated ar...
5,M1,biopsy,Biopsy,meddra,10004720,ExaminationName,Negated,,,,,,,"The patient had no neurovascular deficits , et..."
6,M1,biopsy,Biopsy,meddra,10004720,ExaminationName,,,,,,,,A punch biopsy of the worrisome skin lesion ...
7,M1,biopsy,Biopsy,nlm,D001706_group_5_03c30202b65ddd016bc296f733d29065,ExaminationName,Negated,,,,,,,"The patient had no neurovascular deficits , et..."
8,M1,biopsy,Biopsy,nlm,D001706_group_5_03c30202b65ddd016bc296f733d29065,ExaminationName,,,,,,,,A punch biopsy of the worrisome skin lesion ...
9,M1,biopsy,Biopsy,nlm,D001706_group_8_27c3321442554cf1ea542b7093929b1b,TreatmentAndProcedures,Negated,,,,,,,"The patient had no neurovascular deficits , et..."


That's it! Hope you find this tutorial useful! Bye!