# ☁️ Healthcare Natural Language API - (PoC)

⚠️ **Important Note:** The medical note included in this repository is entirely fictitious. It was generated for a fictional healthcare professional and does not belong to any real patient. This data is solely for testing and demonstration purposes.

## Project Overview

This project is a Proof of Concept (PoC) using Google Cloud's Healthcare Natural Language API. The goal is to test the API's ability to extract medical entities and relationships from unstructured clinical notes, leveraging pre-trained natural language models.

## About Healthcare Natural Language API

The Healthcare Natural Language API is part of the Google Cloud Healthcare API. It uses natural language processing (NLP) models to extract healthcare-related information from medical text.

### Key Features

The API can identify and extract:

- 🏥 **Medical concepts** such as medications, procedures, and health conditions.
- 📅 **Functional attributes** like temporal relationships, subjects, and certainty assessments.
- 🔗 **Relationships** between entities, such as side effects or drug dosages.

### Core Functionality

In this tutorial, we will focus on the following function:

- **Entity Analysis**: The `analyzeEntities` method inspects medical text to detect and return medical concepts and their relationships.

## Prerequisites

Before running this PoC, ensure you have completed the following steps:

1. ✅ **Google Cloud account**: You must have a Google Cloud account set up.
2. 🌐 **Enable APIs**: Ensure the Cloud Healthcare API and Healthcare Natural Language API are enabled.
3. 🛠️ **Install Google Cloud CLI (gcloud)**: Download and install the Google Cloud CLI.
4. 📄 **Create a `.env` file**: This file will store the necessary environment variables.

### Create a `.env` file

To configure the environment variables, create a `.env` file in your project directory and add the following content:

```
PROJECT_ID = "project_name"
LOCATION = "location_name"
TOKEN = "token_value"
```

### How to obtain the token value

Follow these steps to authenticate and get your access token:

1. **Authenticate with Google Cloud** by running the following command in your terminal:

```
gcloud auth login
```

2. **Get the access token** by running:

```
gcloud auth print-access-token
```

Copy the token value and paste it into the `.env` file under the `TOKEN` variable.

## Goals

- Test entity extraction from clinical notes.
- Evaluate how well the API identifies medical concepts and maps relationships.
- Understand the output format and how to integrate the extracted data into a larger cloud architecture.

## Next Steps

1. Set up Google Cloud Healthcare API.
2. Enable the Healthcare Natural Language API.
3. Implement entity analysis using the `analyzeEntities` method.
4. Analyze the results and review entity mapping.




In [16]:
# import libraries
import requests
import json
import pandas as pd 
from dotenv import load_dotenv
import os
load_dotenv()

True

In [29]:
# get environment variables
TOKEN = os.getenv("TOKEN")
LOCATION = os.getenv("LOCATION")
PROJECT_ID = os.getenv("PROJECT_ID")
NLP_SERVICE = 'analyzeEntities'
URL = f"https://healthcare.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/services/nlp:{NLP_SERVICE}"


In [33]:
# Read txt file
def read_txt(file):
    with open(file, 'r') as f:
        data = f.read()
    return data

def get_entities(note:str):
    # set headers
    request_data = { "documentContent": note}
    headers = {
        "Authorization": f"Bearer {TOKEN}",
        "Content-Type": "application/json"
    }
    response = requests.post(URL, headers=headers, json=request_data)
    if response.status_code == 200:
        return response.json()
    else:
        return None
    
def get_entities_df(note):
    entities = get_entities(note)
    return pd.DataFrame(entities['entityMentions'])

note  = read_txt('data/note_es.txt')
entities = get_entities_df(note)

## Understanding API Response
The `analyzeEntities method` returns a structured response with the following key fields:
* **`mentionId`:** A unique identifier for each detected medical entity mention in the text.
* **`type:`** The type of medical entity (e.g., Medication, Procedure, Condition).
* **`text:`** The exact text fragment where the entity was found.
* **`temporalAssessment:`** Information about temporal relationships (e.g., “history of diabetes” indicates something from the past).
* **`certaintyAssessment:`** Assesses the certainty level of a condition (e.g., "probable pneumonia" or "confirmed").
* **`subject:`** The person related to the entity, usually Patient, but could also be Family Member.
* **`confidence:`** A score between 0 and 1 indicating the model's confidence in the identified entity.
* **`linkedEntities:`** Additional related entities, such as SNOMED CT or RxNorm codes.
* **`entityMentions:`** Groups all mentions of the same entity, useful if a term appears multiple times in the text.

In [34]:
entities


Unnamed: 0,mentionId,type,text,temporalAssessment,certaintyAssessment,subject,confidence,linkedEntities
0,1,MEDICINE,"{'content': 'Fatiga', 'beginOffset': 103}","{'value': 'CURRENT', 'confidence': 0.981151938...","{'value': 'LIKELY', 'confidence': 0.9993194341...","{'value': 'PATIENT', 'confidence': 0.999646961...",0.562996,
1,2,MEDICINE,"{'content': 'Sobrepeso', 'beginOffset': 203}","{'value': 'CURRENT', 'confidence': 0.997980058...","{'value': 'LIKELY', 'confidence': 0.9992980360...","{'value': 'PATIENT', 'confidence': 0.999646961...",0.682784,
2,3,BM_VALUE,"{'content': '29.5', 'beginOffset': 219}",,,,0.524036,
3,4,PROBLEM,"{'content': 'Diabetes Mellitus', 'beginOffset'...","{'value': 'CURRENT', 'confidence': 0.999508738...","{'value': 'LIKELY', 'confidence': 0.9994295835...","{'value': 'PATIENT', 'confidence': 0.999646961...",0.888809,[{'entityId': 'UMLS/C0011849'}]
4,5,PROBLEM,"{'content': 'hiperglucemia', 'beginOffset': 304}","{'value': 'CURRENT', 'confidence': 0.962841928...","{'value': 'LIKELY', 'confidence': 0.9762089848...","{'value': 'PATIENT', 'confidence': 0.999479115...",0.727779,
...,...,...,...,...,...,...,...,...
88,89,MEDICINE,"{'content': 'enfatizando', 'beginOffset': 2661}","{'value': 'CURRENT', 'confidence': 0.845963299...","{'value': 'LIKELY', 'confidence': 0.7537283301...","{'value': 'PATIENT', 'confidence': 0.994117796...",0.434143,
89,90,MEDICINE,"{'content': 'cambios', 'beginOffset': 2673}","{'value': 'UPCOMING', 'confidence': 0.69364482...","{'value': 'CONDITIONAL', 'confidence': 0.78589...","{'value': 'OTHER', 'confidence': 0.78242671489...",0.545428,
90,91,MEDICINE,"{'content': 'optimizar', 'beginOffset': 2752}","{'value': 'CURRENT', 'confidence': 0.654398858...","{'value': 'LIKELY', 'confidence': 0.6787457466...","{'value': 'PATIENT', 'confidence': 0.892929852...",0.792120,
91,92,MEDICINE,"{'content': 'metabólico', 'beginOffset': 2773}","{'value': 'CURRENT', 'confidence': 0.717098295...","{'value': 'LIKELY', 'confidence': 0.5973162055...","{'value': 'PATIENT', 'confidence': 0.946841359...",0.426089,
