<a href="https://colab.research.google.com/github/ganeshraju/Aadhar-uidaiBenchmark/blob/master/LLM_and_Healthcare_NLP_tasks_GCP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**This notebook demonstrates toy examples of LLM Use cases in Healthcare using  Google PaLM API - GCP. This is an experiment/prototype of ideas not a formal evaluation of the APIs or production code**

# Install Libraries

In [None]:
!pip install openai
!pip install python-dotenv
!pip install --upgrade langchain
!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
!pip install --upgrade --user google-cloud-aiplatform \
                              google-cloud-storage \
                             'google-cloud-bigquery[pandas]' \
                              redis \
                              scann

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp (from openai)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
  Downloadin

#Setup

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [None]:
#Autneticate notebook environment. Required for Google Cloud
import sys
if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

In [None]:
# GCP Project Configs

PROJECT_ID = "useful-maxim-378604"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

REGION = "us-central1"   # @param {type: "string"}

LOCATION = "US"  # @param {type: "string"}  BigQuery location

Updated property [core/project].


In [None]:
#Import Libraries

from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table
client = bigquery.Client(project=PROJECT_ID, location=LOCATION)
data_table.enable_dataframe_formatter()

# You may not need all the data sources . Choose whatever data source you want to use

import vertexai
from vertexai.preview.language_models import TextGenerationModel,\
                                            ChatModel,\
                                            InputOutputTextPair,\
                                            TextEmbeddingModel

from IPython.display import display, Markdown
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import seaborn as sns

In [None]:
# Initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=REGION)

In [None]:
completions_model = TextGenerationModel.from_pretrained("text-bison@001")
embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")

In [None]:
# Helper function to call the model API
# For Healthcare NLP tasks you may run into token limit as clinical notes are long.

def get_completion(prompt, model, temperature, max_output_tokens):
    response = model.predict(
        prompt=prompt,
        temperature= temperature,
        max_output_tokens = max_output_tokens,
    )
    content = response.text
    return content

# Use Case #1 - NLP entity and context extractions

# Chain of Thought Reasoning to provide additional context in the NLP extractions

In [None]:
text = f"""
 The patient is a 17-year-old female, who presents to the emergency room with foreign body and airway compromise and was taken to the operating room.  She was intubated and fishbone.
 """

In [None]:
context = f"""
You are a Healthcare AI Assistant helping to extract entities and context from clinical text using the guidance stated below:
1 - Entity Recognition: You will have entity categories: Person, Location, Age,Gender, Disease/Problem, Anatomical Structure,Symptoms,Procedure, Medications,\
Medical Devices, Lab Test, Substance Abuse, Social Determinants.

2 - Entity Assertions: Probability of Assertions in extracted entities.Classify the assertions made on given medical concepts as being present, absent,
or possible in the patient, conditionally present in the patient under certain circumstances, hypothetically present in the patient at some future point,\
and mentioned in the patient report but associated with someone other than the patient.In addition, perform Subject Asssessment. Differentiate between \
"Patient" Vs. " Family Member" in the text description.
For example: "John's father has diabetes". Attach "diabetes" to assertion status: Family Member'
Assertion Status: 1. Present, 2. Absent, 3. Possible, 4. Hypothetical, 5. Conditional, 6. Family

3 - Temporal Assessment: Extract Date or Temporality of the entity and use these categorization:
    Extract Actual Date if date is available in the text.
In case where date is not available, assess temporality and categorize as:
1. Current
2. History
3. Actual Date if date is available in the text

Input Data: A 60 year old male with a history of type-2 diabetes, diagnosed 10 years ago, takes 500 mg metformmin.
Output:
{
    {
      "Name": "60-year-old",
      "Category": "Demographic Entity",
      "Assertion Status": "Present",
      "Temporality": "Current"
    },
    {
      "Name": "male",
      "Category": "Demographic Entity",
      "Assertion Status": "Present",
      "Temporality": "Current"
    },
    {
      "Name": "type-2 diabetes",
      "Category": "Disease/Problem",
      "Assertion Status": "Present",
      "Temporality": "10 years ago"
    },
    {
      "Name": "metformin",
      "Category": "medication",
      "Assertion Status": "Present",
      "Temporality": "Current"
    },
}
"""

# Prompt

In [None]:
# Initialize a prompt.
prompt = f"""
Perform the following actions. Lets think step by step and use the guidance provided. Take your time and try to answer accurately.
1 -  Extract entity name, category, assertion status, and temporality
2 - Output the json object that contains the following keys: entity name, category, assertion status, temporality.
If the question cannot be answered using the information provided answer with “No entities found”
Context:{context}
text: {text}
"""

# System Response: NLP Extraction

In [None]:
temperature = 0.2  # To make the output more deterministic
max
assistant_response = get_completion(prompt, completions_model, temperature, 1024)
print(assistant_response)

Output:
({'Name': '17-year-old', 'Category': 'Demographic Entity', 'Assertion Status': 'Present', 'Temporality': 'Current'}, {'Name': 'female', 'Category': 'Demographic Entity', 'Assertion Status': 'Present', 'Temporality': 'Current'}, {'Name': 'foreign body', 'Category': 'Symptom', 'Assertion Status': 'Present', 'Temporality': 'Current'}, {'Name': 'airway compromise', 'Category': 'Symptom', 'Assertion Status': 'Present', 'Temporality': 'Current'}, {'Name': 'operating room', 'Category': 'Location', 'Assertion Status': 'Present', 'Temporality': 'Current'}, {'Name': 'intubated', 'Category': 'Procedure', 'Assertion Status': 'Present', 'Temporality': 'Current'}, {'Name': 'fishbone', 'Category': 'Anatomical Structure', 'Assertion Status': 'Present', 'Temporality': 'Current'})


# Use Case #2: Write SOAP Notes from Patient-Clinician Conversation

In [None]:
# Set the context

context = f"""
You are a Healthcare AI Assistant helping to write SOAP Notes from the conversation text.
"""

In [None]:
# Input Data

conversation_1 = f"""
Hi, how's it going?    I'm not feeling well today. I have some abdominal pain.     I'm sorry to hear that.  Can you tell me a little bit about that? Yes, I have had some pain for the last two weeks, in the mid abdomen, going to the lower abdomen. Have you had any nausea or vomiting or diarrhea? Yes, I've had some diarrhea. Anybody else at home that sick? Well, my husband and my son are also sick with some diarrhea and abdominal pain. Do you have any fevers? No, I don't have any fevers or chills. OK let's take a look and examine you.
Well, I think you might have some gastroenteritis,  And infection of the abdomen just caused by food poisoning. I think if you drink plenty of water, and stick to a brat diet, it should pass. However, if it still lingers after several days, I think we need to run some tests. How does that sound? Thank you doctor, that sounds like a plan .
"""

In [None]:
prompt = f"""
Perform the following actions. Lets think step by step and use the guidance provided. Take your time and try to answer accurately.
1 - Create SOAP Note from Patient-docto conversation
context: {context}
text: {conversation_1}
"""

In [None]:
temperature = 0.2  # Allow for little creativity
assistant_response = get_completion(prompt, completions_model, temperature, 1024)
print(assistant_response)

**S**ubjective: Patient presents with a 2-week history of abdominal pain. The pain is located in the mid-abdomen and radiates to the lower abdomen. The pain is worse after eating and is associated with nausea, vomiting, and diarrhea. The patient denies fevers or chills.

**O**bjective: Vital signs are within normal limits. Abdominal exam reveals tenderness in the mid-abdomen and lower abdomen. There is no rebound tenderness or guarding.

**A**ssessment: Gastroenteritis

**P**lan: Patient is advised to drink plenty of fluids and stick to a BRAT diet (bananas, rice, applesauce, and toast). The patient is also given a prescription for loperamide to help with the diarrhea. The patient is instructed to follow up with the doctor if the symptoms do not improve after several days.


# Use Case # 3: Summarize a Biomedical Research Article

In [None]:
abstract = """
Sepsis-associated acute kidney injury (S-AKI) is a frequent complication of the critically ill patient and is associated with unacceptable morbidity and mortality.\
Prevention of S-AKI is difficult because by the time patients seek medical attention, most have already developed acute kidney injury. Thus, early recognition is crucial \
to provide supportive treatment and limit further insults. Current diagnostic criteria for acute kidney injury has limited early detection; however, novel biomarkers \
of kidney stress and damage have been recently validated for risk prediction and early diagnosis of acute kidney injury in the setting of sepsis. Recent evidence shows \
that microvascular dysfunction, inflammation, and metabolic reprogramming are 3 fundamental mechanisms that may play a role in the development of S-AKI. \
However, more mechanistic studies are needed to better understand the convoluted pathophysiology of S-AKI and to translate these findings into potential treatment strategies \
and add to the promising pharmacologic approaches being developed and tested in clinical trials.
"""

In [None]:
# Set the context
context = f"""
You are an Healthcare AI Assistant helping to summarize biomedical artciles related to Acute Kidney Injury.
"""

In [None]:
prompt = f"""
Your task is to generate a short summary of a medical article \
abstract from pubmed site.
1 - Summarize the abstract below,in at most 20 words focusing on AKI prediction and treatment options.
2 - List the major topics discussed in the article. Classify the topics into:
    a. disease, b. symptom, c. procedure, d.medication.
article:{abstract}
context: {context}
"""

In [None]:
temperature = 0.2  # Allow for little creativity
assistant_response = get_completion(prompt, completions_model, temperature, 1024)
print(assistant_response)

1 - Sepsis-associated acute kidney injury is a frequent complication of the critically ill patient. Novel biomarkers and therapeutic strategies are being developed to prevent and treat S-AKI.
2 - Topics:
    a. Disease: Acute kidney injury, sepsis
    b. Symptom: Acute kidney injury
    c. Procedure: Biomarkers
    d. Medication: None


# Use Case # 4: Summarize a Patient's Medical Record

In [None]:
clinical_text = f"""
Plan:
CKD (Serology): due to likely hyperfiltration syndrome (patient drinking 2 gallons of water),- improved creatinine
-I warned the patient about hyperfiltration syndrome.  The patient is drinking way too much liquid, and may be causing worsening of his renal function, because of hyperfiltration syndrome.
- he needs to limit his fluid intake to 2L, a bit more if he exercises and sweats
- his blood pressure and blood sugar are decently controlled now

Proteinuria:   well controlled
- protein restriction discussed
- on ACE inhibitor or angiotensin receptor blocker: yes

Blood Pressure:  well controlled
- low salt diet
- current treatment plan is effective, no change in therapy Diabetes/[Synop]: stable Continue current plan. Review blood glucose monitoring. Discuss risks of poor blood glucose control. Review carbohydrate-controlled diet.

Preventative: HMM  Care Gap SS  Check labs now Follow up with me in 3 months
Cc to Medical assistant
=============================================================  SUBJECTIVE

John Smith is a male with diabetes mellitus 2, hypertension, bph, hyperlipidemia, aaa, high bmi, CAD/CABG, AF on coumadin, copd, anemia, osa, mdd, gerd, moderate chronic renal failure for month(s) who comes to see me for a follow up visit.  Patient says that he drinks about 2 gallons a day. Because his mouth is really dry.

Lab Results  Component
Value
Date  Creatinine
2.25 (H)
04/11/2023  Creatinine
2.08 (H)
04/02/2023  Creatinine
2.34 (H)
04/01/2023   ROS: + frequency - urgency + dysuria - hematuria - skin changes/rash + joint pains; takes tylenol  - sinus problems - epistaxis - cough with blood + stone history; many years ago x 4, last 1969 + urinary hesitancy + nocturia: 2 times a nigth  + leg edema - little in the left leg  - NSAID use    Reviewed: medical history with no changes 5/12/2023, social history with no changes  5/12/2023 and family history with no changes 5/12/2023      PHYSICAL EXAM
     BP Readings from Last 3 Encounters:
05/12/23  103/55
04/07/23  101/54
04/02/23  132/72
     Pulse Readings from Last 3 Encounters:
05/12/23  80
04/07/23  69
04/02/23  72
     Wt Readings from Last 3 Encounters:
05/12/23  116.9 kg (257 lb 11.2 oz)
04/07/23  115.7 kg (255 lb)
04/02/23  123.8 kg (273 lb)
     BMI Readings from Last 3 Encounters:
05/12/23  39.18 kg/m²
04/07/23  38.77 kg/m²
04/02/23  41.51 kg/m²
  General appearance - oriented to person, place, and time Chest - clear Heart - S1 and S2 normal Abdomen - soft Extremities - pedal edema: 0 +     RESULTS
Data Reviewed:  Reviewed lab results: Renal:       Lab Results
Component  Value  Date
   Estimated Glomerular Filtration Rate  29 (L)  04/11/2023
   Estimated Glomerular Filtration Rate  24 (L)  03/31/2023
   Estimated Glomerular Filtration Rate  35 (L)  11/23/2022
         Lab Results
Component  Value  Date
   Creatinine  2.25 (H)  04/11/2023
   Creatinine  2.08 (H)  04/02/2023
   Creatinine  2.34 (H)  04/01/2023
   Creatinine  2.63 (H)  03/31/2023
   Creatinine  1.94 (H)  11/23/2022
         Lab Results
Component  Value  Date
   BUN  29 (H)  04/02/2023
      Anemia:       Lab Results
Component  Value  Date
   Hgb  11.5 (L)  04/02/2023
   Hgb  11.5 (L)  04/01/2023
   Hematocrit  37.1 (L)  04/02/2023
   Hematocrit  36.3 (L)  04/01/2023
   MCV  90  04/02/2023
   MCV  89  04/01/2023
   Transferrin % saturation  37  11/23/2022
   Transferrin % saturation  7 (L)  11/22/2016
    Bone:       Lab Results
Component  Value  Date
   Calcium  8.7 (L)  03/31/2023
         Lab Results
Component  Value  Date
   Phosphorus  3.5  03/31/2023
   No results found for: PTHINTACT No results found for: VITD25    Potassium:       Lab Results
Component  Value  Date
   Potassium  5.0  04/11/2023
   Potassium  4.5  04/02/2023
   Potassium  4.2  04/01/2023
    Protein:      Lab Results
"""

In [None]:
# Set the context
context = f"""
You are an Healthcare AI Assistant helping to summarize patient's situation from medical history.
"""

In [None]:
prompt = f"""
Your task is to generate a short summary of patient's medical record \
medical record:{clinical_text}
context:{context}
"""

In [None]:
temperature = 0.2  # Allow for little creativity
assistant_response = get_completion(prompt, completions_model, temperature, 1024)
print(assistant_response)

The patient is a 72-year-old male with a history of diabetes mellitus type 2, hypertension, bph, hyperlipidemia, aaa, high bmi, cad/cabg, af on coumadin, copd, anemia, osa, mdd, gerd, moderate chronic renal failure for month(s) who comes to see me for a follow up visit.  Patient says that he drinks about 2 gallons a day. Because his mouth is really dry. 

The patient's blood pressure is well controlled. His blood sugar is also well controlled. His creatinine is elevated, but it is stable. His potassium is also elevated, but it is stable. His hemoglobin is low, but it is stable. His hematocrit is low, but it is stable. His mcv is low, but it is stable. His transferrin % saturation is low, but it is stable. His calcium is normal. His phosphorus is normal. His potassium is normal. His protein is normal.


# Evaluate the LLM's answer based on "expert" human generated answer. Work in Progress.

In [None]:
# This is a evaluation framework, I still have to work with MIMIC data to create the test.
# Source https://github.com/openai/evals/blob/main/evals/registry/modelgraded/fact.yaml

def eval_with_ideal(test_set, assistant_answer):

    test_input = test_set['note']
    test_output = test_set['entity_list']
    llm_answer = assistant_answer

    system_message = """\
    You are an assistant that evaluates how well the data processing assistant \
    extracts entities by looking at the context that the customer service \
    agent is using to generate its response.
    """

    user_message = f"""\
You are evaluating a submitted answer to a question based on the context \
that the agent uses to answer the question.
Here is the data:
    [BEGIN DATA]
    ************
    [Input]: {test_input}
    ************
    [Expert Output]: {test_output}
    ************
    [Submission]: {llm_answer}
    ************
    [END DATA]

Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
    The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
    (A) The submitted answer is a subset of the expert answer and is fully consistent with it.
    (B) The submitted answer is a superset of the expert answer and is fully consistent with it.
    (C) The submitted answer contains all the same details as the expert answer.
    (D) There is a disagreement between the submitted answer and the expert answer.
    (E) The answers differ, but these differences don't matter from the perspective of factuality.
  choice_strings: ABCDE
"""

    messages = [
        {'role': 'system', 'content': system_message},
        {'role': 'user', 'content': user_message}
    ]

    response = get_completion(messages, completions_model, temperature=0, max_tokens=4096)
    return response