## Demo: Leveraging Large Language Models (LLMs) to Validate Medical Claims with PubMed Research

This notebook demonstrates a workflow for validating medical claims using Large Language Models (LLMs) alongside scientific evidence sourced from PubMed. The steps are:

1. Claim Definition: define a medical claim, provide an edge from a (knowledge) graph, or propose a hypothesis that you want to verify.

2. Evidence Retrieval: Utilize Milvus to efficiently search and retrieve relevant sentences from PubMed articles.

3. Claim Verification: Apply LLMs to assess the accuracy of the medical claim (defined in step 1) based on the retrieved evidence (from step 2).

4. Result Analysis: Present the results provided by the LLMs, and use any statistcs to interpret the results.

##### After launching the Milvus container in Docker, wait until at least **1** node is ready to load the collections.

In [23]:
from pymilvus import connections
from pymilvus import utility

connections.connect(
  alias="default",
  uri="http://localhost:19530",
  token="root:Milvus",
)
info = utility.describe_resource_group(name='__default_resource_group')
num_available_node = info.num_available_node
print(f"Node avialability: {num_available_node}")

Node avialability: 1


##### If you see at least "1" printed from the cell above, you may continue running the following cells. Otherwise, wait a moment and rerun the cell above until at least "1" appears.

In [36]:
from pymilvus import MilvusClient
import time
import requests
import json
from utils import extract_non_think, generate_claim, embed_sentence

##### The following cell defines:

1. The medical claim or edge you'd like to verify.
2. The Milvus collection name (pubmed_sentence_XX, where XX ranges from 00 to 09) to search for supporting evidence.
3. The Large Language Model(s) you'd like to use for claim verification.

In [41]:
#  provide an edge with the following format
edge = {'subject': 'ginger',
        'object': 'nausea',
        'predicate': 'Biolink:treats'}
claim = generate_claim(edge)

# or provide a claim which is a sentence
# claim = 'ginger treats nausea'

# vectorize the claim for semantic search conducted in the following step
claim_vector = embed_sentence(claim)

which_collection = 'pubmed_sentence_00'
LLMs = ['phi4', 'gemma3:4b', 'deepseek-r1:8b', 'llama3.1:8b', 'mistral:7b']

##### Connect to the milvus-standalone container and load the collection specified above by which_collection. (This might take a while - about 1 minute)

In [43]:
client = MilvusClient(
    uri="http://localhost:19530",
    token="root:Milvus"
)

client.load_collection(
    collection_name=which_collection,
    # replica_number=1
)

##### Perform a semantic search using your claim to retrieve relevant sentences from the Milvus collection with `client.search()`.

In [47]:
# semantic search
res = client.search(
    collection_name=which_collection,  # target collection
    data=claim_vector,  # query vectors
    limit=30,  # number of returned entities
    search_params={
        # highlight-start
        "params": {
            "radius": 0.75,
            "range_filter": 1.0
        }
        # highlight-end
    },
    output_fields=["sentence", "pmid"],  # specifies fields to be returned
)
pmids = set([i['entity']['pmid'] for i in res[0]])
context = [i['entity']['sentence'] for i in res[0]]
print(f"{len(context)} relevant sentences were retrieved from a subset of PubMed.")

15 relevant sentences were retrieved from a subset of PubMed.


##### Then, generate a prompt using these retrieved sentences.

In [49]:
prompt = f"""Claim: {claim}
Context:
{"\n".join(context)}
Question: Does the context support the claim? ***Just return Yes or No.***
"""
print(f"The prompt will be used for LLMs queries -\n{prompt}")

The prompt will be used for LLMs queries -
Claim: ginger treats nausea
Context:
Ginger has been used to treat numerous types of nausea and vomiting.
It is concluded that the efficiency of ginger in reducing nausea and vomiting may be based on a weak inhibitory effect of gingerols and shogaols at M (3) and 5-HT (3) receptors.
Ginger has also been studied for its efficacy for acute chemotherapy-induced nausea and vomiting (CINV).
Ginger has been used in postoperative and pregnancy-induced nausea and vomiting.
The aim of this study was to determine the effects of ginger in nausea and vomiting of pregnancy.
Ginger is efficacious for nausea and vomiting in pregnancy but is limited in its safety data.
Phase II trial of encapsulated ginger as a treatment for chemotherapy-induced nausea and vomiting.
Scientific studies suggest that ginger (Zingiber officinale) might have beneficial effects on nausea and vomiting associated with motion sickness, surgery, and pregnancy.
Protein and ginger for th

##### Finally, query the LLM(s) you specified earlier, collect their responses, and record the results for statistical analysis.

In [53]:
LLM_url = "http://localhost:11434/api/generate"
headers = {
    "Content-Type": "application/json"
}
results = []
responses = []
for LLM in LLMs:
    data = {
        "model": LLM,
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(LLM_url, headers=headers, data=json.dumps(data))
    if response.status_code == 200:
        response_text = response.text
        data = json.loads(response_text)
        actual_resonse = data['response'].strip()
        # print(actual_resonse)
        if LLM == 'deepseek-r1:8b':
            actual_resonse = extract_non_think(actual_resonse)
        responses.append(f"{LLM}: {actual_resonse}")
        if actual_resonse[:3].lower() == 'yes':
            results.append(1)
        elif actual_resonse[:2].lower() == 'no':
            results.append(0)
        else:
            print(f"Error: not a proper answer from {LLM}", actual_resonse)
    else:
        print("Error", LLM, response.status_code, response.text)

score = sum(results)/len(results)
print(f"There were {len(results)} LLMs were queried and returning responses -\n{"\n".join(responses)}.\nThe confident score for this edge being correct is {score},\nwith the evidences {pmids}")


There were 5 LLMs were queried and returning responses -
phi4: Yes. The context provides multiple references to studies and trials that indicate ginger is used to treat various types of nausea and vomiting, including chemotherapy-induced nausea, postoperative nausea, motion sickness, pregnancy-related nausea, and more. This supports the claim that ginger treats nausea.
gemma3:4b: Yes
deepseek-r1:8b: Yes.
llama3.1:8b: Yes
mistral:7b: Yes.
The confident score for this edge being correct is 1.0,
with the evidences {20842754, 21305447, 20041096, 18403946, 18632524, 20436140, 20193490, 19250006, 19005687, 22060218, 18537470}


##### (Optional: Release the collection to optimize RAM usage.)

In [55]:
client.release_collection(
    collection_name=which_collection
)

res = client.get_load_state(
    collection_name=which_collection
)

print(res)

{'state': <LoadState: NotLoad>}
