In [1]:
import os, sys
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../src")))

from tqdm import tqdm
from rag_prompt_template import *
from rag_util import *
from rag_moduler import *
from rag_extraction import *
from format_enforcer_wrapper import *
import json

In [2]:
abstract_samples = [
    "Traditional, trans-cervical thyroidectomy results in the presence of a neck scar, which has been shown to correlate with lower quality of life and lower patient satisfaction. Transoral thyroid surgery (TOTS) has been utilized as an alternative approach to avoid a cutaneous incision and scar by accessing the neck and thyroid through the oral cavity. This study was designed to evaluate patient preference through health-state utility scores for TOTS as compared to conventional trans-cervical thyroidectomy.",

    "Cardiac sarcoidosis is an inflammatory myocardial disease of unknown etiology. It is characterized by the deposition of non-caseating granulomas that may involve any part of the heart. Cardiac sarcoidosis is often under-diagnosed or recognized partly due to the heterogeneous clinical presentation of the disease. The three most frequent clinical manifestations of cardiac sarcoidosis are atrioventricular block, ventricular arrhythmias, and heart failure. A definitive diagnosis of cardiac sarcoidosis can be made with histology findings from an endomyocardial biopsy. However, the diagnosis in the majority of cases is based on findings from the clinical presentation and advanced imaging due to the low sensitivity of endomyocardial biopsy. The Heart Rhythm Society (HRS) 2014 expert consensus statement and the Japanese Ministry of Health and Welfare criteria are the two most commonly used diagnostic criteria sets. This review article summarizes the available evidence on cardiac sarcoidosis, focusing on the diagnostic criteria and stepwise approach to its management.",

    "Tracheal resection and anastomosis surgery is a safe operation and is used to treat various benign and malignant diseases of the trachea. However, tracheal stenosis is among the main anastomotic complications following this procedure. Surgeons use both the continuous and the interrupted suture techniques for tracheal anastomosis, but contradicting results in each technique's complications have been reported in various studies. In this study, we aimed to compare the outcome of these two different suture techniques and a relevant literature review.",

    "Airborne aerosol transmission, an established mechanism of SARS-CoV-2 spread, has been successfully mitigated in the health care setting through the adoption of universal masking. Upper airway endoscopy, however, requires direct access to the face, thereby potentially exposing the clinic environment to infectious particles. This study quantifies aerosol production during rigid nasal endoscopy (RNE) and RNE with debridement (RNED) as compared with intubation, a posited gold standard aerosol-generating procedure.",

    "Adenoid cystic carcinoma (ACC) is the most common malignant neoplasm involving the lacrimal glands, with high rates of recurrence and metastasis. During the pregnancy, reports of recurrence of ACC of the salivary glands and trachea have previously been published, but no lacrimal gland ACC recurrence has been reported. We present a 35-year-old woman with lacrimal gland ACC who was initially treated by surgical resection and adjunctive radiotherapy, but her cancer recurred during pregnancy, with rapid progression to cavernous sinuses and brain. Estrogen and progesterone receptors have been detected on lacrimal glands and ACCs of salivary glands. Thus, hormonal changes during pregnancy might contribute to the recurrence of ACC. However, the inherent invasive and recurrent nature of ACC could also account for the regrowth in this patient and further molecular studies can provide more accurate explanations."
]

In [3]:
# using_llm = "ds-r1-qwen"
using_llm = "mistralsmall"
# using_llm = "mistral-ft-multitask"
using_embed = "hitsnomed"
task = "infoextraction"
eval_dataset = "multitask"
using_extractor = "None"
using_generator = "None"
using_parser = "nuparser"
import datetime
from datetime import datetime
date = datetime.now().strftime("%Y%m%d_%H%M%S")

PARAMETERS = {
    "llm_model_name": LLM[using_llm],
    "tokenizer_name": LLM["mistralsmall"],
    "embed_model_name": EMBED_MODEL[using_embed],
    "storage_dir": f"../index/snomed_dataset_nodoc_commandr_hitsnomed", # this is a partial KG indices for testing
    # "storage_dir": f"index/snomed_all_dataset_nodoc_hitsnomed", # this is a full KG indices for testing
    "input_text_dir": f"../data/humandx_data/humandx_findings.json",
    "context_window": 32768,
    "max_new_tokens": 512,
    "case_num":50,
    "verbose": True,
    "similarity_top_k": 30,
    "graph_store_query_depth": 2,
    "retriever_mode": "hybrid",
    "test_id": f"{date}_test_{task}_{using_llm}_{eval_dataset}_extractor_{using_extractor}_{date}"
}

In [4]:
llm = init_llm_service_context(llm_model_name=PARAMETERS["llm_model_name"], 
                                    tokenizer_name=PARAMETERS["tokenizer_name"], 
                                    embed_model_name=PARAMETERS["embed_model_name"],
                                    context_window=PARAMETERS["context_window"],
                                    max_new_tokens=PARAMETERS["max_new_tokens"],
                                    # quantization_config=None,
                                )



Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]

In [5]:
kg_index = init_kg_storage_context(llm, storage_dir=PARAMETERS["storage_dir"])

In [6]:
query_engine = init_rag_pipeline(kg_index, 
                                 similarity_top_k=30, 
                                 graph_store_query_depth=2, 
                                 include_text=False, 
                                 retriever_mode="hybrid", 
                                 verbose=False)

In [7]:
triple_extraction_prompt = """
Task: Extract SNOMED CT knowledge triples from the given medical text. 

Context: The input text is an excerpt from a medical document (e.g., research paper abstract, clinical note).

Extract triples in the format (entity; predicate; entity) where both entities are medical concepts, and predicate describes their semantic relationship.

Available predicate: [temporally follows, after, due to, has realization, associated with, has definitional manifestation, associated finding, associated aetiologic finding, interprets, associated morphology, causative agent, course, finding site, temporally related to, pathological process, direct morphology, is modification of, measures, direct substance, has active ingredient, using, part of, type].

This is the input: {text}

"""
for abstract in tqdm(abstract_samples[:1]):
    print(f"Processing abstract: {abstract}")
    response = query_engine.query(triple_extraction_prompt.format(text=abstract))
    # print(f"Response: {response}")
    print("Response: {}".format(response.response.replace("\n", " ")))
    print("========================================")

    my_query_engine = create_hybrid_triple_engine(query_engine, llm)
    response = my_query_engine(triple_extraction_prompt.format(text=abstract))

    triple_list = [f"({triple.subject}; {triple.predicate}; {triple.object})" for triple in response]

    print(f"extracted triples: {triple_list}")

  0%|          | 0/1 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Processing abstract: Traditional, trans-cervical thyroidectomy results in the presence of a neck scar, which has been shown to correlate with lower quality of life and lower patient satisfaction. Transoral thyroid surgery (TOTS) has been utilized as an alternative approach to avoid a cutaneous incision and scar by accessing the neck and thyroid through the oral cavity. This study was designed to evaluate patient preference through health-state utility scores for TOTS as compared to conventional trans-cervical thyroidectomy.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Response: 1. (Transoral thyroid surgery (TOTS); is modification of; Traditional, trans-cervical thyroidectomy) 2. (Transoral thyroid surgery (TOTS); associated finding; Presence of neck scar) 3. (Presence of neck scar; associated finding; Lower quality of life) 4. (Presence of neck scar; associated finding; Lower patient satisfaction) 5. (Trans-cervical thyroidectomy; associated finding; Presence of neck scar) 6. (Health-state utility scores; type; Method of measuring preference) 7. (Transoral thyroid surgery (TOTS); temporally follows; Traditional, trans-cervical thyroidectomy)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
100%|██████████| 1/1 [01:07<00:00, 67.20s/it]

extracted triples: ['(Traditional trans-cervical thyroidectomy; has definitional manifestation; Neck scar)', '(Traditional trans-cervical thyroidectomy; associated finding; Lower quality of life)', '(Traditional trans-cervical thyroidectomy; associated finding; Lower patient satisfaction)', '(Transoral thyroid surgery; course; Through oral cavity)', '(Transoral thyroid surgery; is modification of; Traditional trans-cervical thyroidectomy)']



