In [1]:
from tqdm import tqdm
from rag_prompt_template import *
from rag_util import *
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"






Initialise RAG pipeline
------

In [2]:
llm = init_llm_service_context(llm_model_name="../llm/Mistral-Small-Instruct-2409", 
                                tokenizer_name="../llm/Mistral-Small-Instruct-2409", 
                                embed_model_name="../llm/embedder/HiT-MiniLM-L12-SnomedCT",
                                context_window=32758,
                                max_new_tokens=2048,
                            )

Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]

In [3]:
kg_index = init_kg_storage_context(llm, storage_dir="index/snomed_dataset_nodoc_commandr_minilml6v2")

In [4]:
query_engine = init_rag_pipeline(kg_index, 
                                 similarity_top_k=5, 
                                 graph_store_query_depth=5, 
                                 include_text=False, 
                                 retriever_mode="hybrid", 
                                 verbose=True)

Simple Question-Ansuwer example
------

In [5]:
response = query_engine.query("what is the type of Gastroenteritis caused by influenza?")
display(Markdown(f"<b>{response}</b>"))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;32mExtracted keywords: ['caused', 'type', 'influenza', 'Gastroenteritis']
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;34mKG context:
The following are knowledge sequence in max depth 5 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...`
('Acute fulminating appendicitis with perforation AND peritonitis (disorder)', 'associated morphology', 'Rupture (morphologic abnormality)')
('Viral gastroenteritis (disorder)', 'associated morphology', 'Inflammatory morphology (morphologic abnormality)')
('Acute fulminating appendicitis with perforation AND peritonitis (disorder)', 'type', 'Disorder')
('Bacterial gastroenteritis (disorder)', 'type', 'Disorder')
('Viral gastroenteritis (disorder)', 'type', 'Disorder')
[0m

<b> The type of Gastroenteritis caused by influenza is not explicitly stated in the given context. The context only provides information about the types of Viral gastroenteritis and Bacterial gastroenteritis, but not specifically about Gastroenteritis caused by influenza.</b>

Triple Extraction example
------

In [6]:
text = "We report a case of fulminant hepatic failure associated with didanosine and masquerading as a surgical abdomen and compare the clinical , biologic , histologic , and ultrastructural findings with reports described previously ."

snomed_prompt = f"""\
Here is the context: {text}.\

Task: Extract the SNOMED CT triples from the given context with the format of (concept 1 ; relation ; concept 2).\

Here is the optional relation list: [temporally follows, after, due to, has realization, associated with, has definitional manifestation, 
associated finding, associated aetiologic finding, associated etiologic finding, interprets, associated morphology, causative agent, course, 
finding site, temporally related to, pathological process, direct morphology, is modification of, measures, direct substance, has active ingredient, using, part of].\

The steps are as follows:\
1. extract the concept 1 and concept 2 from the given context sentence, using the retrieved sub-graph.
2. select ONE most likely relation from the list for the extracted concepts.
3. output the triples in the format of (concept 1 ; relation ; concept 2) strictly.\
\

Provide your answer as follows:

Answer:::
Triples: (The extracted triples)\
Answer End:::\

You MUST provide values for 'Triples:' in your answer.\

"""
response = query_engine.query(snomed_prompt)
# display(Markdown(f"<b>{response}</b>"))

print("Results:")
print(extract_triple(str(response), notebook=True).replace(") (", ")\n("))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;32mExtracted keywords: ['didanosine', 'ultrastructural', 'biologic', 'abdomen', 'KEYWORDS', 'surgical', 'associated', 'hepatic', 'histologic', 'clinical', '---------------------\nKEYWORDS: didanosine', 'failure']
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;34mKG context:
The following are knowledge sequence in max depth 5 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...`
('Intentional X-ray diagnostic contrast media overdose (disorder)', 'causative agent', 'Drug or medicament (substance)')
('Three column classification system burst fracture of vertebra (disorder)', 'type', 'Disorder')
('Closed fracture acetabulum, posterior column (disorder)', 'type', 'Disorder')
('Fibrous obliteration of appendix (disorder)', 'type', 'Disorder')
('Perforation of appendix (disorder)', 'type', 'Disorder')
[0mResults:
- (Fulminant hepatic failure ; associated with ; Didanosine)   - (Fulminant hepatic failure ; associated finding ; Surgical abdomen)


In [8]:
text = "We report a case of fulminant hepatic failure associated with didanosine and masquerading as a surgical abdomen and compare the clinical , biologic , histologic , and ultrastructural findings with reports described previously ."

snomed_description_generation_prompt = f"""\
Here is the context: {text}.\

Here is the optional relation list: [temporally follows, after, due to, has realization, associated with, has definitional manifestation, 
associated finding, associated aetiologic finding, associated etiologic finding, interprets, associated morphology, causative agent, course, 
finding site, temporally related to, pathological process, direct morphology, is modification of, measures, direct substance, has active ingredient, using, part of].\

Task: Generate the SNOMED CT descriptions for the given concept.

The steps are as follows:
1. extract a CONCEPT from the given context sentence, using the retrieved sub-graph.
2. generate an EXPRESSION in human-readable phrase that can describe the CONCEPT.
3. select one most likely relation from the list between the CONCEPT and the EXPRESSION.
4. generate descriptions in the format of (CONCEPT ; relation ; EXPRESSION). Each CONCEPT may have multiple descriptions.
5. repeat the step 1 to step 4.

Provide your answer as follows:

Answer:::
Concept: 
Descriptions: (The generated descriptions)
Answer End:::\

You MUST provide values for 'Concept' and 'Description' in your answer.\

Few-shot examples:
Answer:::
Concept: apnea
Descriptions: (apnea ; interprets ; respiration observable) (apnea ; has interpretation ; absent) (apnea ; finding site ; structure of respiratory system)
Answer End:::

"""

response = query_engine.query(snomed_description_generation_prompt)
# display(Markdown(f"<b>{response}</b>"))

print(f"Results:\n{extract_triple(str(response), notebook=True, split_str1='Answer:::')}")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;32mExtracted keywords: ['didanosine', 'ultrastructural', 'biologic', 'Answer', 'abdomen', 'described\nAnswer End:::', 'described', 'reports', 'KEYWORDS', 'fulminant', 'surgical abdomen', 'surgical', 'hepatic', 'End', 'findings', 'histologic', 'Answer:::\nKEYWORDS: fulminant hepatic failure', 'clinical', 'failure']
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;34mKG context:
The following are knowledge sequence in max depth 5 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...`
('Intentional X-ray diagnostic contrast media overdose (disorder)', 'causative agent', 'Drug or medicament (substance)')
('Accidental poisoning caused by cholinergic (disorder)', 'causative agent', 'Drug or medicament (substance)')
('Drug-induced hepatic necrosis (disorder)', 'causative agent', 'Drug or medicament (substance)')
('Closed fracture acetabulum, posterior column (disorder)', 'type', 'Disorder')
('Acute mesenteric arterial occlusion (disorder)', 'type', 'Disorder')
[0mResults:
1Concept: fulminant hepatic failureDescriptions: (fulminant hepatic failure ; associated with ; didanosine) (fulminant hepatic failure ; associated with ; surgical abdomen)Answer: 2Concept: didanosineDescriptions: (didanosine ; causative agent ; fulminant hepatic failure)Answer: 3Concept: surgical abdomenDescriptions

Medical Diagnostics example
------

In [12]:
case_vignette = """
40 year old female presenting with chest pain
 Symptom: Worsening chest pain
 • Onset: 2 weeks ago
 • Associated with: Cough, dyspnea, fever
 • Complicated by: Fatigue
 Social history
 • Recent construction in Ohio
 Physical exam
 • Lungs: Wheezing
 Diagnostic: X-ray
 • Interpretation: Normal
"""

medical_diagnosis_prompt = """
Case vignette: {case_vignette}

According the given case vignette, provide only the most probable differential diagnosis, no explanation, no recapitulation of the case information or task. 
Give a maximum of 5 answers, sorted by probability of being the correct diagnosis, most probable first, remove list numbering, 
and respond with each answer on a new line. Be as concise as possible, no need to be polite.

Provide your answer as follows:

Answer:::
Diagnosis: (the 5 most probable diagnoses, most probable first)
Answer End:::\

You MUST provide values for 'Diagnosis' in your answer.\
"""

response = query_engine.query(medical_diagnosis_prompt.format(case_vignette=case_vignette))
# display(Markdown(f"<b>{response}</b>"))
print(f"Results:\n{extract_triple(str(response), notebook=True, split_str1='Answer:::')}")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;32mExtracted keywords: ['fever', 'x', 'chest', 'worsening', 'ray', 'fatigue', 'wheezing', 'cough', 'normal x-ray', 'normal', 'construction', 'Ohio', 'pain', 'dyspnea', 'chest pain']
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;34mKG context:
The following are knowledge sequence in max depth 5 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...`
('Postobstructive pneumonia (disorder)', 'due to', 'Respiratory obstruction (disorder)')
('Transient respiratory distress with sepsis (disorder)', 'type', 'Disorder')
('Aspiration pneumonia due to near drowning (disorder)', 'type', 'Disorder')
('Postoperative aspiration pneumonia (disorder)', 'type', 'Disorder')
('Pneumonia due to measles (disorder)', 'type', 'Disorder')
[0mResults:
Diagnosis: Viral pneumoniaDiagnosis: Aspiration pneumoniaDiagnosis: BronchitisDiagnosis: AsthmaDiagnosis: Pneumonia due to measles
