** note you need to restart kernel in order to register changes you made to modules (approach1 and approach2)

In [1]:
import json

def get_dataset(dataset_string):
  with open(dataset_string, 'r') as file:
      data = json.load(file)
  return data

### sample code to test different approaches w/ different models, also different iterations of synthetic dataset

In [2]:

from approaches.approach1 import approach1
from approaches.approach2 import approach2
from llms.llm_interaction import GroqClient

patient_records2 = get_dataset('datasets/patient_records2.json')
llm_client = GroqClient(model="llama-3.3-70b-versatile")

# results = approach1(patient_records2[0:5], llm_client)
results = approach2(patient_records2[0:5], llm_client)

Groq client initialized with model: llama-3.3-70b-versatile


Processing records:   0%|          | 0/5 [00:00<?, ?record/s]

### playing around with prompt templates and dif models

In [1]:
from llms.llm_interaction import OpenAIClient, AnthropicClient

gpt4o = OpenAIClient(model="gpt-4o")
sonnet = AnthropicClient(model="claude-3-5-sonnet-20240620")
versatile = GroqClient(model="llama-3.3-70b-versatile")

prompt_template = "Check if this semantic regex - {regex} - exists in the following patient record - {record_text}."
system_prompt = "You are a helpful AI assistant that only answers True or False based on patient data and provided semantic regex matching."

results = approach1(patient_records2[0:5], gpt4o, prompt_template=prompt_template, system_prompt=system_prompt)
print("="*100)
results = approach1(patient_records2[0:5], sonnet, prompt_template=prompt_template, system_prompt=system_prompt)
print("="*100)
results = approach1(patient_records2[0:5], versatile, prompt_template=prompt_template, system_prompt=system_prompt)


ValueError: OPENAI_API_KEY not found in environment variables

## naive approach 1 (no semantic regex, just nl query)

In [3]:
from approaches.approach1 import approach1_naive
from llms.llm_interaction import OpenAIClient, GroqClient

gpt4o = OpenAIClient(model="gpt-4o")
versatile = GroqClient(model="llama-3.3-70b-versatile")

patient_records2_nl_query = get_dataset('datasets/patient_records2_nl_query.json')

results = approach1_naive(patient_records2_nl_query[0:5], gpt4o)
# pred, true = approach1_naive(patient_records2[0:5], sonnet)
results = approach1_naive(patient_records2_nl_query[0:5], versatile)

OpenAI client initialized with model: gpt-4o
Groq client initialized with model: llama-3.3-70b-versatile


Processing records: 100%|██████████| 5/5 [00:03<00:00,  1.43record/s]


accuracy of model over 5 generated records: 1.0
precision of model over 5 generated records: 1.0
recall of model over 5 generated records: 1.0
f1 of model over 5 generated records: 1.0


Processing records: 100%|██████████| 5/5 [00:01<00:00,  2.64record/s]

accuracy of model over 5 generated records: 1.0
precision of model over 5 generated records: 1.0
recall of model over 5 generated records: 1.0
f1 of model over 5 generated records: 1.0





In [None]:
from approaches.approach2 import approach2
from llms.llm_interaction import OpenAIClient, GroqClient

patient_records2 = get_dataset('datasets/patient_records2.json')
llm_client = GroqClient(model="llama-3.3-70b-versatile")


Groq client initialized with model: llama-3.3-70b-versatile
Processing 1 records...

Patient Record:
A 62-year-old former coal miner was admitted with shortness of breath, which initially presented as mild exertional dyspnea but progressively worsened over several weeks. The admission diagnosis was acute exacerbation of chronic obstructive pulmonary disease, compounded by pulmonary fibrosis likely secondary to long-standing occupational exposure to dust.

The history of present illness began approximately three months prior when the patient first started noticing breathing difficulties during his daily walks. These symptoms gradually worsened, despite his efforts to manage them with over-the-counter medication. Recently, he had a fever andproductive cough, further increasing the severity of his condition.

Past medical history was notable for chronic obstructive pulmonary disease, hypertension, hyperlipidemia, and lung cancer, for which he had undergone chemotherapy two years prior. No

In [13]:
system_prompt = """
        You are a helpful AI assistant that strictly outputs a python list of tuples, where each tuple is (<semantic_symbol>, explanation) in the order they appear in the patient record.
        The output should be parseable with ast.literal_eval().
        """
prompt = """
            Given the following patient record, extract the following semantic symbols if they exist: {regex}. 
            Return a machine parseable python list of tuples, where each tuple is (<semantic_symbol>, explanation) in the order they appear in the patient record. Only include semantic symbols that are explicitly represented in the patient record. 
            IMPORTANT: Only return the list, nothing else.
            Make sure the order of the list reflects the order in which the semantic symbols appear in the patient record, not the order in which they are listed in the regex.
            The explanation should be a brief description of where/how the symbol appears in the text.
            \n\nPatient Record: {record_text}
            """
results = approach2(patient_records2[3:4], llm_client, verbose=True, order_sensitive=True, system_prompt=system_prompt, extraction_prompt_template=prompt)

Processing 1 records...

Patient Record:
ADMISSION DIAGNOSIS
The patient is a 62-year-old machinist admitted with acute kidney injury and severe anemia, likely due to complications from a recent traumatic injury. The injury occurred when the patient was involved in a car accident on his way home from work.

HISTORY OF PRESENT ILLNESS
The patient reported experiencing sudden onset abdominal pain and vomiting about six hours prior to presentation, prompting his wife to call emergency services. On route to the hospital, the patient became hypotensive but responded well to fluid resuscitation administered by paramedics. 

PAST MEDICAL HISTORY
The patient has a lengthy medical history including hypertension, heart disease, hyperlipidemia, and a previous transfusion for gastrointestinal bleeding about ten years ago due to an ulcer. He takes atorvastatin and beta blockers for his cardiac condition and is known to occasionally miss taking his atenolol, leading to episodes of poorly controlled 