### code to fetch synthesized dataset

In [1]:
import json

def get_dataset(dataset_string):
  with open(dataset_string, 'r') as file:
      data = json.load(file)
  return data

### approach 1 with different models and prompts

In [None]:
from approach1 import approach1
from llms.llm_interaction import OpenAIClient, GroqClient

#load dataset
patient_records = get_dataset("datasets/patient_records2.json")

gpt4o = OpenAIClient(model="gpt-4o")
llama33 = GroqClient(model="llama-3.3-70b-versatile")

prompt_template = "Given the following patient record, identify if it matches the following semantic regex pattern: {regex} Return either true OR false, nothing else.\n\nPatient Record: {record_text}"

system_prompt = "You are a helpful AI assistant that only answers True or False based on patient data and provided semantic regex matching."

#### Approach 1: GPT-4o

In [None]:
pred, true = approach1(patient_records[0:1], gpt4o, prompt_template=prompt_template, system_prompt=system_prompt, verbose=True)

#### Approach 1: llama-3.3 (via Groq)

In [None]:
pred, true = approach1(patient_records[0:1], llama33, prompt_template=prompt_template, system_prompt=system_prompt, verbose=True)

### approach 2 with different models and prompts

In [4]:
from approach2 import approach2
from llms.llm_interaction import OpenAIClient, GroqClient

#load dataset
patient_records = get_dataset("datasets/patient_records2.json")

gpt4o = OpenAIClient(model="gpt-4o")
llama33 = GroqClient(model="llama-3.3-70b-versatile")

#instructions for the model to extract the semantic symbols from the patient record
extraction_prompt = "Given the following patient record, extract the following semantic symbols if they exist: {regex}. Return a machine parseable dictionary with <symbol>: extracted text pairs. IMPORTANT: Only return the dictionary, nothing else.\n\nPatient Record: {record_text}"

#general model instructions to better align
system_prompt = "You are a helpful AI assistant that strictly outputs a python dictionary with <symbol>: extracted text pairs, that can be parsed with ast.literal_eval()."

OpenAI client initialized with model: gpt-4o
Groq client initialized with model: llama-3.3-70b-versatile


#### Approach 2: GPT-4o

In [None]:
pred, true = approach2(patient_records[0:1], gpt4o, extraction_prompt_template=extraction_prompt, system_prompt=system_prompt, verbose=True)

#### Approach 2: llama-3.3 (via Groq)

In [6]:
pred, true = approach2(patient_records[0:5], llama33, extraction_prompt_template=extraction_prompt, system_prompt=system_prompt, verbose=True)

Processing 5 records...

Patient Record:
ADMISSION DIAGNOSIS
Pancytosis with concerns for chronic myeloproliferative disorder versus myeloproliferative neoplasm was diagnosed in an eighty-year-old female retired school teacher, presenting to the emergency department upon advice of primary care for abnormal laboratory findings discovered during evaluation of fatigue.

HISTORY OF PRESENT ILLNESS
The patient reports several weeks of progressive and severe fatigue and weight loss that made daily tasks difficult, necessitating help with managing the household from family members. The primary care provider initially attributed these findings to aging, poor diet, and deconditioning but pursued laboratory evaluations when symptom severity necessitated an ER evaluation. This led to the discovery of abnormal blood work.

PAST MEDICAL HISTORY
Relevant medical conditions include coronary artery disease managed by medications including carvedilol, a prior appendicitis treated surgically at a young 