## Hospital Resource Management - Entity Extraction

**Course**: Knowledge Graphs with Large Language Models  
**Program**: MSc in AI and Data Science, 2025-2026  
**Instructor**: Panos Alexopoulos

---

## Overview

This notebook implements an LLM-based entity extraction system for populating a Hospital Resource Management knowledge graph.

**Target Entities:**
1. **Equipment** - Medical devices and equipment
2. **Department** - Hospital departments and clinical units

**Tasks:**
- Task 1: Entity Extractor Development
- Task 2: Extractor Evaluation - Precision & Recall
- Task 3: LLM-as-a-Judge Evaluator

## Setup

In [15]:
!pip install openai python-dotenv pandas scikit-learn -q

In [16]:
import os
import json
import pandas as pd
from typing import List, Dict, Tuple
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
print("✓ Setup complete")

✓ Setup complete


## Task 1: Entity Extractor

In [17]:
def create_extraction_prompt(text: str) -> str:
    """Create few-shot prompt for entity extraction"""
    return f"""You are an expert in medical entity extraction for knowledge graph population.

Extract two types of entities:
1. **Equipment**: Medical devices, diagnostic machines, surgical equipment
2. **Department**: Hospital departments and clinical units

Rules:
- Only extract terms that explicitly appear in the text
- Extract exact phrases as they appear
- Include variations (e.g., "MRI", "MRI machine")
- Return distinct entities only

Example 1:
Text: "The Emergency Department acquired a CT scanner and ventilators. Radiology will operate the CT scanner."
Output:
{{
  "Equipment": ["CT scanner", "ventilators"],
  "Department": ["Emergency Department", "Radiology"]
}}

Example 2:
Text: "Cardiology has an advanced MRI machine for cardiac imaging. The ICU needs access to this MRI."
Output:
{{
  "Equipment": ["MRI machine", "MRI"],
  "Department": ["Cardiology", "ICU"]
}}

Example 3:
Text: "Surgical robots are being deployed. The Neurology department requested access to the robotic surgery system."
Output:
{{
  "Equipment": ["Surgical robots", "robotic surgery system"],
  "Department": ["Neurology"]
}}

Now extract entities from:
Text: "{text}"

Output (JSON only):
"""

def extract_entities(text: str, model: str = "gpt-4o") -> Dict[str, List[str]]:
    """Extract Equipment and Department entities using GPT"""
    prompt = create_extraction_prompt(text)
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a medical entity extraction expert. Respond with valid JSON only."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.0,
            response_format={"type": "json_object"}
        )
        
        result = json.loads(response.choices[0].message.content)
        
        if "Equipment" not in result:
            result["Equipment"] = []
        if "Department" not in result:
            result["Department"] = []
        
        result["Equipment"] = list(dict.fromkeys(result["Equipment"]))
        result["Department"] = list(dict.fromkeys(result["Department"]))
        
        return result
    except Exception as e:
        print(f"Error: {e}")
        return {"Equipment": [], "Department": []}

print("✓ Extractor functions defined")

✓ Extractor functions defined


### Test Extractor

In [18]:
test_text = """The Emergency Department received three new ventilators and an advanced CT scanner. 
The Cardiology department is coordinating with Radiology to share the new MRI machine. 
The ICU requested access to portable ultrasound devices."""

result = extract_entities(test_text)

print("INPUT:")
print(test_text)
print("\nEXTRACTED:")
print(f"Equipment ({len(result['Equipment'])}): {result['Equipment']}")
print(f"Department ({len(result['Department'])}): {result['Department']}")

INPUT:
The Emergency Department received three new ventilators and an advanced CT scanner. 
The Cardiology department is coordinating with Radiology to share the new MRI machine. 
The ICU requested access to portable ultrasound devices.

EXTRACTED:
Equipment (4): ['ventilators', 'CT scanner', 'MRI machine', 'portable ultrasound devices']
Department (4): ['Emergency Department', 'Cardiology', 'Radiology', 'ICU']


## Task 2: Evaluation

In [19]:
def load_evaluation_dataset(filepath: str = "evaluation_dataset.json") -> List[Dict]:
    """Load manually annotated evaluation dataset"""
    with open(filepath, 'r', encoding='utf-8') as f:
        dataset = json.load(f)
    print(f"✓ Loaded {len(dataset)} texts")
    return dataset

def normalize_entity(entity: str) -> str:
    """Normalize entity for comparison"""
    return entity.lower().strip()

def calculate_metrics(predicted: List[str], ground_truth: List[str]) -> Tuple[float, float, float]:
    """Calculate Precision, Recall, F1"""
    pred_set = set(normalize_entity(e) for e in predicted)
    truth_set = set(normalize_entity(e) for e in ground_truth)
    
    tp = len(pred_set & truth_set)
    fp = len(pred_set - truth_set)
    fn = len(truth_set - pred_set)
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
    
    return precision, recall, f1

def evaluate_extractor(dataset: List[Dict]) -> pd.DataFrame:
    """Evaluate extractor on full dataset"""
    results = []
    
    for item in dataset:
        text_id = item['id']
        text = item['text']
        ground_truth = item['ground_truth']
        
        print(f"Processing text {text_id}...")
        predicted = extract_entities(text)
        
        for entity_type in ['Equipment', 'Department']:
            pred = predicted.get(entity_type, [])
            truth = ground_truth.get(entity_type, [])
            
            precision, recall, f1 = calculate_metrics(pred, truth)
            
            results.append({
                'text_id': text_id,
                'entity_type': entity_type,
                'precision': precision,
                'recall': recall,
                'f1_score': f1,
                'predicted': len(pred),
                'ground_truth': len(truth)
            })
    
    return pd.DataFrame(results)

print("✓ Evaluation functions defined")

✓ Evaluation functions defined


### Run Evaluation

In [20]:
dataset = load_evaluation_dataset("evaluation_dataset.json")
results = evaluate_extractor(dataset)

print("\n" + "="*60)
print("EVALUATION RESULTS")
print("="*60)

print("\nOverall Performance:")
print(f"  Precision: {results['precision'].mean():.3f}")
print(f"  Recall:    {results['recall'].mean():.3f}")
print(f"  F1 Score:  {results['f1_score'].mean():.3f}")

print("\nPer Entity Type:")
for entity_type in ['Equipment', 'Department']:
    subset = results[results['entity_type'] == entity_type]
    print(f"\n{entity_type}:")
    print(f"  Precision: {subset['precision'].mean():.3f}")
    print(f"  Recall:    {subset['recall'].mean():.3f}")
    print(f"  F1 Score:  {subset['f1_score'].mean():.3f}")

results.to_csv("evaluation_results.csv", index=False)
print("\n✓ Results saved to evaluation_results.csv")

✓ Loaded 12 texts
Processing text 1...
Processing text 2...
Processing text 3...
Processing text 4...
Processing text 5...
Processing text 6...
Processing text 7...
Processing text 8...
Processing text 9...
Processing text 10...
Processing text 11...
Processing text 12...

EVALUATION RESULTS

Overall Performance:
  Precision: 0.745
  Recall:    0.767
  F1 Score:  0.750

Per Entity Type:

Equipment:
  Precision: 0.903
  Recall:    0.924
  F1 Score:  0.902

Department:
  Precision: 0.587
  Recall:    0.611
  F1 Score:  0.597

✓ Results saved to evaluation_results.csv


### Detailed Results

In [21]:
print("\nDetailed Results by Text:")
display(results)


Detailed Results by Text:


Unnamed: 0,text_id,entity_type,precision,recall,f1_score,predicted,ground_truth
0,1,Equipment,1.0,0.666667,0.8,2,3
1,1,Department,0.714286,1.0,0.833333,7,5
2,2,Equipment,0.8,1.0,0.888889,5,4
3,2,Department,1.0,1.0,1.0,1,1
4,3,Equipment,0.6,1.0,0.75,10,6
5,3,Department,0.333333,0.333333,0.333333,3,3
6,4,Equipment,1.0,0.666667,0.8,2,3
7,4,Department,0.0,0.0,0.0,0,7
8,5,Equipment,0.888889,1.0,0.941176,9,8
9,5,Department,0.0,0.0,0.0,0,5


## Task 3: LLM-as-a-Judge

In [22]:
def create_judge_prompt(text: str, extracted: Dict[str, List[str]], entity_type: str) -> str:
    """Create prompt for LLM judge"""
    entities = extracted.get(entity_type, [])
    
    descriptions = {
        "Equipment": "medical devices, diagnostic machines, surgical equipment",
        "Department": "hospital departments, clinical units, organizational divisions"
    }
    
    return f"""You are an expert evaluator for medical entity extraction.

Judge if extracted entities are correct. An entity is CORRECT if:
1. It appears in the original text
2. It belongs to the specified entity type
3. It is a valid instance of that type

Entity Type: {entity_type} ({descriptions[entity_type]})

Original Text:
"{text}"

Extracted {entity_type}:
{json.dumps(entities, indent=2)}

For each entity, judge:
- CORRECT: appears in text and correctly classified
- INCORRECT: hallucinated or wrongly classified
- PARTIAL: partially correct

Output (JSON):
{{
  "evaluations": [
    {{"entity": "...", "judgment": "CORRECT|INCORRECT|PARTIAL", "reasoning": "..."}}
  ],
  "summary": {{
    "correct": <number>,
    "incorrect": <number>,
    "partial": <number>
  }}
}}
"""

def llm_judge(text: str, extracted: Dict[str, List[str]], entity_type: str, model: str = "gpt-4o") -> Dict:
    """Use LLM as judge to evaluate extraction"""
    prompt = create_judge_prompt(text, extracted, entity_type)
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are an expert evaluator. Respond with valid JSON only."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.0,
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    except Exception as e:
        print(f"Error: {e}")
        return {"evaluations": [], "summary": {"correct": 0, "incorrect": 0, "partial": 0}}

print("✓ LLM judge functions defined")

✓ LLM judge functions defined


### Test LLM Judge

In [23]:
sample_text = """The Emergency Department acquired two new ventilators and a CT scanner. 
The Cardiology team will help train staff on the new equipment."""

sample_extraction = {
    "Equipment": ["ventilators", "CT scanner", "new equipment"],
    "Department": ["Emergency Department", "Cardiology"]
}

print("Testing LLM Judge on Equipment:\n")
judge_result = llm_judge(sample_text, sample_extraction, "Equipment")
print(json.dumps(judge_result, indent=2))

Testing LLM Judge on Equipment:

{
  "evaluations": [
    {
      "entity": "ventilators",
      "judgment": "CORRECT",
      "reasoning": "The entity 'ventilators' appears in the original text and is correctly classified as Equipment."
    },
    {
      "entity": "CT scanner",
      "judgment": "CORRECT",
      "reasoning": "The entity 'CT scanner' appears in the original text and is correctly classified as Equipment."
    },
    {
      "entity": "new equipment",
      "judgment": "PARTIAL",
      "reasoning": "The entity 'new equipment' appears in the original text, but it is a general term and not a specific instance of Equipment. It partially matches the context but lacks specificity."
    }
  ],
  "summary": {
    "correct": 2,
    "incorrect": 0,
    "partial": 1
  }
}


### Evaluate Judge on Sample Texts

In [24]:
# Evaluate LLM judge on a subset of texts
sample_texts = dataset[:3]  # First 3 texts

print("Evaluating LLM Judge on sample texts:\n")

for item in sample_texts:
    text = item['text']
    extracted = extract_entities(text)
    
    print(f"Text {item['id']}:")
    print(f"Source: {item['source']}\n")
    
    for entity_type in ['Equipment', 'Department']:
        result = llm_judge(text, extracted, entity_type)
        summary = result.get('summary', {})
        
        print(f"{entity_type}:")
        print(f"  Correct: {summary.get('correct', 0)}")
        print(f"  Incorrect: {summary.get('incorrect', 0)}")
        print(f"  Partial: {summary.get('partial', 0)}")
    
    print()

Evaluating LLM Judge on sample texts:

Text 1:
Source: UCI Health News - July 2024

Equipment:
  Correct: 2
  Incorrect: 0
  Partial: 0
Department:
  Correct: 6
  Incorrect: 0
  Partial: 1

Text 2:
Source: Northwestern Medical Center Press Release - 2024

Equipment:
  Correct: 4
  Incorrect: 1
  Partial: 0
Department:
  Correct: 1
  Incorrect: 0
  Partial: 0

Text 3:
Source: UNM Hospital News - February 2024

Equipment:
  Correct: 9
  Incorrect: 1
  Partial: 0
Department:
  Correct: 1
  Incorrect: 2
  Partial: 0



## Conclusion

This notebook implements:
1. **Entity Extractor**: LLM-based extraction using GPT-4o with few-shot prompting
2. **Evaluation**: Precision/Recall/F1 metrics on 12 manually annotated real-world texts
3. **LLM Judge**: Automated evaluation system for extraction quality assessment

Results demonstrate the effectiveness of prompt engineering for medical entity extraction.