# MedGemma-Council: Multi-Agent Clinical Decision Support

This notebook demonstrates the MedGemma-Council system -- a "Council of Experts" 
that debates clinical cases via a LangGraph state machine, powered by MedGemma 1.5 models.

## Architecture
```
Ingestion -> Supervisor Route -> Specialists (parallel) -> Safety Check
  -> [red flag] -> Emergency Synthesis -> END
  -> [safe] -> Conflict Check
    -> [conflict] -> Research (PubMed) -> Debate -> Conflict Check (loop, max 3)
    -> [no conflict] -> Synthesis -> Final Plan
```

## Available Specialists (10 agents)
- **Cardiology** (ACC/AHA), **Oncology** (NCCN), **Pediatrics** (AAP/WHO)
- **Radiology** (vision-based), **Psychiatry** (APA/DSM-5-TR)
- **Emergency Medicine** (ACLS/ATLS), **Dermatology** (AAD)
- **Neurology** (AAN/AHA-ASA), **Endocrinology** (ADA/Endocrine Society)
- **Research** (PubMed/MEDLINE literature retrieval)

## 1. Installation

In [None]:
# Install the medgemma-council package
# On Kaggle, the repo should be uploaded as a dataset or utility script
import os

# If running on Kaggle:
REPO_PATH = "/kaggle/working/medgemma-council"
if not os.path.exists(REPO_PATH):
    # Fallback for local development
    REPO_PATH = "."

# Install dependencies
!pip install -q langgraph langchain-core pydantic biopython datasets chromadb

# Install the package itself
!pip install -q -e {REPO_PATH}

In [None]:
# Verify installation
import sys
sys.path.insert(0, os.path.join(REPO_PATH, "src"))

from graph import CouncilState, build_council_graph
from utils.safety import scan_for_red_flags, redact_pii
from utils.model_factory import ModelFactory
print("Installation verified successfully!")

## 2. Run Tests (Verify Integrity)

In [None]:
# Run the full test suite to verify everything works
# All 367 tests should pass in < 2 seconds (everything is mocked, no GPU needed)
!cd {REPO_PATH} && python -m pytest tests/ -v --tb=short 2>&1 | tail -30

## 3. CLI-First Workflow (Recommended for Kaggle)

The `council_cli` module provides the simplest way to run the council 
programmatically without any web framework.

In [None]:
# Import the CLI module
sys.path.insert(0, REPO_PATH)
from council_cli import run_council_cli, format_result, build_state

### Example 1: Cardiology Case

In [None]:
# Run a cardiology case
result = run_council_cli(
    age=65,
    sex="Male",
    chief_complaint="Chest pain radiating to left arm, onset 2 hours ago",
    history="Hypertension, Type 2 Diabetes, former smoker (quit 5 years ago)",
    medications=["Metformin 1000mg BID", "Lisinopril 20mg daily", "Aspirin 81mg daily"],
    vitals={"bp": "160/95", "hr": 92, "temp": 98.6, "spo2": 96, "rr": 18},
    labs={"troponin": 0.08, "bnp": 450, "creatinine": 1.4, "glucose": 210},
)

# Display formatted results
print(format_result(result, output_format="text"))

### Example 2: Pediatrics Case

In [None]:
result_peds = run_council_cli(
    age=5,
    sex="Female",
    chief_complaint="High fever (103F) for 3 days with cough and poor appetite",
    history="No significant past medical history, vaccinations up to date",
    medications=[],
    vitals={"temp": 103.0, "hr": 130, "rr": 28, "spo2": 94},
    labs={"wbc": 18000, "crp": 45},
)

print(format_result(result_peds, output_format="text"))

### Example 3: Multi-specialty Case (Oncology + Cardiology)

In [None]:
result_multi = run_council_cli(
    age=58,
    sex="Female",
    chief_complaint="Newly diagnosed breast cancer with pre-existing heart failure",
    history="HFrEF (LVEF 35%), NYHA Class II, breast mass found on screening mammography",
    medications=["Carvedilol 25mg BID", "Sacubitril/Valsartan 97/103mg BID", "Spironolactone 25mg"],
    vitals={"bp": "110/70", "hr": 68, "spo2": 97},
    labs={"bnp": 890, "troponin": 0.02, "lvef": 35},
)

print(format_result(result_multi, output_format="text"))

### Example 4: Neurology Case

In [None]:
result_neuro = run_council_cli(
    age=72,
    sex="Male",
    chief_complaint="Sudden onset right-sided weakness and slurred speech, 45 minutes ago",
    history="Atrial fibrillation (not on anticoagulation), Hypertension",
    medications=["Amlodipine 10mg daily"],
    vitals={"bp": "185/110", "hr": 88, "rr": 16, "spo2": 97},
    labs={"inr": 1.0, "glucose": 140, "platelets": 220000},
)

print(format_result(result_neuro, output_format="text"))

### Example 5: Endocrinology Case

In [None]:
result_endo = run_council_cli(
    age=48,
    sex="Female",
    chief_complaint="Uncontrolled diabetes with HbA1c 10.2%, recurrent DKA episodes",
    history="Type 1 Diabetes (20 years), Hashimoto thyroiditis, gastroparesis",
    medications=["Insulin glargine 40u daily", "Insulin lispro sliding scale", "Levothyroxine 100mcg"],
    vitals={"bp": "128/78", "hr": 98, "temp": 98.6, "spo2": 99},
    labs={"hba1c": 10.2, "glucose": 320, "tsh": 4.8, "creatinine": 1.1},
)

print(format_result(result_endo, output_format="text"))

## 4. Working with the Raw State

For more control, you can work directly with the CouncilState.

In [None]:
# Build state manually
state = build_state(
    age=34,
    sex="Male",
    chief_complaint="Severe anxiety, insomnia for 3 months, suicidal ideation",
    history="Major depressive disorder, generalized anxiety disorder",
    medications=["Sertraline 100mg daily"],
)

# Inspect the state structure
print("State keys:", list(state.keys()))
print("Patient context:", state["patient_context"])
print("Red flag detected:", state.get("red_flag_detected", False))
print("Emergency override:", state.get("emergency_override", ""))

In [None]:
# Check for red flags BEFORE running the council
from utils.safety import scan_for_red_flags

flags = scan_for_red_flags(state["patient_context"]["chief_complaint"])
print("Red flags detected:", flags["flagged"])
if flags["flagged"]:
    print("Flags:", flags["flags"])
    print("Emergency message:", flags["emergency_message"])

## 5. JSON Output (for downstream processing)

In [None]:
import json

# Get JSON output for programmatic use
json_output = format_result(result, output_format="json")
parsed = json.loads(json_output)

print("Final Plan:", parsed["final_plan"][:200], "...")
print("Consensus:", parsed["consensus_reached"])
print("Specialists consulted:", list(parsed["agent_outputs"].keys()))
print("Red flag detected:", parsed.get("red_flag_detected", False))

## 6. Using Medical Images

The RadiologyAgent and DermatologyAgent can process medical images 
using the MedGemma 1.5 4B multimodal model.

In [None]:
# Example with images (paths should point to actual image files)
# On Kaggle, images would be in /kaggle/input/your-dataset/

# result_with_images = run_council_cli(
#     age=72,
#     sex="Male",
#     chief_complaint="Persistent cough, weight loss, hemoptysis",
#     history="40 pack-year smoking history",
#     image_paths=[
#         "/kaggle/input/chest-xrays/current_cxr.png",
#         "/kaggle/input/chest-xrays/prior_cxr_6mo.png",
#     ],
# )
# print(format_result(result_with_images))

print("Image processing requires MedGemma 1.5 4B model on GPU.")
print("Uncomment the code above when running on Kaggle with T4 GPUs.")

## 7. Model Loading (GPU Setup)

On Kaggle with T4 GPUs, load the actual MedGemma models using 
`transformers` + `bitsandbytes` for 4-bit quantization:

In [None]:
# GPU Model Loading (uncomment on Kaggle with T4 GPUs)
#
# import os
# os.environ["MEDGEMMA_USE_REAL_MODELS"] = "true"
#
# from utils.model_factory import ModelFactory
# from utils.quantization import get_model_kwargs, detect_gpu_config
#
# # Check GPU availability
# gpu_config = detect_gpu_config()
# print(f"GPUs detected: {gpu_config}")
#
# # Create factory in real mode
# factory = ModelFactory()
#
# # Load MedGemma-27B with 4-bit NF4 quantization (auto-split across 2xT4)
# # Uses: BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
# # Memory budget: max_memory={0: "14GiB", 1: "14GiB"}
# text_model = factory.create_text_model()
# print("Text model loaded (4-bit NF4, ~13.5 GB across 2xT4)")
#
# # Load MedGemma 1.5 4B vision model (bfloat16 on single T4)
# vision_model = factory.create_vision_model()
# print("Vision model loaded (bfloat16, ~8 GB on single T4)")

print("Model loading requires Kaggle T4 GPU environment.")
print("Set MEDGEMMA_USE_REAL_MODELS=true to enable real model loading.")
print("In mock mode (default), all model calls return placeholder responses.")

## 8. Evaluation Harness

Run the council against standard medical QA benchmarks to measure performance.

In [None]:
from evaluation.benchmarks import load_medqa, load_pubmedqa, load_medmcqa, format_medqa_prompt
from evaluation.evaluator import CouncilEvaluator
from evaluation.metrics import compute_accuracy, generate_report

# Preview benchmark data (uses HuggingFace datasets, mocked in tests)
# medqa_items = load_medqa(limit=5)
# print(f"MedQA sample: {medqa_items[0]['question'][:100]}...")
# print(f"Options: {medqa_items[0]['options']}")
# print(f"Answer: {medqa_items[0]['answer']}")

print("Available benchmarks:")
print("  - MedQA (GBaker/MedQA-USMLE-4-options): 1.27k USMLE-style questions")
print("  - PubMedQA (qiaojin/PubMedQA): 1k yes/no/maybe questions")
print("  - MedMCQA (openlifescienceai/medmcqa): 194k with 21 subject tags")
print("")
print("CLI usage:")
print("  python -m evaluation.runner --benchmark medqa --limit 100")
print("  python -m evaluation.runner --benchmark medmcqa --specialty Cardiology")

In [None]:
# Run evaluation programmatically (uncomment when models are loaded)
#
# from graph import build_council_graph
# from evaluation.evaluator import CouncilEvaluator
# from evaluation.benchmarks import load_medqa, format_medqa_prompt
# from evaluation.metrics import compute_accuracy, generate_report
#
# graph = build_council_graph()
# evaluator = CouncilEvaluator(graph=graph)
#
# items = load_medqa(limit=50)
# results = []
# for item in items:
#     prompt = format_medqa_prompt(item)
#     result = evaluator.evaluate_single(item["question"], item["answer"], prompt)
#     results.append(result)
#
# report = generate_report(results, benchmark_name="medqa")
# print(f"Accuracy: {report['accuracy']:.1%}")
# print(f"Total: {report['total']}")

print("Evaluation requires loaded models. Uncomment when running on Kaggle GPUs.")

## 9. PMC-Patients & LLM-as-Judge Evaluation

Evaluate clinical plan quality using PMC-Patients cases and an LLM judge.

In [None]:
from evaluation.pmc_patients import load_pmc_patients, format_pmc_patient_prompt
from evaluation.retrieval_metrics import compute_mrr, compute_ndcg
from evaluation.llm_judge import LLMJudge, generate_judging_prompt

# Preview a judging prompt
sample_prompt = generate_judging_prompt(
    patient_context={"chief_complaint": "chest pain", "age": "65", "sex": "Male"},
    clinical_plan="Admit to CCU. Serial troponins q6h. Start heparin drip. Cardiology consult.",
)
print("=== Sample Judging Prompt ===")
print(sample_prompt)

In [None]:
# LLM-as-Judge evaluation (uncomment when models are loaded)
#
# judge = LLMJudge(llm=text_model)
#
# # Evaluate a single plan
# score = judge.evaluate_plan(
#     patient_context={"chief_complaint": "chest pain", "age": "65"},
#     clinical_plan="Admit to CCU. Start heparin drip. Cardiology consult.",
# )
# print(f"Score: {score['score']}/5")
# print(f"Rationale: {score['rationale']}")
#
# # Batch evaluation
# cases = [
#     {"patient_context": {"chief_complaint": "chest pain"}, "clinical_plan": "Plan A..."},
#     {"patient_context": {"chief_complaint": "headache"}, "clinical_plan": "Plan B..."},
# ]
# batch_scores = judge.evaluate_batch(cases)
# for i, s in enumerate(batch_scores):
#     print(f"Case {i+1}: {s['score']}/5")

print("LLM-as-Judge requires loaded models. Uncomment when running on Kaggle GPUs.")
print("")
print("Retrieval metrics available:")
print("  compute_mrr(results)    - Mean Reciprocal Rank")
print("  compute_ndcg(results)   - NDCG@k (default k=10)")

## 10. Guideline Ingestion (RAG Pipeline)

Ingest clinical guideline documents into the ChromaDB vector store for 
retrieval-augmented generation by specialist agents.

In [None]:
from tools.ingestion import GuidelineChunker, IngestionPipeline

# Preview the chunker
chunker = GuidelineChunker(chunk_size=512, chunk_overlap=64)
sample_text = (
    "ACC/AHA Guideline for Chest Pain Evaluation: "
    "Patients presenting with acute chest pain should receive an immediate "
    "12-lead ECG within 10 minutes of arrival. Serial troponin measurements "
    "should be obtained at 0 and 3 hours using high-sensitivity assays. "
    "Risk stratification using HEART score is recommended for all patients."
)

chunks = chunker.chunk_text(sample_text, source="acc_aha_chest_pain.pdf")
print(f"Input text: {len(sample_text)} chars")
print(f"Chunks produced: {len(chunks)}")
for chunk in chunks:
    print(f"  Chunk {chunk['chunk_index']}: {len(chunk['text'])} chars, source={chunk['source']}")

In [None]:
# Ingest guideline documents (uncomment when you have guideline files)
#
# pipeline = IngestionPipeline(
#     persist_directory="data/vector_store/",
#     collection_name="guidelines",
#     chunk_size=512,
#     chunk_overlap=64,
# )
#
# # Ingest from directory (supports .pdf, .txt, .md)
# pipeline.ingest_directory("data/reference_docs/")
# stats = pipeline.get_stats()
# print(f"Ingested: {stats}")
#
# # Or via CLI:
# # !python scripts/ingest_guidelines.py --input-dir data/reference_docs/ --chunk-size 512

print("Guideline ingestion ready. Supports: .pdf, .txt, .md")
print("Place guideline files in data/reference_docs/ then run the pipeline.")
print("")
print("CLI usage:")
print("  python scripts/ingest_guidelines.py --input-dir data/reference_docs/")

## 11. Gradio UI (Alternative to Streamlit)

For interactive use on Kaggle (where Streamlit doesn't work well), 
use the Gradio interface:

In [None]:
# Install Gradio if not already installed
# !pip install -q gradio

# Launch the Gradio interface
# This will create an interactive UI in the notebook output
#
# import subprocess
# subprocess.Popen(["python", os.path.join(REPO_PATH, "app_gradio.py")])

print("To launch Gradio UI:")
print(f"  python {os.path.join(REPO_PATH, 'app_gradio.py')}")
print("Or import and launch directly:")
print("  from app_gradio import create_interface")
print("  demo = create_interface()")
print("  demo.launch(share=True)  # share=True for Kaggle")

## 12. Safety Guardrails Demo

In [None]:
from utils.safety import scan_for_red_flags, redact_pii, add_disclaimer

# Red flag detection
test_cases = [
    "Patient denies suicidal ideation but reports self-harm behavior.",
    "Lactate > 4, MAP < 65, suspected septic shock.",
    "Patient stable, vitals within normal limits.",
    "Acute ischemic stroke, onset 45 minutes ago.",
    "Pulseless ventricular tachycardia, initiating CPR.",
]

for text in test_cases:
    result = scan_for_red_flags(text)
    status = "RED FLAG" if result["flagged"] else "CLEAR"
    flags = ", ".join(result["flags"]) if result["flags"] else "none"
    print(f"[{status}] {text[:60]}... -> {flags}")

print("")
print("When red flags are detected in the graph, the safety_check node")
print("routes to emergency_synthesis, bypassing normal debate flow.")

In [None]:
# PII redaction
text_with_pii = (
    "Patient John Smith, SSN 123-45-6789, phone (555) 123-4567, "
    "email john@hospital.com, MRN: 12345678, presents with chest pain."
)

print("Before redaction:")
print(text_with_pii)
print("\nAfter redaction:")
print(redact_pii(text_with_pii))

---

## Summary

This notebook demonstrated:

1. **Installation** -- `pip install -e .` for the medgemma-council package
2. **CLI workflow** -- `run_council_cli()` for programmatic use
3. **Multiple case types** -- Cardiology, Pediatrics, Multi-specialty, Neurology, Endocrinology
4. **Safety guardrails** -- Red flag detection (with graph-level override), PII redaction, disclaimers
5. **Output formats** -- Text (human-readable) and JSON (machine-readable)
6. **GPU setup** -- `transformers` + `bitsandbytes` 4-bit NF4 quantization with `device_map="auto"`
7. **Evaluation harness** -- MedQA, PubMedQA, MedMCQA benchmarks
8. **PMC-Patients + LLM-as-Judge** -- Clinical plan scoring with retrieval metrics (MRR, NDCG)
9. **Guideline ingestion** -- RAG pipeline for clinical guidelines (PDF/TXT/MD -> ChromaDB)
10. **Gradio UI** -- Interactive web interface for Kaggle

For production use on Kaggle T4 GPUs, set `MEDGEMMA_USE_REAL_MODELS=true` 
and ensure the MedGemma model weights are available as a dataset.