# MedGemma Inference Test

This notebook tests the MedGemma client with **Ollama backend**.

MedGemma is a collection of **Gemma 3** variants trained for medical text and image comprehension.
The 4B variant is **multimodal** — it can process both text and images.

## Prerequisites

1. Install Ollama: https://ollama.ai
2. Pull the model: `ollama pull MedAIBase/MedGemma1.5:4b-it-q8_0`
3. Ollama server running (starts automatically or run `ollama serve`)

## Tests
- **Sections 1-6**: Text-only inference (reference ranges, clinical interpretation, JSON extraction)
- **Sections 7-10**: **Image-based inference** (multimodal — prescription, lab report, PDF pages)

In [1]:
import sys
import os

# Add project root to path
project_root = os.path.dirname(os.path.dirname(os.path.abspath('__file__')))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

print(f"Project root: {project_root}")

Project root: /Users/yaokouadio/Projects/STARTUP/universal-medical-ingestion-engine


## 1. Setup Logging

In [2]:
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print("Logging configured")

Logging configured


## 2. Create Client (Ollama Backend)

In [3]:
from src.medical_ingestion.medgemma.client import create_client, get_default_config

# Show default config
default_config = get_default_config('ollama')
print("Default Ollama config:")
for k, v in default_config.items():
    print(f"  {k}: {v}")

  from .autonotebook import tqdm as notebook_tqdm


Default Ollama config:
  backend: ollama
  max_tokens: 1000
  temperature: 0.1
  use_cache: True
  ollama_host: http://localhost:11434
  ollama_model: MedAIBase/MedGemma1.5:4b-it-q8_0
  timeout: 120


  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


In [4]:
# Create client with Ollama backend
config = {
    'backend': 'ollama',
    'ollama_host': 'http://localhost:11434',
    'ollama_model': 'medgemma-4b-local',
    'max_tokens': 512,
    'temperature': 0.1,
}

client = create_client(config)

print(f"Client created")
print(f"  Backend: {client.backend_type.value}")
print(f"  Model: {client.model_name}")

2026-02-05 19:53:20,119 - OllamaMedGemmaClient - INFO - Initialized Ollama client: http://localhost:11434 / medgemma-4b-local


Client created
  Backend: ollama
  Model: medgemma-4b-local


## 3. Health Check

In [5]:
# Check if Ollama is running and model is available
health = await client.health_check()

print("Health Check:")
print(f"  Healthy: {health['healthy']}")
print(f"  Backend: {health['backend']}")
print(f"  Model: {health['model']}")
print(f"  Details: {health['details']}")

if not health['healthy']:
    print("\n*** FIX REQUIRED ***")
    print("Run these commands:")
    print("  1. ollama serve  (if not running)")
    print("  2. ollama pull MedAIBase/MedGemma1.5:4b-it-q8_0")

Health Check:
  Healthy: True
  Backend: ollama
  Model: medgemma-4b-local
  Details: Ollama server running and model available


## 4. Test Inference - Simple Query

In [6]:
# Simple medical query
prompt = """What are the normal reference ranges for the following lab values?
- Hemoglobin
- White Blood Cell Count
- Platelet Count

Provide a brief answer."""

print("Prompt:")
print(prompt)
print("\n" + "="*60)
print("Generating response...\n")

result = await client.generate(prompt, max_tokens=256)

print("RESPONSE:")
print("="*60)
print(result['text'])
print("="*60)
print(f"\nPrompt tokens: {result['prompt_tokens']}")
print(f"Generated tokens: {result['generated_tokens']}")
print(f"Inference time: {result['inference_time']:.2f}s")
print(f"Tokens/sec: {result.get('tokens_per_second', 0):.1f}")

Prompt:
What are the normal reference ranges for the following lab values?
- Hemoglobin
- White Blood Cell Count
- Platelet Count

Provide a brief answer.

Generating response...



2026-02-05 19:53:40,990 - OllamaMedGemmaClient - INFO - Generated 185 tokens in 8.65s (31.1 tokens/sec)


RESPONSE:
Here are the normal reference ranges for the lab values you requested:

*   **Hemoglobin:** 12.0 - 16.0 g/dL (men), 12.0 - 16.0 g/dL (women)
*   **White Blood Cell Count:** 4.5 - 11.0 x 10^9/L (4.5 - 11.0 x 10^3/mm^3)
*   **Platelet Count:** 150 - 400 x 10^9/L (150 - 400 x 10^3/mm^3)

**Note:** These ranges can vary slightly depending on the laboratory and the specific testing method used. Always refer to the reference ranges provided by the specific lab that performed the test.

Prompt tokens: 43
Generated tokens: 185
Inference time: 8.65s
Tokens/sec: 31.1


## 5. Test Inference - Lab Result Interpretation

In [7]:
# Lab result interpretation
prompt = """You are a medical assistant AI designed to provide clinical interpretations of lab results.

INSTRUCTIONS:

1. Apply **diagnostic thresholds exactly**. Do NOT downgrade, hedge, or mislabel a diagnosis if a lab value meets established criteria.

   - Fasting glucose ≥126 mg/dL = Diabetes Mellitus
   - HbA1c ≥6.5% = Diabetes Mellitus
   - Total cholesterol ≥200 mg/dL = Hypercholesterolemia
   - LDL ≥130 mg/dL = High LDL
   - HDL <40 mg/dL (male) = Low HDL
   - Triglycerides ≥150 mg/dL = Hypertriglyceridemia

2. First, **determine the relevant diagnoses** based on the labs. Only include diagnoses that are supported by the numbers above.

3. Then, **explain the clinical significance** of these lab abnormalities in clear, concise language, suitable for a clinician.

4. Suggest **general next steps or risk implications**, but do NOT suggest treatments or medications unless explicitly asked.

5. **Output format must be structured as follows**:

Overall Assessment:
- <Brief summary of major diagnoses and risks>

Detailed Interpretation:
- <Lab Name>: <Value> — <Interpretation including normal range and significance>
- ...

Important: Never hedge. If thresholds are met, explicitly state the diagnosis.

---

Now analyze the following lab results for a 45-year-old male patient:

- Glucose (fasting): 142 mg/dL
- HbA1c: 7.2%
- Total Cholesterol: 245 mg/dL
- LDL: 165 mg/dL
- HDL: 38 mg/dL
- Triglycerides: 210 mg/dL
"""

print("Generating clinical interpretation...\n")

result = await client.generate(prompt, max_tokens=512)

print("CLINICAL INTERPRETATION:")
print("="*60)
print(result['text'])
print("="*60)
print(f"\nInference time: {result['inference_time']:.2f}s")
print(f"Tokens/sec: {result.get('tokens_per_second', 0):.1f}")

Generating clinical interpretation...



2026-02-05 19:56:55,538 - OllamaMedGemmaClient - INFO - Generated 136 tokens in 6.65s (31.1 tokens/sec)


CLINICAL INTERPRETATION:
Overall Assessment:
- The patient has diabetes mellitus, hypercholesterolemia, high LDL, low HDL, and hypertriglyceridemia.

Detailed Interpretation:
- Glucose (fasting): 142 mg/dL — Diabetes Mellitus
- HbA1c: 7.2% — Diabetes Mellitus
- Total Cholesterol: 245 mg/dL — Hypercholesterolemia
- LDL: 165 mg/dL — High LDL
- HDL: 38 mg/dL — Low HDL
- Triglycerides: 210 mg/dL — Hypertriglyceridemia

Inference time: 6.65s
Tokens/sec: 31.1


## 6. Test JSON Extraction

In [8]:
# Test structured output
prompt = """You are a medical data extraction AI. Your task is to extract lab values from the following report and return **only valid JSON**, strictly matching this structure:

{
  "patient": "name",
  "date": "YYYY-MM-DD",
  "results": [
    {
      "test": "name",
      "value": number,
      "unit": "unit",
      "normal": true/false
    }
  ]
}

Rules:
1. Do NOT include explanations, reasoning, or extra text.
2. All string values must be in double quotes.
3. Boolean values must be `true` or `false`.
4. Output must be **strict JSON parsable**.
5. Normal flag is true if the value falls within the given normal range.

Here is the report:

Patient: John Smith
Test Date: 2024-01-15
Hemoglobin: 14.2 g/dL (Normal: 13.5-17.5)
WBC: 8.5 x10^9/L (Normal: 4.5-11.0)
Platelets: 250 x10^9/L (Normal: 150-400)

Return the JSON now:
"""

print("Generating structured response...\n")

result = await client.generate(prompt, max_tokens=512, temperature=0.0)

print("Raw response:")
print(result['text'])
print("\n" + "="*60)

# Extract JSON
extracted = client.extract_json(result['text'])
if extracted:
    print("\nExtracted JSON:")
    import json
    print(json.dumps(extracted, indent=2))
else:
    print("\nCould not extract JSON from response")

Generating structured response...



2026-02-05 20:05:56,861 - OllamaMedGemmaClient - INFO - Generated 176 tokens in 9.59s (31.8 tokens/sec)


Raw response:
```json
{
  "patient": "John Smith",
  "date": "2024-01-15",
  "results": [
    {
      "test": "Hemoglobin",
      "value": 14.2,
      "unit": "g/dL",
      "normal": true
    },
    {
      "test": "WBC",
      "value": 8.5,
      "unit": "x10^9/L",
      "normal": true
    },
    {
      "test": "Platelets",
      "value": 250,
      "unit": "x10^9/L",
      "normal": true
    }
  ]
}
```


Extracted JSON:
{
  "patient": "John Smith",
  "date": "2024-01-15",
  "results": [
    {
      "test": "Hemoglobin",
      "value": 14.2,
      "unit": "g/dL",
      "normal": true
    },
    {
      "test": "WBC",
      "value": 8.5,
      "unit": "x10^9/L",
      "normal": true
    },
    {
      "test": "Platelets",
      "value": 250,
      "unit": "x10^9/L",
      "normal": true
    }
  ]
}


## 7. View Statistics (Text-Only Tests)

In [9]:
stats = client.get_statistics()

print("MedGemma Client Statistics")
print("="*40)
for key, value in stats.items():
    if isinstance(value, float):
        print(f"{key}: {value:.2f}")
    else:
        print(f"{key}: {value}")

MedGemma Client Statistics
backend: ollama
model: medgemma-4b-local
inference_count: 3
cache_hits: 0
cache_hit_rate: 0.00
total_inference_time: 24.89
average_inference_time: 8.30
ollama_host: http://localhost:11434


## 8. Test Different Backends

You can switch backends by changing the config:

In [10]:
# Example: How to use different backends

print("Available backends:")
print()
print("1. OLLAMA (current - recommended):")
print("   config = {'backend': 'ollama'}")
print("   - Easy setup, efficient quantization")
print("   - Requires: ollama serve + ollama pull <model>")
print()
print("2. LOCAL (HuggingFace Transformers):")
print("   config = {'backend': 'local', 'model_path': './models/cache/medgemma'}")
print("   - Full control, works offline")
print("   - Requires: 16GB+ RAM, GPU recommended")
print()
print("3. API (not yet implemented):")
print("   config = {'backend': 'api'}")
print("   - Cloud-based inference")

Available backends:

1. OLLAMA (current - recommended):
   config = {'backend': 'ollama'}
   - Easy setup, efficient quantization
   - Requires: ollama serve + ollama pull <model>

2. LOCAL (HuggingFace Transformers):
   config = {'backend': 'local', 'model_path': './models/cache/medgemma'}
   - Full control, works offline
   - Requires: 16GB+ RAM, GPU recommended

3. API (not yet implemented):
   config = {'backend': 'api'}
   - Cloud-based inference


## 15. Cleanup

## 9. Helper: Send Image to Ollama

In [12]:
import aiohttp
import base64
import json
import time
from pathlib import Path

OLLAMA_HOST = "http://localhost:11434"
OLLAMA_MODEL = "medgemma-4b-local"

async def generate_with_image(
    prompt: str,
    image_path: str = None,
    image_bytes: bytes = None,
    model: str = OLLAMA_MODEL,
    max_tokens: int = 1024,
    temperature: float = 0.1,
    timeout_sec: int = 300
) -> dict:
    """
    Call Ollama /api/generate with an image (multimodal).
    
    Ollama accepts images as base64 strings in the 'images' array.
    Works with MedGemma 4B multimodal, llava, minicpm-v, etc.
    
    Args:
        prompt: Text prompt
        image_path: Path to image file (PNG, JPG, etc.)
        image_bytes: Raw image bytes (alternative to image_path)
        model: Ollama model name
        max_tokens: Max tokens to generate
        temperature: Sampling temperature
        timeout_sec: Request timeout
    
    Returns:
        dict with 'text', 'inference_time', 'prompt_tokens', 'generated_tokens'
    """
    # Encode image to base64
    if image_path:
        with open(image_path, 'rb') as f:
            img_b64 = base64.b64encode(f.read()).decode('utf-8')
        print(f"  Image: {Path(image_path).name} ({Path(image_path).stat().st_size / 1024:.0f} KB)")
    elif image_bytes:
        img_b64 = base64.b64encode(image_bytes).decode('utf-8')
        print(f"  Image: {len(image_bytes) / 1024:.0f} KB (from bytes)")
    else:
        raise ValueError("Provide image_path or image_bytes")
    
    payload = {
        "model": model,
        "prompt": prompt,
        "images": [img_b64],
        "stream": False,
        "options": {
            "num_predict": max_tokens,
            "temperature": temperature,
        }
    }
    
    start = time.time()
    timeout = aiohttp.ClientTimeout(total=timeout_sec)
    
    async with aiohttp.ClientSession(timeout=timeout) as session:
        async with session.post(f"{OLLAMA_HOST}/api/generate", json=payload) as resp:
            if resp.status != 200:
                error = await resp.text()
                raise RuntimeError(f"Ollama error ({resp.status}): {error}")
            data = await resp.json()
    
    elapsed = time.time() - start
    text = data.get('response', '')
    prompt_tokens = data.get('prompt_eval_count', 0)
    gen_tokens = data.get('eval_count', 0)
    eval_ns = data.get('eval_duration', 0)
    tps = gen_tokens / (eval_ns / 1e9) if eval_ns > 0 else 0
    
    return {
        'text': text.strip(),
        'inference_time': elapsed,
        'prompt_tokens': prompt_tokens,
        'generated_tokens': gen_tokens,
        'tokens_per_second': tps,
    }

print("generate_with_image() helper ready")
print(f"  Model: {OLLAMA_MODEL}")

generate_with_image() helper ready
  Model: medgemma-4b-local


## 10. Image Test — Prescription (PNG)

Test MedGemma's ability to read a prescription image and extract medication info.

In [13]:
# Test with a prescription image
rx_image = os.path.join(project_root, "data/samples/prescriptions/ColorRx-English-Logo-HTWT-Generics.png")

print("=" * 60)
print("TEST: Prescription Image -> Describe contents")
print("=" * 60)

prompt = """Look at this medical document image. Describe what you see:
1. What type of document is this?
2. What information can you read from it?
3. List any medications, dosages, or patient info visible."""

result = await generate_with_image(prompt, image_path=rx_image, max_tokens=512)

print(f"\nRESPONSE ({result['inference_time']:.1f}s, {result['tokens_per_second']:.1f} tok/s):")
print("-" * 60)
print(result['text'])
print("-" * 60)
print(f"Prompt tokens: {result['prompt_tokens']}, Generated: {result['generated_tokens']}")

TEST: Prescription Image -> Describe contents
  Image: ColorRx-English-Logo-HTWT-Generics.png (129 KB)


RuntimeError: Ollama error (500): {"error":"Failed to create new sequence: failed to process inputs: this model is missing data required for image input"}

## 11. Image Test — Lab Report (PNG)

Test MedGemma's ability to read a LabCorp result image and extract test values.

In [None]:
# Test with a lab report image (LabCorp positive result)
lab_image = os.path.join(project_root, "data/samples/labs/labcorp/labcorp-positive.png")

print("=" * 60)
print("TEST: Lab Report Image -> Extract test results as JSON")
print("=" * 60)

prompt = """Extract ALL test results from this lab report image as JSON.

Return JSON with this structure:
{"test_results": [{"name": "test name", "value": "result", "unit": "unit", "reference_range": "range", "abnormal_flag": "H/L/null"}]}

Rules:
- Extract EVERY test result visible in the image
- Include the exact values, units, and reference ranges shown
- Mark abnormal values with H (high) or L (low)
- Return ONLY valid JSON, no explanations"""

result = await generate_with_image(prompt, image_path=lab_image, max_tokens=1024)

print(f"\nRESPONSE ({result['inference_time']:.1f}s, {result['tokens_per_second']:.1f} tok/s):")
print("-" * 60)
print(result['text'][:2000])  # Truncate if very long
print("-" * 60)

# Try to parse JSON from response
from json_repair import repair_json
import re

raw = result['text']
json_match = re.search(r'(\{[\s\S]*\})', raw)
if json_match:
    try:
        parsed = json.loads(json_match.group(1))
        tests = parsed.get('test_results', [])
        print(f"\nParsed {len(tests)} test results:")
        for t in tests:
            flag = f" [{t.get('abnormal_flag')}]" if t.get('abnormal_flag') else ""
            print(f"  {t.get('name')}: {t.get('value')} {t.get('unit', '')}{flag}")
    except json.JSONDecodeError:
        try:
            repaired = repair_json(json_match.group(1))
            parsed = json.loads(repaired)
            tests = parsed.get('test_results', [])
            print(f"\nParsed {len(tests)} test results (after repair):")
            for t in tests:
                flag = f" [{t.get('abnormal_flag')}]" if t.get('abnormal_flag') else ""
                print(f"  {t.get('name')}: {t.get('value')} {t.get('unit', '')}{flag}")
        except:
            print("\nCould not parse JSON from response")
else:
    print("\nNo JSON found in response")

## 12. Image Test — PDF Page (LabCorp CBC)

Convert the first page of a PDF to an image, then send to MedGemma.
This tests the full vision pipeline: PDF -> image -> multimodal extraction.

In [None]:
# Convert PDF first page to image, then send to MedGemma
import pymupdf

pdf_path = os.path.join(project_root, "data/samples/labs/labcorp/SampleLabCorpReport.pdf")

# Convert first page to PNG
doc = pymupdf.open(pdf_path)
page = doc[0]
mat = pymupdf.Matrix(150/72, 150/72)  # 150 DPI
pix = page.get_pixmap(matrix=mat)
page_bytes = pix.tobytes("png")
doc.close()

print(f"PDF: {Path(pdf_path).name}")
print(f"Page 1 rendered: {pix.width}x{pix.height} px, {len(page_bytes)/1024:.0f} KB")
print()

# First test: describe the document
print("=" * 60)
print("TEST A: PDF Page -> Describe document type")
print("=" * 60)

prompt_describe = """What type of medical document is this? 
Describe what you see: the layout, sections, and type of data present."""

result = await generate_with_image(prompt_describe, image_bytes=page_bytes, max_tokens=256)

print(f"\nRESPONSE ({result['inference_time']:.1f}s):")
print("-" * 60)
print(result['text'])
print("-" * 60)

In [None]:
# Second test: structured extraction from same PDF page
print("=" * 60)
print("TEST B: PDF Page -> Extract ALL test results as JSON")
print("=" * 60)

prompt_extract = """Extract ALL test results from this lab report image as JSON.

Return JSON:
{"patient": {"name": "full name", "dob": "date of birth"},
 "test_results": [{"name": "test name", "value": "result", "unit": "unit", "reference_range": "range", "abnormal_flag": "H/L/null"}]}

CRITICAL: Extract EVERY single test result row visible. Include CBC, differentials, chemistry, everything.
Return ONLY valid JSON:"""

result = await generate_with_image(prompt_extract, image_bytes=page_bytes, max_tokens=2048)

print(f"\nRESPONSE ({result['inference_time']:.1f}s, {result['tokens_per_second']:.1f} tok/s):")
print("-" * 60)
print(result['text'][:3000])
print("-" * 60)

# Parse and summarize
raw = result['text']
json_match = re.search(r'(\{[\s\S]*\})', raw)
if json_match:
    try:
        parsed = json.loads(json_match.group(1))
    except json.JSONDecodeError:
        try:
            repaired = repair_json(json_match.group(1))
            parsed = json.loads(repaired)
        except:
            parsed = None
    
    if parsed:
        patient = parsed.get('patient', {})
        tests = parsed.get('test_results', [])
        print(f"\nPatient: {patient.get('name', 'N/A')}, DOB: {patient.get('dob', 'N/A')}")
        print(f"Total test results extracted: {len(tests)}")
        print()
        
        # Show all tests
        for t in tests:
            flag = f" [{t.get('abnormal_flag')}]" if t.get('abnormal_flag') else ""
            ref = f" (ref: {t.get('reference_range')})" if t.get('reference_range') else ""
            print(f"  {t.get('name')}: {t.get('value')} {t.get('unit', '')}{flag}{ref}")
        
        # Check for key CBC tests
        test_names_lower = [t.get('name', '').lower() for t in tests]
        cbc_expected = ['wbc', 'rbc', 'hemoglobin', 'hematocrit', 'platelet', 'neutrophil', 'lymphocyte', 'monocyte', 'eosinophil', 'basophil']
        found = [name for name in cbc_expected if any(name in tn for tn in test_names_lower)]
        missing = [name for name in cbc_expected if not any(name in tn for tn in test_names_lower)]
        print(f"\nCBC coverage: {len(found)}/{len(cbc_expected)}")
        if missing:
            print(f"  Missing: {missing}")
    else:
        print("\nCould not parse JSON")
else:
    print("\nNo JSON found in response")

## 13. Image Test — Handwritten Prescription (JPG)

The hardest test: can MedGemma read a handwritten prescription?
This is where multimodal models shine vs traditional OCR.

In [None]:
# Test with handwritten prescription
hw_image = os.path.join(project_root, "data/samples/prescriptions/Hassprescription-768x576.jpg")

print("=" * 60)
print("TEST: Handwritten Prescription -> Read and extract")
print("=" * 60)

prompt = """This is a handwritten medical prescription. Read it carefully and extract:

1. Patient name (if visible)
2. Doctor/prescriber name (if visible)
3. Each medication with dosage and instructions
4. Date (if visible)

Be specific about what you can and cannot read. If text is illegible, say so."""

result = await generate_with_image(prompt, image_path=hw_image, max_tokens=512)

print(f"\nRESPONSE ({result['inference_time']:.1f}s, {result['tokens_per_second']:.1f} tok/s):")
print("-" * 60)
print(result['text'])
print("-" * 60)

## 14. Image Test — Multi-Page PDF (All Pages)

Process each page of a multi-page PDF separately and merge results.
This simulates what the pipeline would do with chunked vision extraction.

In [None]:
# Process ALL pages of a multi-page lab report PDF
pdf_path = os.path.join(project_root, "data/samples/labs/labcorp/SampleLabCorpReport.pdf")

doc = pymupdf.open(pdf_path)
print(f"PDF: {Path(pdf_path).name} — {len(doc)} pages")
print("=" * 60)

all_tests = []
total_time = 0

prompt_page = """Extract ALL test results from this lab report page as JSON.

Return JSON:
{"test_results": [{"name": "test name", "value": "result", "unit": "unit", "reference_range": "range", "abnormal_flag": "H/L/null"}]}

Extract EVERY row with a test name and value. Return [] if no test results on this page.
Return ONLY valid JSON:"""

for page_num in range(len(doc)):
    page = doc[page_num]
    mat = pymupdf.Matrix(150/72, 150/72)
    pix = page.get_pixmap(matrix=mat)
    pg_bytes = pix.tobytes("png")
    
    print(f"\nPage {page_num + 1}/{len(doc)} ({len(pg_bytes)/1024:.0f} KB)...")
    
    try:
        result = await generate_with_image(prompt_page, image_bytes=pg_bytes, max_tokens=2048)
        total_time += result['inference_time']
        
        raw = result['text']
        json_match = re.search(r'(\{[\s\S]*\})', raw)
        if json_match:
            try:
                parsed = json.loads(json_match.group(1))
            except:
                try:
                    repaired = repair_json(json_match.group(1))
                    parsed = json.loads(repaired)
                except:
                    parsed = None
            
            if parsed:
                page_tests = parsed.get('test_results', [])
                # Filter tests with actual values
                page_tests = [t for t in page_tests if t.get('value') and str(t['value']).strip() not in ('', 'null', 'None')]
                all_tests.extend(page_tests)
                print(f"  -> {len(page_tests)} test results ({result['inference_time']:.1f}s)")
            else:
                print(f"  -> Could not parse JSON ({result['inference_time']:.1f}s)")
        else:
            print(f"  -> No JSON in response ({result['inference_time']:.1f}s)")
    except Exception as e:
        print(f"  -> Error: {e}")

doc.close()

# Deduplicate by test name (keep most complete)
tests_by_name = {}
for t in all_tests:
    key = t.get('name', '').lower().strip()
    if not key:
        continue
    score = sum(1 for f in ['value', 'unit', 'reference_range'] if t.get(f))
    if key not in tests_by_name or score > sum(1 for f in ['value', 'unit', 'reference_range'] if tests_by_name[key].get(f)):
        tests_by_name[key] = t

unique_tests = list(tests_by_name.values())

print("\n" + "=" * 60)
print(f"TOTAL: {len(unique_tests)} unique test results from {len(doc)} pages ({total_time:.1f}s total)")
print("=" * 60)
for t in unique_tests:
    flag = f" [{t.get('abnormal_flag')}]" if t.get('abnormal_flag') else ""
    print(f"  {t.get('name')}: {t.get('value')} {t.get('unit', '')}{flag}")

## 9. Cleanup

In [14]:
# Close the client session
await client.close()

print("Client closed. All tests completed!")

Client closed. All tests completed!


In [15]:
from transformers import pipeline
from PIL import Image
import requests
import torch

pipe = pipeline(
    "image-text-to-text",
    model="/Users/yaokouadio/Projects/STARTUP/universal-medical-ingestion-engine/models/medgemma-4b-it-Q4_K_M.gguf",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are an expert radiologist."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this X-ray"},
            {"type": "image", "image": image}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=200)
print(output[0]["generated_text"][-1]["content"])


HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/Users/yaokouadio/Projects/STARTUP/universal-medical-ingestion-engine/models/medgemma-4b-it-Q4_K_M.gguf'. Use `repo_type` argument if needed.