# HealthVest AI - Lab Report Analyzer

**MedGemma Impact Challenge Submission**

An AI-powered lab report analyzer that helps Indian patients understand their blood test results in plain English.

## Problem
- Patients struggle to understand medical jargon in lab reports
- Reference ranges are confusing without context
- No easy way to track health trends over time

## Solution
Upload blood test report â†’ Get plain English explanations for each value

## Key Features
- Uses MedGemma 1.5 4B - Google's medical AI model
- Extracts structured data from lab report images
- Generates patient-friendly explanations
- Supports Indian lab formats (Thyrocare, SRL, Dr. Lal PathLabs, Metropolis)

In [None]:
# Install dependencies (fix protobuf version conflict)
!pip install -q transformers>=4.50.0 accelerate pillow pdf2image
!pip install -q protobuf>=3.20

import warnings
warnings.filterwarnings('ignore')

In [None]:
import torch
import json
from PIL import Image
from transformers import pipeline
import os

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Load MedGemma Model

Using MedGemma 1.5 4B - Google's open-source medical AI model.

In [None]:
# Model configuration - MedGemma 1.5 4B (latest version)
MODEL_ID = "google/medgemma-1.5-4b-it"

# Get HF token from Kaggle secrets
# IMPORTANT: In Kaggle, you must:
# 1. Add secret: Settings (right panel) > Secrets > Add Secret > Name: "HF_TOKEN"
# 2. ATTACH the secret: Toggle ON next to your HF_TOKEN secret
# 3. Accept model license at: https://huggingface.co/google/medgemma-1.5-4b-it

HF_TOKEN = None

# Method 1: Kaggle Secrets (preferred)
try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    HF_TOKEN = user_secrets.get_secret("HF_TOKEN")
    print("HF_TOKEN loaded from Kaggle Secrets")
except Exception as e:
    print(f"Kaggle secrets error: {e}")

# Method 2: Environment variable fallback
if not HF_TOKEN:
    HF_TOKEN = os.environ.get('HF_TOKEN', None)
    if HF_TOKEN:
        print("HF_TOKEN loaded from environment")

# Method 3: Direct input (for testing only)
if not HF_TOKEN:
    print("ERROR: HF_TOKEN not found!")
    print("\nTo fix this on Kaggle:")
    print("1. Right panel > Secrets > Add Secret")
    print("2. Name: HF_TOKEN, Value: your_token")
    print("3. TOGGLE ON the secret to attach it to notebook")
    print("4. Accept license: https://huggingface.co/google/medgemma-1.5-4b-it")
else:
    # Verify token works
    print(f"Token starts with: {HF_TOKEN[:10]}...")

In [None]:
# Load MedGemma using pipeline (recommended approach)
print("Loading MedGemma model (this takes 2-3 minutes on GPU)...")

pipe = pipeline(
    "image-text-to-text",
    model=MODEL_ID,
    token=HF_TOKEN,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

print("MedGemma loaded successfully!")

## Extraction Prompt

Carefully crafted prompt for extracting lab values from Indian lab report formats.

In [None]:
EXTRACTION_PROMPT = """You are a medical lab report analyzer. Extract all test values from this lab report image.

For each test, provide:
- test_name: Name of the test (e.g., "Hemoglobin", "Fasting Blood Sugar", "TSH")
- value: Numeric value as shown
- unit: Unit of measurement (e.g., "g/dL", "mg/dL", "mIU/L")
- reference_range: Normal range as shown on report
- status: "normal", "high", or "low" based on reference range

Return ONLY a JSON array. Example:
[
  {"test_name": "Hemoglobin", "value": 14.2, "unit": "g/dL", "reference_range": "13.0-17.0", "status": "normal"}
]

Extract ALL tests visible. Use exact values. Handle Indian lab formats (Thyrocare, SRL, Dr. Lal PathLabs).
"""

EXPLANATION_PROMPT = """You are a friendly medical educator. Explain this lab value simply:

Test: {test_name}
Value: {value} {unit}
Normal Range: {reference_range}
Status: {status}

In under 80 words, explain:
1. What this test measures
2. What your result means
3. One actionable tip (if needed)

Use simple language. Never diagnose - suggest discussing with doctor if abnormal.
"""

## Core Functions

In [None]:
def extract_lab_values(image: Image.Image) -> list:
    """Extract lab values from a lab report image using MedGemma."""
    
    # Resize if too large
    max_size = 1024
    if max(image.size) > max_size:
        ratio = max_size / max(image.size)
        new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
        image = image.resize(new_size, Image.Resampling.LANCZOS)
    
    # Prepare message format for pipeline
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": EXTRACTION_PROMPT}
            ]
        }
    ]
    
    # Generate with pipeline
    output = pipe(messages, max_new_tokens=2048)
    response = output[0]["generated_text"][-1]["content"]
    
    # Parse JSON from response
    try:
        start = response.find('[')
        end = response.rfind(']') + 1
        if start != -1 and end > start:
            return json.loads(response[start:end])
    except json.JSONDecodeError as e:
        print(f"JSON parsing error: {e}")
        print(f"Raw response: {response[:500]}")
    
    return []


def explain_lab_value(test_name: str, value: float, unit: str, 
                      reference_range: str, status: str) -> str:
    """Generate plain English explanation for a lab value."""
    
    prompt = EXPLANATION_PROMPT.format(
        test_name=test_name,
        value=value,
        unit=unit,
        reference_range=reference_range,
        status=status
    )
    
    # Text-only query
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt}
            ]
        }
    ]
    
    output = pipe(messages, max_new_tokens=200)
    response = output[0]["generated_text"][-1]["content"]
    
    return response


def analyze_report(image: Image.Image) -> dict:
    """Full analysis: extract values + generate explanations."""
    
    print("Extracting lab values...")
    lab_values = extract_lab_values(image)
    print(f"Found {len(lab_values)} tests")
    
    results = []
    for i, val in enumerate(lab_values):
        print(f"Explaining {i+1}/{len(lab_values)}: {val.get('test_name', 'Unknown')}...")
        
        explanation = explain_lab_value(
            val.get('test_name', ''),
            val.get('value', 0),
            val.get('unit', ''),
            val.get('reference_range', 'N/A'),
            val.get('status', 'normal')
        )
        
        results.append({
            **val,
            'explanation': explanation
        })
    
    return {
        'total_tests': len(results),
        'normal': sum(1 for r in results if r.get('status') == 'normal'),
        'abnormal': sum(1 for r in results if r.get('status') in ['high', 'low']),
        'results': results
    }

## Demo: Text-Only Explanations

Test MedGemma's explanation capability with sample lab values (no image needed).

In [None]:
# Demo: Explain sample lab values without needing an image
sample_tests = [
    {"test_name": "Hemoglobin", "value": 11.5, "unit": "g/dL", "reference_range": "13.0-17.0", "status": "low"},
    {"test_name": "Fasting Blood Sugar", "value": 126, "unit": "mg/dL", "reference_range": "70-100", "status": "high"},
    {"test_name": "TSH", "value": 2.5, "unit": "mIU/L", "reference_range": "0.4-4.0", "status": "normal"},
]

print("Demo: Generating explanations for sample lab values\n")
print("="*60)

for test in sample_tests:
    print(f"\n{test['test_name']}: {test['value']} {test['unit']} ({test['status'].upper()})")
    print("-"*40)
    
    explanation = explain_lab_value(
        test['test_name'],
        test['value'],
        test['unit'],
        test['reference_range'],
        test['status']
    )
    print(explanation)
    print()

In [None]:
# Option 1: Upload a file using Kaggle's file browser
# Click "Add Input" in the right panel > Upload > Select your lab report image/PDF

# Option 2: Use a sample from Kaggle datasets
# from kaggle_datasets import KaggleDatasets

# List uploaded files
import glob
uploaded_files = glob.glob('/kaggle/input/**/*.*', recursive=True)
print("Available input files:")
for f in uploaded_files[:10]:
    print(f"  {f}")

# Load your lab report image
# Change this path to your uploaded file
IMAGE_PATH = "/kaggle/input/your-lab-report.jpg"  # Update this path

if os.path.exists(IMAGE_PATH):
    image = Image.open(IMAGE_PATH).convert('RGB')
    print(f"Loaded image: {IMAGE_PATH}")
    print(f"Image size: {image.size}")
else:
    print(f"File not found: {IMAGE_PATH}")
    print("Upload a lab report using 'Add Input' in the right panel")

In [None]:
# Run analysis
results = analyze_report(image)

print("\n" + "="*60)
print("ANALYSIS COMPLETE")
print("="*60)
print(f"Total tests: {results['total_tests']}")
print(f"Normal: {results['normal']}")
print(f"Abnormal: {results['abnormal']}")

In [None]:
# Display results with nice formatting
from IPython.display import HTML, display

def display_results(results):
    """Display analysis results with nice HTML formatting."""
    html = "<div style='font-family: Arial, sans-serif;'>"
    
    for r in results['results']:
        status = r.get('status', 'normal')
        color = '#28a745' if status == 'normal' else '#dc3545' if status == 'high' else '#ffc107'
        badge = 'Normal' if status == 'normal' else 'High' if status == 'high' else 'Low'
        
        html += f"""
        <div style='border: 1px solid #ddd; border-left: 4px solid {color}; 
                    padding: 15px; margin: 10px 0; border-radius: 4px;'>
            <div style='display: flex; justify-content: space-between; align-items: center;'>
                <h3 style='margin: 0; color: #333;'>{r.get('test_name', 'Unknown')}</h3>
                <span style='background: {color}; color: white; padding: 4px 12px; 
                             border-radius: 20px; font-size: 12px;'>{badge}</span>
            </div>
            <p style='font-size: 24px; margin: 10px 0; color: #333;'>
                <strong>{r.get('value', 'N/A')}</strong> 
                <span style='font-size: 14px; color: #666;'>{r.get('unit', '')}</span>
            </p>
            <p style='color: #666; font-size: 13px; margin: 5px 0;'>
                Reference: {r.get('reference_range', 'N/A')}
            </p>
            <hr style='border: none; border-top: 1px solid #eee; margin: 10px 0;'>
            <p style='color: #444; line-height: 1.5;'>{r.get('explanation', 'No explanation available.')}</p>
        </div>
        """
    
    html += "</div>"
    display(HTML(html))

# Display results if available
if 'results' in dir() and results:
    display_results(results)

## Impact & Summary

### Problem We're Solving
In India, millions of patients receive lab reports filled with medical jargon, confusing reference ranges, and numbers that mean nothing to them. This creates anxiety and prevents patients from taking proactive steps to improve their health.

### How MedGemma Helps
MedGemma 1.5 enables us to:
1. **Extract** structured data from lab report images (OCR + understanding)
2. **Interpret** values by comparing to reference ranges
3. **Explain** results in simple, actionable language

### Real-World Impact
- **Accessibility**: Patients can understand their own health data
- **Empowerment**: Informed patients make better health decisions
- **Healthcare efficiency**: Doctors spend less time explaining basics
- **Early intervention**: Patients notice abnormalities sooner

### Technical Highlights
- Uses MedGemma 1.5 4B instruction-tuned model
- Handles multimodal input (image + text)
- Trained on medical knowledge for accurate health information
- Generates patient-friendly explanations

### Future Roadmap
- Mobile app for instant report scanning
- Trend tracking across multiple reports
- Regional language support (Hindi, Tamil, etc.)
- Integration with hospital systems

In [None]:
# Save results to JSON
with open('analysis_results.json', 'w') as f:
    json.dump(results, f, indent=2)
print("Results saved to analysis_results.json")