In [1]:
!pip install --upgrade --force-reinstall Pillow

Collecting Pillow
  Using cached pillow-12.1.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (8.8 kB)
Using cached pillow-12.1.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.0 MB)
Installing collected packages: Pillow
  Attempting uninstall: Pillow
    Found existing installation: pillow 12.1.0
    Uninstalling pillow-12.1.0:
      Successfully uninstalled pillow-12.1.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradio 5.50.0 requires pillow<12.0,>=8.0, but you have pillow 12.1.0 which is incompatible.[0m[31m
[0mSuccessfully installed Pillow-12.1.0


# DocGemma Connect - Colab Demo

Medical AI assistant powered by MedGemma 4B with structured generation.

**Requirements:**
- GPU runtime (T4 or better)
- HuggingFace account with MedGemma access
- GitHub Personal Access Token (for private repo)

**Setup:**
1. Go to **Runtime > Change runtime type > T4 GPU**
2. Click the **key icon** in the left sidebar to open Secrets
3. Add `GITHUB_TOKEN` with your GitHub PAT
4. Add `HF_TOKEN` with your HuggingFace token (optional, for auto-login)

## 1. Install DocGemma from GitHub (Private Repo)

Since this is a private repository, you need a GitHub Personal Access Token (PAT).

**Create a PAT:** GitHub → Settings → Developer settings → Personal access tokens → Generate new token (classic) → Select `repo` scope

In [2]:
# Option 1: Use Colab Secrets (recommended)
# Add your GitHub PAT as a secret named "GITHUB_TOKEN" in the Colab sidebar (key icon)
from google.colab import userdata
GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
!pip install -q git+https://{GITHUB_TOKEN}@github.com/galinilin/docgemma-connect.git

# Option 2: Direct token (less secure - token visible in notebook)
# !pip install -q git+https://ghp_yourtoken@github.com/galinilin/docgemma-connect.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## 2. Authenticate with HuggingFace

MedGemma requires approval. Request access at: https://huggingface.co/google/medgemma-1.5-4b-it

In [3]:
from huggingface_hub import login
from google.colab import userdata

# Option 1: Use Colab Secrets (if you added HF_TOKEN)
try:
    HF_TOKEN = userdata.get('HF_TOKEN')
    login(token=HF_TOKEN)
    print("Logged in via Colab secret")
except:
    # Option 2: Interactive login (will prompt for token)
    login()

Logged in via Colab secret


## 3. Load the Model

In [4]:
from docgemma import DocGemma

# Initialize and load the model
model = DocGemma()
model.load()

print(f"Model loaded on: {model.device}")
print(f"Using dtype: {model.dtype}")

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json:   0%|          | 0.00/90.6k [00:00<?, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/883 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

Model loaded on: cuda
Using dtype: torch.bfloat16


## 4. Emergency Classification

Classify if a medical query is an emergency using constrained generation.

In [5]:
# Test emergency classification
test_cases = [
    "I have a mild headache",
    "I'm experiencing severe chest pain and difficulty breathing",
    "My child has a fever of 38C",
    "I think I'm having a stroke, my face feels numb",
]

print("Emergency Classification Results:")
print("-" * 50)
for case in test_cases:
    result = model.classify_emergency(case)
    print(f"Input: {case}")
    print(f"Result: {result}")
    print()

Emergency Classification Results:
--------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: I have a mild headache
Result: non_emergency



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: I'm experiencing severe chest pain and difficulty breathing
Result: emergency



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: My child has a fever of 38C
Result: non_emergency



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: I think I'm having a stroke, my face feels numb
Result: emergency



## 5. User Type Classification

Detect if the user is a patient or medical expert based on their language.

In [6]:
# Test user type classification
test_cases = [
    "My tummy hurts after eating",
    "Patient presents with acute epigastric pain, possible peptic ulcer",
    "What does my blood test result mean?",
    "Differential diagnosis for elevated troponin with normal ECG?",
]

print("User Type Classification Results:")
print("-" * 50)
for case in test_cases:
    result = model.classify_user_type(case)
    print(f"Input: {case}")
    print(f"Result: {result}")
    print()

User Type Classification Results:
--------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: My tummy hurts after eating
Result: patient



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: Patient presents with acute epigastric pain, possible peptic ulcer
Result: patient



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: What does my blood test result mean?
Result: patient



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Input: Differential diagnosis for elevated troponin with normal ECG?
Result: expert



## 6. Free-form Generation

Generate medical responses using the base model.

In [7]:
# Generate a medical response
prompt = "What are the common causes of headaches and when should I see a doctor?"

response = model.generate(prompt, max_new_tokens=256)
print(f"Prompt: {prompt}")
print(f"\nResponse:\n{response}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Prompt: What are the common causes of headaches and when should I see a doctor?

Response:
Headaches are incredibly common, and while most are harmless, they can sometimes indicate an underlying medical issue. Here's a breakdown of common causes and when it's time to seek medical attention:

**Common Causes of Headaches:**

* **Tension Headaches:**
    * **Description:** The most frequent type. Often described as a constant ache or pressure around the head, especially at the temples or back of the head and neck. Can feel like a tight band around the head.
    * **Causes:** Stress, anxiety, poor posture, fatigue, muscle tension in the neck and shoulders, eye strain, dehydration, skipping meals.
    * **Triggers:** Work stress, arguments, lack of sleep, long hours spent looking at screens.

* **Migraines:**
    * **Description:** Moderate to severe, often one-sided headaches. Can be throbbing or pulsating. Often accompanied by other symptoms like nausea, vomiting, and extreme sensitivity

## 7. Structured Generation with Pydantic

Generate responses that conform to a specific schema using Outlines.

In [9]:
from pydantic import BaseModel
from typing import Literal

# Define a structured output schema
class TriageResult(BaseModel):
    urgency: Literal["low", "medium", "high", "critical"]
    category: str
    recommendation: str

# Generate structured output
prompt = """Triage the following patient complaint and provide a structured assessment.

Patient complaint: I've had a persistent cough for 2 weeks with some yellow mucus.

Provide your assessment:"""

result = model.generate_outlines(prompt, TriageResult, max_new_tokens=1024)
result = TriageResult.model_validate_json(result)
print(f"Urgency: {result.urgency}")
print(f"Category: {result.category}")
print(f"Recommendation: {result.recommendation}")

Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


Urgency: low
Category: respiratory
Recommendation: Monitor symptoms and consider seeking medical attention if symptoms worsen or persist beyond 3-4 weeks.


## 8. Combined Triage Pipeline Example

A simple example of the decision-tree approach.

In [10]:
def triage_query(model, query: str) -> dict:
    """Simple triage pipeline using DocGemma's classification."""

    # Step 1: Check for emergency
    emergency_status = model.classify_emergency(query)

    if emergency_status == "emergency":
        return {
            "status": "EMERGENCY",
            "action": "Call emergency services immediately (911)",
            "query": query
        }

    # Step 2: Classify user type
    user_type = model.classify_user_type(query)

    # Step 3: Generate appropriate response
    if user_type == "patient":
        system_context = "Respond in simple, easy-to-understand language for a patient."
    else:
        system_context = "Respond with clinical terminology appropriate for a medical professional."

    full_prompt = f"{system_context}\n\nQuery: {query}"
    response = model.generate(full_prompt, max_new_tokens=200)

    return {
        "status": "non_emergency",
        "user_type": user_type,
        "response": response,
        "query": query
    }

# Test the pipeline
queries = [
    "I have sudden severe chest pain radiating to my arm",
    "What's the best way to treat a common cold?",
    "Recommended prophylaxis for DVT in post-operative patients?",
]

for query in queries:
    print("=" * 60)
    result = triage_query(model, query)
    for key, value in result.items():
        print(f"{key}: {value}")
    print()



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


status: EMERGENCY
action: Call emergency services immediately (911)
query: I have sudden severe chest pain radiating to my arm



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


status: non_emergency
user_type: patient
response: <unused94>thought
Here's a thinking process for generating the simple explanation about treating a common cold:

1.  **Identify the Target Audience:** The request specifies "simple, easy-to-understand language for a patient." This means avoiding medical jargon, using clear and concise sentences, and focusing on practical, actionable advice.

2.  **Understand the Core Question:** The patient wants to know the "best way to treat a common cold."

3.  **Initial Brainstorming - What *is* a common cold?**
    *   It's a viral infection.
    *   It affects the nose, throat, and sometimes lungs.
    *   Symptoms: runny/stuffy nose, sore throat, cough, sneezing, maybe a low fever, aches.
    *   It's usually mild and goes away on its own.
    *   It's *not* caused by bacteria (so antibiotics don't work).
query: What's the best way to treat a common cold?



Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:1 for open-end generation.


status: non_emergency
user_type: expert
response: The recommended prophylaxis for deep vein thrombosis (DVT) in post-operative patients typically involves the use of pharmacologic agents. The choice of agent and dosage depends on the patient's risk factors, the type and duration of surgery, and institutional protocols.

Commonly used pharmacologic agents include:

*   **Low-molecular-weight heparin (LMWH):** Such as enoxaparin or dalteparin. LMWH is often preferred due to its ease of administration (subcutaneous injection) and predictable pharmacokinetics. The dose is typically based on weight and adjusted for renal function.
*   **Unfractionated heparin (UFH):** Requires intravenous administration and frequent monitoring with activated partial thromboplastin time (aPTT).
*   **Fondaparinux:** An indirect factor Xa inhibitor administered subcutaneously, similar to LMWH.
*   **Dalteparin:** A low-molecular-weight heparin.
*   **Warfarin
query: Recommended prophylaxis for DVT in post-ope