# MedGemma 4B Multimodal Demo  
**Course:** AI in Healthcare — Multimodal Medical Chatbots  

This notebook shows how to load Google’s **MedGemma** 4B model from the Hugging Face Hub and run multimodal inference on medical images as well as text‑only prompts.

> **Educational use only – not for clinical decision‑making.**


Copyright 2025 Google LLC — Licensed under the Apache 2.0 License.  
See <https://www.apache.org/licenses/LICENSE-2.0> for details.

The authors make **no warranty** about the model outputs. Always verify conclusions with qualified healthcare professionals.


In [None]:
# Install core libraries (takes ~1 min)
!pip install --upgrade --quiet accelerate bitsandbytes transformers pillow

## 1 · Authenticate with Hugging Face  

1. [Create a free Hugging Face account](https://huggingface.co).
2. Accept the *MedGemma* usage terms on its [model page](https://huggingface.co/google/medgemma-4b-it).
3. Generate a **read** access token (`Settings → Access Tokens`).  

### Colab users  
*Open the left‑side **🔑 Secrets** panel → create secret **HF_TOKEN** with your token value.*

### Local Jupyter users  
The cell below will prompt you for the token on first run.


In [None]:
import os, sys

google_colab = "google.colab" in sys.modules and not os.environ.get("VERTEX_PRODUCT")

if google_colab:
    # Colab: retrieve stored secret
    from google.colab import userdata
    os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")
else:
    # Local / Colab Enterprise: interactive login if needed
    from huggingface_hub import get_token, notebook_login
    if get_token() is None:
        notebook_login()


## 2 · Load the MedGemma 4B model

In [None]:
from transformers import pipeline, BitsAndBytesConfig
import torch

model_variant = "4b-it"
model_id = f"google/medgemma-{model_variant}"

model_kwargs = dict(
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),  # fits on T4 GPU
)

pipe = pipeline("image-text-to-text", model=model_id, model_kwargs=model_kwargs)
pipe.model.generation_config.do_sample = False

print("Model loaded on", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU")


## 3 · Example 1 — Describe a chest X‑ray

In [None]:
from PIL import Image
import requests, os
from IPython.display import display, Markdown

# Download sample image
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image_path = "sample_xray.png"
if not os.path.exists(image_path):
    Image.open(requests.get(image_url, stream=True).raw).save(image_path)

prompt = "Describe this chest X‑ray"

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are an expert radiologist."}]},
    {"role": "user",   "content": [{"type": "text", "text": prompt},
                                   {"type": "image", "image": Image.open(image_path)}]}
]

output = pipe(messages, max_new_tokens=300)
response = output[0]["generated_text"][-1]["content"]

display(Markdown(f"**Prompt:** {prompt}"))
display(Image.open(image_path))
display(Markdown(f"**MedGemma:** {response}"))


## 4 · Example 2 — Text‑only medical question

In [None]:
prompt = "List key differences between bacterial and viral pneumonia."

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful medical assistant."}]},
    {"role": "user",   "content": [{"type": "text", "text": prompt}]}
]

output = pipe(messages, max_new_tokens=500)
response = output[0]["generated_text"][-1]["content"]

from IPython.display import Markdown
Markdown(f"**MedGemma:**\n\n{response}")


## 5 · Your Turn  

1. Replace `image_url` with a pathology or dermatology image and ask for findings.  
2. Try different prompts such as *“Suggest next diagnostic steps”* or *“Summarize key abnormal findings.”*  
3. Adjust `max_new_tokens` to control answer length.  

Feel free to experiment — and remember: **never rely solely on model output for clinical decisions**.
