https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4 \
https://huggingface.co/google/medgemma-4b-it

In [1]:
import os
import sys

In [2]:
# Set CUDA_VISIBLE_DEVICES to expose only device 0
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Set CUDA_VISIBLE_DEVICES to expose devices 0, 1, 2, and 3
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

In [3]:
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
model_id = "google/medgemma-4b-it"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00,  3.71s/it]


In [5]:
processor = AutoProcessor.from_pretrained(model_id)

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


#### Single Image Inference

In [6]:
# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

In [7]:
messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are an expert radiologist."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this X-ray"},
            {"type": "image", "image": image}
        ]
    }
]

In [8]:
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

In [9]:
input_len = inputs["input_ids"].shape[-1]

In [10]:
with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=200, do_sample=False)
    generation = generation[0][input_len:]

In [11]:
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

Okay, based on the provided chest X-ray image, here's a description:

**Overall Impression:**

The image shows a normal chest X-ray. The heart size appears within normal limits, and the lungs are clear bilaterally. There are no obvious signs of acute pathology.

**Specific Findings:**

*   **Heart:** The heart size is within normal limits. The cardiothoracic ratio (the ratio of the heart's width to the chest's width) appears to be less than 0.5, which is generally considered normal.
*   **Lungs:** The lungs are clear bilaterally. There are no visible consolidations, effusions, or pneumothoraces. The lung markings are normal.
*   **Mediastinum:** The mediastinum (the space between the lungs containing the heart, great vessels, trachea, and esophagus) appears normal in width and alignment.
*   **Bones:** The ribs, clavicles, and


#### Batch Inference

In [12]:
message1 = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "./pathology_images/ck_PTGC_2x.jpeg"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

message2 = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "./pathology_images/ck_PTGC_5x.jpeg"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

messages = [message1, message2]

In [13]:
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

In [14]:
with torch.inference_mode():
    generated_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
    generated_ids_trimmed = [
        out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
    ]

In [15]:
output_texts = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)

In [16]:
print(output_texts[0])

The image shows a microscopic view of a tissue sample, likely from a biological specimen. Here's a detailed description:

*   **Overall Appearance:** The image presents a relatively uniform, homogenous tissue structure. The color is a pale, almost translucent purple, suggesting a relatively low cellular density or a specific staining technique.

*   **Cellular Structure:** The image reveals a complex network of cells and their associated structures. The cells appear to be relatively small and compact, with distinct boundaries.

*   **Tissue Architecture:** The tissue appears to be organized into a three-dimensional structure, with cells arranged in a specific pattern. The arrangement of cells and their connections suggest a specific tissue type, although without further context, it's difficult to identify the exact tissue.

*   **Details:** The image shows a variety of cellular features, including nuclei, cytoplasm, and extracellular matrix. The presence of these features indicates tha

In [17]:
print(output_texts[1])

The image shows a microscopic view of a tissue sample. It appears to be a cross-section of a biological structure, possibly a lymph node or a similar organ.

Here's a more detailed breakdown:

*   **Color:** The image is predominantly a shade of purple, likely due to the staining process used to visualize the tissue.
*   **Texture:** The tissue appears to be a complex network of interconnected structures. There are areas of denser tissue and areas of more open space.
*   **Shape:** The overall shape is irregular and somewhat amorphous, suggesting a non-uniform structure.
*   **Details:** While the resolution is limited by the microscopic nature of the image, it's possible to discern some cellular structures and possibly some blood vessels or lymphatic vessels.
*   **Scale:** The scale bar indicates that the image is approximately 200 micrometers in size.

Overall, the image provides a glimpse into the intricate architecture of a biological tissue. It's a valuable tool for researchers s