https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct \
https://github.com/QwenLM/Qwen2.5-VL

In [1]:
import os

In [2]:
# Set CUDA_VISIBLE_DEVICES to expose only device 0
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Set CUDA_VISIBLE_DEVICES to expose devices 0, 1, 2, and 3
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

In [3]:
import torch
import pandas as pd
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

  from .autonotebook import tqdm as notebook_tqdm


#### Reading the Pathology Test Data

In [4]:
questions = pd.read_csv("../pathology_test_data/questions.csv")
questions

Unnamed: 0,image_name,question,answer
0,ck_PTGC_2x.jpeg,How can you best describe the low-power patter...,The lymph node shows a mixed follicular and in...
1,ck_PTGC_2x.jpeg,What are three main differential diagnostic co...,"Based on the low-power pattern, the primary co..."
2,ck_PTGC_5x.jpeg,What is the morphologic alteration being depic...,"The image shows a central enlarged, somewhat i..."
3,ck_PTGC_5x.jpeg,What is the expected immunoarchitecture of the...,This image shows an enlarged secondary follicl...
4,ck_serositis_4x.jpg,What is the most common source for the change ...,"The changes here show extensive serositis, wit..."
5,ck_serositis_4x.jpg,What is the specific anatomic region shown in ...,The right half shows the muscularis propria an...
6,ck_steatohepatitis_100x.jpg,If these histologic changes included conspicuo...,There is fatty liver disease and if concurrent...
7,ck_steatohepatitis_100x.jpg,What would the primary histologic feature to s...,Wilson's disease has many non-specific finding...
8,ck_steatohepatitis_200x.jpg,In an overweight adolescent with mildly increa...,"In the background of steatosis, there is incre..."


In [5]:
images_path = os.path.join(os.getcwd(), "../pathology_test_data/images")
images_list = os.listdir(images_path)
images_list

['ck_steatohepatitis_100x.jpg',
 'ck_PTGC_2x.jpeg',
 'ck_serositis_4x.jpg',
 'ck_steatohepatitis_200x.jpg',
 'ck_PTGC_5x.jpeg']

#### Loading the Model Checkpoint and Image Processor

In [6]:
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

Loading checkpoint shards: 100%|██████████| 5/5 [00:04<00:00,  1.16it/s]


In [7]:
# default processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
processor.tokenizer.padding_side = "left"

The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.


You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.


#### Creating the message structure

In [8]:
messages = []
for index, row in questions.iterrows():
    image_path = os.path.join(images_path, row["image_name"])
    question = row["question"]
    
    message =[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": image_path,
                },
                {"type": "text", "text": question},
            ],
        }
    ]
    
    messages.append(message)

#### Batch Images Inference

In [9]:
# Preparation for inference
texts = [
    processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in messages
]

In [10]:
image_inputs, video_inputs = process_vision_info(messages)

In [11]:
inputs = processor(
    text=texts,
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to(model.device)

In [12]:
# Batch Inference
generated_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

In [13]:
output_texts = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)

In [14]:
output_texts

['The provided image appears to be a histological section stained with hematoxylin and eosin (H&E), which is commonly used for examining tissue samples under a microscope. The image shows two distinct areas, likely representing different sections or structures within a lymph node.\n\n### Key Observations:\n1. **Left Side**: This area appears to have a more compact, densely packed structure with less interstitial space. It could represent a region of dense connective tissue or a specific type of cell aggregation.\n2. **Right Side**: This area has a more open, less densely packed structure with visible spaces between the cellular elements. This could indicate a region with more lymphoid tissue, possibly representing a lymphoid follicle or a sinusoidal region where lymphocytes are found.\n\n### Interpretation:\n- The left side might represent a region of dense connective tissue or fibrosis.\n- The right side could represent a lymphoid region, such as a lymphoid follicle or a sinusoidal re

#### Storing the output Responses

In [15]:
os.makedirs("./responses", exist_ok=True)
questions['qwen_25_vl_response'] = output_texts
questions.to_csv("./responses/qwen_25_vl_responses.csv", index=False)