<a href="https://www.kaggle.com/code/shravankumar147/interior-or-exterior-classification-using-qwen2vl?scriptVersionId=219816823" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

In [12]:
import requests
from io import BytesIO
import re

In [2]:
# Load model and processor
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct",
    trust_remote_code=True,
    torch_dtype=torch.float16
).to("cuda").eval()
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True)

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.


config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/56.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/429M [00:00<?, ?B/s]

`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/272 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/4.19k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

In [4]:
def load_image_from_url(url):
    """Load image from URL using requests"""
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return Image.open(BytesIO(response.content)).convert("RGB")

In [13]:
# Example usage
def test_model(image_url, question):
    # Load and process image
    # image = Image.open(image_path)

    # Load image directly from URL
    image = load_image_from_url(image_url)
    
    # Create message template
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": question}
        ]
    }]
    
    # Process inputs
    text_prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(
        text=[text_prompt],
        images=[image],
        return_tensors="pt",
        padding=True
    ).to("cuda")

    # # Generate response
    # generated_ids = model.generate(**inputs, max_new_tokens=1024)
    # return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

    # Generate response
    generated_ids = model.generate(**inputs, max_new_tokens=10)
    full_response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
    # Extract only the classification
    match = re.search(r'\b(Interior|Exterior)\b', full_response, re.IGNORECASE)
    return match.group(0) if match else "Classification failed"

![image](https://www.shutterstock.com/image-photo/dark-car-interior-steering-wheel-600nw-2445598523.jpg)

In [14]:
# Example with online image
image_url = "https://www.shutterstock.com/image-photo/dark-car-interior-steering-wheel-600nw-2445598523.jpg"
question = "Classify image as Interior or Exterior, Just Interior or Exterior is enough, no further description is required"
print(test_model(image_url, question))

Interior


In [15]:
# After loading the model:
total_params = sum(p.numel() for p in model.parameters())
print(f"Parameters: {total_params / 1e9:.2f} B")          # ~2.00 B parameters
print(f"Memory: {total_params * 2 / (1024**3):.2f} GB")   # ~3.73 GB for float16

Parameters: 2.21 B
Memory: 4.11 GB


In [16]:
print(f"VRAM Used: {torch.cuda.memory_allocated() / (1024 ** 3):.2f} GB")

VRAM Used: 4.13 GB
