# <center>Critical AI</center>
<center>ENGL 54.41</center>
<center>Dartmouth College</center>
<center>Winter 2026</center>
<pre>Created: 02/20/2026

In [2]:
!pip install qwen-vl-utils > /dev/null 2>&1
import torch
import os
from glob import glob
from transformers import AutoProcessor, AutoConfig, Qwen2VLForConditionalGeneration
from transformers.models.qwen2_vl.modeling_qwen2_vl import Qwen2VisionTransformerPretrainedModel
from qwen_vl_utils import process_vision_info

In [3]:
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
    device = torch.device('mps')
elif torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
print('Using device: {0}'.format(device))

Using device: mps


In [4]:
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
pretrained = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct",
                                                             device_map = "auto")

The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 


Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/729 [00:00<?, ?it/s]

In [8]:
# resizes images to save memory and speed up inference
min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28

def format_message(image, prompt = "", min_pixels = min_pixels, max_pixels = max_pixels):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": image,
                    "max_pixels": max_pixels,
                    "min_pixels": min_pixels,
                    "resized_height": 224,
                    "resized_width": 224
                },
                {"type": "text", "text": prompt},
            ],
        }
    ]
    return messages

## Multi-modal Generation

This function will taken an image and a prompt and generate text. We can use this to produce captions, to extract information, to supplement information the either the image or text space within the model. Despite the default of "Describe this image," experimental work reveals that this model has been fine-tuned for description. Non-sensical prompts will likely result in the production of a description. What does it mean to have description as a default mode for these multi-modal models?

In [5]:
def get_response(image,prompt="Describe this image"):
    text = processor.apply_chat_template(format_message(image,prompt=prompt), tokenize = False, add_generation_prompt = True)
    image_inputs, video_inputs = process_vision_info(format_message(image))
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt")
    inputs = inputs.to(device)
    with torch.no_grad():
        outputs = pretrained.generate(**inputs, max_new_tokens=256)
    generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, outputs)
    ]
    response = processor.batch_decode(generated_ids_trimmed, skip_special_tokens = True, clean_up_tokenization_spaces = False)
    return response

In [6]:
# download sample images
!wget -O fysep1440.jpg 'https://media.licdn.com/dms/image/v2/D4E22AQHn991ixEZcBA/feedshare-shrink_800/B4EZv_v4EcIMAg-/0/1769522300025?e=2147483647&v=beta&t=FdcWH2LO9koVo-_kB7rcA0PDT3hmmkuXgOxwQUSStWs'

--2026-02-27 10:53:38--  https://media.licdn.com/dms/image/v2/D4E22AQHn991ixEZcBA/feedshare-shrink_800/B4EZv_v4EcIMAg-/0/1769522300025?e=2147483647&v=beta&t=FdcWH2LO9koVo-_kB7rcA0PDT3hmmkuXgOxwQUSStWs
Resolving media.licdn.com (media.licdn.com)... 146.75.31.42
Connecting to media.licdn.com (media.licdn.com)|146.75.31.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 131801 (129K) [image/jpeg]
Saving to: ‘fysep1440.jpg’


2026-02-27 10:53:38 (3.74 MB/s) - ‘fysep1440.jpg’ saved [131801/131801]



In [9]:
get_response("fysep1440.jpg","What are some possible careers for these students?")

['The students in the picture are likely to pursue careers in engineering and computer science. Some possible career paths include:\n\n1. **Engineers**: They can work as engineers in various industries such as aerospace, automotive, electronics, and manufacturing.\n2. **Computer Scientists**: They can work as computer scientists in research and development, software engineering, and data science.\n3. **Software Developers**: They can work as software developers in various sectors such as finance, healthcare, and technology.\n4. **Data Scientists**: They can work as data scientists in industries such as finance, healthcare, and technology, analyzing and interpreting large datasets.\n5. **Project Managers**: They can work as project managers in engineering and computer science projects, overseeing the development and implementation of projects.\n6. **Consultants**: They can work as consultants in industries such as technology, healthcare, and finance, providing expertise and advice to cl

In [11]:
!wget -O bridge.jpg 'https://sites.dartmouth.edu/literaryartsbridge/files/2025/08/DSC_0029-1120x746.jpg'

--2026-02-27 10:54:29--  https://sites.dartmouth.edu/literaryartsbridge/files/2025/08/DSC_0029-1120x746.jpg
Resolving sites.dartmouth.edu (sites.dartmouth.edu)... 35.172.73.102, 184.72.224.80, 3.91.109.122, ...
Connecting to sites.dartmouth.edu (sites.dartmouth.edu)|35.172.73.102|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/e/2665/files/2025/08/DSC_0029-1120x746.jpg [following]
--2026-02-27 10:54:29--  https://bpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/e/2665/files/2025/08/DSC_0029-1120x746.jpg
Resolving bpb-us-e1.wpmucdn.com (bpb-us-e1.wpmucdn.com)... 109.61.86.193
Connecting to bpb-us-e1.wpmucdn.com (bpb-us-e1.wpmucdn.com)|109.61.86.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 116176 (113K) [image/jpeg]
Saving to: ‘bridge.jpg’


2026-02-27 10:54:29 (2.45 MB/s) - ‘bridge.jpg’ saved [116176/116176]



In [12]:
get_response("bridge.jpg","What are some possible careers for these students?")

["It's not possible to determine specific careers for the students based on the image alone. However, based on their interests and the context of the image, some possible careers could include:\n\n1. **Entrepreneurship**: The students may have a passion for starting their own businesses or working in the startup industry.\n2. **Education**: They could be interested in teaching or working in education, either as a teacher or in a related field.\n3. **Technology**: They might have a strong interest in technology, possibly working in software development, data analysis, or cybersecurity.\n4. **Healthcare**: They could be interested in healthcare, either as a doctor or a healthcare professional.\n5. **Environmental Science**: They might have a passion for environmental conservation, working in environmental organizations or research institutions.\n6. **Social Work**: They could be interested in social work, working in community services or helping people in need.\n7. **Arts**: They might h

In [13]:
!wget -O rauner.jpg 'https://admissions.dartmouth.edu/sites/admissions.prod/files/admissions/content/program-area/areastudy_805x550_humraunerlib.jpg'

--2026-02-27 10:54:47--  https://admissions.dartmouth.edu/sites/admissions.prod/files/admissions/content/program-area/areastudy_805x550_humraunerlib.jpg
Resolving admissions.dartmouth.edu (admissions.dartmouth.edu)... 104.16.117.63, 104.16.118.63
Connecting to admissions.dartmouth.edu (admissions.dartmouth.edu)|104.16.117.63|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 387948 (379K) [image/jpeg]
Saving to: ‘rauner.jpg’


2026-02-27 10:54:47 (8.17 MB/s) - ‘rauner.jpg’ saved [387948/387948]



In [14]:
get_response("rauner.jpg","What are some possible careers for these students?")

["The students in the image are likely studying printmaking, which involves creating artwork using printing techniques such as etching, lithography, and woodcut. Some possible careers for these students could include:\n\n1. **Printmaker**: They could work as artists or designers who create artwork using printing techniques.\n2. **Printer**: They could work in the printing industry, such as in bookbinding, magazine production, or graphic design.\n3. **Art Teacher**: They could teach art classes, focusing on printmaking, to students of all ages.\n4. **Artisan**: They could create custom artwork for clients, such as posters, prints, or limited edition pieces.\n5. **Illustrator**: They could work in the field of illustration, creating artwork for books, magazines, or other media.\n6. **Graphic Designer**: They could work in graphic design, creating visual content for websites, brochures, or other printed materials.\n7. **Art Collector**: They could specialize in collecting and studying pri

In [15]:
!wget -O protest.jpg 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRAah_qZp0IDiYzSsGxyIjQHEzzDEf3D8eC_w&s'

--2026-02-27 10:55:04--  https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRAah_qZp0IDiYzSsGxyIjQHEzzDEf3D8eC_w&s
Resolving encrypted-tbn0.gstatic.com (encrypted-tbn0.gstatic.com)... 142.250.188.14
Connecting to encrypted-tbn0.gstatic.com (encrypted-tbn0.gstatic.com)|142.250.188.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9496 (9.3K) [image/jpeg]
Saving to: ‘protest.jpg’


2026-02-27 10:55:05 (14.9 MB/s) - ‘protest.jpg’ saved [9496/9496]



In [17]:
get_response("protest.jpg","What is happening in this photograph? What is the legacy of this image?")

['The photograph you provided shows a line of tanks moving down a street in Beijing, China. This image is often referred to as the "Tank Man" photograph, which was taken during the Tiananmen Square protests in 1989. The photograph captures a moment of resistance and defiance against the Chinese government\'s suppression of the protests.\n\nThe legacy of this image is significant in terms of its representation of the Tiananmen Square protests and the human rights issues that were at the center of the protests. The photograph has become a symbol of the struggle for freedom and democracy in China and has been used in various forms of media to raise awareness about human rights issues and the ongoing struggle for democracy in China.']