# DeepSeek-VL-Chat Test (7B params)

Chat-optimized VLM with good conversational capabilities.

**Model:** `deepseek-ai/deepseek-vl-7b-chat`  
**Size:** 7B parameters  
**License:** Permissive  
**Features:** Chat-optimized, good accuracy  
**Requirements:** ~14GB disk, ~8GB RAM/VRAM


In [None]:
import torch
from transformers import AutoProcessor, AutoModel
from PIL import Image
from vlm_utils import get_device_info, load_test_images, display_image, print_section, print_subsection

device = get_device_info()


Using device: mps
PyTorch version: 2.9.1
Using Apple Silicon MPS (Metal Performance Shaders)


## Load Test Images


In [None]:
image_files = load_test_images()


Found 1 image(s) to test:
  - sample_image.jpg


## Load DeepSeek-VL-Chat Model


In [None]:
print("Loading DeepSeek-VL-Chat...")
model_id = "deepseek-ai/deepseek-vl-7b-chat"

# Note: DeepSeek-VL has known issues with MPS backend
# Force CPU if on MPS to avoid errors
if device.type == 'mps':
    print("‚ö†Ô∏è  DeepSeek-VL-Chat has compatibility issues with MPS, using CPU instead")
    device = torch.device('cpu')
    model_dtype = torch.float32
else:
    use_float16 = torch.cuda.is_available()
    model_dtype = torch.float16 if use_float16 else torch.float32

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=model_dtype,
    low_cpu_mem_usage=True
).to(device)
print("‚úì DeepSeek-VL-Chat loaded!")


Loading DeepSeek-VL-Chat...
‚ö†Ô∏è  DeepSeek-VL-Chat has compatibility issues with MPS, using CPU instead


You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.


ValueError: The checkpoint you are trying to load has model type `multi_modality` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

## Define Inference Function


In [None]:
def describe_image(image_path, prompt="Describe this image in detail."):
    """Generate description for an image using DeepSeek-VL-Chat."""
    image = Image.open(image_path)
    inputs = processor(images=image, text=prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_new_tokens=200)
    description = processor.decode(outputs[0], skip_special_tokens=True)
    return description


## Test on All Images


In [None]:
for image_path in image_files:
    print_section(f"Image: {image_path.name}")
    
    display_image(image_path)
    
    print_subsection("üîç DeepSeek-VL-Chat Description:")
    try:
        desc = describe_image(image_path)
        print(desc)
    except Exception as e:
        print(f"Error: {e}")


## Custom Prompts

Try asking specific questions about an image.


In [None]:
if image_files:
    test_image = image_files[0]
    
    custom_prompts = [
        "What objects can you see in this image?",
        "What colors are prominent in this image?",
        "What is the main subject of this image?"
    ]
    
    print_section(f"Custom Prompts - {test_image.name}")
    display_image(test_image)
    
    for prompt in custom_prompts:
        print_subsection(f"Q: {prompt}")
        answer = describe_image(test_image, prompt)
        print(f"A: {answer}")
