Skip to content

docs: Course section — Image Input and Vision Models #15

@rdwj

Description

@rdwj

Summary

Teach users how to send images to agents and configure vision model support. Vision capabilities enable agents to understand screenshots, diagrams, photos, and other visual content alongside text.

Course Section Outline

  • Content block format — mixing text and image_url blocks in messages
  • Configuring vision model endpoints in agent.yaml
  • Deploying Granite Vision 3.2-2B on vLLM with the correct launch flags
  • Sending images via the API — base64 encoding and file_id references from the upload endpoint
  • UI integration for image paste, drag-and-drop, and file picker upload
  • Model capability considerations — not all models support vision, how to handle gracefully

Lab Exercise

Deploy a vision model on vLLM, create an agent configured to use it, and test image understanding through several scenarios: describe a photograph, extract text from a screenshot, and interpret a simple diagram. Verify that non-vision requests still route correctly.

Companion Issues

Companion issues filed on fips-agents/agent-template, fips-agents/gateway-template, fips-agents/ui-template, and fips-agents/fips-agents-cli.

Size

S

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions