Summary
Teach users how to send images to agents and configure vision model support. Vision capabilities enable agents to understand screenshots, diagrams, photos, and other visual content alongside text.
Course Section Outline
- Content block format — mixing text and image_url blocks in messages
- Configuring vision model endpoints in agent.yaml
- Deploying Granite Vision 3.2-2B on vLLM with the correct launch flags
- Sending images via the API — base64 encoding and file_id references from the upload endpoint
- UI integration for image paste, drag-and-drop, and file picker upload
- Model capability considerations — not all models support vision, how to handle gracefully
Lab Exercise
Deploy a vision model on vLLM, create an agent configured to use it, and test image understanding through several scenarios: describe a photograph, extract text from a screenshot, and interpret a simple diagram. Verify that non-vision requests still route correctly.
Companion Issues
Companion issues filed on fips-agents/agent-template, fips-agents/gateway-template, fips-agents/ui-template, and fips-agents/fips-agents-cli.
Size
S
Summary
Teach users how to send images to agents and configure vision model support. Vision capabilities enable agents to understand screenshots, diagrams, photos, and other visual content alongside text.
Course Section Outline
Lab Exercise
Deploy a vision model on vLLM, create an agent configured to use it, and test image understanding through several scenarios: describe a photograph, extract text from a screenshot, and interpret a simple diagram. Verify that non-vision requests still route correctly.
Companion Issues
Companion issues filed on fips-agents/agent-template, fips-agents/gateway-template, fips-agents/ui-template, and fips-agents/fips-agents-cli.
Size
S