GitHub - landing-ai/vision-agent: Vision agent

VisionAgent

VisionAgent is a library that helps you utilize agent frameworks to generate code to solve your vision task. Check out our discord for updates and roadmaps! The fastest way to test out VisionAgent is to use our web application which you can find here.

Installation

pip install vision-agent

export ANTHROPIC_API_KEY="your-api-key"
export GEMINI_API_KEY="your-api-key"

NOTE: We found using both Anthropic Claude-3.7 and Gemini-2.0-Flash-Exp to be provide the best performance for VisionAgent. If you want to use a different LLM provider or only one, see 'Using Other LLM Providers' below.

You will also need to set your VisionAgent API key to be able to authenticate when using the hosted vision tools that we provide through our APIs. Currently, the APIs are free to use so you will only need to get it from here.

export VISION_AGENT_API_KEY="your-api-key"

Documentation

VisionAgent Library Docs

Examples

Counting cans in an image

You can run VisionAgent in a local Jupyter Notebook Counting cans in an image

Generating code

You can use VisionAgent to generate code to count the number of people in an image:

from vision_agent.agent import VisionAgentCoderV2
from vision_agent.models import AgentMessage

agent = VisionAgentCoderV2(verbose=True)
code_context = agent.generate_code(
    [
        AgentMessage(
            role="user",
            content="Count the number of people in this image",
            media=["people.png"]
        )
    ]
)

with open("generated_code.py", "w") as f:
    f.write(code_context.code + "\n" + code_context.test)

Using the tools directly

VisionAgent produces code that utilizes our tools. You can also use the tools directly. For example if you wanted to detect people in an image and visualize the results:

import vision_agent.tools as T
import matplotlib.pyplot as plt

image = T.load_image("people.png")
dets = T.countgd_object_detection("person", image)
# visualize the countgd bounding boxes on the image
viz = T.overlay_bounding_boxes(image, dets)

# save the visualization to a file
T.save_image(viz, "people_detected.png")

# display the visualization
plt.imshow(viz)
plt.show()

You can also use the tools for running on video files:

import vision_agent.tools as T

frames_and_ts = T.extract_frames_and_timestamps("people.mp4")
# extract the frames from the frames_and_ts list
frames = [f["frame"] for f in frames_and_ts]

# run the countgd tracking on the frames
tracks = T.countgd_sam2_video_tracking("person", frames)
# visualize the countgd tracking results on the frames and save the video
viz = T.overlay_segmentation_masks(frames, tracks)
T.save_video(viz, "people_detected.mp4")

Using Other LLM Providers

You can use other LLM providers by changing config.py in the vision_agent/configs directory. For example to change to Anthropic simply just run:

cp vision_agent/configs/anthropic_config.py vision_agent/configs/config.py

You can also modify the existing config.py file yourself to use a different LLM provider, for example if you wanted to change the planner from Anthropic inside config.py to OpenAI you would replace this code:

    planner: Type[LMM] = Field(default=AnthropicLMM)
    planner_kwargs: dict = Field(
        default_factory=lambda: {
            "model_name": "claude-3-7-sonnet-20250219",
            "temperature": 0.0,
            "image_size": 768,
        }
    )

with this code:

    planner: Type[LMM] = Field(default=OpenAILMM)
    planner_kwargs: dict = Field(
        default_factory=lambda: {
            "model_name": "gpt-4o-2024-11-20",
            "temperature": 0.0,
            "image_size": 768,
            "image_detail": "low",
        }
    )

NOTE: VisionAgent moves fast and we are constantly updating and changing the library. If you have any questions or need help, please reach out to us on our discord channel.

Name		Name	Last commit message	Last commit date
Latest commit History 675 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
tests		tests
vision_agent		vision_agent
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionAgent

Installation

Documentation

Examples

Counting cans in an image

Generating code

Using the tools directly

Using Other LLM Providers

About

Releases

Packages

Contributors 26

Languages

License

landing-ai/vision-agent

Folders and files

Latest commit

History

Repository files navigation

VisionAgent

Installation

Documentation

Examples

Counting cans in an image

Generating code

Using the tools directly

Using Other LLM Providers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 26

Languages

Packages