Skip to content

liamchalcroft/gaze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

GAZE: Grounded Agentic Zero-shot Evaluation

CI PyPI Python 3.10+ License: MIT codecov OpenSSF Scorecard Docs

GAZE (Grounded Agentic Zero-shot Evaluation) is a modular Python framework for multi-turn agentic vision-language model (VLM) systems, built for medical image analysis.

A radiologist rarely reads a scan in a single glance: they zoom, adjust the window, compare regions, and consult the literature before writing a report. A vision-language model, by contrast, reads an image once and produces text in a single forward pass. GAZE closes that gap by giving a VLM viewer-level tools (zoom, windowing, contrast, edge detection) and literature retrieval (PubMed, Open-i), then running it as a multi-turn loop with schema-validated outputs and full tool-call traces for auditability. It applies to any visual reasoning task, not only medical imaging.

Features

  • Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
  • 25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
  • Task processors -- abstract base class with dependency injection for prompts, schemas, and validation
  • Model adapters -- OpenAI API (including OpenRouter), LM Studio for local models, HuggingFace Transformers
  • Verifiers integration -- reward functions and multi-turn environments for RL training via verifiers

Tools at a glance

The model can call these during reasoning (multi-turn mode); the full set of 25 is in the tool reference.

Category Representative tools
Inspect zoom, crop, rotate, flip_horizontal
Enhance adjust_contrast, adjust_brightness, window_level, equalize_histogram
Analyze threshold, detect_edges, morphological, symmetry_diff
Retrieve search_web (PubMed), search_images (Open-i)

Installation

pip install gaze-vlm

With extras for specific examples:

pip install gaze-vlm[nova]          # NOVA brain-MRI benchmark
pip install gaze-vlm[gemex]         # GEMeX visual grounding
pip install gaze-vlm[agentclinic]   # AgentClinic diagnostic reasoning
pip install gaze-vlm[pubmedqa]      # PubMedQA text-only QA
pip install gaze-vlm[vqa-rad]       # VQA-RAD radiology VQA
pip install gaze-vlm[medmarks]      # MedMarks-compatible NOVA environment
pip install gaze-vlm[verifiers]     # RL reward functions

For development:

git clone https://github.com/liamchalcroft/gaze.git
cd gaze
uv sync

Quick start

Subclass AgenticProcessorBase and implement four methods:

import asyncio
from pathlib import Path
from gaze import AgenticProcessorBase

class MyProcessor(AgenticProcessorBase):
    def get_system_prompt(self, images, metadata):
        return "You are a medical imaging expert."

    def get_user_message(self, images, metadata):
        return f"Analyze this scan. History: {metadata.get('history', '')}"

    def get_response_schema(self):
        return {
            "type": "json_schema",
            "json_schema": {
                "name": "analysis",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "findings": {"type": "string"},
                        "continue": {"type": "boolean"},
                    },
                    "required": ["findings", "continue"],
                    "additionalProperties": False,
                },
            },
        }

    def validate_response(self, response):
        return "findings" in response

async def main():
    # `async with` releases shared search/HTTP resources on exit.
    async with MyProcessor(model_name="openai/gpt-4o", use_tools=True) as processor:
        result = await processor.analyze(
            images=Path("scan.jpg"),
            metadata={"modality": "MRI", "history": "Patient presents with headache"},
        )
        print(result.final_response)

asyncio.run(main())

The model returns JSON each turn with "continue": true to keep reasoning or "continue": false when done. result.final_response is the validated JSON from the last turn, for example:

{"findings": "No acute intracranial abnormality.", "continue": false}

Architecture

src/gaze/
    base.py          AgenticProcessorBase -- subclass this
    types.py         ToolCall, ToolResult, Turn, AgenticResult (all frozen)
    config.py        Frozen dataclasses: GazeConfig, SearchConfig, etc.
    exceptions.py    GazeError hierarchy
    models/          AdapterProtocol, OpenAIAdapter, LMStudioAdapter, HuggingFaceAdapter
    tools/           Tool, ToolRegistry, 23 visual tools, 2 search tools
    retrieval/       PubMed (NCBI E-utilities), Open-i image search
    prompts/         Jinja2 templates via minijinja
    verifiers/       RL reward functions and multi-turn environments
    utils/           IoU, JSON extraction, type coercion, confidence clamping

The import path is gaze (the package lives under src/gaze/).

Examples

Five complete example applications are included:

Example Task Dataset
nova/ Brain MRI analysis (caption + diagnosis + localization) c-i-ber/Nova
gemex_thinkvg/ Visual grounding with chain-of-thought MIMIC-CXR (PhysioNet)
agentclinic_nejm/ Multi-turn diagnostic reasoning AgentClinic NEJM
pubmedqa/ Medical Q&A (text-only) PubMedQA
vqa_rad/ Radiology VQA VQA-RAD

Each example includes a CLI, evaluation metrics, and run scripts for local models.

Local models (LM Studio)

All examples support local model inference via LM Studio:

uv run python -m examples.nova.src.cli \
  --model qwen3.5-a3b \
  --base-url http://localhost:1234/v1 \
  --mode single_turn \
  --max-samples 5

Environment variables

Variable Required Description
OPENROUTER_API_KEY or OPENAI_API_KEY Yes (for cloud models) Model API access
NCBI_API_KEY No Higher PubMed rate limits
NCBI_EMAIL No PubMed API compliance
GAZE_ALLOW_CUSTOM_BASE_URL No Set to 1 to send API keys to a non-allowlisted model host

Development

uv sync                          # Install dependencies
make check                       # Quality gate: lint + format + typecheck + lockfile + tests
make check-nova                  # Torch-gated + example tests (installs the nova extra)
uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run pyright src/              # Type check
uv run pytest tests/ -x          # Run tests

Stability and versioning

GAZE follows Semantic Versioning. While the project is pre-1.0, minor releases may include breaking changes to the public API; each is recorded in the Changelog. The public API is the set of names exported from the top-level gaze package (gaze.__all__); anything underscore-prefixed or imported from a submodule is internal and may change without notice. From 1.0 onward, removals will ship with a deprecation warning for at least one minor release.

Documentation

Citation

If you use GAZE in your research, please cite:

@article{alim2026gaze,
  title   = {{GAZE}: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain {MRI}},
  author  = {Alim, Duaa and Alim, Mogtaba and Chalcroft, Liam},
  journal = {arXiv preprint arXiv:2605.00876},
  year    = {2026},
  note    = {Accepted at AIiH 2026},
}

The preprint is available at arXiv:2605.00876.

License

MIT

About

Grounded agentic framework for medical vision-language models: viewer-level image tools, multi-turn tool use, and PubMed/Open-i literature retrieval

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages