GitHub - liamchalcroft/gaze: Grounded agentic framework for medical vision-language models: viewer-level image tools, multi-turn tool use, and PubMed/Open-i literature retrieval

GAZE (Grounded Agentic Zero-shot Evaluation) is a modular Python framework for multi-turn agentic vision-language model (VLM) systems, built for medical image analysis.

A radiologist rarely reads a scan in a single glance: they zoom, adjust the window, compare regions, and consult the literature before writing a report. A vision-language model, by contrast, reads an image once and produces text in a single forward pass. GAZE closes that gap by giving a VLM viewer-level tools (zoom, windowing, contrast, edge detection) and literature retrieval (PubMed, Open-i), then running it as a multi-turn loop with schema-validated outputs and full tool-call traces for auditability. It applies to any visual reasoning task, not only medical imaging.

Features

Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
Task processors -- abstract base class with dependency injection for prompts, schemas, and validation
Model adapters -- OpenAI API (including OpenRouter), LM Studio for local models, HuggingFace Transformers
Verifiers integration -- reward functions and multi-turn environments for RL training via verifiers

Tools at a glance

The model can call these during reasoning (multi-turn mode); the full set of 25 is in the tool reference.

Category	Representative tools
Inspect	`zoom`, `crop`, `rotate`, `flip_horizontal`
Enhance	`adjust_contrast`, `adjust_brightness`, `window_level`, `equalize_histogram`
Analyze	`threshold`, `detect_edges`, `morphological`, `symmetry_diff`
Retrieve	`search_web` (PubMed), `search_images` (Open-i)

Installation

pip install gaze-vlm

With extras for specific examples:

pip install gaze-vlm[nova]          # NOVA brain-MRI benchmark
pip install gaze-vlm[gemex]         # GEMeX visual grounding
pip install gaze-vlm[agentclinic]   # AgentClinic diagnostic reasoning
pip install gaze-vlm[pubmedqa]      # PubMedQA text-only QA
pip install gaze-vlm[vqa-rad]       # VQA-RAD radiology VQA
pip install gaze-vlm[medmarks]      # MedMarks-compatible NOVA environment
pip install gaze-vlm[verifiers]     # RL reward functions

For development:

git clone https://github.com/liamchalcroft/gaze.git
cd gaze
uv sync

Quick start

Subclass AgenticProcessorBase and implement four methods:

import asyncio
from pathlib import Path
from gaze import AgenticProcessorBase

class MyProcessor(AgenticProcessorBase):
    def get_system_prompt(self, images, metadata):
        return "You are a medical imaging expert."

    def get_user_message(self, images, metadata):
        return f"Analyze this scan. History: {metadata.get('history', '')}"

    def get_response_schema(self):
        return {
            "type": "json_schema",
            "json_schema": {
                "name": "analysis",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "findings": {"type": "string"},
                        "continue": {"type": "boolean"},
                    },
                    "required": ["findings", "continue"],
                    "additionalProperties": False,
                },
            },
        }

    def validate_response(self, response):
        return "findings" in response

async def main():
    # `async with` releases shared search/HTTP resources on exit.
    async with MyProcessor(model_name="openai/gpt-4o", use_tools=True) as processor:
        result = await processor.analyze(
            images=Path("scan.jpg"),
            metadata={"modality": "MRI", "history": "Patient presents with headache"},
        )
        print(result.final_response)

asyncio.run(main())

The model returns JSON each turn with "continue": true to keep reasoning or "continue": false when done. result.final_response is the validated JSON from the last turn, for example:

{"findings": "No acute intracranial abnormality.", "continue": false}

Architecture

src/gaze/
    base.py          AgenticProcessorBase -- subclass this
    types.py         ToolCall, ToolResult, Turn, AgenticResult (all frozen)
    config.py        Frozen dataclasses: GazeConfig, SearchConfig, etc.
    exceptions.py    GazeError hierarchy
    models/          AdapterProtocol, OpenAIAdapter, LMStudioAdapter, HuggingFaceAdapter
    tools/           Tool, ToolRegistry, 23 visual tools, 2 search tools
    retrieval/       PubMed (NCBI E-utilities), Open-i image search
    prompts/         Jinja2 templates via minijinja
    verifiers/       RL reward functions and multi-turn environments
    utils/           IoU, JSON extraction, type coercion, confidence clamping

The import path is gaze (the package lives under src/gaze/).

Examples

Five complete example applications are included:

Example	Task	Dataset
`nova/`	Brain MRI analysis (caption + diagnosis + localization)	c-i-ber/Nova
`gemex_thinkvg/`	Visual grounding with chain-of-thought	MIMIC-CXR (PhysioNet)
`agentclinic_nejm/`	Multi-turn diagnostic reasoning	AgentClinic NEJM
`pubmedqa/`	Medical Q&A (text-only)	PubMedQA
`vqa_rad/`	Radiology VQA	VQA-RAD

Each example includes a CLI, evaluation metrics, and run scripts for local models.

Local models (LM Studio)

All examples support local model inference via LM Studio:

uv run python -m examples.nova.src.cli \
  --model qwen3.5-a3b \
  --base-url http://localhost:1234/v1 \
  --mode single_turn \
  --max-samples 5

Environment variables

Variable	Required	Description
`OPENROUTER_API_KEY` or `OPENAI_API_KEY`	Yes (for cloud models)	Model API access
`NCBI_API_KEY`	No	Higher PubMed rate limits
`NCBI_EMAIL`	No	PubMed API compliance
`GAZE_ALLOW_CUSTOM_BASE_URL`	No	Set to `1` to send API keys to a non-allowlisted model host

Development

uv sync                          # Install dependencies
make check                       # Quality gate: lint + format + typecheck + lockfile + tests
make check-nova                  # Torch-gated + example tests (installs the nova extra)
uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run pyright src/              # Type check
uv run pytest tests/ -x          # Run tests

Stability and versioning

GAZE follows Semantic Versioning. While the project is pre-1.0, minor releases may include breaking changes to the public API; each is recorded in the Changelog. The public API is the set of names exported from the top-level gaze package (gaze.__all__); anything underscore-prefixed or imported from a submodule is internal and may change without notice. From 1.0 onward, removals will ship with a deprecation warning for at least one minor release.

Documentation

Citation

If you use GAZE in your research, please cite:

@article{alim2026gaze,
  title   = {{GAZE}: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain {MRI}},
  author  = {Alim, Duaa and Alim, Mogtaba and Chalcroft, Liam},
  journal = {arXiv preprint arXiv:2605.00876},
  year    = {2026},
  note    = {Accepted at AIiH 2026},
}

The preprint is available at arXiv:2605.00876.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
docs		docs
environments/nova_brain_mri		environments/nova_brain_mri
examples		examples
scripts		scripts
src/gaze		src/gaze
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Tools at a glance

Installation

Quick start

Architecture

Examples

Local models (LM Studio)

Environment variables

Development

Stability and versioning

Documentation

Citation

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Tools at a glance

Installation

Quick start

Architecture

Examples

Local models (LM Studio)

Environment variables

Development

Stability and versioning

Documentation

Citation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages