CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

📄 Paper | 🤗 CodeV-SFT | 🤗 CodeV-RL | 📊 CodeV-RL-Data

Overview

CodeV is a code-based visual agent that achieves faithful visual reasoning by generating and executing Python code with visual tools. Unlike traditional vision-language models that may achieve high accuracy while exhibiting unfaithful reasoning (invoking tools on irrelevant regions or ignoring outputs), CodeV ensures that intermediate visual tool outputs actually contain queried evidence.

This repository provides the Tool-Aware Policy Optimization (TAPO) training code and evaluation framework for CodeV.

CodeV framework: Code-based visual reasoning with executable tools

Key Features

🎯 Faithful Visual Reasoning: Ensures intermediate tool outputs contain actual evidence, not just correct final answers
🔧 Tool-Aware Policy Optimization (TAPO): Novel RL framework with dense rewards based on visual tool inputs/outputs
📝 Code-Based Visual Agent: Represents visual tools as executable Python code for verifiable reasoning
🚀 Strong Performance: Competitive or superior accuracy with substantially higher faithful tool-use rates

What's Included

✅ Tool-Aware Policy Optimization (`rl/`)

TAPO training framework augmenting GRPO with dense visual tool rewards
Safe and powerful python sandbox as visual tool-use

See rl/README.md for detailed capabilities and training instructions.

✅ Evaluation Framework (`VLMEvalKit/`)

Comprehensive evaluation framework based on VLMEvalKit
Evaluation protocol with tool-use like python sandbox and crop API

🔗 For SFT Training

For supervised fine-tuning (SFT), please refer to LLaMA-Factory or ms-swift since currently VeRL does not support multimodal SFT. As for the training data, please refer to Thyme which provides the SFT training data.

Performance Highlights

Main Results

Performance on visual search and reasoning benchmarks

CodeV achieves:

✅ Competitive or superior accuracy on visual search benchmarks (VStarBench, HRBench, MathVista)
✅ Substantially increased faithful tool-use rates compared to baselines

Faithfulness Results

Faithful tool-use rates: CodeV vs. baselines (Thyme, Pixel-Reasoner, etc.)

Key Finding: Explicitly supervising intermediate tool behavior (via TAPO) is crucial for building trustworthy agentic visual reasoning systems.

Training Dynamics

TAPO training dynamics: response length/tool calls (left) and reward progression (right)

See paper for complete results and analysis

Quick Start

RL Training

# 1. Install package
cd rl && pip install -e . --no-deps

# 2. Install required dependencies
pip install -r requirements_codev.txt

# 3. Prepare data
hf download RenlyH/CodeV-RL-Data --local-dir data/
python scripts/extract_images_from_parquet.py \
    --parquet_path data/codev_rl_data.parquet \
    --image_save_path data/images

# 4. Setup LLM judge for reward model (on separate node with REWARD_NODE_IP)
vllm serve Qwen/Qwen2.5-VL-32B-Instruct --port 8000

# 5. Configure judge endpoint
export LLM_AS_A_JUDGE_BASE="http://REWARD_NODE_IP:8000/v1"

# 6. Login to Weights & Biases
wandb login

# 7. Start training
bash examples/agent/codev.sh

See rl/README.md for hardware requirements and detailed instructions.

Evaluation

Note: Install VLMEvalKit in a separate conda environment. Avoid sharing the same environment with RL training.

# 1. Create conda environment
conda create -n codev-eval python=3.10 -y
conda activate codev-eval

# 2. Install VLMEvalKit
cd VLMEvalKit && pip install -e .

# 3. Serve the model with vLLM (online serving recommended)
vllm serve RenlyH/CodeV-RL --port 8000

# 4. Run evaluation on the served model
python run.py --config codev_eval.yaml

Supported Benchmarks: VStarBench, HRBench, MathVista, and many more via VLMEvalKit integration.

Supported eval protocols: Python sandbox, crop API.

Recommendation: Use vLLM online serving (v0.10.0+) for testing models. This provides better performance and easier integration with the evaluation framework.

Repository Structure

CodeV/
├── rl/                      # TAPO training code
│   ├── verl/               # RL framework
│   ├── scripts/            # Training scripts
│   ├── examples/           # Example configurations
│   └── README.md           # Detailed RL capabilities
├── VLMEvalKit/              # Evaluation framework
│   ├── vlmeval/            # VLMEvalKit integration
│   ├── scripts/            # Evaluation helper scripts
│   └── run.py              # Main evaluation script
├── assets/                  # Figures and visualizations
└── CLAUDE.md               # Guidance for Claude Code

Citation

If you find CodeV useful for your research, please cite:

@article{hou2025codev,
  title={CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization},
  author={Hou, Xinhai and Xu, Shaoyuan and Biyani, Manan and Li, Mayan and Liu, Jia and Hollon, Todd C and Wang, Bryan},
  journal={arXiv preprint arXiv:2511.19661},
  year={2025}
}

Related Projects

Pixel-Reasoner: Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
OpenThinkImg: OPENTHINKIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
DeepEyes: DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning
REVPT: Reinforced Visual Perception with Tools
Thyme: Thyme: Think Beyond Images

License

This project is licensed under Apache 2.0 License.

Contact

For questions and issues, please open an issue on GitHub or contact xinhaih@umich.edu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

Overview

Key Features

What's Included

✅ Tool-Aware Policy Optimization (`rl/`)

✅ Evaluation Framework (`VLMEvalKit/`)

🔗 For SFT Training

Performance Highlights

Main Results

Faithfulness Results

Training Dynamics

Quick Start

RL Training

Evaluation

Repository Structure

Citation

Related Projects

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
VLMEvalKit		VLMEvalKit
assets		assets
rl		rl
README.md		README.md

RenlyH/CodeV

Folders and files

Latest commit

History

Repository files navigation

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

Overview

Key Features

What's Included

✅ Tool-Aware Policy Optimization (rl/)

✅ Evaluation Framework (VLMEvalKit/)

🔗 For SFT Training

Performance Highlights

Main Results

Faithfulness Results

Training Dynamics

Quick Start

RL Training

Evaluation

Repository Structure

Citation

Related Projects

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

✅ Tool-Aware Policy Optimization (`rl/`)

✅ Evaluation Framework (`VLMEvalKit/`)

Packages