Skip to content

RenlyH/CodeV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

📄 Paper | 🤗 CodeV-SFT | 🤗 CodeV-RL | 📊 CodeV-RL-Data

Overview

CodeV is a code-based visual agent that achieves faithful visual reasoning by generating and executing Python code with visual tools. Unlike traditional vision-language models that may achieve high accuracy while exhibiting unfaithful reasoning (invoking tools on irrelevant regions or ignoring outputs), CodeV ensures that intermediate visual tool outputs actually contain queried evidence.

This repository provides the Tool-Aware Policy Optimization (TAPO) training code and evaluation framework for CodeV.

CodeV Framework
CodeV framework: Code-based visual reasoning with executable tools

Key Features

  • 🎯 Faithful Visual Reasoning: Ensures intermediate tool outputs contain actual evidence, not just correct final answers
  • 🔧 Tool-Aware Policy Optimization (TAPO): Novel RL framework with dense rewards based on visual tool inputs/outputs
  • 📝 Code-Based Visual Agent: Represents visual tools as executable Python code for verifiable reasoning
  • 🚀 Strong Performance: Competitive or superior accuracy with substantially higher faithful tool-use rates

What's Included

✅ Tool-Aware Policy Optimization (rl/)

  • TAPO training framework augmenting GRPO with dense visual tool rewards
  • Safe and powerful python sandbox as visual tool-use

See rl/README.md for detailed capabilities and training instructions.

✅ Evaluation Framework (VLMEvalKit/)

  • Comprehensive evaluation framework based on VLMEvalKit
  • Evaluation protocol with tool-use like python sandbox and crop API

🔗 For SFT Training

For supervised fine-tuning (SFT), please refer to LLaMA-Factory or ms-swift since currently VeRL does not support multimodal SFT. As for the training data, please refer to Thyme which provides the SFT training data.

Performance Highlights

Main Results

Main Results
Performance on visual search and reasoning benchmarks

CodeV achieves:

  • Competitive or superior accuracy on visual search benchmarks (VStarBench, HRBench, MathVista)
  • Substantially increased faithful tool-use rates compared to baselines

Faithfulness Results

Faithfulness Results
Faithful tool-use rates: CodeV vs. baselines (Thyme, Pixel-Reasoner, etc.)

Key Finding: Explicitly supervising intermediate tool behavior (via TAPO) is crucial for building trustworthy agentic visual reasoning systems.

Training Dynamics

Response Length and Tool Calls Reward Curve
TAPO training dynamics: response length/tool calls (left) and reward progression (right)

See paper for complete results and analysis

Quick Start

RL Training

# 1. Install package
cd rl && pip install -e . --no-deps

# 2. Install required dependencies
pip install -r requirements_codev.txt

# 3. Prepare data
hf download RenlyH/CodeV-RL-Data --local-dir data/
python scripts/extract_images_from_parquet.py \
    --parquet_path data/codev_rl_data.parquet \
    --image_save_path data/images

# 4. Setup LLM judge for reward model (on separate node with REWARD_NODE_IP)
vllm serve Qwen/Qwen2.5-VL-32B-Instruct --port 8000

# 5. Configure judge endpoint
export LLM_AS_A_JUDGE_BASE="http://REWARD_NODE_IP:8000/v1"

# 6. Login to Weights & Biases
wandb login

# 7. Start training
bash examples/agent/codev.sh

See rl/README.md for hardware requirements and detailed instructions.

Evaluation

Note: Install VLMEvalKit in a separate conda environment. Avoid sharing the same environment with RL training.

# 1. Create conda environment
conda create -n codev-eval python=3.10 -y
conda activate codev-eval

# 2. Install VLMEvalKit
cd VLMEvalKit && pip install -e .

# 3. Serve the model with vLLM (online serving recommended)
vllm serve RenlyH/CodeV-RL --port 8000

# 4. Run evaluation on the served model
python run.py --config codev_eval.yaml

Supported Benchmarks: VStarBench, HRBench, MathVista, and many more via VLMEvalKit integration.

Supported eval protocols: Python sandbox, crop API.

Recommendation: Use vLLM online serving (v0.10.0+) for testing models. This provides better performance and easier integration with the evaluation framework.

Repository Structure

CodeV/
├── rl/                      # TAPO training code
│   ├── verl/               # RL framework
│   ├── scripts/            # Training scripts
│   ├── examples/           # Example configurations
│   └── README.md           # Detailed RL capabilities
├── VLMEvalKit/              # Evaluation framework
│   ├── vlmeval/            # VLMEvalKit integration
│   ├── scripts/            # Evaluation helper scripts
│   └── run.py              # Main evaluation script
├── assets/                  # Figures and visualizations
└── CLAUDE.md               # Guidance for Claude Code

Citation

If you find CodeV useful for your research, please cite:

@article{hou2025codev,
  title={CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization},
  author={Hou, Xinhai and Xu, Shaoyuan and Biyani, Manan and Li, Mayan and Liu, Jia and Hollon, Todd C and Wang, Bryan},
  journal={arXiv preprint arXiv:2511.19661},
  year={2025}
}

Related Projects

  • Pixel-Reasoner: Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
  • OpenThinkImg: OPENTHINKIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
  • DeepEyes: DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning
  • REVPT: Reinforced Visual Perception with Tools
  • Thyme: Thyme: Think Beyond Images

License

This project is licensed under Apache 2.0 License.

Contact

For questions and issues, please open an issue on GitHub or contact xinhaih@umich.edu.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published