AgentOCR: Reimagining Agent History via Optical Self-Compression

AgentOCR addresses the critical bottleneck of rapidly growing textual histories in multi-turn LLM agent training by representing observation-action history as compact rendered images. This approach exploits the superior information density of visual tokens, substantially reducing token consumption while preserving agent performance.

Key Features:

Visual Token Representation: Renders history as compact images, achieving >50% token reduction with >95% performance preservation
Segment Optical Caching: Hashable segment decomposition with visual cache enables 20× rendering speedup
Agentic Self-Compression: Agent learns to adaptively emit compression rates via compression-aware reward training

Installation

Install veRL

conda create -n AgentOCR python==3.12 -y
conda activate AgentOCR

pip3 install vllm==0.11.0

pip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip3 install -e .

Install Supported Environments

1. ALFWorld

Install with pip:

pip3 install gymnasium==0.29.1
pip3 install stable-baselines3==2.6.0
pip3 install alfworld

Download PDDL & Game files and pre-trained MaskRCNN detector (will be stored in ~/.cache/alfworld/):

alfworld-download -f

2. Search

cd ./agent_system/environments/env_package/search/third_party
pip install -e .
pip install gym==0.26.2

Prepare dataset (data will be saved at ~/data/searchR1_processed_direct):

cd repo_root/
python examples/data_preprocess/preprocess_search_r1_dataset.py

Since faiss-gpu is not available via pip, we setup a separate conda environment for the local retrieval server. Running this server will use around 6GB of GPU memory per GPU, so make sure to account for this in your training run configuration. Build Retriever environments:

conda create -n retriever python=3.10 -y
conda activate retriever

conda install numpy==1.26.4 
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

pip install transformers datasets pyserini huggingface_hub
conda install faiss-gpu==1.8.0 -c pytorch -c nvidia -y
pip install uvicorn fastapi

Download the index:

conda activate retriever

local_dir=~/data/searchR1
python examples/search/searchr1_download.py --local_dir $local_dir
cat $local_dir/part_* > $local_dir/e5_Flat.index
gzip -d $local_dir/wiki-18.jsonl.gz

Start the local flat e5 retrieval server:

conda activate retriever

# redirect the output to a file to avoid cluttering the terminal
# we have observed outputting to the terminal causing spikes in server response times
bash examples/search/retriever/retrieval_launch.sh > retrieval_server.log

Run Examples

We provide training scripts for ALFWorld and Search-based QA tasks:

# ALFWorld
bash train_alfworld.sh

# Search
bash train_search.sh

Acknowledgement

AgentOCR is built upon verl-agent and veRL, which provide the foundational infrastructure for multi-turn agent training and efficient RL training for LLMs.

The supported environments are adapted from ALFWorld for embodied AI tasks and Search-R1 for search-based question answering. We extend our gratitude to the authors and contributors of these projects for their valuable work.

Name		Name	Last commit message	Last commit date
Latest commit History 541 Commits
.github		.github
.vscode		.vscode
agent_system		agent_system
agentocr		agentocr
docker		docker
docs		docs
examples		examples
gigpo		gigpo
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py
train_alfworld.sh		train_alfworld.sh
train_alfworld_text.sh		train_alfworld_text.sh
train_search.sh		train_search.sh
train_search_text.sh		train_search_text.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOCR: Reimagining Agent History via Optical Self-Compression

Installation

Install veRL

Install Supported Environments

1. ALFWorld

2. Search

Run Examples

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

langfengQ/AgentOCR

Folders and files

Latest commit

History

Repository files navigation

AgentOCR: Reimagining Agent History via Optical Self-Compression

Installation

Install veRL

Install Supported Environments

1. ALFWorld

2. Search

Run Examples

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages