EARL: The Promise of RL for Autoregressive Image Editing

🎉 Exciting News: EARL: The Promise of RL for Autoregressive Image Editing is accepted at NeurIPS 2025! 🎉

Official code for the paper The Promise of RL for Autoregressive Image Editing (EARL).

This README provides an overview of how to prepare and tokenize data for EARL image editing using either a Hugging Face dataset or local data. It also covers SFT, RL training and evaluation steps.

Prerequisites - Installation

git clone https://github.com/mair-lab/EARL
cd EARL
python -m venv /path/to/envs/EARL
. /path/to/envs/EARL/bin/activate
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install vllm==0.8.4
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install -r requirements.txt
export PYTHONPATH=$(pwd)

Model

Try the model quickly: vLLM inference

Patch vLLM to support Emu3

vim /path/to/venv/lib/python3.10/site-packages/vllm/model_executor/models/registry.py

Add the following line to the file line 166

_MULTIMODAL_MODELS = {    
    # add this line
    "Emu3ForCausalLM": ("llama", "LlamaForCausalLM"), 
    # end of adding
    "AriaForConditionalGeneration": ("aria", "AriaForConditionalGeneration"), # already exists
    ...
}

Run inference on Omniedit

EARL on Hugging Face 🤗: mair-lab/earl-sft-simple.rl-simple-n-complex-scaled

. ./scripts/batch_eval.sh

Notes:

vLLM support infinite batch size, so technically you can pass entire validation set at once.

Trained Models on HF 🤗

Trained models are available on Hugging Face 🤗: mair-lab/earl collection

Dataset

Dataset on HF 🤗

For convenience, we provide pre-tokenized training data on Hugging Face 🤗: mair-lab/earl-datasets

Tokenization

We have tokenization code compatible with both Hugging Face datasets and local files. Follow the steps below to tokenize your data.

To tokenize your data, use the following command. Make sure the images are in PIL format, if not you can slightly modify the code to read the image in the provided format. You can specify which key in your dataset corresponds to each required component (original image, edited image, and text instruction). You need to provide list of which keys are part of CoT. Please see ./scripts/tokenize.sh:

python ./emu3/train_image_editing/prepare_data.py --dataset-name "$data_name" --output-path "$output_dir" --original-image-key "$original_image_key" --edit-instruction "$edit_instruction" --CoT-keys "${CoT_keys[@]}" --edited-image "$edited_image" --image-area $image_area --random-seed 42

Parameters:

--dataset-name: Path to your dataset, which can be either:
- A Hugging Face dataset name (e.g., "username/dataset-name")
- A local path to parquet files containing your dataset
--output-path: Directory where the tokenized data will be saved
--original-image-key: The key/column name in your dataset that contains the original images
--edit-instruction: The key/column name containing the text instructions for editing
--CoT-keys: Array of keys/column names that are part of the Chain of Thought (CoT) process
--edited-image: The key/column name containing the edited/result images
--image-area: The target area for the images (in pixels)
--random-seed: Random seed for reproducibility
--source: Source HF or local.
- Default: HF

For multi-GPU tokenization, use the provided script:

. ./scripts/tokenize.sh

SFT Training

Trained models are available on Hugging Face 🤗: mair-lab/earl collection

For training, choose between DeepSpeed Stage Zero 3 or DeepSpeed Stage Zero 3 offload based on your available resources. Use:

. ./scripts/ie_sft.sh
. ./scripts/ie_sft_offload.sh

Ensure that the DATAPATH and VALPATH variables point to your tokenized data list from the previous step, and adjust EXP_NAME and --output_dir.

Note: Adjust --cot_keys to mention which keys of CoT is intended to use for training if you want to train with reasoning. See example below.

--cot_keys 'reasoning_verbose'

The --coefficients argument is used to control the coefficient of sampling for each dataset.

If provided, each dataset is sampled based on its coefficient. If not provided, the script automatically oversamples smaller datasets to match the size of the largest dataset. Important Note: If you want automatic upsampling, remove the --coefficients argument when running the script.

RL Training

The RL training is designed to be user-friendly with clear configuration options and automatic checkpointing. Check out rl/README.md for detailed instructions!

The current configs in rl/configs/ are sample configurations. The final configurations used in the paper will be released soon.

Evaluation

See README file inside baselines/ directory.

Citation

If you use this code or paper in your research, please cite our work:

@inproceedings{earl2025,
  title={The Promise of RL for Autoregressive Image Editing}, 
  author={Saba Ahmadi and Rabiul Awal and Ankur Sikarwar and Amirhossein Kazemnejad and Ge Ya Luo and Juan A. Rodriguez and Sai Rajeswar and Siva Reddy and Christopher Pal and Benno Krojer and Aishwarya Agrawal},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EARL: The Promise of RL for Autoregressive Image Editing

Table of Contents

Prerequisites - Installation

Model

Try the model quickly: vLLM inference

Trained Models on HF 🤗

Dataset

Dataset on HF 🤗

Tokenization

SFT Training

RL Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
VIEScore		VIEScore
assets		assets
baselines		baselines
emu3		emu3
evaluation		evaluation
rl		rl
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

EARL: The Promise of RL for Autoregressive Image Editing

Table of Contents

Prerequisites - Installation

Model

Try the model quickly: vLLM inference

Trained Models on HF 🤗

Dataset

Dataset on HF 🤗

Tokenization

SFT Training

RL Training

Evaluation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages