Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

🐘 Introduction

The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon tasks. First, single-agent models struggle to balance high-level capabilities and low-level execution capability, facing prevalent issues of responsibility coupling and capability conflicts. Second, agents lack awareness of the task state, leading to progress loss in long-horizon tasks. To address these challenges, we propose a staged execution-feedback reinforcement learning algorithm. Unlike training a unified policy model, we focus on training high-level scheduling models. Specifically, we propose and train two agents: a Coordinator, responsible for the strategic planning and task decomposition; and a State Tracker, responsible for context compression and information management to maintain the task's state and coherence. Based on this, we built the Coordinator-Executor-State Tracker (CES) multi-agent framework, which can be integrated with any low-level Executor model, assisting the Executor in solving long-horizon tasks through task scheduling and state management. Experiments on long-horizon task benchmarks demonstrate that CES significantly enhances the system's planning and state management capabilities. Furthermore, analysis confirms that our trained high-level scheduling module is a generalizable, plug-and-play module that significantly enhances the long-horizon capabilities of various Executors

🌠 Key Features

Multi-Agent Decoupling: We build CES multi-agent framework, featuring general-purpose, plug-and-play high-level components (Coordinator and State Tracker) that can integrate with various Executors and enhance their abilities.
State context compression: We introduce a State Tracker, whose core task is dynamic context compression and state summarization, effectively resolving the state unawareness problem and maintaining the agent's logical coherence in long-horizon tasks.
Staged Execution-Feedback RL: We propose a staged execution-feedback RL strategy. The core of this algorithm is to decouple high-level capabilities from low-level execution: it freezes a pre-trained Executor and uses the reward signals it generates to exclusively train the high-level Coordinator and State Tracker.
Compelling Performance: Extensive experiments demonstrate that our method significantly enhances the long-horizon scheduling and state management capabilities of various Executor models and surpasses existing baselines.

🎭 Getting Started

🌲Env

Create conda virtual environment:

conda create --name ces python=3.10
conda activate ces
pip install -r requirements.txt

♨️ Warm-up SFT

We use LLaMA-Factory to do the warm-up sft.

1. Initialize LLaMA-FActory first (you can also follow official repo to initialize)

cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation

2. Fine-Tuning

The data has been put in LLaMA-Factory\data, called planner_vl_sft and memory_sft respectively.

You just need:

bash examples/sft/train_coordinator.sh
bash examples/sft/train_tracker.sh

3. Merge LoRA Weights:

llamafactory-cli export examples/merge_lora/qwen2_5vl_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

💪🏾 Staged Execution-Feedback RL

1. Data Preparation

Download the dataset from huggingface and put them to ./data

2. Download Executor

We use GUI-R1-7B as Executor model, so download it first.

You can also try other powful model, and maybe you can get higher performance.

3. Train Coordinator

Change your SFT_model and data path in train_coordinator.sh, and then:

cd ../
bash examples/train_rl/train_coordinator.sh

4. Train State Tracker

Remember the path of coordinator and change it in train_tracker.sh, and then:

bash examples/train_rl/train_tracker.sh

✒️ Eval

We use original data to evaluate directly:

python examples/eval/eval.py

🙏 Acknowledgements

We thank for the code repository: verl, LLaMA-Factory, vLLM, SWIRL, GUI-R1.

🙏 Citation

If you think our work helpful, please cite our paper. Thank you very much!

article{deng2025training,
  title={Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation},
  author={Deng, Zehao and Ju, Tianjie and Wu, Zheng and Zhang, Zhuosheng and Liu, Gongshen},
  journal={arXiv preprint arXiv:2511.22235},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LLaMA-Factory		LLaMA-Factory
docker		docker
examples		examples
multi_agent		multi_agent
recipe		recipe
scripts		scripts
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
fig2.png		fig2.png
fig3.png		fig3.png
pyproject.toml		pyproject.toml
requirement.txt		requirement.txt
requirements-cuda.txt		requirements-cuda.txt
requirements-npu.txt		requirements-npu.txt
requirements_sglang.txt		requirements_sglang.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

🐘 Introduction

🌠 Key Features

🎭 Getting Started

🌲Env

♨️ Warm-up SFT

1. Initialize LLaMA-FActory first (you can also follow official repo to initialize)

2. Fine-Tuning

3. Merge LoRA Weights:

💪🏾 Staged Execution-Feedback RL

1. Data Preparation

2. Download Executor

3. Train Coordinator

4. Train State Tracker

✒️ Eval

🙏 Acknowledgements

🙏 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

🐘 Introduction

🌠 Key Features

🎭 Getting Started

🌲Env

♨️ Warm-up SFT

1. Initialize LLaMA-FActory first (you can also follow official repo to initialize)

2. Fine-Tuning

3. Merge LoRA Weights:

💪🏾 Staged Execution-Feedback RL

1. Data Preparation

2. Download Executor

3. Train Coordinator

4. Train State Tracker

✒️ Eval

🙏 Acknowledgements

🙏 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages