🖥️ LongHorizonUI

A Long-Horizon GUI Automation Agent Framework with Enhanced Perception, Deep Reflection, and Compensating Execution

📖 Introduction

LongHorizonUI is an agent framework designed for long-horizon GUI automation tasks. Existing GUI agents suffer from rapid success-rate degradation in long-step tasks (>10 steps) due to error accumulation. LongHorizonUI addresses this problem through three core modules:

Module	Description
Multi-source Enhanced Perceiver (MEP)	Runs icon detection and OCR in parallel, resolves compound widget ambiguity via IoU semantic binding, and repairs missing key elements with template matching
Deep Reflective Decider (DRD)	Multi-step look-ahead reasoning, retrospective action review, and causal inference on UI states for high-quality action decisions
Compensating Action Executor (CAE)	Three-level fallback strategy (Index → Relative → Absolute+ε), post-execution verification, progress monitoring, and automatic rollback

🚀 Quick Start

1. Environment Setup

# Clone the repository
git clone <your-repo-url>
cd LongHorizonUI

# Install dependencies
pip install -r requirements.txt

2. Configure LLM API

cp .env.example .env
# Edit the .env file and fill in the API keys for your LLM provider

Supported LLM Providers:

Provider	Required Environment Variables
Gemini (Recommended)	`LLM_PROJECT`, `LLM_LOCATION`
Azure OpenAI	`AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`
OpenAI	`OPENAI_ENDPOINT`, `OPENAI_API_KEY`

3. Download Dataset (Optional)

We provide the LongGUIBench benchmark dataset for evaluation:

🤗 LongGUIBench Dataset (link available upon publication)

After downloading, place the data under the data/ directory:

data/
├── general/          # General application scenarios
│   ├── app_a/
│   │   ├── task_001/
│   │   │   ├── screenshot/     # UI screenshot sequences
│   │   │   │   ├── 001.png
│   │   │   │   ├── 002.png
│   │   │   │   └── ...
│   │   │   └── task_infos.json # Task description and annotations
│   │   └── ...
│   ├── app_b/
│   └── ...
└── game/             # Game application scenarios
    ├── hero/
    └── ...

task_infos.json format example:

{
  "task_name": "Create a new email in a mail app and send it to a contact",
  "task_steps": [
    {"action": "Click the menu button in the top-left corner"},
    {"action": "Select the compose email option"},
    {"action": "Enter the recipient address"},
    {"action": "Enter the email subject"},
    {"action": "Click the send button"}
  ]
}

💻 Usage

Mode 1: Offline (Screenshot Simulation)

No phone connection required. Simulates the agent's full reasoning and execution pipeline based on pre-recorded screenshot sequences. Suitable for:

Offline evaluation and experiment reproduction
Development and debugging without an Android device

# Low instruction mode (detailed step-by-step instructions provided)
python run.py offline \
  --data_dir data/general/app_a \
  --instruction_level low \
  --provider gemini \
  --model gemini-2.5-pro

# High instruction mode (only task description provided, agent plans autonomously)
python run.py offline \
  --data_dir data/game/game_a \
  --instruction_level high \
  --provider gemini \
  --model gemini-2.5-pro

Common Parameters

Parameter	Description	Default
`--provider`	LLM provider	`gemini`
`--model`	Model name	`gemini-2.5-pro`
`--instruction_level`	Instruction level: `high` / `low`	`low`
`--max_steps`	Maximum execution steps	`100`
`--temperature`	LLM sampling temperature	`0.4`
`--output_dir`	Output directory	`./output`

📊 Main Results

LongGUIBench

On our self-constructed LongGUIBench, LongHorizonUI significantly outperforms existing methods across both general and game long-horizon scenarios.

ScreenSpot

On the ScreenSpot cross-platform UI grounding benchmark, LongHorizonUI surpasses previous state-of-the-art methods, validating the effectiveness of the IoU semantic binding strategy in the enhanced perception module.

📝 Citation

If this project is helpful for your research, please cite:

@inproceedings{anonymous2026longhorizonui,
  title={LongHorizon{UI}: A Unified Framework for Robust long-horizon Task Automation of {GUI} Agent},
  author={Anonymous},
  booktitle={Conference on Learning Representations},
  year={2026},
  url={#}
}

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
LonghorizonAgent		LonghorizonAgent
data		data
page		page
tests		tests
tools/android_record		tools/android_record
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖥️ LongHorizonUI

📖 Introduction

🚀 Quick Start

1. Environment Setup

2. Configure LLM API

3. Download Dataset (Optional)

💻 Usage

Mode 1: Offline (Screenshot Simulation)

Common Parameters

📊 Main Results

LongGUIBench

ScreenSpot

📝 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🖥️ LongHorizonUI

📖 Introduction

🚀 Quick Start

1. Environment Setup

2. Configure LLM API

3. Download Dataset (Optional)

💻 Usage

Mode 1: Offline (Screenshot Simulation)

Common Parameters

📊 Main Results

LongGUIBench

ScreenSpot

📝 Citation

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages