Skip to content

kwai/MemGUI-Agent

Repository files navigation

MemGUI-Agent Logo
MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Python License Stars arXiv HuggingFace Dataset HuggingFace Model Project Page MemGUI-Bench MobileWorld

Official implementation, training, and evaluation code for MemGUI-Agent.


MemGUI-8B-SFT-Demo.mp4


▶ Watch the full demo on YouTube

🔥 News

Main Results

Context efficiency and benchmark performance of MemGUI-Agent

MemGUI-Agent improves both zero-shot 235B and trained 8B settings on long-horizon mobile GUI benchmarks. On MemGUI-Bench, MemGUI-Agent-235B reaches 62.5% Pass@3, and MemGUI-8B-SFT reaches 23.4% Pass@1. On the out-of-distribution MobileWorld GUI-Only benchmark, MemGUI-Agent-235B reaches 29.1% success rate, and MemGUI-8B-SFT reaches 17.9%.

Full benchmark trajectories are available on the official pages: MemGUI-Bench and MobileWorld.

For captioned leaderboard tables and paper figures, see the project page.

Overview

MemGUI-Agent is an end-to-end mobile GUI agent for long-horizon tasks that require remembering progress, preserving UI facts, and controlling prompt growth. Its core interface, ConAct (Context-as-Action), makes context management part of each model response instead of an external module.

ConAct maintains three structured fields: Folded Action History, Folded UI State, and Recent Step Record.

ConAct framework

MemGUI-Agent updates folded history, UI memory, and recent step records while producing the next GUI action. We evaluate both a zero-shot 235B ConAct agent with unchanged backbone weights and MemGUI-8B-SFT, an 8B agent trained on MemGUI-3K.

MemGUI-3K

MemGUI-3K contains 2,956 successful mobile GUI trajectories, 82,103 task steps, and 64,430 evaluator-approved reasonable steps.

MemGUI-3K dataset statistics

Dataset usage: data/memgui3k/README.md.

Repository Layout

MemGUI-Agent/
|-- data/
|   `-- memgui3k/                  # Dataset download, restore, packaging, and conversion tools
|-- evaluation/
|   `-- memgui3k_offline_eval/     # Step-level offline evaluation on MemGUI-3K
|-- scripts/                       # Convenience entrypoints
|-- training/
|   `-- ms_swift/                  # MemGUI-8B-SFT ms-swift LoRA SFT template
|-- website/                       # Project-page notes
|-- requirements.txt
`-- README.md

Quick Start

Install the Python dependencies used by the public utilities:

pip install -r requirements.txt

Download MemGUI-3K from Hugging Face:

bash scripts/download_memgui3k.sh

Restore screenshots into data/MemGUI-3K/images/:

bash scripts/restore_memgui3k_images.sh

Build step-level multimodal training JSONL files:

bash scripts/build_memgui3k_training_data.sh

This writes:

data/MemGUI-3K/training_data/
|-- train_sft.jsonl
`-- test_sft.jsonl

Training MemGUI-8B-SFT

MemGUI-8B-SFT is trained with ms-swift from Qwen3-VL-8B-Instruct. The released template keeps the paper's key hyperparameters:

Parameter Value
Base model Qwen/Qwen3-VL-8B-Instruct
Training type LoRA SFT
Epochs 1
Learning rate 1e-4
LoRA rank / alpha 8 / 32
Target modules all-linear
Max length 32768
Per-device train batch size 2
Gradient accumulation 8
GPUs 8

Run the public template:

bash training/ms_swift/train_memgui_8b_sft.sh

See training/ms_swift/README.md for the full command and environment variables.

Evaluation

The offline evaluation toolkit compares model outputs with MemGUI-3K gold step responses and reports action matching, memory actions, folding quality, and format compliance. See evaluation/memgui3k_offline_eval/README.md.

For end-to-end rollout scripts, trajectories, and evaluation results, see:

Contact

For questions about the paper, code, or released artifacts, contact guangyiliu@zju.edu.cn or the corresponding author at yongliu@iipc.zju.edu.cn.

⭐ Star History

Star History Chart

Citation

@article{liu2026memgui,
  title={MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
  author={Liu, Guangyi and Wu, Gao and Liu, Congxiao and Zhao, Pengxiang and Liu, Liang and Li, Mading and Zhang, Qi and Wang, Mengyan and Guo, Liang and Liu, Yong},
  journal={arXiv preprint arXiv:2606.19926},
  year={2026}
}

License

Code in this repository is released under the Apache License 2.0.

About

Official code repo for the paper "MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management"

Topics

Resources

License

Stars

Watchers

Forks

Contributors