DigiData: Training and Evaluating General-Purpose Mobile Control Agents

This repo is the home to DigiData paper.

Tip

Click to Navigate!

Dataset Release

Running DigiData-Bench

Updates

[Nov-10-25]: DigiData paper is released. 🔥🔥

Dataset Release

Coming soon...

Running DigiData-Bench

Note

We will release the full task set and scaffolding code soon, allowing you to create your own agent and evaluate it on the complete DigiData-Bench suite. For now, you can run the end-to-end benchmark using the provided demo set with either GPT-4o or Llama4 by following the instructions below.

Step 1: Install the Required Packages

Create a conda environment by running

conda create --name digidata_bench python=3.12

then activate it with

conda activate digidata_bench

finally install the required packages by running

pip install -r requirements.txt.

Step 2: Set up the Environment

Follow the instructions here to set up the environment. After this step, you should have an running emulator as well as a terminal window open and have the Appium server running.

Step 3: Set up Model API Key

In our default driver, we are using OpenAI's API to generate the model. You will need to set up an API key for this.

To use OpenAI model, get api key from here and set it as an environment variable called OPENAI_API_KEY

export OPENAI_API_KEY=<YOUR_API_KEY>

To use Llama4 model, get api key from here and set it as an environment variable called LLAMA_API_KEY. Also set api_key_name in the configuration file to be LLAMA_API_KEY

export LLAMA_API_KEY=<YOUR_API_KEY>

Step 4: Run the Benchmark

We provide a script to run the benchmark end-to-end. You can run it as follows:

python benchmark.py --config_filepath "configs/demo_3_bench_gpt4o.json"

This script will run a subset of the benchmark containing only 3 tasks, for demonstration purposes. Full task list will be released soon.

License

The Data is released under CC-by 4.0. The CoT and descriptions are outputs of Llama 4, and subject to the Llama 4 license (https://github.com/meta-llama/llama-models/tree/main/models/llama4). if you use of this portion of the data to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. Third party content pulled from other locations are subject to its own licenses and you may have other legal obligations or restrictions that govern your use of that content.

Citation

@misc{sun2025digidatatrainingevaluatinggeneralpurpose,
      title={DigiData: Training and Evaluating General-Purpose Mobile Control Agents}, 
      author={Yuxuan Sun and Manchen Wang and Shengyi Qian and William R. Wong and Eric Gan and Pierluca D'Oro and Alejandro Castillejo Munoz and Sneha Silwal and Pedro Matias and Nitin Kamra and Satwik Kottur and Nick Raines and Xuanyi Zhao and Joy Chen and Joseph Greer and Andrea Madotto and Allen Bolourchi and James Valori and Kevin Carlberg and Karl Ridgeway and Joseph Tighe},
      year={2025},
      eprint={2511.07413},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2511.07413}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
auto_evaluator		auto_evaluator
configs		configs
env		env
predictor		predictor
scripts		scripts
task		task
task_registry		task_registry
utils		utils
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
benchmark.py		benchmark.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Updates

Dataset Release

Running DigiData-Bench

Step 1: Install the Required Packages

Step 2: Set up the Environment

Step 3: Set up Model API Key

Step 4: Run the Benchmark

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/DigiData

Folders and files

Latest commit

History

Repository files navigation

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Updates

Dataset Release

Running DigiData-Bench

Step 1: Install the Required Packages

Step 2: Set up the Environment

Step 3: Set up Model API Key

Step 4: Run the Benchmark

License

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages