Mobile agents rely on Large Language Models (LLMs) to plan and execute tasks on smartphone user interfaces (UIs). While cloud-based LLMs achieve high task accuracy, they require uploading the full UI state at every step, exposing unnecessary and often irrelevant information. In contrast, local LLMs avoid UI uploads but suffer from limited capacity, resulting in lower task success rates. We propose CORE, a COllaborative framework that combines the strengths of cloud and local LLMs to Reduce UI Exposure, while maintaining task accuracy for mobile agents. The pipeline is shown below.
Make sure you have an Android device or emulator available, and the following installed on your computer:
- Java Development Kit (JDK)
- Android SDK
We recommend using Conda to manage your Python environment:
conda create -n CORE python=3.8
conda activate CORE
pip install -r requirements.txtADB Keyboard is required for input automation.
-
Download ADBKeyBoard.apk.
-
Install the APK on your Android device or emulator:
adb install ADBKeyBoard.apk
-
Set ADB Keyboard as the default input method:
adb shell ime set com.android.adbkeyboard/.AdbIME
You can use the following public datasets for evaluation:
To use OpenAI's models, set your API key as an environment variable OPENAI_API_KEY.
Follow instructions at Ollama to deploy a local model (e.g., Gemma2-9B, etc.)
The following command runs the automation pipeline on an Android app:
python start.py -pn "com.simplemobiletools.calendar" -an "Calendar" -o "output" -task "create a new event, the task is 'laundry', save it" -keep_app-pn: Package name of the app-an: App name-o: Output folder-task: Natural language instruction
- Make sure your Android device or emulator is properly connected to your computer and in developer mode.
- Ensure the target app is already installed before running the script. You can try open-source apps from Simple Mobile Tools.
- This project reuses and adapts code from:
