The Android Phone Agent is a Python-based tool that uses Claude AI to automate interactions with Android devices. Modified from the AndroidPhoneAgent project, it captures screenshots, analyzes them using Claude AI, and performs simulated touch operations based on AI-generated instructions.
- Screen capture and analysis using Claude AI
- Simulated cursor movements and clicks
- Conversation logging
- Configurable AI model parameters
- User-friendly graphical interface for easy interaction and real-time feedback
- Automatic flashing of the Android Mirroring app window when starting a task
- Compatible with any Android model connected through the Android Mirroring macOS app
-
Clone this repository:
git clone git@github.com:linlide/AndroidPhoneAgent.git cd AndroidPhoneAgent -
Install the required dependencies using the provided requirements.txt file:
pip install -r requirements.txt
Run the script with the following command:
python main.py
This will launch the graphical user interface. Enter your Anthropic API key, configure the parameters, and provide a task description in the input fields. Click the "Start Task" button to begin the automation process.
The application allows you to configure the following parameters through the GUI:
- API Key: Your Anthropic API key
- Model: The Claude AI model to use (default: "claude-3-5-sonnet-20240620")
- Max Tokens: Maximum number of tokens in Claude's response (default: 2048)
- Temperature: Temperature for Claude's responses (0.0 to 1.0, default: 0.7)
- Max Messages: Maximum number of messages in the conversation (default: 20)
- Task Description: The task you want the agent to perform on the mirrored Android screen
The project is organized into multiple files for better modularity and maintainability:
main.py: Entry point of the applicationgui.py: Contains the MainWindow class and GUI-related codeagent.py: Contains the AndroidPhoneAgent class for interacting with the Android mirroring and Claude APIconstants.py: Contains constant values like SYSTEM_PROMPT and TOOLSscreen.py: Contains utility functions for screen capture, window management, and cursor operations
- When the "Start Task" button is clicked, the application attempts to bring the Android Mirroring app window to the front and flash it to draw attention.
- The script captures a screenshot of the mirrored Android screen.
- The screenshot is sent to Claude AI for analysis.
- Claude AI provides instructions for the next action (move cursor, click, etc.).
- The script executes the instructed action using PyAutoGUI.
- This process repeats until the task is completed or the maximum number of messages is reached.
- The GUI provides real-time updates on the task progress and displays the current screenshot.
- The effectiveness of the automation depends on the quality of the mirrored display and the complexity of the task.
- The script relies on the Anthropic API, so an active internet connection is required.
- The window flashing feature requires the Android Mirroring app to be running and visible on the screen.
- The AI's ability to interact with the Android interface may vary depending on the specific app or task being performed.
Contributions to improve the Android Mirroring Agent are welcome. Please feel free to submit issues or pull requests.
This project is modified from iPhoneMirroringAgent by instapal-ai. Special thanks to their original work and contributions.
The main modifications include:
- Added Android device support
- Enhanced ADB integration for direct device control
- Improved UI XML capture for better interaction accuracy
- Added more interaction tools (swipe, long press, key events)
- Optimized screenshot capture process
This project is licensed under the MIT License. See the LICENSE file for details.
