Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Choose an example below to get started. Each example includes step-by-step instr
| **[VLM Multi-Turn Math](docs/vlm_geo3k_multiturn.md)** | geometry 3k math problem solving with tool calling | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/r39htm2o?nw=nwuserzhusq20) |
| **[LLM Gomoku Agent](docs/gomoku_multiturn.md)** | A multi-turn gomoku agent | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/7a7ggkw3?nw=nwuserzhusq20) |
| **[LLM AlfWorld Agent](docs/alfworld_multiturn.md)** | A multi-turn alfworld agent | [wandb](https://wandb.ai/1125027232/opentinker-public/runs/3jrlolk7?nw=nwuser1125027232) |
| **[LLM Android World Agent](docs/android_world_multiturn.md)** | A multi-turn android world agent | |


## 📦 Installation
Expand Down
232 changes: 232 additions & 0 deletions docs/android_world_multiturn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
# LLM Game Agent (AndroidWorld Multi-Turn)

This example demonstrates training a language model to complete tasks in the Android operating system environment using AndroidWorld.

## Overview

**AndroidWorld** is a dynamic benchmarking environment for autonomous agents to interact with the Android operating system. The agent perceives the screen via a list of UI elements and interacts by performing actions like clicking, typing, and scrolling.

Tasks include:
- Adding contacts
- Managing settings
- Browsing information
- Sending messages
- And more...

## Prerequisites

1. Complete the [Installation](../README.md#-installation) steps.
2. **Environment Setup**: You must install the Android SDK and run an Emulator. See the **[Detailed Environment Setup](#detailed-environment-setup)** section below for instructions.
3. Get your IP address: `hostname -I`

## Step 1: Start the Scheduler (Server Side)

```bash
bash opentinker/scripts/launch_scheduler.sh --scheduler-port <scheduler_port>
```

## Step 2: Start the AndroidWorld Environment (Server Side)

Before starting the environment server, ensure your Android Emulator is running (see setup below).

```bash
python -m opentinker.environment.android_world.android_world_server \
--port 8092 \
--max_steps 50 \
--split train
```

**Server Options:**

- `--port`: Server port (default: 8082, recommend 8092 to match client config)
- `--max_steps`: Max steps per episode (default: 50)
- `--split`: Dataset split (`train`, `eval_in_distribution`, `eval_out_of_distribution`)
- `--shards`: Number of parallel server instances (for parallel training)

## Step 3: Run Training

```bash
python opentinker/client/android_world_rl.py \
tokenizer_path=Qwen/Qwen2.5-3B-Instruct \
batch_size=4 \
val_batch_size=50 \
num_steps=1000 \
save_freq=20000 \
test_freq=10 \
scheduler_url=http://<server_endpoint>:<scheduler_port> \
interaction.config.env_port=8092 \
interaction.config.env_host=<env_server_endpoint>
```

**Training Parameters:**

- `num_steps`: Total training steps (alternative: use `num_epochs`)
- `batch_size`: Training batch size
- `val_batch_size`: Validation samples per evaluation
- `test_freq`: Validation frequency (every N steps)
- `adv_estimator`: Advantage estimator (`gae`, `grpo`, `grpo_per_step`)

## Reward Structure

| Event | Reward |
| :--------------- | ------ |
| Task Success | +10.0 |
| Task Failure | -1.0 |
| Per Step Penalty | -0.01 |
| Invalid Action | -0.1 |

## Example Actions

The agent interacts with the environment by outputting JSON commands referencing UI element indices:

- **Click**: `{"action_type": "click", "index": 4}`
- **Type**: `{"action_type": "input_text", "text": "Alice", "index": 2}`
- **Scroll**: `{"action_type": "scroll", "direction": "down"}`
- **Open App**: `{"action_type": "open_app", "app_name": "Settings"}`
- **Navigate Home**: `{"action_type": "navigate_home"}`
- **Navigate Back**: `{"action_type": "navigate_back"}`
- **Answer Question**: `{"action_type": "answer", "text": "It is 5 PM."}`
- **Finish Task**: `{"action_type": "status", "goal_status": "complete"}`

## Configuration Reference

See [`opentinker/client/client_config/android_world_param.yaml`](../opentinker/client/client_config/android_world_param.yaml) for full configuration options.

---

## Detailed Environment Setup

### 1. Android SDK & Command Line Tools

If you do not have Android Studio installed, you can set up the command-line tools manually.

1. **Create Directory Structure:**
```bash
mkdir -p /usr/local/android-sdk/cmdline-tools
cd /usr/local/android-sdk/cmdline-tools
```

2. **Download Command Line Tools:**
```bash
wget https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O cmdline-tools.zip
unzip cmdline-tools.zip
mv cmdline-tools latest
rm cmdline-tools.zip
```

3. **Install SDK Components:**
```bash
export ANDROID_HOME=/usr/local/android-sdk
export PATH=$ANDROID_HOME/cmdline-tools/latest/bin:$PATH

# Accept licenses
yes | sdkmanager --licenses --sdk_root=$ANDROID_HOME

# Install Platform Tools (adb), Android 33 Platform, and Build Tools
sdkmanager "platform-tools" "platforms;android-33" "build-tools;34.0.0" "emulator" --sdk_root=$ANDROID_HOME
```

4. **Configure Environment Variables:**
Add the following to your shell configuration file (`~/.bashrc` or `~/.zshrc`):
```bash
export JAVA_HOME="/usr/local/android-studio/jbr" # Or your JDK path
export ANDROID_HOME="/usr/local/android-sdk"
export PATH="$JAVA_HOME/bin:$ANDROID_HOME/cmdline-tools/latest/bin:$ANDROID_HOME/platform-tools:$ANDROID_HOME/emulator:$PATH"
```

### 2. Create Android Virtual Device (AVD)

Create an AVD named `AndroidWorldAvd` targeting Android 13 (Tiramisu, API 33).

1. **Install System Image:**
* For x86_64 (Standard PC):
```bash
sdkmanager "system-images;android-33;google_apis;x86_64" --sdk_root=$ANDROID_HOME
```
* For ARM64 (Apple Silicon or Software Emulation on x86):
```bash
sdkmanager "system-images;android-33;google_apis;arm64-v8a" --sdk_root=$ANDROID_HOME
```

2. **Create AVD:**
```bash
echo "no" | avdmanager create avd --name AndroidWorldAvd --package "system-images;android-33;google_apis;x86_64" --device "pixel_6"
```
*(Replace `x86_64` with `arm64-v8a` if applicable)*

### 3. Launch Emulator

Start the emulator in a separate terminal or background process using the `sg` command to ensure correct group permissions (e.g., `kvm`).

* **Standard Launch (with GUI):**
```bash
sg kvm -c "emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554"
```

* **Headless Launch (Server/Docker):**
```bash
sg kvm -c "emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554 -no-window -no-audio"
```

* **Software Emulation (No KVM):**
If hardware acceleration is unavailable, add `-accel off`. **Warning: Performance will be very low.**
```bash
emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554 -no-window -no-audio -accel off
```

## Quick Start with `run_android.sh`

For multi-emulator parallel training, we provide an all-in-one launcher script [`opentinker/scripts/run_android.sh`](../opentinker/scripts/run_android.sh) that automates AVD creation, emulator startup, environment server, and training client.

### Usage

Run each step in a **separate terminal**:

```bash
# Step 0 (one-time): Create N AVDs for parallel training
bash opentinker/scripts/run_android.sh setup-avds

# Step 1: Start the scheduler
bash opentinker/scripts/run_android.sh scheduler

# Step 2: Start N Android emulators in parallel
bash opentinker/scripts/run_android.sh simulator

# Step 3: Start the sharded environment server (after emulators fully boot)
bash opentinker/scripts/run_android.sh env

# Step 4: Launch RL training
bash opentinker/scripts/run_android.sh client
```

### Environment Variables

All settings are configurable via environment variables:

| Variable | Default | Description |
| :------- | :------ | :---------- |
| `NUM_EMULATORS` | `4` | Number of parallel emulators |
| `NUM_GPUS` | `4` | Number of GPUs for model parallelism |
| `GPUS` | `[0,1,2,3]` | GPU device list |
| `MODEL_PATH` | `Qwen/Qwen2.5-3B-Instruct` | Model path or HuggingFace ID |
| `AVD_NAME` | `AndroidWorldAvd` | AVD name prefix (creates `{AVD_NAME}_0`, `{AVD_NAME}_1`, ...) |
| `EMULATOR_HEADLESS` | `1` | Set `0` to show emulator GUI |
| `EMULATOR_NO_KVM` | `0` | Set `1` for software emulation (slow) |
| `SCHEDULER_PORT` | `9780` | Scheduler listen port |
| `ENV_PORT` | `9092` | Environment server base port |

**Example** — scale to 8 emulators on 8 GPUs:

```bash
NUM_EMULATORS=8 NUM_GPUS=8 GPUS="[0,1,2,3,4,5,6,7]" bash opentinker/scripts/run_android.sh setup-avds
# Then run scheduler / simulator / env / client with the same env vars
```

---

## Troubleshooting

* **"KVM is not found"**: Ensure virtualization is enabled in your BIOS/Hypervisor. On Linux, check permissions for `/dev/kvm`. If in a container, run with `--device /dev/kvm`.
* **Emulator crashes immediately**: Check logs. If running x86_64 image on ARM or vice-versa, the emulator will fail. Use the correct system image for your host architecture.
* **"ADB command not found"**: Ensure `platform-tools` is in your `$PATH`.
* **"Process system isn't responding"**: Common in software emulation (`-accel off`). Wait for the system to stabilize or dismiss the dialog.
Empty file.
Empty file.
Loading
Loading