Skip to content

feat(env): support android_world env per-turn training#30

Merged
lwaekfjlk merged 4 commits intoopen-tinker:mainfrom
PolarisDane:feature/android-world-multi-emulator
Feb 20, 2026
Merged

feat(env): support android_world env per-turn training#30
lwaekfjlk merged 4 commits intoopen-tinker:mainfrom
PolarisDane:feature/android-world-multi-emulator

Conversation

@PolarisDane
Copy link
Contributor

📑 Description

Added support for training in android_world environment. Currently, training is conducted in a per-turn fashion due to the context window limit.

ℹ Additional Information

Android emulator may require hardware support or some privileged commands on your device. Make sure they are available.

…nfigurations

Summary of changes:
- Completed android_world_multiturn.md with setup, usage, and reward structure.
- Updated android_world_param.yaml to set default env_shards to 1.
- Added ADB and port environment variables to android_world_game.py.
- Adjusted launch_scheduler.sh for local environment paths and GPU configuration.
- Implemented prompt truncation in generic_agent_loop.py to prevent tensor size mismatch (replacing submodule modification).
…ining

Core Android World environment:
- AndroidWorldGame with multi-emulator shard support
- AndroidWorldServer with emulator-to-worker binding
- Multimodal VL prompt templates (INITIAL/ACTION split)
- gym_environment_interaction with worker-to-endpoint binding

Android agent loop:
- AndroidAgentLoop (1190 lines) with multimodal VL support
- Per-turn training mode with expansion_index
- <obs> separator based generation context optimization
- Agent registry entry in agent.yaml

Per-turn training system:
- PerTurnAgentLoopManager expanding multi-turn episodes into per-turn samples
- per_turn_agent_loop.py with expansion logic and reward gamma discounting
- Backend patches: ray_trainer.py (PerTurnAgentLoopManager import),
  rollout.py (per_turn_training/per_turn_reward_gamma config)
- http_training_server.py expansion of batch tensors via expansion_index

Infrastructure improvements:
- base_game.py/base_game_environment.py: agent_loop_name class attribute
- job_scheduler.py: ROLLOUT_TRACE_JOB_ID env var + KL divergence forwarding
- generic_agent_loop.py: per-job trace subdirectory isolation, JSONL fix
- actor.yaml: ppo_max_token_len_per_gpu 32768

Scripts and config:
- run_android.sh with env var configuration (no hardcoded paths)
- launch_scheduler.sh with generic trace dir
- launch_http_server.py with generic checkpoint dir
- scheduler.yaml and android_world_param.yaml cleaned
Copy link
Collaborator

@lwaekfjlk lwaekfjlk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lwaekfjlk lwaekfjlk changed the title feature/android_world env per-turn training feat(env): support android_world env per-turn training Feb 20, 2026
@lwaekfjlk lwaekfjlk merged commit 0b31a0c into open-tinker:main Feb 20, 2026
2 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants