feat(env): support android_world env per-turn training by PolarisDane · Pull Request #30 · open-tinker/OpenTinker

PolarisDane · 2026-02-19T13:52:27Z

📑 Description

Added support for training in android_world environment. Currently, training is conducted in a per-turn fashion due to the context window limit.

ℹ Additional Information

Android emulator may require hardware support or some privileged commands on your device. Make sure they are available.

…nfigurations Summary of changes: - Completed android_world_multiturn.md with setup, usage, and reward structure. - Updated android_world_param.yaml to set default env_shards to 1. - Added ADB and port environment variables to android_world_game.py. - Adjusted launch_scheduler.sh for local environment paths and GPU configuration. - Implemented prompt truncation in generic_agent_loop.py to prevent tensor size mismatch (replacing submodule modification).

…ining Core Android World environment: - AndroidWorldGame with multi-emulator shard support - AndroidWorldServer with emulator-to-worker binding - Multimodal VL prompt templates (INITIAL/ACTION split) - gym_environment_interaction with worker-to-endpoint binding Android agent loop: - AndroidAgentLoop (1190 lines) with multimodal VL support - Per-turn training mode with expansion_index - <obs> separator based generation context optimization - Agent registry entry in agent.yaml Per-turn training system: - PerTurnAgentLoopManager expanding multi-turn episodes into per-turn samples - per_turn_agent_loop.py with expansion logic and reward gamma discounting - Backend patches: ray_trainer.py (PerTurnAgentLoopManager import), rollout.py (per_turn_training/per_turn_reward_gamma config) - http_training_server.py expansion of batch tensors via expansion_index Infrastructure improvements: - base_game.py/base_game_environment.py: agent_loop_name class attribute - job_scheduler.py: ROLLOUT_TRACE_JOB_ID env var + KL divergence forwarding - generic_agent_loop.py: per-job trace subdirectory isolation, JSONL fix - actor.yaml: ppo_max_token_len_per_gpu 32768 Scripts and config: - run_android.sh with env var configuration (no hardcoded paths) - launch_scheduler.sh with generic trace dir - launch_http_server.py with generic checkpoint dir - scheduler.yaml and android_world_param.yaml cleaned

lwaekfjlk

LGTM

PolarisDane added 4 commits February 19, 2026 13:35

Android_world init

a7032e7

docs: add Android World agent to README quick start table

10c2df6

lwaekfjlk approved these changes Feb 20, 2026

View reviewed changes

lwaekfjlk changed the title ~~feature/android_world env per-turn training~~ feat(env): support android_world env per-turn training Feb 20, 2026

lwaekfjlk merged commit 0b31a0c into open-tinker:main Feb 20, 2026
2 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(env): support android_world env per-turn training#30

feat(env): support android_world env per-turn training#30
lwaekfjlk merged 4 commits intoopen-tinker:mainfrom
PolarisDane:feature/android-world-multi-emulator

PolarisDane commented Feb 19, 2026

Uh oh!

lwaekfjlk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PolarisDane commented Feb 19, 2026

📑 Description

ℹ Additional Information

Uh oh!

lwaekfjlk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants