# ARPO Training - UI-TARS-2B (Colab GPU + Mac OSWorld)\n\nTrain UI-TARS-2B on 128 OSWorld tasks using Colab GPU for inference and Mac for training orchestration.\n\n## Prerequisites\n\n- ‚úÖ Colab GPU server running (`GPU_Server_for_OSWorld.ipynb`)\n- ‚úÖ VMware Fusion + Ubuntu VM ready\n- ‚úÖ wandb configured\n\n**See**: `PRE_TRAINING_CHECKLIST.md` for complete setup verification

## 1. Environment Check

In [None]:
import os\nimport sys\nimport json\nfrom pathlib import Path\n\nARPO_ROOT = Path(\"/Users/hanszhu/Desktop/ARPO_replicate\")\nos.chdir(ARPO_ROOT)\nsys.path.insert(0, str(ARPO_ROOT))\n\nprint(f\"‚úÖ Working directory: {os.getcwd()}\")\nprint(f\"‚úÖ Python: {sys.executable}\")\n\n# Check dependencies\ntry:\n    import torch, transformers, wandb\n    print(f\"‚úÖ PyTorch {torch.__version__}\")\n    print(f\"‚úÖ Transformers {transformers.__version__}\")\n    print(f\"‚úÖ wandb {wandb.__version__}\")\nexcept ImportError as e:\n    print(f\"‚ùå Missing: {e}\")

## 2. Training Configuration

In [None]:
config = {\n    # Model\n    \"model\": \"ByteDance-Seed/UI-TARS-2B-SFT\",\n    \"inference_server\": \"https://YOUR-NGROK-URL/v1\",  # ‚¨ÖÔ∏è UPDATE FROM COLAB!\n    \n    # Training\n    \"tasks\": 128,\n    \"num_envs\": 4,\n    \"rollouts_per_task\": 4,\n    \"epochs\": 1,\n    \"max_steps\": 16,\n    \"batch_size\": 8,\n    \n    # Paths\n    \"train_data\": str(ARPO_ROOT / \"test_data\" / \"osworld_examples\" / \"train_all_128.json\"),\n    \"result_dir\": str(ARPO_ROOT / \"results_training_128\"),\n    \"checkpoint_dir\": str(ARPO_ROOT / \"checkpoints_training_128\"),\n    \n    # wandb\n    \"wandb_entity\": \"hanszhu05-university-of-pennsylvania-org\",\n    \"wandb_project\": \"arpo-uitars-training\",\n}\n\nprint(\"Training Configuration:\")\nprint(json.dumps(config, indent=2))\nprint()\nprint(f\"Expected time: ~34-68 hours for {config['epochs']} epoch\")

## 3. Verify Colab Server Connection

In [None]:
import requests\n\nserver_url = config[\"inference_server\"].replace(\"/v1\", \"\")\n\nif \"YOUR-NGROK-URL\" in server_url:\n    print(\"‚ùå Update config['inference_server'] with your Colab ngrok URL!\")\nelse:\n    try:\n        response = requests.get(f\"{server_url}/health\", timeout=5)\n        if response.status_code == 200:\n            print(f\"‚úÖ Server reachable: {server_url}\")\n            print(f\"Server: {response.json()}\")\n        else:\n            print(f\"‚ùå Server returned {response.status_code}\")\n    except Exception as e:\n        print(f\"‚ùå Cannot reach server: {e}\")\n        print(\"Make sure Colab GPU server is running!\")

## 4. Update OSWorld Agent Config

In [None]:
import shutil\n\nagent_file = ARPO_ROOT / \"OSWorld\" / \"mm_agents\" / \"uitars_agent.py\"\nbackup_file = agent_file.with_suffix('.py.backup_training')\n\n# Backup\nif not backup_file.exists():\n    shutil.copy(agent_file, backup_file)\n    print(f\"‚úÖ Created backup: {backup_file}\")\n\n# Update base_url\ncontent = agent_file.read_text()\nif \"YOUR-NGROK-URL\" not in config[\"inference_server\"]:\n    # Simple replacement\n    lines = content.split('\\n')\n    for i, line in enumerate(lines):\n        if 'base_url=' in line and '__init__' in lines[max(0, i-5):i+1]:\n            lines[i] = f'        base_url=\"{config[\"inference_server\"]}\",\\n'\n            break\n    agent_file.write_text('\\n'.join(lines))\n    print(f\"‚úÖ Updated agent to: {config['inference_server']}\")\nelse:\n    print(\"‚ö†Ô∏è  Update config['inference_server'] first!\")

## 5. Initialize wandb

In [None]:
import wandb\n\n# Initialize wandb\nrun = wandb.init(\n    entity=config[\"wandb_entity\"],\n    project=config[\"wandb_project\"],\n    name=\"uitars-2b-128tasks-epoch1\",\n    config=config,\n    tags=[\"ui-tars-2b\", \"128-tasks\", \"colab-gpu\", \"1-epoch\"],\n)\n\nprint(f\"‚úÖ wandb run started: {wandb.run.url}\")\nprint(f\"View at: https://wandb.ai/{config['wandb_entity']}/{config['wandb_project']}\")

## 6. Run Training\n\n‚ö†Ô∏è **This will take ~34-68 hours!** Make sure:\n- Colab server stays running (keep tab open)\n- Stable internet connection\n- Mac stays awake (disable sleep)

In [None]:
import subprocess\nimport time\n\n# Create output directories\nos.makedirs(config[\"result_dir\"], exist_ok=True)\nos.makedirs(config[\"checkpoint_dir\"], exist_ok=True)\n\nprint(\"üöÄ Starting ARPO Training...\")\nprint(f\"üìÅ Results: {config['result_dir']}\")\nprint(f\"üíæ Checkpoints: {config['checkpoint_dir']}\")\nprint(\"="*70)\n\nstart_time = time.time()\n\n# Training command\ncmd = [\n    \"python\", \"run_uitars.py\",\n    \"--headless\",\n    \"--observation_type\", \"screenshot\",\n    \"--max_steps\", str(config[\"max_steps\"]),\n    \"--model\", \"ui-tars-2b\",\n    \"--temperature\", \"0.7\",\n    \"--max_tokens\", \"256\",\n    \"--test_config_base_dir\", \"../test_data/osworld_examples\",\n    \"--test_all_meta_path\", config[\"train_data\"],\n    \"--result_dir\", config[\"result_dir\"],\n]\n\nprint(\"Training with:\")\nprint(f\"  Model: {config['model']}\")\nprint(f\"  Tasks: {config['tasks']}\")\nprint(f\"  Max steps: {config['max_steps']}\")\nprint(f\"  VMs: {config['num_envs']}\")\nprint()\n\n# Note: This runs evaluation-style. For full ARPO training with VERL,\n# use scripts/train_uitars_2b_arpo.sh\nprint(\"‚ö†Ô∏è  Running in evaluation mode (for testing)\")\nprint(\"For full ARPO training with experience replay, use:\")\nprint(\"  bash scripts/train_uitars_2b_arpo.sh\")\nprint()\n\ntry:\n    result = subprocess.run(\n        cmd,\n        cwd=ARPO_ROOT / \"OSWorld\",\n        capture_output=False,  # Show output in real-time\n        text=True,\n    )\n    \n    elapsed = time.time() - start_time\n    print(f\"\\n‚úÖ Complete in {elapsed/3600:.1f} hours\")\n    \nexcept KeyboardInterrupt:\n    print(\"\\nüõë Training interrupted\")\nexcept Exception as e:\n    print(f\"\\n‚ùå Error: {e}\")

## 7. View Results

In [None]:
# Analyze training results\nresults = []\nfor result_file in Path(config[\"result_dir\"]).rglob(\"result.txt\"):\n    try:\n        score = float(result_file.read_text().strip())\n        results.append(score)\n    except:\n        pass\n\nif results:\n    print(\"="*70)\n    print(f\"üìä Training Results ({len(results)} tasks)\")\n    print(\"="*70)\n    print(f\"Average Score: {sum(results)/len(results):.3f}\")\n    print(f\"Success Rate: {sum(1 for r in results if r >= 0.9)/len(results)*100:.1f}%\")\n    print(f\"Passed: {sum(1 for r in results if r >= 0.9)}/{len(results)}\")\n    print(\"="*70)\n    \n    # Log to wandb\n    if wandb.run:\n        wandb.log({\n            \"final_average_score\": sum(results)/len(results),\n            \"final_success_rate\": sum(1 for r in results if r >= 0.9)/len(results),\n            \"tasks_completed\": len(results),\n        })\nelse:\n    print(\"‚ö†Ô∏è  No results found yet - training may still be running\")

## 8. Cleanup

In [None]:
# Finish wandb run\nif wandb.run:\n    wandb.finish()\n    print(\"‚úÖ wandb run finished\")\n\n# Restore original agent config\nif backup_file.exists():\n    shutil.copy(backup_file, agent_file)\n    print(\"‚úÖ Restored original agent config\")

---\n\n## Summary\n\nThis notebook provides a simplified training workflow. For more details:\n\n- **Full training**: Use `scripts/train_uitars_2b_arpo.sh` with VERL framework\n- **Documentation**: See `TRAINING_WITH_COLAB.md`\n- **Paper details**: See `docs/PAPER_SUMMARY.md`\n- **Troubleshooting**: See `docs/TROUBLESHOOTING.md`\n\n**wandb Dashboard**: https://wandb.ai/hanszhu05-university-of-pennsylvania-org/arpo-uitars-training