# ARPO Smoke Test - GPU Cluster

**4 tasks, multi-GPU options (1/4/8 GPUs)**

**Prerequisites** (one-time setup):
```bash
pip install vllm flash-attn liger-kernel
newgrp docker  # Enable Docker access
```

Then run cells below to start training.

---

## üìã Prerequisites Check

Verify your environment is ready:

In [1]:
import os
import subprocess
import sys

# Change to repo directory
os.chdir('/home/kevinzyz/hansenzuishuai')
print(f"üìÅ Working directory: {os.getcwd()}")

# Docker helper function
def run_docker_cmd(cmd, **kwargs):
    """Run docker command, try with sudo if permission denied"""
    try:
        result = subprocess.run(['docker'] + cmd, check=True, capture_output=True, **kwargs)
        return result, False  # No sudo needed
    except (subprocess.CalledProcessError, PermissionError):
        try:
            result = subprocess.run(['sudo', 'docker'] + cmd, check=True, capture_output=True, **kwargs)
            return result, True  # Sudo needed
        except:
            raise RuntimeError("Docker not accessible even with sudo")

# Check Docker
try:
    result, use_sudo = run_docker_cmd(['ps'])
    if use_sudo:
        print("‚úÖ Docker access OK (using sudo)")
        os.environ['DOCKER_SUDO'] = 'sudo'
    else:
        print("‚úÖ Docker access OK (no sudo)")
        os.environ['DOCKER_SUDO'] = ''
except Exception as e:
    print(f"‚ùå Docker not accessible: {e}")
    print("Make sure Docker is running")

# Check GPU
try:
    import torch
    print(f"‚úÖ PyTorch {torch.__version__}")
    print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")
    print(f"‚úÖ GPU count: {torch.cuda.device_count()}")
except:
    print("‚ö†Ô∏è  PyTorch not imported")

# Check files
print(f"‚úÖ Config exists: {os.path.exists('configs/smoke.yaml')}")
print(f"‚úÖ Data exists: {os.path.exists('test_data/osworld_examples/train_smoke_4.json')}")
print(f"‚úÖ OSWorld exists: {os.path.exists('OSWorld/run_uitars.py')}")

üìÅ Working directory: /home/kevinzyz/hansenzuishuai
‚úÖ Docker access OK (using sudo)
‚úÖ PyTorch 2.9.1+cu128
‚úÖ CUDA available: True
‚úÖ GPU count: 8
‚úÖ Config exists: True
‚úÖ Data exists: True
‚úÖ OSWorld exists: True


## üì¶ Install Dependencies

**Prerequisites** (one-time setup in terminal):
```bash
pip install vllm  # Required
pip install flash-attn --no-build-isolation  # Optional (5-10 min)
```

This cell installs notebook-specific packages.

In [4]:
# Install all required packages
print("üì¶ Installing packages...")
!pip install -q transformers==4.57.6 accelerate==1.4.0
!pip install -q 'ray[default]' omegaconf wandb tensorboard
!pip install -q tensordict torchdata codetiming
!pip install -q mathruler pylatexenc qwen-vl-utils
!pip install -q datasets mlflow swanlab

# OSWorld dependencies (cloud providers, file handling)
!pip install -q PyDrive2 azure-identity azure-mgmt-compute azure-mgmt-network
!pip install -q boto3 chardet borb pypdf2

print("‚úÖ All packages installed")

# Note: GPU packages should be pre-installed in base environment:
# - vllm (required)
# - flash-attn (optional, 5-10 min compile)
# - liger-kernel (optional)

# Install OSWorld dependencies
print("üì¶ Installing OSWorld dependencies...")
!pip install -q numpy gymnasium Pillow opencv-python pyautogui pynput psutil tqdm pandas matplotlib
!pip install -q openai requests requests-toolbelt backoff flask PyYAML
!pip install -q lxml cssselect xmltodict beautifulsoup4 tiktoken loguru
!pip install -q openpyxl python-docx python-pptx pypdf pymupdf pdfplumber odfpy
!pip install -q ImageHash scikit-image librosa mutagen pyacoustid pygame
!pip install -q PyGetWindow rapidfuzz playwright func-timeout formulas fastdtw easyocr gdown
!pip install -q wrapt_timeout_decorator fabric

print("‚úÖ OSWorld dependencies installed")

# Install OSWorld without strict dependencies
%cd OSWorld
print("üì¶ Installing OSWorld...")
!pip install --no-deps -e .
%cd ..

# Verify critical imports
print("\nüîç Verifying critical packages...")
try:
    import torch
    print(f"  ‚úÖ PyTorch {torch.__version__}")
except: print("  ‚ùå PyTorch missing")

try:
    import transformers
    print(f"  ‚úÖ Transformers {transformers.__version__}")
except: print("  ‚ùå Transformers missing")

try:
    import vllm
    print(f"  ‚úÖ vLLM {vllm.__version__}")
except: print("  ‚ùå vLLM missing")

try:
    import ray
    print(f"  ‚úÖ Ray {ray.__version__}")
except: print("  ‚ùå Ray missing")

try:
    import desktop_env
    print("  ‚úÖ OSWorld (desktop_env)")
except:
    sys.path.insert(0, '/home/kevinzyz/hansenzuishuai/OSWorld')
    import desktop_env
    print("  ‚úÖ OSWorld (via path)")

# Add repo to path
sys.path.insert(0, '/home/kevinzyz/hansenzuishuai')
print("\n‚úÖ All dependencies ready!")

üì¶ Installing packages...
‚úÖ Packages installed
üì¶ Installing OSWorld dependencies...
‚úÖ OSWorld dependencies installed
/home/kevinzyz/hansenzuishuai/OSWorld
üì¶ Installing OSWorld...
Obtaining file:///home/kevinzyz/hansenzuishuai/OSWorld
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: desktop_env
  Building editable for desktop_env (pyproject.toml) ... [?25ldone
[?25h  Created wheel for desktop_env: filename=desktop_env-0.1.5-0.editable-py3-none-any.whl size=11723 sha256=cc25367429d7b7b3b12ef73c3012a4fd436e6654f221e7eaef2ed297fe40444c
  Stored in directory: /tmp/pip-ephem-wheel-cache-inhw5cxy/wheels/d1/20/8a/2857614ae731a7c26d71f046279c3e4d3a26423db0cce73c5c
Successfully built desktop_env
Installing collected packages: desktop_env
 

## üê≥ Setup Docker

Pull OSWorld Docker image and verify:

In [2]:
# Get docker command (with or without sudo)
docker_cmd = os.environ.get('DOCKER_SUDO', '')

# Verify Docker is working
try:
    result, _ = run_docker_cmd(['ps'])
    print("‚úÖ Docker is accessible")
    print(result.stdout.decode())
except Exception as e:
    print(f"‚ùå Docker not accessible: {e}")
    raise RuntimeError("Docker not available")

# Check if image exists
try:
    result, _ = run_docker_cmd(['images', 'happysixd/osworld-docker'])
    images_output = result.stdout.decode()
except:
    images_output = ""

if 'happysixd/osworld-docker' not in images_output:
    print("üì• Pulling OSWorld Docker image (10-15 minutes)...")
    if docker_cmd:
        !sudo docker pull happysixd/osworld-docker:latest
    else:
        !docker pull happysixd/osworld-docker:latest
    print("‚úÖ Image downloaded")
else:
    print("‚úÖ OSWorld Docker image already available")

# Verify image
if docker_cmd:
    !sudo docker images | grep osworld
else:
    !docker images | grep osworld

‚úÖ Docker is accessible
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

‚úÖ OSWorld Docker image already available
happysixd/osworld-docker:latest                                 fe8d9a5e5ad6        260MB             0B        


## ‚öôÔ∏è Setup Ray

**SKIP THIS CELL** - Ray will be initialized by the training script.

If you want to start Ray manually instead, run in terminal:
```bash
ray stop
ray start --head --port=6379 --resources='{"docker:10.100.4.6": 128}'
```

In [1]:
# SKIP - Ray will be auto-initialized by training script
print("‚è© Skipping Ray initialization (training script handles it)")
print("")
print("If training fails with Ray resource errors, run in terminal:")
print("  ray stop")
print("  ray start --head --port=6379 --resources='{\"docker:10.100.4.6\": 128}'")
print("")
print("Then re-run the training cell.")

  from .autonotebook import tqdm as notebook_tqdm
2026-02-02 20:00:09,128	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


Local IP: 10.100.4.6


2026-02-02 20:00:12,757	INFO worker.py:1998 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


‚úÖ Ray initialized with Docker resources
Cluster resources: {'CPU': 4.0, 'object_store_memory': 200000000000.0, 'memory': 1598925414400.0, 'GPU': 4.0, 'node:__internal_head__': 1.0, 'docker:10.100.4.6': 128.0, 'accelerator_type:A100': 1.0, 'node:10.100.4.6': 1.0}




## üìä Setup Weights & Biases

Configure W&B for training monitoring:

In [5]:
import wandb
from getpass import getpass

# Login to W&B
wandb_key = getpass('Enter your W&B API key (from https://wandb.ai/authorize): ')
os.environ['WANDB_API_KEY'] = wandb_key

wandb.login(relogin=True)
print("‚úÖ W&B authenticated")

[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice:[34m[1mwandb[0m: You chose 'Use an existing W&B account'
[34m[1mwandb[0m: Logging into https://api.wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: Create a new API key at: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Store your API key securely and do not share it.
[34m[1mwandb[0m: Paste your API key and hit enter:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/kevinzyz/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mhanszhu05[0m ([33mhanszhu05-university-of-pennsylvania[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


‚úÖ W&B authenticated


## üîß Configure OSWorld for Docker

Update OSWorld files to use Docker (already done, but verify):

In [6]:
# Check if already configured for docker
with open('OSWorld/run_uitars.py', 'r') as f:
    content = f.read()
    if 'provider_name="docker"' in content:
        print("‚úÖ OSWorld already configured for Docker")
    else:
        print("‚ö†Ô∏è  Updating OSWorld for Docker...")
        # Backup and update
        !cp OSWorld/run_uitars.py OSWorld/run_uitars.py.bak
        !sed -i 's/provider_name="vmware"/provider_name="docker"/g' OSWorld/run_uitars.py
        !sed -i 's/provider_name="vmware"/provider_name="docker"/g' OSWorld/run_multienv_uitars.py
        print("‚úÖ OSWorld updated for Docker")

‚úÖ OSWorld already configured for Docker


## üìù Review Configuration

Check the training configuration:

In [7]:
!cat configs/smoke.yaml

# ARPO Smoke Test Configuration - 4 tasks, Docker, ~1 hour
# For GPU cluster with Docker support

data:
  train_files: test_data/osworld_examples/train_smoke_4.json  # 4 tasks only
  val_files: test_data/osworld_examples/train_smoke_4.json
  prompt_key: instruction
  answer_key: null
  image_key: images
  max_prompt_length: 32768
  max_response_length: 4096
  rollout_batch_size: 1
  val_batch_size: -1
  shuffle: true
  seed: 1
  max_pixels: 2116800
  min_pixels: 2800

algorithm:
  adv_estimator: grpo
  disable_kl: true  # No KL divergence (ARPO)
  use_kl_loss: false
  kl_coef: 0
  enable_replay: true  # Experience replay buffer (key ARPO feature!)

worker:
  actor:
    global_batch_size: 4  # Small for smoke test
    micro_batch_size_per_device_for_update: 1
    micro_batch_size_per_device_for_experience: 1
    max_grad_norm: 1.0
    padding_free: false
    ulysses_sequence_parallel_size: 1
    ppo_epochs: 1
    clip_ratio_low: 0.2
    clip_ratio_high: 0.3
    model:
      model_path: 

## üìä Review Training Data

See what tasks will be trained:

In [8]:
import json

with open('test_data/osworld_examples/train_smoke_4.json', 'r') as f:
    data = json.load(f)

print("Training tasks:")
for domain, tasks in data.items():
    print(f"  {domain}: {len(tasks)} task(s)")
    for task in tasks:
        print(f"    - {task}")

total_tasks = sum(len(tasks) for tasks in data.values())
print(f"\nTotal: {total_tasks} tasks")

Training tasks:
  chrome: 1 task(s)
    - 06fe7178-4491-4589-810f-2e2bc9502122
  gimp: 1 task(s)
    - 3fb3efba-eb56-427b-b16e-a07c8e6cb969
  vs_code: 1 task(s)
    - 0a291cc3-74bb-46c4-8309-d2c0f2b05138
  os: 1 task(s)
    - c64f5fe6-7b76-4c0a-acc7-f3ab50cdce49

Total: 4 tasks


## üöÄ Run Training

Start the ARPO training:

**You have 8 GPUs!** Choose your config by editing the cell below:
- `smoke.yaml` - 1 GPU, ~1 hour
- `smoke_4gpu.yaml` - 4 GPUs, ~25-30 min ‚≠ê **RECOMMENDED**
- `smoke_8gpu.yaml` - 8 GPUs, ~15-20 min (maximum speed)

**Monitor progress:**
- In this notebook (console output)
- On W&B dashboard: https://wandb.ai/hanszhu05/arpo-smoke-test
- GPU usage: Run `nvidia-smi` in another terminal

In [1]:
print("üöÄ Starting ARPO training...")
print("")
print("You have 8 GPUs available! Choose a configuration:")
print("  1. smoke.yaml      - 1 GPU,  2 envs  (~1 hour)")
print("  2. smoke_4gpu.yaml - 4 GPUs, 8 envs  (~25-30 min) ‚≠ê RECOMMENDED")
print("  3. smoke_8gpu.yaml - 8 GPUs, 16 envs (~15-20 min)")
print("")

# Choose config (change this line to select different config)
config = "configs/smoke_4gpu.yaml"  # ‚≠ê Change this to use different GPU config

print(f"Using config: {config}")
print("")
print("üìù NOTE: Due to Docker permissions, please run from terminal:")
print("="*60)
print("")
print("cd /home/kevinzyz/hansenzuishuai")
print("./run_training_with_docker.sh")
print("")
print("="*60)
print("")
print("This script handles:")
print("  ‚úÖ Docker group activation")
print("  ‚úÖ Ray cluster setup")  
print("  ‚úÖ Training execution")
print("")
print("Monitor at: https://wandb.ai/hanszhu05/arpo-smoke-test")
print("")
print("Or copy/paste this into your terminal:")
print("-" * 60)
print("cd /home/kevinzyz/hansenzuishuai && ./run_training_with_docker.sh")

SyntaxError: invalid syntax (3952096707.py, line 26)

## ‚úÖ Check Results

After training completes, check the results:

In [None]:
# Check if checkpoints were created
import os

checkpoint_dir = 'checkpoints_smoke'

if os.path.exists(checkpoint_dir):
    print(f"‚úÖ Checkpoints saved to: {checkpoint_dir}")
    print("\nCheckpoint contents:")
    !ls -lh {checkpoint_dir}
else:
    print("‚ùå No checkpoints found")

# Show W&B link
print("\nüìä View detailed metrics at:")
print("   https://wandb.ai/hanszhu05/arpo-smoke-test")

## üéØ Next Steps

If smoke test succeeds:

1. **Scale to 32 tasks**:
   - Edit `configs/smoke.yaml`
   - Change: `train_files: test_data/osworld_examples/train_subset_32.json`

2. **Scale to 128 tasks**:
   - Change: `train_files: test_data/osworld_examples/train_all_128.json`

3. **Increase parallelism**:
   - Edit `env.num_envs` in config (e.g., from 2 to 4)
   - Use more GPUs: `trainer.n_gpus_per_node` (you have 8 available!)

4. **Monitor GPU usage**:
   ```bash
   watch -n 1 nvidia-smi
   ```

## üßπ Cleanup (Optional)

After training, you can clean up Ray:

In [None]:
# Shutdown Ray
ray.shutdown()
print("‚úÖ Ray shutdown")

# Stop Docker containers (if any still running)
docker_cmd = os.environ.get('DOCKER_SUDO', '')
if docker_cmd:
    !sudo docker ps -q | xargs -r sudo docker stop
else:
    !docker ps -q | xargs -r docker stop
print("‚úÖ Docker containers stopped")