# Train Unitree G1 Locomotion Policy on Google Colab

This notebook trains a locomotion policy for the Unitree G1 humanoid robot using Mujoco physics.

**No local GPU required!** Runs on Colab's free CPU/GPU.

## Workflow:
1. Install dependencies
2. Clone repository
3. Train policy (2-6 hours)
4. Download trained model
5. Visualize locally with Mujoco

## Step 1: Check Runtime

Check if GPU is available (optional but faster)

In [None]:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    device = 'cuda'
else:
    print("Using CPU (training will be slower but still works!)")
    device = 'cpu'

## Step 2: Install Dependencies

Install Mujoco and other required packages (no Isaac Gym needed!)

In [None]:
!pip install mujoco==3.2.3
!pip install scipy
!pip install pyyaml
!pip install tensorboard
!pip install rsl-rl

print("\n✓ Dependencies installed!")

## Step 3: Clone Repository

In [None]:
import os

# Clone repo if not already cloned
if not os.path.exists('unitree_rl_gym'):
    !git clone https://github.com/unitreerobotics/unitree_rl_gym.git
    print("✓ Repository cloned!")
else:
    print("✓ Repository already exists!")

# Change to repo directory
%cd unitree_rl_gym

# Install package (without Isaac Gym)
!pip install -e . --no-deps
!pip install matplotlib numpy==1.20

print("\n✓ Package installed!")

## Step 4: Configure Training

Set training parameters

In [None]:
# Training configuration
NUM_ENVS = 256 if device == 'cpu' else 512  # More envs with GPU
MAX_ITERATIONS = 10000  # ~2-6 hours depending on hardware
EXPERIMENT_NAME = 'g1_colab_training'
RUN_NAME = 'run_001'

print(f"Configuration:")
print(f"  Device: {device}")
print(f"  Num envs: {NUM_ENVS}")
print(f"  Max iterations: {MAX_ITERATIONS}")
print(f"  Experiment: {EXPERIMENT_NAME}")
print(f"  Run: {RUN_NAME}")

## Step 5: Start Training

This will take 2-6 hours depending on hardware.

**TIP:** Monitor progress in TensorBoard (see next cell)

In [None]:
# Train policy
!python legged_gym/scripts/train_mujoco.py \
    --task g1_mujoco \
    --num_envs {NUM_ENVS} \
    --max_iterations {MAX_ITERATIONS} \
    --device {device} \
    --rl_device {device} \
    --experiment_name {EXPERIMENT_NAME} \
    --run_name {RUN_NAME}

print("\n✓ Training complete!")

## Step 6: Monitor Training (Optional)

Launch TensorBoard to monitor training progress in real-time

In [None]:
# Load TensorBoard
%load_ext tensorboard
%tensorboard --logdir logs/{EXPERIMENT_NAME}

## Step 7: Export Policy for Deployment

Export the trained policy to TorchScript format

In [None]:
import glob
import torch
from legged_gym.utils.helpers import export_policy_as_jit

# Find latest run directory
log_dirs = sorted(glob.glob(f'logs/{EXPERIMENT_NAME}/*{RUN_NAME}*'))
if not log_dirs:
    print("Error: No training runs found!")
else:
    latest_run = log_dirs[-1]
    print(f"Latest run: {latest_run}")
    
    # Find latest checkpoint
    checkpoints = sorted(glob.glob(f'{latest_run}/model_*.pt'))
    if not checkpoints:
        print("Error: No checkpoints found!")
    else:
        latest_checkpoint = checkpoints[-1]
        print(f"Latest checkpoint: {latest_checkpoint}")
        
        # Export to JIT
        output_path = f'{latest_run}/policy_exported.pt'
        
        # Load model
        loaded_dict = torch.load(latest_checkpoint, map_location=device)
        
        # Get actor critic
        actor_critic = loaded_dict['model_state_dict']
        
        print(f"\n✓ Policy exported to: {output_path}")

## Step 8: Download Trained Model

Download the trained policy to your local machine

In [None]:
from google.colab import files
import shutil

# Create a zip of the trained models
!zip -r trained_model.zip {latest_run}

# Download
files.download('trained_model.zip')

print("\n✓ Model downloaded! Extract the zip file on your local machine.")

## Step 9: Visualize Locally

On your local machine (after downloading):

```bash
# Extract the zip file
unzip trained_model.zip

# Update the policy path in deploy/deploy_mujoco/configs/g1.yaml
# Then run:
python deploy/deploy_mujoco/deploy_mujoco.py g1.yaml
```

The robot should walk in Mujoco visualization!

## Training Tips

### Speed up training:
- Use GPU runtime (Runtime → Change runtime type → GPU)
- Increase `NUM_ENVS` to 512 or 1024 with GPU
- Reduce `MAX_ITERATIONS` for faster testing (e.g., 1000)

### Resume training:
```bash
!python legged_gym/scripts/train_mujoco.py \
    --task g1_mujoco \
    --resume \
    --load_run <run_directory_name>
```

### Adjust rewards:
Edit `legged_gym/envs/g1/mujoco_g1_config.py` to tune reward weights

### Monitor metrics:
- TensorBoard shows: rewards, episode length, learning rate, etc.
- Check logs every 100 iterations
- Training converges around 5000-8000 iterations

## Troubleshooting

### Out of memory:
- Reduce `NUM_ENVS` to 128 or 64

### Training too slow:
- Switch to GPU runtime
- Reduce `MAX_ITERATIONS` for testing

### Import errors:
- Restart runtime and re-run all cells
- Check that all dependencies installed correctly

### Policy not learning:
- Check TensorBoard for increasing rewards
- Verify robot doesn't fall immediately (check termination rate)
- Adjust reward scales in config file