# Unitree G1 Robot Training with Mujoco (Colab-Compatible)

This notebook trains a locomotion policy for the Unitree G1 humanoid robot using:
- **Mujoco** physics simulation (CPU/GPU compatible, works on Colab)
- **PPO** reinforcement learning algorithm (rsl_rl)
- **No Isaac Gym required** - runs entirely on Google Colab free tier

After training, you can download the policy and visualize it locally with Mujoco.

---

## Setup Instructions

1. **Runtime**: Go to `Runtime > Change runtime type` and select `GPU` (T4 recommended)
2. **Run all cells** in order
3. **Training time**: ~13 hours for 10,000 iterations (can stop earlier and resume later)
4. **Download trained models** at the end

---

## 1. Install Dependencies

Install required packages for Mujoco-based training.

In [None]:
# Install dependencies
!pip install -q mujoco==3.2.3
!pip install -q scipy
!pip install -q pyyaml
!pip install -q tensorboard
!pip install -q rsl-rl-lib
!pip install -q matplotlib
!pip install -q numpy

print("\n✅ Dependencies installed successfully!")

## 2. Clone Repository and Install Package

Clone the modified unitree_rl_mugym repository with Mujoco support.

In [None]:
import os

# Remove old clone if exists
if os.path.exists('unitree_rl_mugym'):
    !rm -rf unitree_rl_mugym
    print("Removed old repository")

# Clone the repository
!git clone https://github.com/julienokumu/unitree_rl_mugym.git
print("\n✅ Repository cloned!")

# Change to repo directory
%cd unitree_rl_mugym

# Install package (without Isaac Gym dependencies)
!pip install -q -e . --no-deps

print("\n✅ Package installed successfully!")

## 3. Verify Installation

Check that Mujoco and the package are properly installed.

In [None]:
import mujoco
import torch
from legged_gym.envs.g1.mujoco_g1_env import MujocoG1Robot
from legged_gym.envs.g1.mujoco_g1_config import MujocoG1RoughCfg, MujocoG1RoughCfgPPO

print("✅ All imports successful!")
print(f"Mujoco version: {mujoco.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 4. Start Training

Train the G1 locomotion policy using PPO with Mujoco simulation.

**Training Parameters:**
- **Environments**: 512 parallel simulations
- **Device**: CUDA (GPU)
- **Max iterations**: 1000 (adjustable - see options below)
- **Save interval**: Every 50 iterations
- **Expected time**: ~2 hours for 50 iterations, ~3.5 hours for 100 iterations

**Training Options:**
- **Quick test** (5 iterations): ~10 minutes - verify everything works
- **Short session** (50 iterations): ~2 hours - fits Colab free tier nicely
- **Medium session** (100 iterations): ~3.5 hours - good progress
- **Full training** (1000 iterations): ~35 hours - requires multiple resume sessions

**Notes:**
- Colab free tier provides 12-hour sessions - plan accordingly
- Models are automatically saved every 50 iterations
- You can interrupt training anytime (Ctrl+C or stop button)
- Resume training later from any checkpoint
- Monitor progress in real-time with TensorBoard (next cell)

In [None]:
# Start training with Mujoco
# Default: 1000 iterations (~35 hours total, save every 50)
# For 2-hour session: Use --max_iterations 50
# For quick test: Use --max_iterations 5
!python legged_gym/scripts/train_mujoco.py \
    --device cuda \
    --num_envs 512 \
    --max_iterations 50

print("\n✅ Training completed!")

## 5. Monitor Training Progress (Optional)

Launch TensorBoard to visualize training metrics in real-time.

In [None]:
# Load TensorBoard extension
%load_ext tensorboard

# Launch TensorBoard
%tensorboard --logdir logs/g1_colab_training

print("\n📊 TensorBoard is running above. You can monitor:")
print("  - Mean reward per episode")
print("  - Value function loss")
print("  - Policy loss and entropy")
print("  - Individual reward components")

## 6. Resume Training (Optional)

If training was interrupted, you can resume from the latest checkpoint.

In [None]:
# Resume training from last checkpoint
!python legged_gym/scripts/train_mujoco.py \
    --device cuda \
    --num_envs 512 \
    --resume \
    --load_run -1 \
    --checkpoint -1

print("\n✅ Resumed training completed!")

## 7. Download Trained Models

Download the trained policy to test locally with Mujoco visualization.

In [None]:
import os
from google.colab import files
import glob

# Find the latest run directory
log_dir = "logs/g1_colab_training"
runs = sorted([d for d in os.listdir(log_dir) if os.path.isdir(os.path.join(log_dir, d))])

if runs:
    latest_run = runs[-1]
    run_path = os.path.join(log_dir, latest_run)
    
    # Find all model checkpoints
    models = sorted(glob.glob(os.path.join(run_path, "model_*.pt")))
    
    if models:
        print(f"Found {len(models)} model checkpoints in run: {latest_run}\n")
        
        # Download the latest model
        latest_model = models[-1]
        print(f"Downloading: {latest_model}")
        files.download(latest_model)
        
        print("\n✅ Model downloaded!")
        print("\nTo visualize locally:")
        print("  1. Install: pip install mujoco==3.2.3")
        print("  2. Clone repo: git clone https://github.com/julienokumu/unitree_rl_mugym.git")
        print(f"  3. Copy downloaded model to: unitree_rl_mugym/logs/g1_colab_training/{latest_run}/")
        print("  4. Run: python deploy/deploy_mujoco/deploy_mujoco.py g1.yaml")
    else:
        print("No model checkpoints found. Training may not have reached the first save interval.")
else:
    print("No training runs found.")

## 8. Create Checkpoint Archive (Optional)

Create a zip file with all training checkpoints and logs.

In [None]:
import shutil

# Find latest run
log_dir = "logs/g1_colab_training"
runs = sorted([d for d in os.listdir(log_dir) if os.path.isdir(os.path.join(log_dir, d))])

if runs:
    latest_run = runs[-1]
    run_path = os.path.join(log_dir, latest_run)
    
    # Create zip archive
    archive_name = f"g1_training_{latest_run}"
    shutil.make_archive(archive_name, 'zip', run_path)
    
    print(f"Created archive: {archive_name}.zip")
    print(f"Size: {os.path.getsize(archive_name + '.zip') / 1024 / 1024:.2f} MB\n")
    
    # Download archive
    files.download(f"{archive_name}.zip")
    
    print("\n✅ Archive downloaded!")
else:
    print("No training runs found.")

---

## Training Information

### What the Robot Learns

The G1 humanoid robot learns to:
- **Walk** in different directions (forward, backward, sideways)
- **Turn** at various angular velocities
- **Maintain balance** and upright posture
- **Coordinate leg movements** for stable bipedal locomotion
- **Follow velocity commands** (vx, vy, vyaw)

### Reward Components

The policy is trained to maximize rewards for:
- **Velocity tracking**: Following commanded velocities
- **Survival**: Staying upright and alive
- **Contact patterns**: Proper gait with alternating foot contact
- **Smooth motion**: Minimal joint acceleration and action changes

And minimize penalties for:
- **Energy consumption**: High torques
- **Undesired contacts**: Body parts touching the ground
- **Joint limits**: Approaching position/velocity limits
- **Orientation errors**: Tilting or falling

### Network Architecture

**Actor (Policy Network):**
- Input: 47-dim observations (velocities, orientation, commands, joint states, phase)
- LSTM: 47 → 64 (temporal processing)
- MLP: 64 → 32 → 12 actions (joint position targets)

**Critic (Value Network):**
- Input: 50-dim privileged observations (includes ground truth velocities)
- LSTM: 50 → 64 (temporal processing)
- MLP: 64 → 32 → 1 value (state value estimate)

### Hyperparameters

- **Algorithm**: PPO (Proximal Policy Optimization)
- **Environments**: 512 parallel simulations
- **Steps per env**: 24 steps per rollout
- **Max iterations**: 1000 (adjustable, ~35 hours total)
- **Save interval**: 50 iterations (~2 hours)
- **Learning rate**: 0.001
- **Discount (γ)**: 0.99
- **GAE λ**: 0.95
- **Clip parameter**: 0.2
- **Entropy coefficient**: 0.01

### Training Time Estimates (T4 GPU)

- **5 iterations**: ~10 minutes (quick test)
- **50 iterations**: ~2 hours (perfect for Colab free tier)
- **100 iterations**: ~3.5 hours (good progress)
- **500 iterations**: ~17.5 hours (needs resume)
- **1000 iterations**: ~35 hours (multiple sessions required)

---

## Troubleshooting

**Training is slow:**
- Make sure you selected GPU runtime (Runtime > Change runtime type > GPU)
- T4 GPU gives ~98 steps/s, which is expected

**Out of memory:**
- Reduce `--num_envs` to 256 or 128
- Restart runtime and try again

**Session disconnected:**
- Colab free tier has time limits (~12 hours)
- Download checkpoints periodically
- Resume training from last checkpoint when reconnected

**Import errors:**
- Re-run the dependency installation cell
- Restart runtime if needed

---

## Repository

**GitHub**: [julienokumu/unitree_rl_mugym](https://github.com/julienokumu/unitree_rl_mugym)

Based on the original [unitreerobotics/unitree_rl_gym](https://github.com/unitreerobotics/unitree_rl_gym) with modifications for Mujoco compatibility and Google Colab support.

---

**Happy Training! 🤖🚀**