# üèãÔ∏è Getting Started with NeMo Gym

**A Complete Step-by-Step Guide for Newcomers**

> üí° **Tip:** Run this notebook from within the cloned `Gym` directory. NeMo Gym will be installed automatically when you run the setup cells.

---

NeMo Gym is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.

## What You'll Learn

This notebook covers:

1. **üîß Environment Setup** - Install NeMo Gym and configure NVIDIA API keys
2. **üöÄ Quick Start** - Start servers and interact with your first agent (powered by [Nemotron Super NIM](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1))
3. **üìä Rollout Collection** - Generate verified training data
4. **üè¢ Workplace Assistant Tutorial** - Explore multi-step tool calling
5. **üí™ Training with RL** - Understand GPU requirements and training options

---

## üíª Hardware & GPU Requirements

| Component | NeMo Gym Library | RL Training (Optional) |
|-----------|------------------|------------------------|
| **GPU** | ‚ùå Not required | ‚úÖ Required |
| **CPU** | Any modern x86_64 or ARM64 | Any modern x86_64 |
| **RAM** | 8 GB minimum (16 GB+ recommended) | 64 GB+ per node |
| **Storage** | 2-5 GB | 100 GB+ (shared filesystem) |

### Training GPU Requirements

| Training Framework | GPU Requirements |
|-------------------|------------------|
| **Unsloth (Colab)** | 1√ó T4 GPU (16GB VRAM) - Free tier available |
| **Unsloth (Local)** | 1√ó GPU with 16GB+ VRAM |
| **NeMo RL (Single-node)** | 8√ó NVIDIA GPUs (80GB+ each, e.g., H100/A100) |
| **NeMo RL (Multi-node)** | 8+ nodes √ó 8 GPUs (80GB+ each) |

**Note:** This notebook focuses on NeMo Gym setup and exploration, which does NOT require a GPU.

---

# Part 1: Environment Setup

## Prerequisites

Before starting, ensure you have:
- **Git** installed
- **Python 3.12+** installed
- **NVIDIA API key** from [NVIDIA Build](https://build.nvidia.com/) (free tier available)

## Step 1.1: Verify Python Version

In [None]:
import sys

print(f"Python version: {sys.version}")
major, minor = sys.version_info[:2]

if major >= 3 and minor >= 12:
    print("‚úÖ Python version is compatible (3.12+)")
else:
    print("‚ùå Please upgrade to Python 3.12 or higher")

## Step 1.2: Clone the Repository (If Not Already Done)

If you're running this notebook outside the Gym repository, clone it first:

```bash
git clone git@github.com:NVIDIA-NeMo/Gym.git
cd Gym
```

Then open this notebook from inside the `Gym` directory.

## Step 1.3: Install & Verify NeMo Gym

Run the cell below to automatically install NeMo Gym if needed:

In [None]:
import sys
import subprocess
import os

def ensure_pip():
    """Ensure pip is available in the current Python environment."""
    result = subprocess.run(
        [sys.executable, "-m", "pip", "--version"],
        capture_output=True,
        text=True
    )
    if result.returncode != 0:
        print("üì¶ Installing pip...")
        subprocess.run([sys.executable, "-m", "ensurepip", "--upgrade"], 
                      capture_output=True)
        subprocess.run([sys.executable, "-m", "pip", "install", "--upgrade", "pip"],
                      capture_output=True)

def run_pip(*args):
    """Run pip command and return success status."""
    result = subprocess.run(
        [sys.executable, "-m", "pip", *args],
        capture_output=True,
        text=True
    )
    return result.returncode == 0, result.stderr

def clean_install_nemo_gym():
    """Uninstall and reinstall NeMo Gym from scratch."""
    
    # Step 0: Ensure pip is available
    ensure_pip()
    
    # Step 1: Uninstall existing nemo_gym if present
    print("üßπ Uninstalling existing NeMo Gym (if any)...")
    run_pip("uninstall", "nemo-gym", "-y")
    run_pip("uninstall", "nemo_gym", "-y")
    
    # Clear any cached imports
    mods_to_remove = [k for k in sys.modules if k.startswith('nemo_gym')]
    for mod in mods_to_remove:
        del sys.modules[mod]
    
    # Step 2: Install fresh from source
    print("üì¶ Installing NeMo Gym from source... (this may take a minute)")
    success, err = run_pip("install", "-e", ".", "--no-cache-dir")
    
    if success:
        print("‚úÖ NeMo Gym installed successfully!")
        return True
    else:
        print(f"‚ùå Installation failed: {err}")
        return False

# Always do a clean install to ensure consistency
print("üîÑ Setting up NeMo Gym (clean install)...\n")
if clean_install_nemo_gym():
    import nemo_gym
    print(f"\n‚úÖ NeMo Gym is ready!")

## Step 1.4: Configure Your NVIDIA API Key

Create an `env.yaml` file with your API credentials. This file keeps secrets out of version control.

We'll use **NVIDIA's Nemotron Super NIM** - a powerful 49B parameter model optimized for reasoning, tool calling, and instruction following with 128K context length.

In [None]:
import os
from pathlib import Path

# Get the Gym project root directory
# Adjust this path if running from a different location
GYM_ROOT = Path(os.getcwd())
if not (GYM_ROOT / "nemo_gym").exists():
    # Try parent directory
    GYM_ROOT = GYM_ROOT.parent
    if not (GYM_ROOT / "nemo_gym").exists():
        print("‚ö†Ô∏è  Cannot find Gym root directory. Please run this notebook from the Gym directory.")

ENV_YAML_PATH = GYM_ROOT / "env.yaml"
print(f"üìÅ Gym root directory: {GYM_ROOT}")
print(f"üìÑ env.yaml path: {ENV_YAML_PATH}")

In [None]:
# Check if env.yaml already exists
if ENV_YAML_PATH.exists():
    print("‚úÖ env.yaml already exists")
    print("\nCurrent content (API key masked):")
    content = ENV_YAML_PATH.read_text()
    for line in content.split('\n'):
        if 'api_key' in line.lower():
            key_part = line.split(':')[0]
            print(f"{key_part}: nvapi-****...****")
        else:
            print(line)
else:
    print("‚ùå env.yaml not found. Please create it in the next step.")

### Create or Update env.yaml

**Option A:** Run the cell below and replace `nvapi-...` with your actual NVIDIA API key:

In [None]:
# ‚ö†Ô∏è REPLACE with your actual NVIDIA API key
NVIDIA_API_KEY = "nvapi-..."  # <-- Replace this!
POLICY_MODEL = "nvidia/llama-3.3-nemotron-super-49b-v1"  # Nemotron Super NIM

if NVIDIA_API_KEY == "nvapi-...":
    print("‚ö†Ô∏è  Please replace 'nvapi-...' with your actual NVIDIA API key")
    print("\nYou can get your API key from: https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1")
else:
    env_yaml_content = f"""policy_base_url: https://integrate.api.nvidia.com/v1
policy_api_key: {NVIDIA_API_KEY}
policy_model_name: {POLICY_MODEL}
"""
    
    ENV_YAML_PATH.write_text(env_yaml_content)
    print("‚úÖ env.yaml created successfully!")
    print(f"üìç Location: {ENV_YAML_PATH}")

**Option B:** Create the file manually via terminal:

```bash
echo "policy_base_url: https://integrate.api.nvidia.com/v1
policy_api_key: nvapi-your-nvidia-api-key
policy_model_name: nvidia/llama-3.3-nemotron-super-49b-v1" > env.yaml
```

## Step 1.5: Validate API Key

In [None]:
# Validate the API key before proceeding
import requests

try:
    # Read config directly from env.yaml (no NeMo Gym dependencies needed)
    # Simple parser that works without PyYAML
    config = {}
    with open(ENV_YAML_PATH, 'r') as f:
        for line in f:
            line = line.strip()
            if line and ':' in line and not line.startswith('#'):
                key, value = line.split(':', 1)
                config[key.strip()] = value.strip()
    
    base_url = config['policy_base_url']
    api_key = config['policy_api_key']
    model_name = config['policy_model_name']
    
    # Test with a simple request using requests library
    response = requests.post(
        f"{base_url}/chat/completions",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": model_name,
            "messages": [{"role": "user", "content": "Say hello"}],
            "max_tokens": 10
        },
        timeout=30
    )
    
    if response.status_code == 200:
        result = response.json()
        print("‚úÖ API key validated successfully!")
        print(f"Model: {model_name}")
        print(f"Response: {result['choices'][0]['message']['content']}")
    else:
        print(f"‚ùå API validation failed: {response.status_code}")
        print(f"Response: {response.text}")
    
except FileNotFoundError:
    print("‚ùå env.yaml not found. Please create it in the previous step.")
except Exception as e:
    print(f"‚ùå API validation failed: {e}")
    print("\nTroubleshooting:")
    print("- Check your NVIDIA API key is correct (should start with 'nvapi-')")
    print("- Get a free API key from: https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1")
    print("- Ensure env.yaml is in the Gym root directory")

---

# Part 2: Quick Start - Your First Agent

Now let's start the NeMo Gym servers and interact with an agent!

## Understanding the Architecture

NeMo Gym uses a server-based architecture:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Head Server    ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Agent Server    ‚îÇ
‚îÇ   (port 11000)   ‚îÇ     ‚îÇ (auto-assigned)  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                  ‚îÇ
                    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                    ‚ñº                           ‚ñº
           ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
           ‚îÇ Model Server   ‚îÇ         ‚îÇResources Server‚îÇ
           ‚îÇ (LLM Inference)‚îÇ         ‚îÇ (Tools/Verify) ‚îÇ
           ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

## Step 2.1: Start the Servers

**‚ö†Ô∏è Important:** Run this in a **separate terminal** (not in this notebook):

```bash
# Navigate to Gym directory
cd /path/to/Gym

# Activate virtual environment (if using one)
source .venv/bin/activate

# Start servers (using vllm_model config for NVIDIA NIM API)
config_paths="resources_servers/example_single_tool_call/configs/example_single_tool_call.yaml,\
responses_api_models/vllm_model/configs/vllm_model.yaml"
ng_run "+config_paths=[${config_paths}]"
```

> **Note:** The `vllm_model` config properly converts the Responses API to Chat Completions API format that NVIDIA NIM supports, and handles role mapping (e.g., "developer" ‚Üí "system").

**Expected output:**
```
INFO:     Started server process [12345]
INFO:     Uvicorn running on http://127.0.0.1:11000 (Press CTRL+C to quit)
INFO:     Started server process [12346]  
INFO:     Uvicorn running on http://127.0.0.1:62920 (Press CTRL+C to quit)
...
```

## Step 2.2: Wait for Servers and Verify

In [None]:
import requests
import time

def check_servers(max_attempts=10, delay=2):
    """Check if NeMo Gym servers are running."""
    for attempt in range(max_attempts):
        try:
            response = requests.get("http://localhost:11000/server_instances", timeout=5)
            if response.status_code == 200:
                print("‚úÖ Servers are running!")
                print("\nRegistered servers:")
                for server in response.json():
                    print(f"  - {server['name']}: {server['host']}:{server['port']}")
                return True
        except requests.exceptions.ConnectionError:
            print(f"‚è≥ Waiting for servers... (attempt {attempt + 1}/{max_attempts})")
            time.sleep(delay)
    
    print("‚ùå Servers not responding. Please start them with ng_run.")
    return False

# Check if servers are running
servers_ok = check_servers()

## Step 2.3: Interact with the Simple Agent

If the servers are running, let's interact with the agent!

In [None]:
import requests
import json

if servers_ok:
    print("ü§ñ Interacting with simple agent...\n")
    
    # Get the agent server URL from the head server
    try:
        instances = requests.get("http://localhost:11000/server_instances").json()
        agent_server = next(
            (s for s in instances if "simple_agent" in s["name"]), 
            None
        )
        
        if not agent_server:
            print("‚ùå Agent server not found. Make sure you started the correct config.")
        else:
            agent_url = f"http://{agent_server['host']}:{agent_server['port']}/v1/responses"
            
            # Send request to the agent
            # Note: Using 'system' role (not 'developer') for NVIDIA NIM compatibility
            response = requests.post(
                agent_url,
                json={
                    "input": [
                        {
                            "role": "system",
                            "content": "You are a helpful personal assistant."
                        },
                        {
                            "role": "user", 
                            "content": "What's the weather in San Francisco?"
                        }
                    ],
                    "tools": [
                        {
                            "type": "function",
                            "name": "get_weather",
                            "description": "Get weather for a city",
                            "parameters": {
                                "type": "object",
                                "properties": {
                                    "city": {"type": "string", "description": "City name"}
                                },
                                "required": ["city"],
                                "additionalProperties": False
                            },
                            "strict": True
                        }
                    ]
                },
                timeout=60
            )
            
            if response.status_code == 200:
                result = response.json()
                print("‚úÖ Agent interaction successful!")
                print("\nüìã Response:")
                print(json.dumps(result, indent=2))
            else:
                print(f"‚ùå Agent error: {response.status_code}")
                print(response.text)
                
    except Exception as e:
        print(f"‚ùå Error: {e}")
else:
    print("‚ö†Ô∏è  Please start the servers first (see Step 2.1)")

### What Just Happened?

1. **User Query:** "What's the weather in San Francisco?"
2. **Agent Action:** Called the `get_weather` tool with `{"city": "San Francisco"}`
3. **Tool Response:** Returned weather data
4. **Final Response:** Agent provided a natural language answer

This demonstrates the core loop of NeMo Gym: **Query ‚Üí Tool Calls ‚Üí Verification ‚Üí Response**

---

# Part 3: Rollout Collection

Rollouts are complete records of task executions with verification scores. They're the training data for RL!

## Step 3.1: Inspect the Example Dataset

In [None]:
import json

example_data_path = GYM_ROOT / "resources_servers/example_single_tool_call/data/example.jsonl"

print("üìÑ Example dataset content:\n")

with open(example_data_path, 'r') as f:
    for i, line in enumerate(f):
        if i >= 2:  # Show first 2 examples
            break
        data = json.loads(line)
        print(f"Example {i+1}:")
        print(json.dumps(data, indent=2))
        print("\n" + "="*60 + "\n")

## Step 3.2: Collect Rollouts

Run this in your **second terminal** (with servers running in the first):

```bash
# Activate environment
source .venv/bin/activate

# Collect rollouts
ng_collect_rollouts +agent_name=example_single_tool_call_simple_agent \
    +input_jsonl_fpath=resources_servers/example_single_tool_call/data/example.jsonl \
    +output_jsonl_fpath=results/example_single_tool_call_rollouts.jsonl \
    +limit=5 \
    +num_repeats=2 \
    +num_samples_in_parallel=3
```

### Parameters Explained:

| Parameter | Description |
|-----------|-------------|
| `+agent_name` | Which agent to use |
| `+input_jsonl_fpath` | Path to input dataset |
| `+output_jsonl_fpath` | Path to save rollouts |
| `+limit` | Max examples to process |
| `+num_repeats` | Rollouts per example |
| `+num_samples_in_parallel` | Concurrent requests |

## Step 3.3: View Collected Rollouts

In [None]:
rollouts_path = GYM_ROOT / "results/example_single_tool_call_rollouts.jsonl"

if rollouts_path.exists():
    print("üìä Collected rollouts:\n")
    
    with open(rollouts_path, 'r') as f:
        for i, line in enumerate(f):
            if i >= 2:  # Show first 2 rollouts
                break
            rollout = json.loads(line)
            print(f"Rollout {i+1}:")
            print(f"  Reward: {rollout.get('reward', 'N/A')}")
            print(f"  Keys: {list(rollout.keys())}")
            print()
else:
    print("‚ö†Ô∏è  Rollouts file not found. Please run ng_collect_rollouts first.")

## Step 3.4: Launch the Rollout Viewer (Optional)

For a visual interface to explore rollouts:

```bash
ng_viewer +jsonl_fpath=results/example_single_tool_call_rollouts.jsonl
```

Then visit: http://127.0.0.1:7860

---

# Part 4: Workplace Assistant Tutorial

Now let's explore a more complex environment: **Workplace Assistant**. This is a multi-step agentic tool-use environment that tests a model's ability to execute business tasks.

## About Workplace Assistant

### Features:
- **5 Databases:** Email, Calendar, Analytics, Project Management, CRM
- **27 Tools:** Distributed across databases for various operations
- **690+ Tasks:** Common business activities (emails, meetings, project management)
- **Multi-step reasoning:** Tasks require 1-6 tool calls in sequence

### Example Multi-Step Task:

**User:** "John is taking over all of Akira's leads that are interested in software. Can you reassign them in the CRM?"

**Expected Steps:**
1. Look up Akira's email: `company_directory_find_email_address(name="Akira")`
2. Look up John's email: `company_directory_find_email_address(name="John")`  
3. Search for leads: `customer_relationship_manager_search_customers(...)`
4-6. Update each lead: `customer_relationship_manager_update_customer(...)`

## Step 4.1: Stop Current Servers

First, stop the example servers by pressing **Ctrl+C** in the terminal running `ng_run`.

## Step 4.2: Start Workplace Assistant Servers

In your terminal:

```bash
# Navigate to Gym directory
cd /path/to/Gym
source .venv/bin/activate

# Start Workplace Assistant servers (using vllm_model config)
config_paths="responses_api_models/vllm_model/configs/vllm_model.yaml,\
resources_servers/workplace_assistant/configs/workplace_assistant.yaml"
ng_run "+config_paths=[${config_paths}]"
```

## Step 4.3: Explore the Workplace Assistant Dataset

In [None]:
workplace_data_path = GYM_ROOT / "resources_servers/workplace_assistant/data/train.jsonl"

if workplace_data_path.exists():
    print("üìã Workplace Assistant training data:\n")
    
    with open(workplace_data_path, 'r') as f:
        for i, line in enumerate(f):
            if i >= 1:  # Show first example
                break
            data = json.loads(line)
            
            # Extract user query
            inputs = data.get('responses_create_params', {}).get('input', [])
            for msg in inputs:
                if msg.get('role') == 'user':
                    print(f"User Query: {msg.get('content', 'N/A')[:200]}...")
            
            # Show number of tools available
            tools = data.get('responses_create_params', {}).get('tools', [])
            print(f"\nNumber of available tools: {len(tools)}")
            print("\nFirst 5 tools:")
            for tool in tools[:5]:
                print(f"  - {tool.get('name', 'N/A')}")
else:
    print("‚ö†Ô∏è  Workplace Assistant data not found locally. Downloading from HuggingFace...")
    
    # Download from HuggingFace
    import subprocess
    
    # Create data directory if it doesn't exist
    data_dir = GYM_ROOT / "resources_servers/workplace_assistant/data"
    data_dir.mkdir(parents=True, exist_ok=True)
    
    # Download using huggingface_hub
    try:
        from huggingface_hub import hf_hub_download
        
        # Download train.jsonl
        train_file = hf_hub_download(
            repo_id="nvidia/Nemotron-RL-agent-workplace_assistant",
            filename="train.jsonl",
            repo_type="dataset",
            local_dir=str(data_dir),
            local_dir_use_symlinks=False
        )
        print(f"‚úÖ Downloaded train.jsonl to {data_dir}")
        
        # Download test.jsonl if available
        try:
            test_file = hf_hub_download(
                repo_id="nvidia/Nemotron-RL-agent-workplace_assistant",
                filename="test.jsonl",
                repo_type="dataset",
                local_dir=str(data_dir),
                local_dir_use_symlinks=False
            )
            print(f"‚úÖ Downloaded test.jsonl to {data_dir}")
        except Exception:
            print("‚ÑπÔ∏è  test.jsonl not found, skipping")
            
    except ImportError:
        print("Installing huggingface_hub...")
        subprocess.run([sys.executable, "-m", "pip", "install", "huggingface_hub", "-q"])
        print("Please re-run this cell after installation.")

## Step 4.4: Collect Workplace Assistant Rollouts

**‚ö†Ô∏è Important:** Make sure you have started the Workplace Assistant servers (Step 4.2) before running this command!

In a **second terminal** (while servers are running in the first), collect rollouts:

```bash
cd ~/Gym
source .venv/bin/activate

ng_collect_rollouts +agent_name=workplace_assistant_simple_agent \
    +input_jsonl_fpath=resources_servers/workplace_assistant/data/train.jsonl \
    +output_jsonl_fpath=results/workplace_assistant_rollouts.jsonl \
    +limit=3 \
    +num_samples_in_parallel=2
```

**Note:** Workplace Assistant tasks take longer due to multi-step tool calling (expect 2-5 minutes).

If you see `Missing key workplace_assistant_simple_agent`, it means the Workplace Assistant servers aren't running. Go back to Step 4.2 and start them.

---

# Part 5: Training with Reinforcement Learning

Now that you understand rollout collection, let's explore training options!

## Training Pathways

### Option A: Quick Start with Unsloth (No Setup Required)

**Best for:** Learning, experimentation, small-scale training

| Feature | Details |
|---------|--------|
| **GPU Required** | 1√ó T4 (free on Colab) or 16GB+ local GPU |
| **Model** | Qwen-2.5 3B with LoRA |
| **Algorithm** | GRPO (Group Relative Policy Optimization) |
| **Time** | ~30 minutes |

**Run in Google Colab:**
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/nemo_gym_sudoku.ipynb

---

### Option B: Production Training with NeMo RL

**Best for:** Large-scale production training, multi-node clusters

| Feature | Details |
|---------|--------|
| **GPU Required** | 8√ó H100/A100 (80GB+ each) per node |
| **Model** | Nemotron Nano 9B v2 |
| **Algorithm** | GRPO with tensor parallelism |
| **Time** | 3-5 hours |

**See the full tutorial:** [NeMo RL GRPO Training](docs/tutorials/nemo-rl-grpo/)

## NeMo RL Training Setup (Summary)

For production training with NeMo RL, you'll need:

### 1. Hardware Requirements
- **Single-node:** 1 node √ó 8 GPUs (H100/A100 with 80GB+ VRAM)
- **Multi-node:** 8+ nodes √ó 8 GPUs each
- **RAM:** 64 GB+ per node
- **Storage:** 100 GB+ shared filesystem

### 2. Software Setup
```bash
# Use the NeMo RL container
CONTAINER_IMAGE_PATH=nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano

# Clone NeMo RL with Gym submodule
git clone https://github.com/NVIDIA-NeMo/RL
cd RL
git submodule update --init --recursive
```

### 3. Data Preparation
```bash
cd 3rdparty/Gym-workspace/Gym
uv venv --python 3.12 --allow-existing .venv
source .venv/bin/activate
uv sync --active --extra dev

# Add HuggingFace token
echo "hf_token: {your HF token}" >> env.yaml

# Download and prepare data
config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,\
resources_servers/workplace_assistant/configs/workplace_assistant.yaml"

ng_prepare_data "+config_paths=[${config_paths}]" \
    +output_dirpath=data/workplace_assistant \
    +mode=train_preparation \
    +should_download=true \
    +data_source=huggingface
```

### 4. Training Command
```bash
CONFIG_PATH=examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml

# Download model first
HF_HOME=$PWD/.cache/ HF_TOKEN={your HF token} \
    hf download nvidia/NVIDIA-Nemotron-Nano-9B-v2

# Run training
TORCH_CUDA_ARCH_LIST="9.0 10.0" \
HF_HOME=$PWD/.cache/ \
WANDB_API_KEY={your W&B API key} \
uv run python examples/nemo_gym/run_grpo_nemo_gym.py \
    --config=$CONFIG_PATH \
    ++logger.wandb.project="my-nemo-gym-training" \
    ++grpo.max_num_steps=10
```

---

# üéì Summary & Next Steps

## What You've Learned

1. ‚úÖ **Installation & Setup** - Installed NeMo Gym and configured API keys
2. ‚úÖ **Architecture** - Understood the server-based design (Head, Agent, Model, Resources)
3. ‚úÖ **Agent Interaction** - Ran a simple agent with tool calling
4. ‚úÖ **Rollout Collection** - Generated verified training data
5. ‚úÖ **Workplace Assistant** - Explored multi-step agentic environments
6. ‚úÖ **Training Options** - Learned about Unsloth and NeMo RL pathways

## GPU Requirements Summary

| Activity | GPU Required? |
|----------|---------------|
| NeMo Gym setup & exploration | ‚ùå No |
| Rollout collection | ‚ùå No (uses API) |
| Unsloth training (Colab) | ‚úÖ 1√ó T4 (free) |
| Unsloth training (local) | ‚úÖ 16GB+ VRAM |
| NeMo RL training | ‚úÖ 8√ó H100/A100 per node |

## Next Steps

1. **Build a Custom Environment:**
   - Tutorial: `docs/tutorials/creating-resource-server.md`
   
2. **Try Other Resource Servers:**
   - Math: `resources_servers/math_with_judge/`
   - Code Generation: `resources_servers/code_gen/`
   - Instruction Following: `resources_servers/instruction_following/`

3. **Production Training:**
   - Full tutorial: `docs/tutorials/nemo-rl-grpo/`
   - Multi-node setup: `docs/tutorials/nemo-rl-grpo/multi-node-training.md`

## Resources

- **Documentation:** https://docs.nvidia.com/nemo/gym/latest/
- **GitHub:** https://github.com/NVIDIA-NeMo/Gym
- **HuggingFace Datasets:** https://huggingface.co/nvidia (search for Nemotron-RL)
- **Report Issues:** https://github.com/NVIDIA-NeMo/Gym/issues

---

## üßπ Cleanup

When you're done, stop the servers by pressing **Ctrl+C** in the terminal running `ng_run`.

Optional cleanup commands:
```bash
# Clean up Ray processes
ray stop --force

# Remove generated rollouts (if desired)
rm -rf results/
```