# LLM Reinforcement Learning - Getting Started

This notebook demonstrates how to set up and get started with Large Language Model (LLM) reinforcement learning. We'll cover:

1. **Environment Setup**: Installing PyTorch CPU version and dependencies
2. **Basic Components**: Understanding RL agents, environments, and models
3. **Simple Example**: Creating a basic text generation environment
4. **Training Demo**: Simple PPO training example

Since you don't have CUDA, we'll use CPU-only PyTorch for this tutorial.

## 1. Check Current PyTorch Installation

Let's first check if PyTorch is installed and whether CUDA is available.

In [2]:
print("hello world ")

hello world 


In [1]:
try:
    import torch
    print(f"PyTorch version: {torch.__version__}")
    print(f"CUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"CUDA version: {torch.version.cuda}")
        print(f"Number of GPUs: {torch.cuda.device_count()}")
    else:
        print("Running on CPU")
except ImportError:
    print("PyTorch is not installed yet")

PyTorch version: 2.5.1
CUDA available: False
Running on CPU


## 2. Install CPU-Only PyTorch with Conda

Since CUDA is not available, we'll install the CPU-only version of PyTorch. Run the following command in your terminal or uncomment and run the cell below:

In [3]:
# Uncomment and run this cell to install PyTorch CPU version
!conda install pytorch torchvision torchaudio cpuonly -c pytorch -y

# Alternative: Install additional ML packages
# !pip install transformers datasets accelerate gymnasium tensorboard wandb matplotlib seaborn

print("Run the above commands in your terminal to install PyTorch and dependencies")

Run the above commands in your terminal to install PyTorch and dependenciesRetrieving notices: done
Channels:
 - pytorch
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.




Could Not Find C:\WINDOWS\TEMP\tmpu57mo79p.bat


## 3. Verify PyTorch Installation

After installation, let's verify that PyTorch is working correctly:

In [4]:
import torch
import torch.nn as nn
import numpy as np

print("=== PyTorch Installation Verification ===")
print(f"PyTorch version: {torch.__version__}")
print(f"Python version: {torch.version.__version__ if hasattr(torch.version, '__version__') else 'N/A'}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

# Test basic operations
x = torch.randn(3, 3)
y = torch.randn(3, 3)
z = torch.matmul(x, y)

print(f"\nBasic tensor operations working: {z.shape == (3, 3)}")
print("✅ PyTorch is installed and working correctly!")

=== PyTorch Installation Verification ===
PyTorch version: 2.5.1
Python version: 2.5.1
CUDA available: False
Device: cpu

Basic tensor operations working: True
✅ PyTorch is installed and working correctly!


## 4. Test PyTorch with Simple Tensor Operations

Let's perform some basic tensor operations to ensure everything is working:

In [6]:
# Create some tensors
print("=== Tensor Operations Test ===")

# Basic tensor creation
a = torch.tensor([1, 2, 3, 4])
b = torch.randn(2, 3)
c = torch.zeros(3, 3)

print(f"Tensor a: {a}")
print(f"Tensor b shape: {b.shape}")
print(f"Tensor c (zeros): \n{c}")

# Basic operations
result = a + 10
print(f"\nTensor arithmetic (a + 10): {result}")

# Matrix multiplication
x = torch.randn(3, 4)
y = torch.randn(4, 2)
z = torch.mm(x, y)
print(f"\nMatrix multiplication result shape: {z.shape}")

# Neural network layer test
linear_layer = nn.Linear(4, 2)
input_tensor = torch.randn(1, 4)
output = linear_layer(input_tensor)
print(f"\nNeural network layer output shape: {output.shape}")

print("\n✅ All tensor operations successful!")



=== Tensor Operations Test ===
Tensor a: tensor([1, 2, 3, 4])
Tensor b shape: torch.Size([2, 3])
Tensor c (zeros): 
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

Tensor arithmetic (a + 10): tensor([11, 12, 13, 14])

Matrix multiplication result shape: torch.Size([3, 2])

Neural network layer output shape: torch.Size([1, 2])

✅ All tensor operations successful!


## 5. Introduction to LLM Reinforcement Learning

Now that PyTorch is working, let's explore the basics of LLM reinforcement learning:

### Key Components:

1. **Environment**: Defines the text generation task and reward function
2. **Agent**: The RL algorithm (e.g., PPO) that learns to generate better text
3. **Model**: The language model (e.g., GPT-2) that generates text
4. **Reward Function**: Evaluates the quality of generated text

### Simple Example - Text Generation Environment

## 7. Next Steps and Resources

Congratulations! You now have a working environment for LLM reinforcement learning. Here's what you can do next:

### 🚀 Next Steps:

1. **Explore the Project Structure**: Check out the `src/` directory for:
   - `agents/`: PPO and other RL algorithms
   - `models/`: Language model wrappers for RL
   - `environments/`: Text generation environments
   - `training/`: Training and evaluation scripts

2. **Run Training Examples**:
   ```bash
   python src/training/train_rl_agent.py --config configs/ppo_config.yaml
   ```

3. **Experiment with Different Models**:
   - Try different base models (GPT-2, DistilGPT-2)
   - Experiment with different reward functions
   - Test various RL algorithms

### 📚 Key Concepts to Learn:

- **Proximal Policy Optimization (PPO)**: The main RL algorithm we use
- **Policy Gradient Methods**: How RL agents learn from rewards
- **Text Generation Environments**: Defining tasks and rewards
- **Language Model Fine-tuning**: Adapting models for specific tasks

### 🔗 Useful Resources:

- [Hugging Face Transformers](https://huggingface.co/docs/transformers/)
- [OpenAI Spinning Up in Deep RL](https://spinningup.openai.com/)
- [PPO Paper](https://arxiv.org/abs/1707.06347)
- [RLHF Paper](https://arxiv.org/abs/2203.02155)

### 🛠️ Development Tips:

- Start with simple environments and small models
- Monitor training with TensorBoard or Weights & Biases
- Use CPU for prototyping, GPU for larger experiments
- Experiment with different reward functions

Happy learning! 🎉

## 8. GRPO vs PPO: Better Choice for CPU Training

You're absolutely right! **GRPO (Group Relative Policy Optimization)** is often a better choice than PPO, especially for CPU training of LLMs. Here's why:

### 🚀 Why GRPO is Better for CPU:

1. **Memory Efficiency**: GRPO uses relative rewards which reduces memory footprint
2. **Computational Efficiency**: Less complex advantage estimation compared to PPO
3. **Stable Training**: More stable convergence, especially with limited compute
4. **Batch Processing**: Better suited for CPU batch processing patterns

### 🔄 GRPO vs PPO Key Differences:

| Aspect | PPO | GRPO |
|--------|-----|------|
| **Memory Usage** | Higher (stores values, advantages) | Lower (relative rewards) |
| **Computation** | Complex GAE calculation | Simpler relative comparisons |
| **Stability** | Can be unstable with large batches | More stable across batch sizes |
| **CPU Performance** | Moderate | Better optimized |

### 💡 When to Use Each:

- **Use GRPO**: CPU training, limited memory, need stability
- **Use PPO**: GPU clusters, complex environments, established pipelines

Let's implement a simple GRPO agent to compare!

In [None]:
# Test Azure ML Compute with Hello World
# Simple test to verify your compute cluster is working

import subprocess
import os

print("🧪 Testing Azure ML Compute - Hello World")
print("=" * 45)

# Create a simple test script
test_script = """
import sys
import torch
import platform

print("=== Hello World from Azure ML Compute ===")
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CPU count: {torch.get_num_threads()}")
print("✅ Compute is working correctly!")
"""

# Write test script to file
test_file = "test_compute.py"
with open(test_file, "w") as f:
    f.write(test_script)

print(f"📝 Created test script: {test_file}")
print(f"💡 Next steps:")
print(f"   1. Submit this script to your Azure ML compute cluster")
print(f"   2. Use Azure ML extension or CLI to run: az ml job create")
print(f"   3. Check output to verify compute is working")

print(f"\n🔗 Test script contents:")
print(test_script)

# Submit Hello World Job to Azure ML Compute Instance
# This actually connects to your workspace and runs on your compute

from azure.ai.ml import MLClient, command
from azure.identity import DefaultAzureCredential
import os

print("🚀 Submitting Hello World to Azure ML Compute Instance")
print("=" * 55)

# You need to fill these in with your actual values
SUBSCRIPTION_ID = "your-subscription-id"
RESOURCE_GROUP = "your-resource-group" 
WORKSPACE_NAME = "your-workspace-name"
COMPUTE_NAME = "your-compute-instance-name"  # NOT a cluster - single compute instance

try:
    # Connect to workspace
    ml_client = MLClient(
        DefaultAzureCredential(),
        subscription_id=SUBSCRIPTION_ID,
        resource_group_name=RESOURCE_GROUP,
        workspace_name=WORKSPACE_NAME
    )
    
    # Create hello world script
    hello_script = """
import sys
import torch
import platform
import os

print("=== Hello World from Azure ML Compute Instance ===")
print(f"Host: {platform.node()}")
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CPU cores: {os.cpu_count()}")
print("✅ Your compute instance is working!")
"""
    
    # Write script to file
    os.makedirs("./hello_job", exist_ok=True)
    with open("./hello_job/hello.py", "w") as f:
        f.write(hello_script)
    
    # Create job
    job = command(
        code="./hello_job",
        command="python hello.py",
        compute=COMPUTE_NAME,
        environment="azureml://registries/azureml/environments/sklearn-1.0/versions/1"
    )
    
    # Submit job
    submitted_job = ml_client.jobs.create_or_update(job)
    print(f"✅ Job submitted: {submitted_job.name}")
    print(f"🔗 View in studio: {submitted_job.studio_url}")
    
except Exception as e:
    print(f"❌ Error: {e}")
    print("💡 Please update the workspace details above with your actual values")

🧪 Testing Azure ML Compute - Hello World
📝 Created test script: test_compute.py
💡 Next steps:
   1. Submit this script to your Azure ML compute cluster
   2. Use Azure ML extension or CLI to run: az ml job create
   3. Check output to verify compute is working

🔗 Test script contents:

import sys
import torch
import platform

print("=== Hello World from Azure ML Compute ===")
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CPU count: {torch.get_num_threads()}")
print("✅ Compute is working correctly!")

