# `gpt_oss` usage
> This comprehensive notebook demonstrates all major components of the `gpt_oss` package



### **Core Components**
1. **Tokenizer**: o200k_base with Harmony extensions
2. **Stub Generation**: Testing without full models
3. **Browser Tool**: Web search capabilities (with Exa backend)
4. **Python Tool**: Code execution in isolated Docker containers
5. **Model Components**: PyTorch architecture elements
6. **Evaluation**: Answer extraction and normalization
7. **Harmony Integration**: Message structures and roles
8. **API Utilities**: Reasoning effort and tool routing

### **Advanced Features**
9. **PyTorch Model Architecture**: Complete transformer implementation with RMSNorm, RoPE, attention blocks
10. **Custom Tools & Integration**: Tool creation framework and end-to-end workflows

### **Backend Support**
- **PyTorch**: Full transformer with attention, MLP, normalization
- **Triton**: GPU-optimized kernels (requires CUDA)
- **vLLM**: Production serving with tensor parallelism
- **Metal**: Apple Silicon optimization

### **Dependencies**
All required packages are automatically installed in the first cell:
- Core GPT-OSS components via PYTHONPATH
- PyTorch for model components
- OpenAI Harmony for message structures
- FastAPI, aiohttp, and other utilities

### **External Requirements**
- **EXA_API_KEY**: Required for browser tool web searches
- **Docker**: Required for Python tool code execution

The `gpt-oss` package provides a production-ready, research-oriented toolkit for language model inference, evaluation, and tool integration across multiple hardware configurations.



## Setup and Dependencies

First, let's ensure all required packages are installed and set up the Python path:

In [None]:
# Install all required packages and set up Python path
import subprocess
import sys
import os

# Install all packages in one command
packages = [
    "torch>=2.0.0",
    "openai-harmony",
    "tiktoken",
    "safetensors>=0.5.0",
    "chz>=0.3.0",
    "pydantic>=2.11.7",
    "fastapi>=0.116.1",
    "aiohttp>=3.12.14",
    "structlog",
    "requests>=2.31.0",
    "termcolor",
    "docker>=7.1.0"
]

print("Installing all required packages...")
try:
    # Install all packages in one pip command
    subprocess.check_call([sys.executable, "-m", "pip", "install"] + packages + ["-q"])
    print("✓ All packages installed successfully")
except subprocess.CalledProcessError as e:
    print(f"✗ Package installation failed: {e}")
    print("Attempting to install packages individually...")
    for package in packages:
        try:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package, "-q"])
            print(f"  ✓ {package}")
        except:
            print(f"  ✗ {package}")

# Add the GPT-OSS directory to Python path 
# This will eventually be `pip install gpt-oss`
gpt_oss_path = '~/labs/gpt-oss'
if gpt_oss_path not in sys.path:
    sys.path.insert(0, gpt_oss_path)

print(f"\n✓ Added {gpt_oss_path} to Python path")
print("✓ Setup complete!")

Installing all required packages...
✓ All packages installed successfully

✓ Added /Users/hhegadehallimadh/labs/gpt-oss to Python path
✓ Setup complete!


## 1. Tokenizer Usage

The tokenizer is based on o200k_base with Harmony-specific extensions:

In [83]:
from gpt_oss.tokenizer import get_tokenizer

# Initialize tokenizer
tokenizer = get_tokenizer()

# Encode text to tokens
text = "Hello, world! This is a test."
tokens = tokenizer.encode(text)
decoded = tokenizer.decode(tokens)

print(f"Original text: {text}")
print(f"Tokens: {tokens}")
print(f"Decoded: {decoded}")
print(f"Token count: {len(tokens)}")

Original text: Hello, world! This is a test.
Tokens: [13225, 11, 2375, 0, 1328, 382, 261, 1746, 13]
Decoded: Hello, world! This is a test.
Token count: 9


## 2. Stub Generation

Test token generation without requiring a full model:

In [84]:
from gpt_oss.responses_api.utils import stub_infer_next_token

# Example tokens for "Hello, world"
input_tokens = [13225, 11, 2375]

# Generate next token with different temperatures
for temp in [0.0, 0.5, 1.0]:
    next_token = stub_infer_next_token(input_tokens, temperature=temp)
    print(f"Temperature {temp}: Next token = {next_token}")

Temperature 0.0: Next token = 17196
Temperature 0.5: Next token = 200008
Temperature 1.0: Next token = 17


## 3. Browser Tool Setup

Demonstrate the browser tool for web search (without requiring API key):

In [85]:
# Import browser tool classes (works without API key for import only)
try:
    from gpt_oss.tools.simple_browser import SimpleBrowserTool
    from gpt_oss.tools.simple_browser.backend import ExaBackend
    
    print("Browser tool classes imported successfully")
    print("SimpleBrowserTool class:", SimpleBrowserTool.__name__)
    print("ExaBackend class:", ExaBackend.__name__)
    
    # Check if API key is available
    import os
    if os.environ.get("EXA_API_KEY"):
        print("✓ EXA_API_KEY is set - browser tool can be used")
        
        # Create browser tool instance
        backend = ExaBackend(source="web")
        browser = SimpleBrowserTool(backend=backend)
        print(f"Tool name: {browser.name}")
        print(f"Tool instruction: {browser.instruction[:100]}...")
    else:
        print("⚠️ EXA_API_KEY not set - browser tool requires this environment variable")
        print("To use the browser tool, set: export EXA_API_KEY='your-api-key'")
        print("Get an API key from: https://exa.ai")
        
except Exception as e:
    print(f"Browser tool import failed: {e}")

Browser tool classes imported successfully
SimpleBrowserTool class: SimpleBrowserTool
ExaBackend class: ExaBackend
⚠️ EXA_API_KEY not set - browser tool requires this environment variable
To use the browser tool, set: export EXA_API_KEY='your-api-key'
Get an API key from: https://exa.ai


## 4. Python Docker Tool

Execute Python code in isolated Docker containers (requires Docker running):

In [86]:
# Python Docker Tool - Execute code in isolated containers
try:
    from gpt_oss.tools.python_docker.docker_tool import PythonTool, call_python_script
    import docker
    
    print("Python Docker tool classes imported successfully")
    
    # Check if Docker is running
    try:
        docker_client = docker.from_env()
        docker_client.ping()
        print("✓ Docker is running")
        
        # Example 1: Direct script execution
        script = """
import math
import numpy as np

# Mathematical calculations
print("Mathematical calculations:")
print(f"Pi = {math.pi}")
print(f"Square root of 2 = {math.sqrt(2)}")

# NumPy operations
arr = np.array([1, 2, 3, 4, 5])
print(f"\\nNumPy array: {arr}")
print(f"Mean: {np.mean(arr)}")
print(f"Standard deviation: {np.std(arr)}")

# Data structure example
data = {'name': 'GPT-OSS', 'version': '1.0', 'features': ['tokenization', 'inference', 'tools']}
print(f"\\nData structure: {data}")
"""
        
        print("\nExecuting Python script in Docker container...")
        try:
            output = call_python_script(script)
            print("Script output:")
            print(output)
        except Exception as e:
            print(f"Script execution failed: {e}")
            print("Note: Requires python:3.11 Docker image")
        
        # Example 2: Using PythonTool with async
        print("\n--- Using PythonTool with async ---")
        import asyncio
        from openai_harmony import Message, Author, Role, TextContent
        
        async def test_python_tool():
            tool = PythonTool()
            
            # Create a message with Python code
            code_message = Message(
                author=Author(role=Role.USER, name="user"),
                content=[TextContent(text="""
# Generate Fibonacci sequence
def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[-1] + fib[-2])
        return fib

result = fibonacci(10)
print(f"First 10 Fibonacci numbers: {result}")
print(f"Sum: {sum(result)}")
""")]
            ).with_recipient("python")
            
            # Process the code
            async for response in tool.process(code_message):
                print("Tool output:")
                print(response.content[0].text)
        
        # Run async function
        await test_python_tool()
        
    except docker.errors.DockerException:
        print("✗ Docker is not running")
        print("To use Python Docker tool:")
        print("1. Install Docker: https://docs.docker.com/get-docker/")
        print("2. Start Docker Desktop or Docker daemon")
        print("3. Pull Python image: docker pull python:3.11")
        
except ImportError as e:
    print(f"Python Docker tool import failed: {e}")
    print("The tool requires the 'docker' package: pip install docker")

Python Docker tool classes imported successfully
✗ Docker is not running
To use Python Docker tool:
1. Install Docker: https://docs.docker.com/get-docker/
2. Start Docker Desktop or Docker daemon
3. Pull Python image: docker pull python:3.11


## 5. Model Architecture Components

Explore the PyTorch model components:

In [87]:
import torch
from gpt_oss.torch.model import swiglu, sdpa

# Test SwiGLU activation
x = torch.randn(2, 4)
output = swiglu(x)
print(f"Input shape: {x.shape}")
print(f"SwiGLU output shape: {output.shape}")
print(f"Input: {x}")
print(f"Output: {output}")

Input shape: torch.Size([2, 4])
SwiGLU output shape: torch.Size([2, 2])
Input: tensor([[ 0.4551,  0.0592,  0.0381, -1.7763],
        [-0.6815, -0.9640,  1.2013,  1.2982]])
Output: tensor([[ 0.3300, -0.0153],
        [-0.0059,  2.4445]])


## 6. Evaluation Framework

Explore the evaluation utilities:

In [88]:
from gpt_oss.evals.aime_eval import extract_boxed_text, normalize_number

# Test answer extraction
test_text = r"The answer is \boxed{42}."
extracted = extract_boxed_text(test_text)
normalized = normalize_number(extracted)

print(f"Original text: {test_text}")
print(f"Extracted answer: {extracted}")
print(f"Normalized: {normalized}")

Original text: The answer is \boxed{42}.
Extracted answer: 42
Normalized: 42


## 7. Harmony Integration

Work with Harmony message structures:

In [89]:
from openai_harmony import Message, TextContent, Author, Role

# Create a user message
user_message = Message(
    author=Author(role=Role.USER, name="user"),
    content=[TextContent(text="Hello, how are you?")]
)

# Create an assistant message
assistant_message = Message(
    author=Author(role=Role.ASSISTANT, name="assistant"),
    content=[TextContent(text="I'm doing well, thank you for asking!")]
)

print(f"User: {user_message.content[0].text}")
print(f"Assistant: {assistant_message.content[0].text}")

User: Hello, how are you?
Assistant: I'm doing well, thank you for asking!


## 8. API Server Components

Explore API server utilities:

In [90]:
from gpt_oss.responses_api.api_server import get_reasoning_effort, is_not_builtin_tool

# Test reasoning effort mapping
for effort in ["low", "medium", "high"]:
    reasoning_effort = get_reasoning_effort(effort)
    print(f"Effort '{effort}' maps to: {reasoning_effort}")

# Test tool routing
tools = ["browser", "python", "builtin_tool"]
for tool in tools:
    is_custom = is_not_builtin_tool(tool)
    print(f"Tool '{tool}' is custom: {is_custom}")

Effort 'low' maps to: ReasoningEffort.LOW
Effort 'medium' maps to: ReasoningEffort.MEDIUM
Effort 'high' maps to: ReasoningEffort.HIGH
Tool 'browser' is custom: True
Tool 'python' is custom: False
Tool 'builtin_tool' is custom: True


## 9. PyTorch Model Architecture

Explore the complete PyTorch backend with all model components:

In [91]:
from gpt_oss.torch.model import (
    ModelConfig, RMSNorm, RotaryEmbedding, AttentionBlock, 
    MLPBlock, TransformerBlock, Transformer
)
import torch

# Model configuration (using smaller values for testing)
config = ModelConfig(
    num_hidden_layers=2,
    vocab_size=1000,
    hidden_size=64,
    head_dim=16,
    num_attention_heads=4,
    num_key_value_heads=2,
    initial_context_length=128
)

print(f"Model config: {config.num_hidden_layers} layers, {config.vocab_size} vocab size")

# Device setup (force CPU for compatibility)
device = torch.device("cpu")
print(f"Using device: {device}")

# RMS Normalization
rms_norm = RMSNorm(num_features=config.hidden_size, device=device)
x = torch.randn(2, 10, config.hidden_size, device=device)
normalized = rms_norm(x)
print(f"RMSNorm input shape: {x.shape}, output shape: {normalized.shape}")

# Rotary Embedding
rope = RotaryEmbedding(
    head_dim=config.head_dim,
    base=config.rope_theta,
    dtype=torch.float32,
    device=device
)

batch_size, seq_len = 2, 10
query = torch.randn(batch_size, seq_len, config.num_attention_heads, config.head_dim, device=device)
key = torch.randn(batch_size, seq_len, config.num_key_value_heads, config.head_dim, device=device)
rotated_q, rotated_k = rope(query, key)
print(f"RoPE query shape: {query.shape} -> {rotated_q.shape}")

Model config: 2 layers, 1000 vocab size
Using device: cpu
RMSNorm input shape: torch.Size([2, 10, 64]), output shape: torch.Size([2, 10, 64])
RoPE query shape: torch.Size([2, 10, 4, 16]) -> torch.Size([2, 10, 4, 16])


## 10. Advanced Tools and Complete Integration

Custom tool creation and complete workflow example:

In [92]:
import asyncio
from gpt_oss.tokenizer import get_tokenizer
from gpt_oss.torch.model import ModelConfig
from gpt_oss.responses_api.utils import stub_infer_next_token
from gpt_oss.tools.tool import Tool
from openai_harmony import Message, Author, Role, TextContent
from typing import AsyncIterator

# Custom tool example
class MathTool(Tool):
    @property
    def name(self) -> str:
        return "math_calculator"
    
    @property
    def instruction(self) -> str:
        return "A calculator tool that can evaluate mathematical expressions."
    
    async def _process(self, message: Message) -> AsyncIterator[Message]:
        expression = message.content[0].text.strip()
        
        try:
            # Safe evaluation of basic math expressions
            allowed_names = {
                'abs': abs, 'round': round, 'min': min, 'max': max, 
                'sum': sum, 'pow': pow, '__builtins__': {}
            }
            
            result = eval(expression, allowed_names)
            response_text = f"Result: {result}"
        except Exception as e:
            response_text = f"Error: {str(e)}"
        
        response = Message(
            author=Author(role=Role.TOOL, name=self.name),
            content=[TextContent(text=response_text)]
        ).with_recipient("assistant")
        
        yield response

# Complete workflow demonstration
async def complete_workflow_demo():
    print("=== Complete GPT-OSS Workflow Demo ===")
    
    # 1. Initialize components
    tokenizer = get_tokenizer()
    config = ModelConfig()
    math_tool = MathTool()
    
    # 2. Process user input
    user_input = "Calculate 2 + 2 * 3"
    print(f"User input: {user_input}")
    
    # 3. Tokenization
    tokens = tokenizer.encode(user_input)
    print(f"Tokenized: {len(tokens)} tokens")
    
    # 4. Tool usage
    tool_message = Message(
        author=Author(role=Role.USER, name="user"),
        content=[TextContent(text="2 + 2 * 3")]
    )
    
    async for response in math_tool.process(tool_message):
        print(f"Tool result: {response.content[0].text}")
    
    # 5. Model configuration info
    print(f"Model: {config.num_hidden_layers} layers, {config.vocab_size} vocab")
    
    # 6. Stub generation
    next_token = stub_infer_next_token(tokens[:5], temperature=0.7)
    print(f"Generated token: {next_token}")
    
    print("Workflow complete!")

# For Jupyter notebooks, use await directly instead of asyncio.run()
# This works because Jupyter already has an event loop running
await complete_workflow_demo()

# Note: If running in a regular Python script (not Jupyter), use:
# asyncio.run(complete_workflow_demo())

=== Complete GPT-OSS Workflow Demo ===
User input: Calculate 2 + 2 * 3
Tokenized: 9 tokens
Tool result: Result: 8
Model: 36 layers, 201088 vocab
Generated token: 659
Workflow complete!


### **Next Steps**
To use GPT-OSS with actual models:
1. Download or train model checkpoints
2. Set up API keys for external tools (EXA_API_KEY for browser)
3. Configure Docker for Python tool execution
4. Scale to GPU/distributed setups for production use