# PAL MCP Integration with Nebius Token Factory

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nebius/token-factory-cookbook/blob/main/agents/pal-mcp-integration/pal_nebius_tutorial.ipynb)
[![](https://img.shields.io/badge/Powered%20by-Nebius-orange?style=flat&labelColor=darkblue&color=orange)](http://tokenfactory.nebius.com/)

## Overview

This tutorial demonstrates how to integrate **Nebius Token Factory** with **PAL MCP Server** to orchestrate multiple AI models through Claude Code.

**PAL MCP** (Provider Abstraction Layer - Model Context Protocol) enables:
- Multi-model orchestration from Claude Code
- Conversation continuity across different AI models
- Specialized tools for code review, debugging, consensus

**Nebius Token Factory** provides:
- Access to top open-source models (Qwen3, DeepSeek, Llama, GLM, GPT-OSS)
- OpenAI-compatible API
- Extended context windows (up to 262K tokens)

## Prerequisites

- **Claude Code CLI** installed ([claude.ai/code](https://claude.ai/code))
- **Nebius API Key** from [tokenfactory.nebius.com](https://tokenfactory.nebius.com/)
- **Python 3.10+** and **uv** package manager

## References

- [PAL MCP Server GitHub](https://github.com/BeehiveInnovations/pal-mcp-server)
- [Nebius Token Factory Documentation](https://docs.tokenfactory.nebius.com/)
- [Claude Code Documentation](https://claude.ai/code)

## 1 - Getting Started

### 1.1 - Get Your Nebius API Key

1. Visit [Nebius Token Factory](https://tokenfactory.nebius.com/)
2. Sign up for a free account
3. Navigate to API Keys section
4. Copy your API key

### 1.2 - If Running on Google Colab

Add `NEBIUS_API_KEY` to **Secrets** panel:

![Colab Secrets](https://github.com/nebius/token-factory-cookbook/raw/main/images/google-colab-1.png)

### 1.3 - If Running Locally

We'll configure the PAL MCP Server `.env` file during installation.

## 2 - Install Dependencies

We'll install the OpenAI SDK to demonstrate direct API calls and compare with PAL MCP usage.

In [None]:
%%capture
!pip install -q openai python-dotenv

## 3 - Load Configuration

In [None]:
import os
import sys

# Recommended way of getting configuration
if os.getenv("COLAB_RELEASE_TAG"):
    print("Running in Colab")
    from google.colab import userdata
    NEBIUS_API_KEY = userdata.get('NEBIUS_API_KEY')
else:
    print("NOT running in Colab")
    from dotenv import load_dotenv
    load_dotenv()
    NEBIUS_API_KEY = os.getenv('NEBIUS_API_KEY')

# Quick hack (not recommended) - hardcode key
# NEBIUS_API_KEY = "your_key_here"

if NEBIUS_API_KEY:
    print('‚úÖ NEBIUS_API_KEY found')
    os.environ['NEBIUS_API_KEY'] = NEBIUS_API_KEY
else:
    raise RuntimeError('‚ùå NEBIUS_API_KEY NOT found')

## 4 - Understand Nebius Models Available via PAL MCP

PAL MCP provides access to Nebius Token Factory models with convenient aliases.

### Key Models

| Model | Alias | Context | Best For |
|-------|-------|---------|----------|
| **Qwen3-235B-Instruct** | `nebius-qwen3` | 262K | General reasoning, coding |
| **Qwen3-235B-Thinking** | `nebius-qwen3-thinking` | 262K | Extended reasoning |
| **GPT-OSS-120B** | `nebius-gpt-oss` | 128K | OpenAI-style reasoning |
| **DeepSeek-R1** | `nebius-deepseek-r1` | 128K | Chain-of-thought reasoning |
| **DeepSeek-V3.2** | `nebius-deepseek` | 128K | Fast general purpose |
| **Llama-3.3-70B** | `nebius-llama` | 128K | Balanced performance |
| **GLM-4.5** | `nebius-glm` | 128K | Vision + function calling |

## 5 - Test Direct Nebius API Access

Before setting up PAL MCP, let's verify direct API access works.

### 5.1 - Initialize OpenAI Client for Nebius

In [None]:
from openai import OpenAI

# Create client pointing to Nebius Token Factory
client = OpenAI(
    base_url="https://api.studio.nebius.com/v1/",
    api_key=os.environ.get('NEBIUS_API_KEY')
)

print("‚úÖ Nebius client initialized")

### 5.2 - Test Basic Model Call

In [None]:
%%time

# Test with Qwen3-235B
response = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Explain what Nebius Token Factory is in one sentence."}
    ],
    temperature=0.7
)

print("\nüìù Model Response:")
print(response.choices[0].message.content)
print("\nüìä Token Usage:", response.usage)

### 5.3 - Test Multiple Models

Let's test different Nebius models to see their capabilities.

In [None]:
# Define models to test
models_to_test = [
    ("Qwen/Qwen3-235B-A22B-Instruct-2507", "Qwen3 235B"),
    ("openai/gpt-oss-120b", "GPT-OSS 120B"),
    ("deepseek-ai/DeepSeek-V3.2", "DeepSeek V3.2"),
    ("meta-llama/Llama-3.3-70B-Instruct", "Llama 3.3 70B")
]

question = "What is 7 * 8 + 15?"

print(f"üßÆ Testing question: {question}\n")
print("=" * 60)

for model_id, model_name in models_to_test:
    try:
        response = client.chat.completions.create(
            model=model_id,
            messages=[{"role": "user", "content": question}],
            temperature=0.3,
            max_tokens=100
        )
        
        answer = response.choices[0].message.content.strip()
        print(f"\nü§ñ {model_name}:")
        print(f"   {answer[:150]}..." if len(answer) > 150 else f"   {answer}")
        
    except Exception as e:
        print(f"\n‚ùå {model_name}: Error - {str(e)[:100]}")

print("\n" + "=" * 60)

## 6 - Install and Configure PAL MCP Server

Now let's set up PAL MCP to orchestrate these models through Claude Code.

### 6.1 - Installation Steps

**Note**: These commands should be run in your terminal, not in this notebook.

```bash
# Clone PAL MCP Server
cd ~/Desktop  # or your preferred location
git clone https://github.com/BeehiveInnovations/pal-mcp-server.git
cd pal-mcp-server

# Run automatic setup
./run-server.sh
```

The setup script will:
1. Create Python virtual environment
2. Install dependencies
3. Prompt for API keys
4. Configure Claude Code integration
5. Test the connection

### 6.2 - Configure Nebius Integration

Edit `~/Desktop/pal-mcp-server/.env` to add your Nebius configuration:

```bash
# Nebius Token Factory API Key
NEBIUS_API_KEY=your_api_key_here

# Optional: Restrict available models (leave empty for all)
# NEBIUS_ALLOWED_MODELS=nebius-qwen3,nebius-deepseek-r1,nebius-gpt-oss

# Model Selection
DEFAULT_MODEL=auto  # Let Claude choose best model

# Tool Configuration (enable essential tools, disable heavy ones)
DISABLED_TOOLS=analyze,refactor,testgen,secaudit,docgen,tracer

# Conversation Settings
CONVERSATION_TIMEOUT_HOURS=24
MAX_CONVERSATION_TURNS=40

# Logging
LOG_LEVEL=INFO
```

### 6.3 - Verify Claude Code Configuration

Check that Claude Code is configured to use PAL MCP:

```bash
# View Claude Code settings
cat ~/.claude/settings.json
```

You should see PAL configured under `mcpServers`:

```json
{
  "mcpServers": {
    "pal": {
      "command": "/path/to/pal-mcp-server/.pal_venv/bin/python",
      "args": ["/path/to/pal-mcp-server/server.py"],
      "env": {
        "NEBIUS_API_KEY": "your_key",
        "DEFAULT_MODEL": "auto"
      }
    }
  }
}
```

## 7 - Using PAL MCP with Claude Code

Now that PAL MCP is installed, let's explore how to use it from Claude Code.

PAL tools can be invoked two ways:
1. **Slash commands** (direct): `/pal:chat`, `/pal:consensus`, `/pal:codereview`, etc.
2. **Natural language** (indirect): Claude interprets your request and calls the appropriate PAL tool

### 7.1 - List Available Models

```
/pal:listmodels
```

Claude will show all available Nebius models with their aliases and capabilities.

### 7.2 - Simple Chat with a Nebius Model

```
/pal:chat with nebius-qwen3 explain what a Python decorator is
```

Or with a specific prompt:

```
/pal:chat model=nebius-deepseek-r1 What are the tradeoffs between Redis and Memcached for session storage?
```

### 7.3 - Extended Reasoning

```
/pal:thinkdeep with nebius-deepseek-r1 analyze the time complexity of merge sort vs timsort
```

DeepSeek-R1 provides visible chain-of-thought reasoning with its analysis.

### 7.4 - Quick Planning

```
/pal:planner with nebius-gpt-oss break down the migration from REST to GraphQL for our user service
```

## 8 - Multi-Model Consensus Workflows

One of PAL MCP's most powerful features is multi-model consensus ‚Äî getting multiple Nebius models to weigh in on a decision.

### 8.1 - Architecture Decision

```
/pal:consensus with nebius-qwen3, nebius-deepseek-r1, and nebius-gpt-oss Should we use REST API or GraphQL for our new microservice?
```

**What happens:**
1. Claude sends the question to all three models
2. Each model provides its perspective:
   - **Qwen3**: General reasoning with broad coverage
   - **DeepSeek-R1**: Deep chain-of-thought analysis
   - **GPT-OSS**: OpenAI-style balanced assessment
3. Claude synthesizes a final recommendation with agreement/disagreement summary

### 8.2 - Code Design Consensus

```
/pal:consensus with nebius-qwen3 and nebius-llama Should we use inheritance or composition for this class hierarchy?
```

### 8.3 - Algorithm Selection

```
/pal:consensus with nebius-deepseek-r1, nebius-gpt-oss, and nebius-qwen3-thinking Choose the best sorting algorithm for: 10M records, 2GB memory limit, partially sorted data
```

### 8.4 - Technology Stack Decision

```
/pal:consensus with nebius-qwen3, nebius-deepseek, and nebius-gpt-oss For a real-time chat app, should we use WebSockets, SSE, or long polling?
```

## 9 - Code Review Workflows

PAL MCP's `codereview` tool enables comprehensive multi-model code reviews.

### 9.1 - Single Model Review

```
/pal:codereview with nebius-qwen3 Review the auth module for security issues and code quality
```

Claude will:
- Walk through code systematically
- Track confidence levels (exploring ‚Üí low ‚Üí medium ‚Üí high ‚Üí certain)
- Identify issues by severity (critical ‚Üí high ‚Üí medium ‚Üí low)
- Provide actionable feedback

### 9.2 - Multi-Model Collaborative Review

```
/pal:codereview with nebius-qwen3 and nebius-deepseek-r1 Review src/api/ for security vulnerabilities and race conditions
```

Then follow up with a plan:

```
/pal:planner with nebius-gpt-oss Based on the previous code review findings, create a fix strategy
```

**Workflow:**
1. **Qwen3**: Initial comprehensive analysis
2. **DeepSeek-R1**: Deep reasoning catches subtle issues
3. **Claude**: Consolidates findings from both
4. **GPT-OSS**: Creates prioritized implementation plan

### 9.3 - Pre-Commit Validation

After making changes, validate before committing:

```
/pal:precommit with nebius-qwen3 Validate my staged changes for regressions
```

### 9.4 - Challenge Your Own Code

```
/pal:challenge with nebius-gpt-oss Is my singleton pattern here actually necessary, or am I over-engineering?
```

## 10 - Advanced Multi-Model Patterns

### 10.1 - Thinking Model Chain

Combine thinking models for maximum reasoning depth:

```
/pal:thinkdeep with nebius-deepseek-r1 Analyze this algorithm for correctness and edge cases
```

Then validate with a second thinking model:

```
/pal:chat with nebius-qwen3-thinking Validate the previous analysis ‚Äî are there cases it missed?
```

### 10.2 - Debugging with PAL

```
/pal:debug with nebius-deepseek-r1 Investigate this race condition in the payment processing module
```

DeepSeek-R1 performs systematic root cause analysis with confidence tracking.

### 10.3 - API Documentation Lookup

When you need current docs (not stale training data):

```
/pal:apilookup Get the latest FastAPI documentation for WebSocket support
```

### 10.4 - Vision Model Integration

Analyze diagrams or screenshots with vision-capable models:

```
/pal:chat with nebius-glm Analyze this architecture diagram and explain the data flow
```

```
/pal:chat with nebius-qwen-vl What does this error screenshot show?
```

### 10.5 - CLI-to-CLI Bridging

Spawn isolated sub-agents for context-heavy tasks:

```
/pal:clink with cli_name="gemini" role="codereviewer" Audit the auth module for security issues
```

The sub-agent works in a fresh context and returns only the final report.

## 11 - Example: Complete Code Review Workflow

Let's walk through a complete example using Python code.

### 11.1 - Sample Code to Review

In [None]:
# Sample code with intentional issues
sample_code = '''
def calculate_average(numbers):
    total = 0
    for num in numbers:
        total = total + num
    return total / len(numbers)

def process_user_data(user_input):
    # Execute user command directly
    result = eval(user_input)
    return result

class DatabaseConnection:
    def __init__(self, host, password):
        self.host = host
        self.password = password
        print(f"Connecting to {host} with password: {password}")
    
    def execute(self, query):
        # SQL injection vulnerable
        full_query = f"SELECT * FROM users WHERE name = '{query}'"
        return full_query
'''

print("üìù Sample Code for Review:")
print(sample_code)

### 11.2 - Review Using Multiple Nebius Models

In Claude Code, run:

```
/pal:codereview with nebius-qwen3 and nebius-deepseek-r1 Review this code for security issues, bugs, and performance problems

[paste the sample_code above]
```

**Expected Analysis:**

**From Qwen3:**
- **CRITICAL**: `eval()` usage in `process_user_data` ‚Äî arbitrary code execution
- **CRITICAL**: SQL injection in `DatabaseConnection.execute()`
- **HIGH**: Password logging in `DatabaseConnection.__init__`
- **MEDIUM**: No error handling in `calculate_average` (division by zero)
- **LOW**: Inefficient loop in `calculate_average` (use `sum()`)

**From DeepSeek-R1 (with chain-of-thought reasoning):**
- Analyzes attack vectors for `eval()` injection
- Demonstrates SQL injection exploit scenario
- Considers edge cases (empty list, None values)
- Suggests parameterized queries and input validation

**Then create a fix plan:**

```
/pal:planner with nebius-gpt-oss Based on the code review above, create a prioritized fix strategy
```

Claude combines both analyses into a single action plan.

### 11.3 - Simulate Model Responses

Let's simulate what each Nebius model might focus on:

In [None]:
# Simulate reviews from different models
reviews = {
    "Qwen3-235B": {
        "focus": "Comprehensive coverage",
        "findings": [
            "CRITICAL: eval() allows arbitrary code execution",
            "CRITICAL: SQL injection vulnerability",
            "HIGH: Password exposed in logs",
            "MEDIUM: No error handling for empty list",
            "LOW: Use sum() instead of manual loop"
        ]
    },
    "DeepSeek-R1": {
        "focus": "Deep security reasoning",
        "findings": [
            "CRITICAL: eval() - attacker could execute: eval('__import__(\"os\").system(\"rm -rf /\")')",
            "CRITICAL: SQL injection - input \"'; DROP TABLE users; --\" would be catastrophic",
            "HIGH: Password in plaintext memory could be scraped via debugging",
            "MEDIUM: calculate_average([]) causes ZeroDivisionError",
            "Recommendation: Use ast.literal_eval() or JSON parsing instead of eval()"
        ]
    },
    "GPT-OSS-120B": {
        "focus": "Balanced security + architecture",
        "findings": [
            "CRITICAL: Replace eval() with safer alternatives (json.loads, ast.literal_eval)",
            "CRITICAL: Use parameterized queries (e.g., cursor.execute(\"SELECT * WHERE name=%s\", (name,)))",
            "HIGH: Use logging.debug() instead of print(), mask sensitive data",
            "MEDIUM: Add input validation and empty list check",
            "Architecture note: Consider using ORM (SQLAlchemy) to prevent SQL injection"
        ]
    }
}

print("üîç Simulated Multi-Model Code Review\n")
print("=" * 70)

for model, review in reviews.items():
    print(f"\nü§ñ {model}")
    print(f"   Focus: {review['focus']}\n")
    for finding in review['findings']:
        print(f"   ‚Ä¢ {finding}")
    print()

print("=" * 70)
print("\nüí° Consensus: All models agree on critical security issues")
print("   Priority: Fix eval() and SQL injection immediately")
print("   Next: Address password logging and error handling")

## 12 - Model Selection Strategy

### 12.1 - When to Use Each Model

| Scenario | Recommended Model | Why |
|----------|-------------------|-----|
| **Quick questions** | `nebius-llama` | Fast, efficient, good for simple tasks |
| **Code generation** | `nebius-qwen3` | Excellent coding capabilities, 262K context |
| **Deep reasoning** | `nebius-deepseek-r1` | Chain-of-thought, visible reasoning |
| **Extended thinking** | `nebius-qwen3-thinking` | Dedicated thinking mode |
| **Security audit** | `nebius-deepseek-r1` | Strong at analyzing attack vectors |
| **Architecture review** | `nebius-gpt-oss` | Balanced, OpenAI-style reasoning |
| **Vision tasks** | `nebius-glm` or `nebius-qwen-vl` | Multimodal capabilities |
| **Consensus** | Mix of 3 models | Diverse perspectives |

### 12.2 - Cost Optimization

- **Tier 1 (Expensive, High Intelligence)**: Qwen3-235B, GPT-OSS-120B
- **Tier 2 (Moderate)**: DeepSeek-V3.2, Llama-3.3-70B
- **Tier 3 (Cost-effective)**: Llama-3.1-8B, Gemma-27B

**Strategy**: Start with Tier 2-3 for exploration, escalate to Tier 1 for production.

## 13 - Conversation Continuity Example

One of PAL's key features is maintaining context across models. Each tool call builds on the previous conversation thread.

### Step-by-step in Claude Code:

**Step 1**: Start a conversation with one model:
```
/pal:chat with nebius-qwen3 Explain microservices architecture patterns for e-commerce
```

**Step 2**: Continue with a different model (it sees Step 1's full context):
```
/pal:chat with nebius-deepseek-r1 Based on the previous explanation, what are the most common pitfalls?
```

**Step 3**: Get consensus (both models see the full thread):
```
/pal:consensus with nebius-gpt-oss and nebius-llama Given the discussion so far, what are the top 3 best practices?
```

**This is impossible with direct API calls** ‚Äî PAL maintains conversation state across models, so DeepSeek-R1 in Step 2 knows exactly what Qwen3 said in Step 1.

## 14 - Debugging Workflow Example

### 14.1 - Systematic Debug with PAL

```
/pal:debug with nebius-deepseek-r1 This function throws a race condition under high concurrency ‚Äî investigate root cause

[paste error trace and relevant code]
```

DeepSeek-R1 performs:
1. **Hypothesis generation** (confidence: exploring)
2. **Code analysis** (confidence: low ‚Üí medium)
3. **Root cause identification** (confidence: high)
4. **Solution proposal** (confidence: certain)

### 14.2 - Validate the Fix

```
/pal:chat with nebius-qwen3 Review this proposed fix for the race condition ‚Äî check correctness and edge cases

[paste proposed solution]
```

### 14.3 - Pre-Commit Check

```
/pal:precommit with nebius-gpt-oss Validate that the race condition fix doesn't introduce regressions
```

## 15 - Best Practices Summary

### ‚úÖ Do's

1. **Use auto mode** for general tasks ‚Äî let PAL choose the best model
2. **Leverage consensus** for important decisions (3 models minimum)
3. **Combine thinking models** (DeepSeek-R1 + Qwen3-Thinking) for complex reasoning
4. **Use codereview workflow** before major commits
5. **Validate with precommit** to catch regressions
6. **Start with smaller models** for exploration, scale up for production
7. **Maintain conversation continuity** ‚Äî reference previous exchanges

### ‚ùå Don'ts

1. **Don't use expensive models** for simple tasks (use `nebius-llama-8b` or `nebius-gemma`)
2. **Don't skip validation** ‚Äî always get a second opinion on critical code
3. **Don't ignore consensus** ‚Äî if models disagree, investigate why
4. **Don't use vision models** for text-only tasks (wastes resources)
5. **Don't exceed context limits** ‚Äî break large files into chunks
6. **Don't forget to configure** `NEBIUS_ALLOWED_MODELS` for cost control

### üéØ Workflow Patterns (Slash Commands)

**Pattern 1: Quick Task**
```
/pal:chat with nebius-llama What does this error mean?
```

**Pattern 2: Code Review**
```
/pal:codereview with nebius-qwen3 Review src/auth/ for issues
/pal:precommit with nebius-qwen3 Validate my changes
```

**Pattern 3: Complex Decision**
```
/pal:consensus with nebius-qwen3, nebius-deepseek-r1, nebius-gpt-oss Should we use Kafka or RabbitMQ?
/pal:thinkdeep with nebius-deepseek-r1 Dig deeper into the latency tradeoffs
/pal:planner with nebius-gpt-oss Create an implementation plan based on the consensus
```

**Pattern 4: Full Development Cycle**
```
/pal:planner with nebius-qwen3 Design the new payment module
/pal:chat with nebius-qwen3 Help implement the PaymentProcessor class
/pal:codereview with nebius-qwen3 and nebius-deepseek-r1 Review the payment module
/pal:debug with nebius-deepseek-r1 Investigate the timeout in processPayment()
/pal:precommit with nebius-gpt-oss Final check before merge
```

## 16 - Troubleshooting

### Issue: PAL tools not available in Claude Code

**Solution:**
```bash
# Check Claude Code settings
cat ~/.claude/settings.json

# Restart Claude Code
# Test with: "use pal to list models"
```

### Issue: "Model not found" error

**Solution:**
```bash
# Check allowed models in .env
cat ~/Desktop/pal-mcp-server/.env | grep NEBIUS_ALLOWED_MODELS

# Leave empty to enable all models
NEBIUS_ALLOWED_MODELS=
```

### Issue: Timeout errors

**Solution:**
```bash
# Increase timeouts in .env
CUSTOM_READ_TIMEOUT=1800.0
CUSTOM_WRITE_TIMEOUT=1800.0
```

### Issue: API key not working

**Solution:**
```bash
# Verify API key format (no quotes, no spaces)
cat ~/Desktop/pal-mcp-server/.env | grep NEBIUS_API_KEY

# Test direct API access
curl -H "Authorization: Bearer $NEBIUS_API_KEY" \
     https://api.studio.nebius.com/v1/models
```

## 17 - Next Steps

### Explore More Features

1. **Try other PAL tools:**
   - `analyze` - Codebase architecture analysis
   - `refactor` - Intelligent refactoring suggestions
   - `testgen` - Test generation
   - `secaudit` - Security audits

2. **Experiment with model combinations:**
   - Find your optimal 3-model consensus set
   - Test different models for different task types

3. **Build custom workflows:**
   - Create project-specific review checklists
   - Define team conventions for model selection

### Additional Resources

- [PAL MCP Documentation](https://github.com/BeehiveInnovations/pal-mcp-server/blob/main/docs/index.md)
- [Nebius Model Catalog](https://tokenfactory.nebius.com/)
- [Claude Code Guide](https://claude.ai/code)
- [Token Factory Cookbook](https://github.com/nebius/token-factory-cookbook)

### Community

- [PAL MCP Issues](https://github.com/BeehiveInnovations/pal-mcp-server/issues)
- [Token Factory Discord](https://discord.gg/nebius)

---

**Happy multi-model orchestration! üöÄ**