# Adding New LLM Models to LLMRouter

**Estimated Time:** 30 minutes  
**Level:** Advanced  
**Prerequisites:** 00_Quick_Start, 01_Installation_and_Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ulab-uiuc/LLMRouter/blob/main/tutorials/notebooks/09_Adding_New_LLM_Models.ipynb)

## Learning Objectives

By the end of this tutorial, you will:
- ‚úÖ Understand LLM candidates format
- ‚úÖ Add new models to the router pool
- ‚úÖ Generate model embeddings
- ‚úÖ Configure API endpoints
- ‚úÖ Test new models

---

In [None]:
# Setup
!git clone https://github.com/ulab-uiuc/LLMRouter.git
%cd LLMRouter
!pip install -e . -q

## 1. Understanding LLM Candidates Format

LLM candidates are defined in JSON format. Let's examine the structure.

In [None]:
import json

# Load existing LLM data
with open('data/example_data/llm_candidates/default_llm.json', 'r') as f:
    llm_data = json.load(f)

# Show one example
example_key = list(llm_data.keys())[0]
print(f"Example LLM: {example_key}")
print(json.dumps(llm_data[example_key], indent=2))

**Required Fields:**

```json
{
  "model_name": {                    // User-friendly name
    "model": "provider/model-id",    // API model identifier
    "size": "7B",                    // Model size (optional)
    "cost": 0.001,                   // Cost per 1K tokens (optional)
    "description": "...",            // Model description (for embeddings)
    "embedding": [0.1, 0.2, ...]     // Model embedding vector (optional)
  }
}
```

## 2. Adding a New Model - Method 1: Manual Entry

Let's add a new model to the existing pool.

In [None]:
# Create a copy of the LLM data
import copy
new_llm_data = copy.deepcopy(llm_data)

# Add a new model
new_llm_data["gpt-4-turbo"] = {
    "model": "openai/gpt-4-turbo",
    "size": "Unknown",
    "cost": 0.01,  # $0.01 per 1K tokens
    "description": "GPT-4 Turbo is OpenAI's most advanced model with 128K context window, optimized for speed and cost."
}

new_llm_data["claude-3-opus"] = {
    "model": "anthropic/claude-3-opus",
    "size": "Unknown",
    "cost": 0.015,
    "description": "Claude 3 Opus is Anthropic's most capable model with strong reasoning and analysis capabilities."
}

new_llm_data["llama-3-70b"] = {
    "model": "meta/llama-3-70b-instruct",
    "size": "70B",
    "cost": 0.0005,
    "description": "Llama 3 70B is Meta's open-source large language model with strong performance across tasks."
}

print(f"‚úÖ Added {len(new_llm_data) - len(llm_data)} new models")
print(f"Total models: {len(new_llm_data)}")

## 3. Generating Model Embeddings

Some routers (e.g., KNN, Graph) use model embeddings to understand model capabilities.

**Two approaches:**
1. **Use model descriptions** (recommended) - Generate from text descriptions
2. **Manual embeddings** - If you have pre-computed embeddings

In [None]:
# Method 1: Generate embeddings from descriptions
from llmrouter.utils import get_longformer_embedding

def add_embeddings_to_llm_data(llm_data):
    """Generate embeddings for LLMs based on their descriptions."""
    llm_data_with_embeddings = copy.deepcopy(llm_data)
    
    for model_name, model_info in llm_data_with_embeddings.items():
        # Skip if embedding already exists
        if 'embedding' in model_info:
            print(f"‚úì {model_name}: embedding exists")
            continue
        
        # Get description
        description = model_info.get('description', '')
        if not description:
            # Create basic description from available info
            description = f"{model_name} is a language model"
            if 'size' in model_info:
                description += f" with {model_info['size']} parameters"
        
        # Generate embedding
        print(f"üîÑ Generating embedding for {model_name}...")
        embedding = get_longformer_embedding(description)
        
        # Convert to list for JSON serialization
        model_info['embedding'] = embedding.tolist()
        print(f"‚úÖ {model_name}: embedding generated ({len(embedding)} dims)")
    
    return llm_data_with_embeddings

# Generate embeddings
new_llm_data_with_embeddings = add_embeddings_to_llm_data(new_llm_data)

## 4. Saving the Updated LLM Data

Save the new LLM configuration to a file.

In [None]:
# Save to new file
output_path = 'data/example_data/llm_candidates/my_custom_llm.json'

with open(output_path, 'w') as f:
    json.dump(new_llm_data_with_embeddings, f, indent=2)

print(f"‚úÖ Saved to: {output_path}")
print(f"Total models: {len(new_llm_data_with_embeddings)}")

## 5. Creating a Custom Configuration

Create a router configuration that uses your new LLM pool.

In [None]:
import yaml

# Create configuration
config = {
    'data_path': {
        'llm_data': 'data/example_data/llm_candidates/my_custom_llm.json',
        'query_data_test': 'data/example_data/query_data/default_query_test.jsonl',
        'routing_data_test': 'data/example_data/routing_data/default_routing_test_data.jsonl',
    },
    'metric': {
        'weights': {
            'performance': 1,
            'cost': 0,
            'llm_judge': 0,
        }
    },
    'api_endpoint': 'https://integrate.api.nvidia.com/v1',
}

# Save configuration
config_path = 'my_custom_config.yaml'
with open(config_path, 'w') as f:
    yaml.dump(config, f)

print(f"‚úÖ Created config: {config_path}")
!cat {config_path}

## 6. Testing with New Models

Let's test routing with the new LLM pool.

In [None]:
# Test with smallest_llm router (doesn't require training)
!llmrouter infer \
  --router smallest_llm \
  --config my_custom_config.yaml \
  --query "Explain machine learning" \
  --route-only \
  --verbose

## 7. API Integration

To actually use these models, you need to configure API access.

**Supported Providers (via LiteLLM):**
- OpenAI (GPT-4, GPT-3.5, etc.)
- Anthropic (Claude)
- Google (Gemini)
- Meta/HuggingFace (Llama, Mistral)
- NVIDIA NIM
- Custom OpenAI-compatible endpoints

In [None]:
import os
from google.colab import userdata

# Set API keys (use Colab secrets)
# You can also use multiple keys for load balancing

# Single key:
# os.environ['API_KEYS'] = 'your-api-key'

# Multiple keys (JSON format):
# os.environ['API_KEYS'] = '["key1", "key2", "key3"]'

# Example using Colab secrets:
try:
    api_key = userdata.get('NVIDIA_API_KEY')
    os.environ['API_KEYS'] = api_key
    print("‚úÖ API key loaded from Colab secrets")
except:
    print("‚ö†Ô∏è No API key found. Set NVIDIA_API_KEY in Colab secrets.")
    print("   Or manually: os.environ['API_KEYS'] = 'your-key'")

## 8. Real Inference with New Models

Now let's actually call the LLM APIs.

In [None]:
# Run real inference (will call API)
# WARNING: This will use API credits!

!llmrouter infer \
  --router smallest_llm \
  --config my_custom_config.yaml \
  --query "What is 2+2?" \
  --max-tokens 50 \
  --verbose

## 9. Advanced: Model Metadata

You can add custom metadata to help routers make better decisions.

In [None]:
# Example with extended metadata
advanced_llm_data = {
    "specialized-coder": {
        "model": "provider/code-model",
        "size": "13B",
        "cost": 0.002,
        "description": "Specialized in code generation and debugging",
        
        # Custom metadata
        "capabilities": ["code", "debugging", "algorithms"],
        "languages": ["python", "javascript", "java", "c++"],
        "context_length": 8192,
        "latency_ms": 500,
        "strengths": "code generation",
        "weaknesses": "creative writing",
    },
    "general-assistant": {
        "model": "provider/general-model",
        "size": "70B",
        "cost": 0.005,
        "description": "General-purpose assistant with broad knowledge",
        
        "capabilities": ["qa", "writing", "analysis", "reasoning"],
        "context_length": 32768,
        "latency_ms": 1200,
        "strengths": "reasoning and analysis",
        "weaknesses": "highly specialized tasks",
    }
}

# This metadata can be used by custom routers!
print(json.dumps(advanced_llm_data, indent=2))

## 10. Best Practices

### Model Selection Criteria

When adding models, consider:

1. **Diversity**: Include models with different strengths
   - Small fast models (e.g., 7B)
   - Large capable models (e.g., 70B)
   - Specialized models (code, math, etc.)

2. **Cost Range**: Mix of expensive and cheap models
   - Budget-friendly: < $0.001/1K tokens
   - Mid-range: $0.001-$0.01/1K tokens  
   - Premium: > $0.01/1K tokens

3. **Latency**: Balance speed and quality
   - Fast: < 500ms
   - Medium: 500-2000ms
   - Slow: > 2000ms

### Description Guidelines

Good descriptions help routers learn model characteristics:

```
‚úÖ Good: "GPT-4 Turbo excels at complex reasoning, code generation, 
         and detailed analysis. Best for tasks requiring deep understanding."

‚ùå Bad: "GPT-4 is a model."
```

Include:
- Main strengths
- Typical use cases
- Special capabilities
- Known limitations

## Summary

### What You Learned:
- ‚úÖ LLM candidates JSON format
- ‚úÖ Adding new models manually
- ‚úÖ Generating model embeddings
- ‚úÖ Creating custom configurations
- ‚úÖ API integration
- ‚úÖ Best practices for model selection

### Key Files Created:
1. `data/example_data/llm_candidates/my_custom_llm.json` - New LLM pool
2. `my_custom_config.yaml` - Configuration using new models

### Next Steps:
- **[10_Creating_Custom_Datasets.ipynb](10_Creating_Custom_Datasets.ipynb)** - Create training data
- **[03_Training_Single_Round_Routers.ipynb](03_Training_Single_Round_Routers.ipynb)** - Train with new models
- **[11_Advanced_Customization.ipynb](11_Advanced_Customization.ipynb)** - Advanced techniques