## **Nugen Intelligence**
<img src="https://nugen.in/logo.png" alt="Nugen Logo" width="200"/>

Domain-aligned foundational models at industry leading speeds and zero-data retention! To learn more, visit [Nugen](https://docs.nugen.in/introduction)


## **The Complete Guide to LLM Routing: Concepts and Implementation**

## **Part 1: Understanding LLM Routing - Core Concepts**

**What is LLM Routing?**

Think of LLM routing like a skilled receptionist at a hospital. When a patient comes in, the receptionist needs to decide whether to send them to the general physician, the cardiologist, or another specialist. Similarly, LLM routing is the process of directing different types of questions or tasks to the most appropriate AI model.

**Why Do We Need LLM Routing?**

Imagine you have several AI models, each trained for different purposes:

One might be excellent at creative writing
Another might specialize in legal matters
A third might excel at programming tasks

Just as you wouldn't want a dermatologist performing heart surgery, you wouldn't want a creative writing model handling legal questions. LLM routing helps ensure each query gets handled by the most qualified model.

## **Part 2: Code Implementation Explained**

**Imports and Setup**

In [1]:
from pydantic import BaseModel, Field, ValidationError
from typing import Literal, Dict
import requests
import json

### **1. Basic LLM Communication (run_nugen_llm)**

In [2]:
def run_nugen_llm(user_prompt: str, model: str, api_token: str, system_prompt: str = None):
    """
    Makes a request to the Nugen API endpoint.
    """
    url = "https://api.nugen.in/inference/completions"
    
    # Combine system prompt and user prompt if system prompt exists
    final_prompt = f"{system_prompt}\n\n{user_prompt}" if system_prompt else user_prompt
    
    payload = {
        "max_tokens": "1000",
        "model": model,
        "prompt": final_prompt,
        "temperature": 0.1
    }
    
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(url, json=payload, headers=headers)
        response.raise_for_status()
        response_data = response.json()
        
        # Extract text from Nugen response format
        if 'choices' in response_data and len(response_data['choices']) > 0:
            return response_data['choices'][0]['text']
        return "Error: No valid response content"
        
    except Exception as e:
        print(f"Error in run_nugen_llm: {str(e)}")
        return f"Error: {str(e)}"


This function handles the basic communication with the AI model. Think of it as the telephone system that lets you talk to any doctor in our hospital analogy. It:

    - Combines any system instructions with the user's question
    - Sends the request to the AI service
    - Handles the response and any potential errors

**Key Features:**

- Temperature set to 0.1 for consistent responses
- Max tokens limited to 1000 for controlled response length
- Basic error handling with informative messages

### **2. JSON Response Handling (JSON_nugen_llm)**

In [3]:
def JSON_nugen_llm(user_prompt: str, schema: BaseModel, api_token: str, system_prompt: str = None):
    """
    Makes a request to Nugen API expecting JSON response.
    """
    try:
        # Add explicit instruction for JSON response
        json_instruction = """
        IMPORTANT: Your response must be a valid JSON object matching this schema:
        {schema}
        
        Respond ONLY with the JSON object, no other text.
        """.format(schema=schema.model_json_schema())
        
        # Combine prompts
        final_prompt = f"{json_instruction}\n\n{system_prompt}\n\n{user_prompt}" if system_prompt else f"{json_instruction}\n\n{user_prompt}"
        
        # Use nugen-flash-instruct for routing decisions
        response = run_nugen_llm(
            user_prompt=final_prompt,
            model="nugen-flash-instruct",
            api_token=api_token
        )
        
        # Extract JSON from response
        # First try to parse as is
        try:
            return json.loads(response)
        except json.JSONDecodeError:
            # If that fails, try to find JSON-like structure in the text
            start_idx = response.find('{')
            end_idx = response.rfind('}') + 1
            if start_idx != -1 and end_idx != 0:
                json_content = response[start_idx:end_idx]
                return json.loads(json_content)
            raise ValueError("No valid JSON found in response")
            
    except ValidationError as e:
        error_message = f"Failed to parse JSON: {e}"
        print(error_message)
        # Return a default route to prevent crashes
        return {
            "route": list(model_routes.keys())[0],
            "reason": "Error in processing route selection"
        }
    except Exception as e:
        print(f"Error in JSON_nugen_llm: {str(e)}")
        return {
            "route": list(model_routes.keys())[0],
            "reason": "Error in processing route selection"
        }

This function is like a translator that ensures all communication follows a specific format. It:

- Tells the AI exactly what format to use (through schema)
- Tries to find valid JSON even if the response isn't perfect
- Provides fallback options if things go wrong

Important Features:

- Uses Pydantic for schema validation
- Includes multiple JSON extraction attempts


### **3. The Router Workflow (router_workflow)**

In [4]:
def router_workflow(input_query: str, routes: Dict[str, str], api_token: str) -> str:
    """
    Router workflow for Nugen API.
    """
    ROUTER_PROMPT = """Given a user prompt/query: {user_query}, select the best option out of the following routes:
    {routes}. Answer only in JSON format."""
    
    # Create schema for route selection
    class Schema(BaseModel):
        route: str = Field(..., description="The selected model route")
        reason: str = Field(
            ...,
            description="Short one-liner explanation why this route was selected for the task in the prompt/query."
        )
        
        class Config:
            extra = "forbid"
    
    # Call LLM to select route
    selected_route = JSON_nugen_llm(
        user_prompt=ROUTER_PROMPT.format(user_query=input_query, routes=routes),
        schema=Schema,
        api_token=api_token
    )
    
    print(f"Selected route: {selected_route['route']}\nReason: {selected_route['reason']}\n")
    
    # Use selected model for the actual response
    response = run_nugen_llm(
        user_prompt=input_query,
        model=selected_route["route"],
        api_token=api_token
    )
    
    print(f"Response: {response}\n")
    return response


This is the main orchestrator, like our hospital's intake system. 

- Takes in the user's query
- Consults the routing model to decide which specialist (model) to use
- Sends the query to the chosen model
- Returns the response to the user

### **The Routing Process Step by Step**

In [5]:
if __name__ == "__main__":
    # Model routes
    model_routes = {
        "nugen-flash-instruct": "General purpose model for various tasks, best for creative and general queries",
        "llama-v3p2-3b-instruct": "Specialized model for this is more specialized for legal questions and questions related to fraud, acts passed by courts, judgements, statutes, situation where legal actions are involved"
    }

    # Example prompts
    prompt_list = [
        "Write a program to check if a number is prime.",
        "I had an accident what are my legal rights?",
    ]

    # Your API token
    api_token =  "nugen-CnStpNdbBczk3d8SZMhmnw"

    # Process each prompt
    for i, prompt in enumerate(prompt_list, 1):
        print(f"\nTask {i}: {prompt}")
        print("=" * 40)
        router_workflow(prompt, model_routes, api_token)


Task 1: Write a program to check if a number is prime.
Selected route: nugen-flash-instruct
Reason: The prompt is asking for a general programming task, which is more suitable for a general-purpose model.

Response:  A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.

## Step 1: Define the Problem and the Approach
We need to write a program that checks if a given number is prime. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. We will approach this by creating a function that takes an integer as input and returns a boolean value indicating whether the number is prime or not.

## Step 2: Plan the Algorithm
The algorithm will work as follows:
- If the number is less than or equal to 1, it is not prime.
- Check if the number has any divisors other than 1 and itself by iterating from 2 to the square root of the number.
- If any divisor is found, the number is not prime.
- If no d

**Writing Effective Routes**

When defining model routes:

Be specific about each model's strengths
Include clear examples of appropriate use cases
Define boundaries between models clearly

**Handling Different Query Types**

The system automatically handles:

Programming questions → general purpose model
Legal queries → legal specialist model
Unclear cases → falls back to general model

**Conclusion**

LLM routing is a powerful technique for getting the most out of multiple AI models. By understanding both the conceptual framework and the technical implementation, you can create robust systems that direct queries to the most appropriate model for the task at hand.