# LLM Routing

RedisVL provides an `LLMRouter` that uses semantic similarity to route queries to the most appropriate LLM model tier. Instead of sending every request to your most expensive model, the router matches incoming queries against reference phrases to determine the right level of model capability.

**The problem**: Applications default to using the most capable (and expensive) LLM for all queries, even when a simpler model would do just fine:
- *"Hello, how are you?"* does **not** need Claude Opus 4.5 ($5/M input tokens)
- *"Hello, how are you?"* is perfectly handled by GPT-4.1 Nano ($0.10/M input tokens)

The `LLMRouter` solves this by classifying queries into **model tiers** (e.g., simple, standard, expert) using Redis vector search over a set of reference phrases that define each tier's "semantic surface area."

This notebook walks through every aspect of the LLM Router:
1. Quick start with pretrained config
2. Routing queries across tiers
3. Defining custom tiers
4. Cost-optimized routing
5. Multi-match routing and aggregation methods
6. Dynamic tier management
7. Persistence and serialization
8. Async usage
9. LiteLLM integration

In [1]:
import os
import warnings

os.environ["TOKENIZERS_PARALLELISM"] = "false"
warnings.filterwarnings("ignore", message=".*IProgress.*")

## Quick Start with a Pretrained Config

The fastest way to get started is with the built-in `"default"` pretrained configuration. It ships with **3 model tiers** and **pre-computed embeddings** (from `sentence-transformers/all-mpnet-base-v2`), so it loads instantly without needing to embed anything.

The three tiers are grounded in [Bloom's Taxonomy](https://en.wikipedia.org/wiki/Bloom%27s_taxonomy) of cognitive complexity:

| Tier | Bloom's Level | Model | Cost (input) | Example Tasks |
|------|--------------|-------|-------------|---------------|
| **simple** | Remember / Understand | `openai/gpt-4.1-nano` | $0.10/M | Greetings, factual QA, format conversion |
| **standard** | Apply / Analyze | `anthropic/claude-sonnet-4-5` | $3.00/M | Code explanation, summarization, analysis |
| **expert** | Evaluate / Create | `anthropic/claude-opus-4-5` | $5.00/M | Research, system architecture, formal proofs |

In [2]:
from redisvl.extensions.llm_router import LLMRouter

router = LLMRouter.from_pretrained(
    "default",
    redis_url="redis://localhost:6379",
)

In [3]:
# Inspect what was loaded
print("Router name:", router.name)
print("Tier count:", len(router.tiers))
print()

for tier in router.tiers:
    print(f"--- {tier.name} ---")
    print(f"  Model:      {tier.model}")
    print(f"  References: {len(tier.references)} phrases")
    print(f"  Threshold:  {tier.distance_threshold}")
    print(f"  Cost (in):  ${tier.metadata.get('cost_per_1k_input', 'N/A')}/1k tokens")
    print(f"  Bloom's:    {tier.metadata.get('blooms_taxonomy', [])}")
    print()

Router name: llm-router-default
Tier count: 3

--- simple ---
  Model:      openai/gpt-4.1-nano
  References: 18 phrases
  Threshold:  0.5
  Cost (in):  $0.0001/1k tokens
  Bloom's:    ['Remember', 'Understand']

--- standard ---
  Model:      anthropic/claude-sonnet-4-5
  References: 18 phrases
  Threshold:  0.6
  Cost (in):  $0.003/1k tokens
  Bloom's:    ['Apply', 'Analyze']

--- expert ---
  Model:      anthropic/claude-opus-4-5
  References: 18 phrases
  Threshold:  0.7
  Cost (in):  $0.005/1k tokens
  Bloom's:    ['Evaluate', 'Create']



In [4]:
# View the underlying Redis index
!rvl index info -i llm-router-default



Index Information:
╭────────────────────────┬────────────────────────┬────────────────────────┬────────────────────────┬────────────────────────┬╮
│ Index Name             │ Storage Type           │ Prefixes               │ Index Options          │ Indexing               │
├────────────────────────┼────────────────────────┼────────────────────────┼────────────────────────┼────────────────────────┼┤
| llm-router-default     | HASH                   | ['llm-router-default'] | []                     | 0                      |
╰────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴╯
Index Fields:
╭─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬╮
│ Name            │ Attribute       │ Type            │ Field Option    │ Option Value    │ Field Option    │ Op

## Routing Queries

The `route()` method takes a text query, embeds it, and finds the best matching tier. It returns an `LLMRouteMatch` with the tier name, model string, distance, confidence, and any alternative matches.

In [5]:
# A simple greeting -> routes to the 'simple' tier
match = router.route("hi, how are you doing today?")
match

LLMRouteMatch(tier='simple', model='openai/gpt-4.1-nano', distance=0.382999330759, confidence=0.8085003346205, alternatives=[], metadata={'provider': 'openai', 'cost_per_1k_input': 0.0001, 'cost_per_1k_output': 0.0004, 'blooms_taxonomy': ['Remember', 'Understand']})

In [6]:
print(f"Tier:        {match.tier}")
print(f"Model:       {match.model}")
print(f"Distance:    {match.distance:.4f}")
print(f"Confidence:  {match.confidence:.4f}")
print(f"Alternatives: {match.alternatives}")

Tier:        simple
Model:       openai/gpt-4.1-nano
Distance:    0.3830
Confidence:  0.8085
Alternatives: []


The `distance` is the cosine distance (0-2, lower is closer). The `confidence` is derived as `1 - distance/2`, giving a 0-1 score where higher is better.

The `alternatives` field shows other tiers that also matched, along with their distances. This is useful for understanding how close the decision was.

Let's try more queries across the complexity spectrum:

In [7]:
# Simple tier queries
simple_queries = [
    "what is the capital of France?",
    "thanks for your help!",
    "translate hello to Spanish",
]

print("=== Simple Tier Queries ===")
for q in simple_queries:
    m = router.route(q)
    print(f"  '{q}'")
    print(f"    -> {m.tier} ({m.model}) distance={m.distance:.4f}")

=== Simple Tier Queries ===
  'what is the capital of France?'
    -> simple (openai/gpt-4.1-nano) distance=0.1417
  'thanks for your help!'
    -> simple (openai/gpt-4.1-nano) distance=0.1270
  'translate hello to Spanish'
    -> simple (openai/gpt-4.1-nano) distance=0.2537


In [8]:
# Standard tier queries
standard_queries = [
    "explain how garbage collection works in Java",
    "write unit tests for this Python class",
    "compare and contrast microservices vs monolith architectures",
]

print("=== Standard Tier Queries ===")
for q in standard_queries:
    m = router.route(q)
    print(f"  '{q}'")
    print(f"    -> {m.tier} ({m.model}) distance={m.distance:.4f}")

=== Standard Tier Queries ===
  'explain how garbage collection works in Java'
    -> standard (anthropic/claude-sonnet-4-5) distance=0.0000
  'write unit tests for this Python class'
    -> standard (anthropic/claude-sonnet-4-5) distance=0.4141


  'compare and contrast microservices vs monolith architectures'
    -> standard (anthropic/claude-sonnet-4-5) distance=0.1792


In [9]:
# Expert tier queries
expert_queries = [
    "architect a fault-tolerant distributed database replication system",
    "prove this mathematical theorem using formal methods",
    "design a novel algorithm for NP-hard graph partitioning",
]

print("=== Expert Tier Queries ===")
for q in expert_queries:
    m = router.route(q)
    print(f"  '{q}'")
    print(f"    -> {m.tier} ({m.model}) distance={m.distance:.4f}")

=== Expert Tier Queries ===
  'architect a fault-tolerant distributed database replication system'
    -> expert (anthropic/claude-opus-4-5) distance=0.2583
  'prove this mathematical theorem using formal methods'
    -> expert (anthropic/claude-opus-4-5) distance=0.2947
  'design a novel algorithm for NP-hard graph partitioning'
    -> expert (anthropic/claude-opus-4-5) distance=0.2461


In [10]:
# A query that doesn't match any tier returns an empty match
match = router.route("xyzzy plugh random gibberish 12345 asdf")
print(f"No match: tier={match.tier}, model={match.model}")
print(f"Bool check: {bool(match)}")

No match: tier=None, model=None
Bool check: False


In [11]:
# Clean up
router.delete()

## Defining Custom Tiers

For production use, you'll want to define tiers tailored to your application. Each `ModelTier` specifies:

- **`name`**: Unique identifier (e.g., `"simple"`, `"coding"`, `"research"`)
- **`model`**: A [LiteLLM-compatible](https://docs.litellm.ai/docs/providers) model string (e.g., `"anthropic/claude-sonnet-4-5"`)
- **`references`**: Example phrases that define this tier's semantic surface area. More references = better coverage.
- **`metadata`**: Arbitrary dict for costs, capabilities, provider info, etc.
- **`distance_threshold`**: Maximum cosine distance for matching (Redis COSINE: 0-2). Lower values require stricter matching.

The quality of your reference phrases is the most important factor in routing accuracy. They should be representative of the *kinds* of queries you expect for each tier.

In [12]:
from redisvl.extensions.llm_router import LLMRouter, ModelTier

tiers = [
    ModelTier(
        name="simple",
        model="openai/gpt-4.1-nano",
        references=[
            "hello",
            "hi there",
            "thanks",
            "goodbye",
            "what time is it?",
            "how are you?",
            "yes",
            "no",
            "ok thanks",
        ],
        metadata={
            "provider": "openai",
            "cost_per_1k_input": 0.0001,
            "cost_per_1k_output": 0.0004,
        },
        distance_threshold=0.5,
    ),
    ModelTier(
        name="reasoning",
        model="anthropic/claude-sonnet-4-5",
        references=[
            "analyze this code for bugs",
            "explain how neural networks learn",
            "compare and contrast these approaches",
            "write a detailed blog post about",
            "debug this issue step by step",
            "summarize this research paper",
            "write unit tests for this class",
            "refactor this code for readability",
        ],
        metadata={
            "provider": "anthropic",
            "cost_per_1k_input": 0.003,
            "cost_per_1k_output": 0.015,
        },
        distance_threshold=0.6,
    ),
    ModelTier(
        name="expert",
        model="anthropic/claude-opus-4-5",
        references=[
            "prove this mathematical theorem",
            "architect a distributed system for millions of users",
            "write a research paper analyzing",
            "review this legal contract for issues",
            "design a novel algorithm for",
            "create a comprehensive security audit report",
        ],
        metadata={
            "provider": "anthropic",
            "cost_per_1k_input": 0.005,
            "cost_per_1k_output": 0.025,
        },
        distance_threshold=0.7,
    ),
]

router = LLMRouter(
    name="custom-router",
    tiers=tiers,
    redis_url="redis://localhost:6379",
    overwrite=True,
)

In [13]:
print("Tier names:", router.tier_names)
print("Thresholds:", router.tier_thresholds)
print("Default tier:", router.default_tier)

Tier names: ['simple', 'reasoning', 'expert']
Thresholds: {'simple': 0.5, 'reasoning': 0.6, 'expert': 0.7}
Default tier: None


In [14]:
# Verify routing works
match = router.route("hello, how are you?")
print(f"'{match.tier}' -> {match.model}")

match = router.route("analyze this code for bugs and security issues")
print(f"'{match.tier}' -> {match.model}")

match = router.route("design a fault-tolerant consensus protocol")
print(f"'{match.tier}' -> {match.model}")

'simple' -> openai/gpt-4.1-nano
'reasoning' -> anthropic/claude-sonnet-4-5
'expert' -> anthropic/claude-opus-4-5


### Using Pre-Computed Vectors

If you've already embedded the query (e.g., from an upstream pipeline), you can pass the vector directly to avoid double-embedding:

In [15]:
vector = router.vectorizer.embed("hello")
match = router.route(vector=vector)
print(f"Pre-computed vector route: {match.tier} ({match.model})")

Pre-computed vector route: simple (openai/gpt-4.1-nano)


## Cost-Optimized Routing

When multiple tiers have similar semantic distances, cost optimization adds a penalty proportional to the model's cost. This biases the router toward cheaper tiers when the semantic match is close.

The formula is:
```
adjusted_distance = distance + (cost_per_1k_input * cost_weight)
```

The `cost_weight` parameter (0-1) controls how much cost influences the decision. Default is 0.1.

In [16]:
from redisvl.extensions.llm_router.schema import RoutingConfig

cost_router = LLMRouter(
    name="cost-router",
    tiers=tiers,
    routing_config=RoutingConfig(
        cost_optimization=True,
        cost_weight=0.3,  # Higher weight = stronger cost preference
    ),
    redis_url="redis://localhost:6379",
    overwrite=True,
)

# Compare results with and without cost optimization
query = "help me understand this code"

match_default = router.route(query)
match_cost = cost_router.route(query)

print(f"Query: '{query}'")
print(f"  Default routing:       {match_default.tier} (distance={match_default.distance:.4f})")
print(f"  Cost-optimized routing: {match_cost.tier} (distance={match_cost.distance:.4f})")

cost_router.delete()

Query: 'help me understand this code'
  Default routing:       reasoning (distance=0.5674)
  Cost-optimized routing: reasoning (distance=0.5674)


## Multi-Match Routing (`route_many`)

Use `route_many()` to get multiple tier matches, ordered by distance. This is useful when you want to see how a query scores across all tiers, or implement fallback logic.

In [17]:
matches = router.route_many("explain machine learning concepts in detail", max_k=3)
matches

[LLMRouteMatch(tier='reasoning', model='anthropic/claude-sonnet-4-5', distance=0.524655163288, confidence=0.737672418356, alternatives=[], metadata={'provider': 'anthropic', 'cost_per_1k_input': 0.003, 'cost_per_1k_output': 0.015})]

In [18]:
for i, m in enumerate(matches):
    print(f"  #{i+1}: {m.tier} (distance={m.distance:.4f}, model={m.model})")

  #1: reasoning (distance=0.5247, model=anthropic/claude-sonnet-4-5)


### Aggregation Methods

Each tier may have multiple reference phrases. The router aggregates distances across all matching references within a tier using one of three methods:

- **`avg`** (default): Average distance across all matching references
- **`min`**: Minimum distance (closest single reference match)
- **`sum`**: Sum of all distances

The `min` method is useful when you want a single strong match to be decisive:

In [19]:
from redisvl.extensions.llm_router.schema import DistanceAggregationMethod

query = "analyze this code and find potential bugs"

# Default: avg aggregation
matches_avg = router.route_many(query, max_k=3, aggregation_method=DistanceAggregationMethod.avg)
# Min aggregation
matches_min = router.route_many(query, max_k=3, aggregation_method=DistanceAggregationMethod.min)

print("AVG aggregation:")
for m in matches_avg:
    print(f"  {m.tier}: distance={m.distance:.4f}")

print("\nMIN aggregation:")
for m in matches_min:
    print(f"  {m.tier}: distance={m.distance:.4f}")

AVG aggregation:
  reasoning: distance=0.4042
  expert: distance=0.6231

MIN aggregation:
  reasoning: distance=0.0758
  expert: distance=0.5718


Note the different distances: with `min`, the distance reflects the single closest reference, while `avg` averages across all matching references in each tier.

## Dynamic Tier Management

Tiers can be added, removed, and updated at runtime without recreating the router.

### Add a new tier

In [20]:
local_tier = ModelTier(
    name="local",
    model="ollama/llama3.2",
    references=["ok", "sure", "yes", "no", "got it"],
    metadata={"provider": "ollama", "cost_per_1k_input": 0},
    distance_threshold=0.3,
)
router.add_tier(local_tier)
print("Tiers after add:", router.tier_names)

Tiers after add: ['simple', 'reasoning', 'expert', 'local']


### Add references to an existing tier

If a tier's semantic coverage is too narrow, you can expand it by adding more reference phrases:

In [21]:
router.add_tier_references(
    tier_name="simple",
    references=["howdy partner", "greetings friend", "hey what's up"]
)

tier = router.get_tier("simple")
print(f"Simple tier now has {len(tier.references)} references:")
for ref in tier.references:
    print(f"  - {ref}")

Simple tier now has 12 references:
  - hello
  - hi there
  - thanks
  - goodbye
  - what time is it?
  - how are you?
  - yes
  - no
  - ok thanks
  - howdy partner
  - greetings friend
  - hey what's up


### Update a tier's distance threshold

You can tune the strictness of matching per tier. Lower thresholds require closer matches:

In [22]:
print("Before:", router.tier_thresholds)

router.update_tier_threshold("simple", 0.4)  # Stricter matching

print("After:", router.tier_thresholds)

Before: {'simple': 0.5, 'reasoning': 0.6, 'expert': 0.7, 'local': 0.3}
After: {'simple': 0.4, 'reasoning': 0.6, 'expert': 0.7, 'local': 0.3}


### Remove a tier

In [23]:
router.remove_tier("local")
print("Tiers after remove:", router.tier_names)

# Reset the simple threshold back to 0.5 for the rest of the demo
router.update_tier_threshold("simple", 0.5)
print("Thresholds:", router.tier_thresholds)

Tiers after remove: ['simple', 'reasoning', 'expert']
Thresholds: {'simple': 0.5, 'reasoning': 0.6, 'expert': 0.7}


### Retrieve a tier by name

In [24]:
tier = router.get_tier("reasoning")
print(f"Tier: {tier.name}")
print(f"Model: {tier.model}")
print(f"Threshold: {tier.distance_threshold}")
print(f"References: {tier.references}")

# Non-existent tier returns None
print(f"\nNon-existent: {router.get_tier('nonexistent')}")

Tier: reasoning
Model: anthropic/claude-sonnet-4-5
Threshold: 0.6
References: ['analyze this code for bugs', 'explain how neural networks learn', 'compare and contrast these approaches', 'write a detailed blog post about', 'debug this issue step by step', 'summarize this research paper', 'write unit tests for this class', 'refactor this code for readability']

Non-existent: None


## Persistence and Serialization

Routers can be serialized and restored in several formats.

### Dictionary round-trip

In [25]:
router_dict = router.to_dict()
router_dict

{'name': 'custom-router',
 'tiers': [{'name': 'simple',
   'model': 'openai/gpt-4.1-nano',
   'references': ['hello',
    'hi there',
    'thanks',
    'goodbye',
    'what time is it?',
    'how are you?',
    'yes',
    'no',
    'ok thanks',
    'howdy partner',
    'greetings friend',
    "hey what's up"],
   'metadata': {'provider': 'openai',
    'cost_per_1k_input': 0.0001,
    'cost_per_1k_output': 0.0004},
   'distance_threshold': 0.5},
  {'name': 'reasoning',
   'model': 'anthropic/claude-sonnet-4-5',
   'references': ['analyze this code for bugs',
    'explain how neural networks learn',
    'compare and contrast these approaches',
    'write a detailed blog post about',
    'debug this issue step by step',
    'summarize this research paper',
    'write unit tests for this class',
    'refactor this code for readability'],
   'metadata': {'provider': 'anthropic',
    'cost_per_1k_input': 0.003,
    'cost_per_1k_output': 0.015},
   'distance_threshold': 0.6},
  {'name': 'expe

In [26]:
# Restore from dict (reconnects to same Redis index since name matches)
router_from_dict = LLMRouter.from_dict(
    router_dict,
    redis_url="redis://localhost:6379",
)

assert router_from_dict.to_dict() == router.to_dict()
print("Dict round-trip OK")

Dict round-trip OK


### YAML serialization

In [27]:
router.to_yaml("llm_router.yaml", overwrite=True)

router_from_yaml = LLMRouter.from_yaml(
    "llm_router.yaml",
    redis_url="redis://localhost:6379",
)

assert router_from_yaml.to_dict() == router.to_dict()
print("YAML round-trip OK")

YAML round-trip OK


### Reconnect to an existing router

If the router's Redis index still exists, you can reconnect without needing the original config. The router config is persisted in Redis alongside the index:

In [28]:
router_reconnected = LLMRouter.from_existing(
    name="custom-router",
    redis_url="redis://localhost:6379",
)

print(f"Reconnected to '{router_reconnected.name}' with tiers: {router_reconnected.tier_names}")

# Routing still works
match = router_reconnected.route("hi there, how are you?")
print(f"Route test: {match.tier} ({match.model})")

Reconnected to 'custom-router' with tiers: ['simple', 'reasoning', 'expert']
Route test: simple (openai/gpt-4.1-nano)


### Export/Import with pre-computed embeddings

For sharing router configs across environments (or loading without an embedding model), you can export with pre-computed vectors:

In [29]:
# Export: embeds all references and saves vectors alongside text
router.export_with_embeddings("my_router_pretrained.json")
print("Exported with embeddings")

Exported with embeddings


In [30]:
# Peek at the structure
import json

with open("my_router_pretrained.json") as f:
    data = json.load(f)

print(f"Config name: {data['name']}")
print(f"Vectorizer: {data['vectorizer']}")
print(f"Tiers: {len(data['tiers'])}")
print(f"First reference vector length: {len(data['tiers'][0]['references'][0]['vector'])}")

Config name: custom-router
Vectorizer: {'type': 'hf', 'model': 'sentence-transformers/all-mpnet-base-v2'}
Tiers: 3
First reference vector length: 768


In [31]:
# Import: loads pre-computed vectors directly, no embedding needed
router_imported = LLMRouter.from_pretrained(
    "my_router_pretrained.json",
    redis_url="redis://localhost:6379",
)

match = router_imported.route("hi there, how are you?")
print(f"Imported router route: {match.tier} ({match.model})")

# Note: router_imported shares the same Redis index as `router`
# (same name in the exported config), so we don't delete it separately.

Imported router route: simple (openai/gpt-4.1-nano)


## Async Usage

The `AsyncLLMRouter` provides the same functionality using async I/O. Since Python's `__init__` can't be async, use the `create()` classmethod factory to instantiate, or `from_pretrained()` / `from_existing()` which are also async.

In [32]:
import logging
logging.getLogger("redisvl.utils.vectorize.base").setLevel(logging.ERROR)

from redisvl.extensions.llm_router import AsyncLLMRouter, ModelTier

# Create from pretrained (async)
async_router = await AsyncLLMRouter.from_pretrained(
    "default",
    redis_url="redis://localhost:6379",
)

print(f"Async router tiers: {async_router.tier_names}")

Async router tiers: ['simple', 'standard', 'expert']


In [33]:
# Route queries (async)
match = await async_router.route("hi, how are you?")
print(f"Simple: {match.tier} ({match.model})")

match = await async_router.route("explain how neural networks learn")
print(f"Standard: {match.tier} ({match.model})")

match = await async_router.route("architect a fault-tolerant distributed system")
print(f"Expert: {match.tier} ({match.model})")

Simple: simple (openai/gpt-4.1-nano)
Standard: standard (anthropic/claude-sonnet-4-5)
Expert: expert (anthropic/claude-opus-4-5)


In [34]:
# Route many (async)
matches = await async_router.route_many("summarize this research paper", max_k=3)
for m in matches:
    print(f"  {m.tier}: distance={m.distance:.4f}")

  standard: distance=0.5297
  expert: distance=0.6355


In [35]:
# Or create with custom tiers (async)
custom_async = await AsyncLLMRouter.create(
    name="async-custom",
    tiers=[
        ModelTier(
            name="fast",
            model="openai/gpt-4o-mini",
            references=["hello", "thanks", "what is"],
            distance_threshold=0.5,
        ),
        ModelTier(
            name="smart",
            model="openai/gpt-4o",
            references=["analyze this", "explain how", "compare these"],
            distance_threshold=0.6,
        ),
    ],
    redis_url="redis://localhost:6379",
    overwrite=True,
)

match = await custom_async.route("hi there!")
print(f"Custom async: {match.tier} ({match.model})")

await custom_async.delete()

Custom async: fast (openai/gpt-4o-mini)


In [36]:
await async_router.delete()

## LiteLLM Integration

The router returns LiteLLM-compatible model strings, making integration straightforward. Here's a typical pattern:

```python
from litellm import completion
from redisvl.extensions.llm_router import LLMRouter

router = LLMRouter.from_pretrained("default", redis_url="redis://localhost:6379")

def smart_completion(query: str, **kwargs):
    """Route to the best model, then call it."""
    match = router.route(query)
    
    if not match:
        # Fallback to a default model if no tier matched
        model = "anthropic/claude-sonnet-4-5"
    else:
        model = match.model
        print(f"Routed to {match.tier} tier ({model}), confidence={match.confidence:.2f}")
    
    return completion(
        model=model,
        messages=[{"role": "user", "content": query}],
        **kwargs,
    )

# Simple query -> GPT-4.1 Nano ($0.10/M)
response = smart_completion("What is 2 + 2?")

# Complex query -> Opus 4.5 ($5/M)
response = smart_completion("Design a distributed consensus algorithm with Byzantine fault tolerance")
```

For async applications:

```python
from litellm import acompletion
from redisvl.extensions.llm_router import AsyncLLMRouter

router = await AsyncLLMRouter.from_pretrained("default", redis_url="redis://localhost:6379")

async def smart_completion(query: str, **kwargs):
    match = await router.route(query)
    model = match.model if match else "anthropic/claude-sonnet-4-5"
    return await acompletion(
        model=model,
        messages=[{"role": "user", "content": query}],
        **kwargs,
    )
```

## Cleanup

In [37]:
router.delete()

In [38]:
# Remove temp files
import os
for f in ["llm_router.yaml", "my_router_pretrained.json"]:
    if os.path.exists(f):
        os.remove(f)