# Threshold Optimization

After setting up `SemanticRouter` or `SemanticCache` it's best to tune the `distance_threshold` to get the most performance out of your system. RedisVL provides helper classes to make this light weight optimization easy.

> **Note:** Threshold optimization relies on `python > 3.9.`

# CacheThresholdOptimizer

Let's say you setup the following semantic cache with a distance_threshold of `X` and store the entries:

- prompt: `what is the capital of france?` response: `paris`
- prompt: `what is the capital of morocco?` response: `rabat`

In [1]:
from redisvl.extensions.cache.llm import SemanticCache
from redisvl.utils.vectorize import HFTextVectorizer

sem_cache = SemanticCache(
    name="sem_cache",                                       # underlying search index name
    redis_url="redis://localhost:6379",                     # redis connection url string
    distance_threshold=0.5,                                 # semantic cache distance threshold
    vectorizer=HFTextVectorizer("redis/langcache-embed-v1") # embedding model
)

paris_key = sem_cache.store(prompt="what is the capital of france?", response="paris")
rabat_key = sem_cache.store(prompt="what is the capital of morocco?", response="rabat")


  from .autonotebook import tqdm as notebook_tqdm


16:16:11 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
16:16:11 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: redis/langcache-embed-v1


Batches:   0%|          | 0/1 [00:00<?, ?it/s]Compiling the model with `torch.compile` and using a `torch.mps` device is not supported. Falling back to non-compiled mode.
Batches: 100%|██████████| 1/1 [00:00<00:00,  3.38it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.02it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 25.04it/s]


This works well but we want to make sure the cache only applies for the appropriate questions. If we test the cache with a question we don't want a response to we see that the current distance_threshold is too high. 

In [2]:
sem_cache.check("what's the capital of britain?")

Batches: 100%|██████████| 1/1 [00:00<00:00,  1.24it/s]


[{'entry_id': 'c990cc06e5e77570e5f03360426d2b7f947cbb5a67daa8af8164bfe0b3e24fe3',
  'prompt': 'what is the capital of france?',
  'response': 'paris',
  'vector_distance': 0.335606634617,
  'inserted_at': 1746051375.81,
  'updated_at': 1746051375.81,
  'key': 'sem_cache:c990cc06e5e77570e5f03360426d2b7f947cbb5a67daa8af8164bfe0b3e24fe3'}]

### Define test_data and optimize

With the `CacheThresholdOptimizer` you can quickly tune the distance threshold by providing some test data in the form:

```json
[
    {
        "query": "What's the capital of Britain?",
        "query_match": ""
    },
    {
        "query": "What's the capital of France??",
        "query_match": paris_key
    },
    {
        "query": "What's the capital city of Morocco?",
        "query_match": rabat_key
    },
]
```

The threshold optimizer will then efficiently execute and score different threshold against the what is currently populated in your cache and automatically update the threshold of the cache to the best setting

In [3]:
from redisvl.utils.optimize import CacheThresholdOptimizer

test_data = [
    {
        "query": "What's the capital of Britain?",
        "query_match": ""
    },
    {
        "query": "What's the capital of France??",
        "query_match": paris_key
    },
    {
        "query": "What's the capital city of Morocco?",
        "query_match": rabat_key
    },
]

print(f"Distance threshold before: {sem_cache.distance_threshold} \n")
optimizer = CacheThresholdOptimizer(sem_cache, test_data)
optimizer.optimize()
print(f"Distance threshold after: {sem_cache.distance_threshold} \n")

Distance threshold before: 0.5 



Batches: 100%|██████████| 1/1 [00:00<00:00,  1.09it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 23.17it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 24.43it/s]
  scores[i] = _f1(qrels[i], run[i], k, rel_lvl)


Distance threshold after: 0.10372881355932204 



We can also see that we no longer match on the incorrect example:

In [4]:
sem_cache.check("what's the capital of britain?")

Batches: 100%|██████████| 1/1 [00:00<00:00, 12.39it/s]


[]

But still match on highly relevant prompts:

In [5]:
sem_cache.check("what's the capital city of france?")

Batches: 100%|██████████| 1/1 [00:00<00:00, 25.92it/s]


[{'entry_id': 'c990cc06e5e77570e5f03360426d2b7f947cbb5a67daa8af8164bfe0b3e24fe3',
  'prompt': 'what is the capital of france?',
  'response': 'paris',
  'vector_distance': 0.043138384819,
  'inserted_at': 1746051375.81,
  'updated_at': 1746051375.81,
  'key': 'sem_cache:c990cc06e5e77570e5f03360426d2b7f947cbb5a67daa8af8164bfe0b3e24fe3'}]

# RouterThresholdOptimizer

Very similar to the caching case, you can optimize your router.

### Define the routes

In [6]:
from redisvl.extensions.router import Route

routes = [
        Route(
            name="greeting",
            references=["hello", "hi"],
            metadata={"type": "greeting"},
            distance_threshold=0.5,
        ),
        Route(
            name="farewell",
            references=["bye", "goodbye"],
            metadata={"type": "farewell"},
            distance_threshold=0.5,
        ),
    ]

### Initialize the SemanticRouter

In [7]:
import os
from redisvl.extensions.router import SemanticRouter
from redisvl.utils.vectorize import HFTextVectorizer

os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Initialize the SemanticRouter
router = SemanticRouter(
    name="greeting-router",
    vectorizer=HFTextVectorizer(),
    routes=routes,
    redis_url="redis://localhost:6379",
    overwrite=True # Blow away any other routing index with this name
)

16:16:41 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps
16:16:41 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2


Batches: 100%|██████████| 1/1 [00:00<00:00,  5.90it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  4.97it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 61.29it/s]


### Provide test_data

In [8]:
test_data = [
    # Greetings
    {"query": "hello", "query_match": "greeting"},
    {"query": "hi", "query_match": "greeting"},
    {"query": "hey", "query_match": "greeting"},
    {"query": "greetings", "query_match": "greeting"},
    {"query": "good morning", "query_match": "greeting"},
    {"query": "good afternoon", "query_match": "greeting"},
    {"query": "good evening", "query_match": "greeting"},
    {"query": "howdy", "query_match": "greeting"},
    {"query": "what's up", "query_match": "greeting"},
    {"query": "yo", "query_match": "greeting"},
    {"query": "hiya", "query_match": "greeting"},
    {"query": "salutations", "query_match": "greeting"},
    {"query": "how's it going", "query_match": "greeting"},
    {"query": "how are you", "query_match": "greeting"},
    {"query": "nice to meet you", "query_match": "greeting"},
    # Farewells
    {"query": "goodbye", "query_match": "farewell"},
    {"query": "bye", "query_match": "farewell"},
    {"query": "see you later", "query_match": "farewell"},
    {"query": "take care", "query_match": "farewell"},
    {"query": "farewell", "query_match": "farewell"},
    {"query": "have a good day", "query_match": "farewell"},
    {"query": "see you soon", "query_match": "farewell"},
    {"query": "catch you later", "query_match": "farewell"},
    {"query": "so long", "query_match": "farewell"},
    {"query": "peace out", "query_match": "farewell"},
    {"query": "later", "query_match": "farewell"},
    {"query": "all the best", "query_match": "farewell"},
    {"query": "take it easy", "query_match": "farewell"},
    {"query": "have a good one", "query_match": "farewell"},
    {"query": "cheerio", "query_match": "farewell"},
    # Null matches
    {"query": "what's the capital of britain?", "query_match": ""},
    {"query": "what does laffy taffy taste like?", "query_match": ""},
]

### Optimize

Note: by default route distance threshold optimization will use a random search to find the best threshold since, unlike caching, there are many thresholds to optimize concurrently. 

In [9]:
from redisvl.utils.optimize import RouterThresholdOptimizer

print(f"Route thresholds before: {router.route_thresholds} \n")
optimizer = RouterThresholdOptimizer(router, test_data)
optimizer.optimize()

Route thresholds before: {'greeting': 0.5, 'farewell': 0.5} 



Batches: 100%|██████████| 1/1 [00:00<00:00,  7.52it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 66.81it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 67.98it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 67.45it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 66.33it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 67.36it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 68.65it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 69.68it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  9.45it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 64.54it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 44.41it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  9.21it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  9.56it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 59.66it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 66.82it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 71.34it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 69.43it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 67.47it/s]
Batches: 1

Eval metric F1: start 0.438, end 0.812 
Ending thresholds: {'greeting': 0.5828282828282831, 'farewell': 0.7545454545454545}


### Test it out

In [10]:
# Query the router with a statement
route_match = router("hi there")
route_match

Batches: 100%|██████████| 1/1 [00:00<00:00, 55.72it/s]


RouteMatch(name='greeting', distance=0.295984089375)

## Cleanup

In [11]:
router.delete()
sem_cache.delete()