**‚ö†Ô∏è IMPORTANT! ‚ö†Ô∏è**

---

**üîì The Notebook is read-only:** Use **File ‚ñ∏ Save a copy in Drive** to make it yours, then run cells ‚úÖ

---

üìö **This Notebook** is part of the **Input Validation & Guardrails** series:

- [Input Validation](https://colab.research.google.com/drive/1bQwv3o1dBfTlTz4koGF5AXFt3ArnwwtA#copy=true) - Basic input validation
- [Toxicity Detection](https://colab.research.google.com/drive/1cd73fGf-DkyM_tMHrh_eYQ7NWWsn5wgd#copy=true) ‚Äî Block harmful content
- [PII Detection](https://colab.research.google.com/drive/1hcNw8n8w3oEk2MfGQoG1XuJphfbhcgwM#copy=true) ‚Äî Protect sensitive information
- [Prompt Injection Detection](https://colab.research.google.com/drive/1eNuATWd2HbuLUvJptiXgmsAgWkgwPVDv#copy=true) ‚Äî Prevent manipulation attempts
- [Off-Topic Detection](https://colab.research.google.com/drive/1cd73fGf-DkyM_tMHrh_eYQ7NWWsn5wgd#copy=true) ‚Äî Keep conversations focused

# üéØ Input Validation & Guardrails

## Off-Topic Detection for LLM Applications

Welcome to this notebook on **Off-Topic Detection** ‚Äî the final line of defense that ensures your RAG system stays focused on its intended purpose!

Even if a query is safe, non-toxic, and well-formed, it might simply be **irrelevant** to your system's domain. If your RAG system is built to answer questions about "Company HR Policy," it has no business answering questions about "Python coding" or "History of Rome."


## üö´ Why Off-Topic Detection Matters

Without this guardrail, your system will attempt to retrieve documents for irrelevant queries. Since vector search **always returns something** (even if the match is poor), the LLM will receive random, irrelevant context chunks.

This leads to three major problems:

| Problem | Description | Impact |
|---------|-------------|--------|
| **ü§ñ Hallucinations** | LLM tries to force an answer using irrelevant context | Incorrect/misleading responses |
| **üí∏ Resource Waste** | You pay for embedding, retrieval, and generation on unanswerable queries | Wasted costs |
| **üè∑Ô∏è Brand Risk** | Your specialized bot acts like a general-purpose chatbot | Dilutes product purpose |

### Example Scenario:
```text
User asks: "What is the capital of France?" (to your HR Policy bot)

‚ùå Without Off-Topic Detection:
   ‚Üí Vector search returns random HR docs with low similarity
   ‚Üí LLM receives irrelevant context about vacation policies
   ‚Üí LLM either hallucinates or gives a confused answer
   ‚Üí You paid for the full pipeline for a question you can't answer

‚úÖ With Off-Topic Detection:
   ‚Üí Query flagged as off-topic (low similarity to HR anchors)
   ‚Üí Immediate response: "I can only help with HR-related questions."
   ‚Üí No wasted resources, clear user guidance
```

## üìã Prerequisites

Before we start, make sure you have:
- ‚úÖ A Google Colab environment (you're already here!)
- ‚úÖ Internet connection
- ‚úÖ An OpenAI API Key (we'll guide you through this)


## üîß Environment Setup

First, let's install the required libraries.

In [None]:
# Install the required libraries
!pip install openai sentence-transformers -q

In [None]:
# Import all required libraries
from openai import OpenAI
from sentence_transformers import SentenceTransformer

print("‚úÖ Libraries imported successfully!")



‚úÖ Libraries imported successfully!


## üîë Secure API Key Management

### Obtaining Your API Key

To get your API key:

1. Go to [Hooloovoo Help Desk - AI Control Center](https://hooloovoo.atlassian.net/servicedesk/customer/portal/48)
2. Navigate to **AI API Keys** section
3. Submit the request (NOTE: Provider should be OpenAI)

### Security Best Practices

**DO:**
- ‚úÖ Store keys as environment variables or in secure storage
- ‚úÖ Use separate keys for development and production
- ‚úÖ Set usage limits and monitoring
- ‚úÖ Rotate keys periodically

**DON'T:**
- ‚ùå Hard-code keys in your source code
- ‚ùå Commit keys to version control
- ‚ùå Share keys publicly or in screenshots
- ‚ùå Share your keys with others

### Setting Up Your API Key

Run the cell below and paste your API key when prompted (it will be hidden for security).

In [None]:
# Securely input your API key
import os
from getpass import getpass

# Prompt for API key (input will be hidden)
api_key = getpass("Enter your OpenAI API Key: ")

# Set it as an environment variable
os.environ["OPENAI_API_KEY"] = api_key

# Initialize the OpenAI client
client = OpenAI(api_key=api_key)

print("‚úÖ API key configured successfully!")

Enter your OpenAI API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
‚úÖ API key configured successfully!


## üß™ Helper Function

Let's create a simple helper function to make API calls easier to read and understand.

In [None]:
def call_llm(prompt, model="gpt-4.1-nano", temperature=0.7):

    # Build parameters dictionary
    params = {
        "model": model,
        "input": prompt,
    }

    if not model == "gpt-5-codex":
      params["temperature"] = temperature

    response = client.responses.create(**params)
    return response.output_text

print("‚úÖ Helper function defined!")

‚úÖ Helper function defined!


## üß≠ The Approach: Semantic Boundary Checking

Instead of training a classifier to recognize every possible off-topic category (which is impossible), we use **Similarity-Based Validation**:

1. **Define anchor queries** ‚Äî A list of "canonical" on-topic example queries
2. **Embed the anchors** ‚Äî Convert them to vectors in semantic space
3. **Compare incoming queries** ‚Äî Check cosine similarity against anchors
4. **Apply threshold** ‚Äî If similarity is too low, reject as off-topic


## üõ†Ô∏è Building the Topic Guardrail

Let's create a `TopicGuardrail` class that:
1. Takes a list of on-topic example queries as "anchors"
2. Embeds them into vector space
3. Compares new queries against these anchors using cosine similarity
4. Rejects queries that fall below the similarity threshold

In [None]:
import numpy as np
from typing import Optional

class TopicGuardrail:
    """
    Semantic boundary checker for off-topic detection.
    Uses cosine similarity against anchor queries to determine relevance.
    """

    def __init__(
        self,
        on_topic_examples: list[str],
        threshold: float = 0.35,
        model_name: str = 'all-MiniLM-L6-v2'
    ):
        """
        Initialize the topic guardrail.

        Args:
            on_topic_examples: List of example queries that ARE in scope
            threshold: Minimum similarity to be considered on-topic (0.0 to 1.0)
            model_name: Sentence transformer model to use
        """
        print(f"Loading embedding model: {model_name}...")
        self.model = SentenceTransformer(model_name)
        self.threshold = threshold
        self.on_topic_examples = on_topic_examples

        # Pre-compute embeddings for all on-topic examples
        print(f"Embedding {len(on_topic_examples)} anchor queries...")
        self.topic_embeddings = self.model.encode(on_topic_examples)

        print("‚úÖ Topic guardrail ready!")

    def check(self, query: str) -> tuple[bool, dict]:
        """
        Check if a query is on-topic.

        Args:
            query: The user's input query

        Returns:
            tuple: (is_on_topic, details)
                - is_on_topic: True if query is within scope
                - details: Dict with similarity scores and matched anchor
        """
        # Embed the incoming query
        query_embedding = self.model.encode([query])[0]

        # Calculate cosine similarity against all anchor embeddings
        # Cosine similarity = dot(A, B) / (norm(A) * norm(B))
        similarities = np.dot(self.topic_embeddings, query_embedding) / (
            np.linalg.norm(self.topic_embeddings, axis=1) *
            np.linalg.norm(query_embedding)
        )

        # Find the best matching anchor
        max_idx = int(np.argmax(similarities))
        max_similarity = float(similarities[max_idx])
        best_match = self.on_topic_examples[max_idx]

        # Determine if on-topic based on threshold
        is_on_topic = max_similarity >= self.threshold

        return is_on_topic, {
            "max_similarity": max_similarity,
            "best_match": best_match,
            "threshold": self.threshold,
            "all_similarities": dict(zip(self.on_topic_examples, similarities.tolist()))
        }

print("‚úÖ TopicGuardrail class defined!")

‚úÖ TopicGuardrail class defined!


## üè¢ Example: HR Policy Assistant

Let's create a topic guardrail for an **HR Policy Assistant** that should only answer questions about company policies, benefits, and HR procedures.

In [None]:
# Define anchor queries for an HR Policy Assistant
hr_anchor_queries = [
    # Vacation & Time Off
    "What is our vacation policy?",
    "How do I request time off?",
    "How many sick days do I get?",

    # Benefits
    "What are the health insurance options?",
    "How does the 401k matching work?",
    "What is the parental leave policy?",

    # Procedures
    "How do I submit an expense report?",
    "What is the dress code?",
    "How do I report harassment?",

    # General HR
    "What are the office hours?",
    "How do I update my direct deposit?",
    "What is the remote work policy?",
]

# Create the topic guardrail
hr_guardrail = TopicGuardrail(
    on_topic_examples=hr_anchor_queries,
    threshold=0.35  # Queries must have at least 35% similarity to be on-topic
)

Loading embedding model: all-MiniLM-L6-v2...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding 12 anchor queries...
‚úÖ Topic guardrail ready!


## ‚úÖ Testing On-Topic Queries

Let's test with queries that SHOULD be allowed ‚Äî they're related to HR topics.

In [None]:
# Test with on-topic queries
on_topic_queries = [
    "How many vacation days do I get per year?",
    "Can I work from home on Fridays?",
    "What's the process for requesting maternity leave?",
    "Does the company match 401k contributions?",
    "Where do I submit my travel expenses?",
]

print("Testing ON-TOPIC queries:\n")
print("=" * 70)

for query in on_topic_queries:
    is_on_topic, details = hr_guardrail.check(query)
    status = "‚úÖ ON-TOPIC" if is_on_topic else "‚ùå OFF-TOPIC"

    print(f"\nQuery: \"{query}\"")
    print(f"Result: {status}")
    print(f"Similarity: {details['max_similarity']:.2f} (threshold: {details['threshold']})")
    print(f"Best match: \"{details['best_match']}\"")

Testing ON-TOPIC queries:


Query: "How many vacation days do I get per year?"
Result: ‚úÖ ON-TOPIC
Similarity: 0.64 (threshold: 0.35)
Best match: "How many sick days do I get?"

Query: "Can I work from home on Fridays?"
Result: ‚úÖ ON-TOPIC
Similarity: 0.37 (threshold: 0.35)
Best match: "What are the office hours?"

Query: "What's the process for requesting maternity leave?"
Result: ‚úÖ ON-TOPIC
Similarity: 0.62 (threshold: 0.35)
Best match: "What is the parental leave policy?"

Query: "Does the company match 401k contributions?"
Result: ‚úÖ ON-TOPIC
Similarity: 0.71 (threshold: 0.35)
Best match: "How does the 401k matching work?"

Query: "Where do I submit my travel expenses?"
Result: ‚úÖ ON-TOPIC
Similarity: 0.65 (threshold: 0.35)
Best match: "How do I submit an expense report?"


## ‚ùå Testing Off-Topic Queries

Now let's test with queries that should be REJECTED ‚Äî they have nothing to do with HR.

In [None]:
# Test with off-topic queries
off_topic_queries = [
    "What is the capital of France?",
    "How do I write a Python function?",
    "What's the weather like today?",
    "Can you help me with my math homework?",
    "Who won the Super Bowl last year?",
    "What is machine learning?",
]

print("Testing OFF-TOPIC queries:\n")
print("=" * 70)

for query in off_topic_queries:
    is_on_topic, details = hr_guardrail.check(query)
    status = "‚úÖ ON-TOPIC" if is_on_topic else "‚ùå OFF-TOPIC"

    print(f"\nQuery: \"{query}\"")
    print(f"Result: {status}")
    print(f"Similarity: {details['max_similarity']:.2f} (threshold: {details['threshold']})")
    print(f"Best match: \"{details['best_match']}\"")

Testing OFF-TOPIC queries:


Query: "What is the capital of France?"
Result: ‚ùå OFF-TOPIC
Similarity: 0.15 (threshold: 0.35)
Best match: "What is the dress code?"

Query: "How do I write a Python function?"
Result: ‚ùå OFF-TOPIC
Similarity: 0.21 (threshold: 0.35)
Best match: "What is the dress code?"

Query: "What's the weather like today?"
Result: ‚ùå OFF-TOPIC
Similarity: 0.28 (threshold: 0.35)
Best match: "How many sick days do I get?"

Query: "Can you help me with my math homework?"
Result: ‚ùå OFF-TOPIC
Similarity: 0.08 (threshold: 0.35)
Best match: "How does the 401k matching work?"

Query: "Who won the Super Bowl last year?"
Result: ‚ùå OFF-TOPIC
Similarity: 0.07 (threshold: 0.35)
Best match: "How do I submit an expense report?"

Query: "What is machine learning?"
Result: ‚ùå OFF-TOPIC
Similarity: 0.16 (threshold: 0.35)
Best match: "What is the remote work policy?"


## üìä Visualizing the Similarity Scores

Let's create a visual comparison showing how on-topic vs off-topic queries score against our anchors.

In [None]:
# Combined visualization
all_test_queries = [
    ("On-topic", "How many vacation days do I get?"),
    ("On-topic", "What's the health insurance deductible?"),
    ("On-topic", "Can I expense my home office setup?"),
    ("Off-topic", "What is the capital of France?"),
    ("Off-topic", "How do I cook pasta?"),
    ("Off-topic", "Explain quantum computing"),
]

print("üìä SIMILARITY SCORE VISUALIZATION")
print("=" * 70)
print(f"Threshold: {hr_guardrail.threshold}")
print("=" * 70)

for category, query in all_test_queries:
    is_on_topic, details = hr_guardrail.check(query)
    score = details["max_similarity"]

    # Create visual bar
    bar_length = int(score * 40)
    threshold_pos = int(hr_guardrail.threshold * 40)

    # Build the bar with threshold marker
    bar = ""
    for i in range(40):
        if i == threshold_pos:
            bar += "|"  # Threshold marker
        elif i < bar_length:
            bar += "‚ñà"
        else:
            bar += "‚ñë"

    status = "‚úÖ" if is_on_topic else "‚ùå"

    print(f"\n[{category}] \"{query[:35]}{'...' if len(query) > 35 else ''}\"")
    print(f"  [{bar}] {score:.2f} {status}")

print("\n" + "=" * 70)
print("Legend: | = threshold, ‚ñà = similarity score")

üìä SIMILARITY SCORE VISUALIZATION
Threshold: 0.35

[On-topic] "How many vacation days do I get?"
  [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 0.71 ‚úÖ

[On-topic] "What's the health insurance deducti..."
  [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 0.65 ‚úÖ

[On-topic] "Can I expense my home office setup?"
  [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë|‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 0.34 ‚ùå

[Off-topic] "What is the capital of France?"
  [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë|‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 0.15 ‚ùå

[Off-topic] "How do I cook pasta?"
  [‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë|‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 0.11 ‚ùå

[Off-topic] "Explain quantum computing"
  [‚ñà‚ñà‚ñà‚ñà

## üîß Tuning the Threshold

The threshold is crucial for balancing **coverage** vs **precision**:

| Threshold | Effect | Use Case |
|-----------|--------|----------|
| **Lower (0.25)** | More permissive, allows borderline queries | General-purpose assistants |
| **Medium (0.35)** | Balanced approach | Most production systems |
| **Higher (0.50)** | More strict, only very relevant queries pass | High-precision systems |

Let's see how different thresholds affect our results:

In [None]:
# Test different thresholds
test_queries = [
    ("How many PTO days do new employees get?", "Clear HR question"),
    ("What's the company policy on side projects?", "Borderline HR question"),
    ("How do I improve my productivity at work?", "General work question"),
    ("What programming languages should I learn?", "Off-topic tech question"),
]

thresholds = [0.25, 0.35, 0.45, 0.55]

print("üìà THRESHOLD SENSITIVITY ANALYSIS")
print("=" * 80)

# Header
print(f"\n{'Query':<40} | ", end="")
for t in thresholds:
    print(f"t={t:<5} | ", end="")
print("\n" + "-" * 80)

for query, description in test_queries:
    # Get the base similarity score
    _, details = hr_guardrail.check(query)
    score = details["max_similarity"]

    # Display query
    display_query = query[:38] + ".." if len(query) > 40 else query
    print(f"{display_query:<40} | ", end="")

    # Check against each threshold
    for t in thresholds:
        status = "‚úÖ" if score >= t else "‚ùå"
        print(f"{status} {score:.2f} | ", end="")

    print(f" ({description})")

üìà THRESHOLD SENSITIVITY ANALYSIS

Query                                    | t=0.25  | t=0.35  | t=0.45  | t=0.55  | 
--------------------------------------------------------------------------------
How many PTO days do new employees get?  | ‚úÖ 0.46 | ‚úÖ 0.46 | ‚úÖ 0.46 | ‚ùå 0.46 |  (Clear HR question)
What's the company policy on side proj.. | ‚úÖ 0.48 | ‚úÖ 0.48 | ‚úÖ 0.48 | ‚ùå 0.48 |  (Borderline HR question)
How do I improve my productivity at wo.. | ‚úÖ 0.31 | ‚ùå 0.31 | ‚ùå 0.31 | ‚ùå 0.31 |  (General work question)
What programming languages should I le.. | ‚ùå 0.08 | ‚ùå 0.08 | ‚ùå 0.08 | ‚ùå 0.08 |  (Off-topic tech question)


---

## üéØ Improving Embedding Accuracy: Fine-Tuning Considerations

While the `all-MiniLM-L6-v2` model works well for general use cases, you may find that it doesn't capture the nuances of your specific domain. Here are strategies to improve accuracy:

### When Embedding-Based Detection Falls Short

| Symptom | Cause | Solution |
|---------|-------|----------|
| Too many false positives | Threshold too high, or model doesn't understand domain | Lower threshold OR fine-tune model |
| Too many false negatives | Threshold too low, or anchors don't cover all topics | Add more anchors OR fine-tune model |
| Borderline queries misclassified | General-purpose embeddings lack domain specificity | Fine-tune on domain data |

### Fine-Tuning Open-Source Embedding Models

If you have labeled data (queries marked as on-topic or off-topic), you can fine-tune open-source models for better accuracy:

**Popular Models for Fine-Tuning:**
- [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) ‚Äî Fast, good baseline
- [`sentence-transformers/all-mpnet-base-v2`](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) ‚Äî Higher quality, 5x slower
- [`BAAI/bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-small-en-v1.5) ‚Äî Excellent for retrieval tasks
- [`intfloat/e5-small-v2`](https://huggingface.co/intfloat/e5-small-v2) ‚Äî Strong performance on semantic similarity

**Fine-Tuning Approach (Conceptual):**

```python
# Conceptual example - not executable in this notebook
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

# 1. Prepare training data as pairs
train_examples = [
    # (query, anchor, label) - label=1 if similar, 0 if not
    InputExample(texts=["How do I get time off?", "What is our vacation policy?"], label=1.0),
    InputExample(texts=["What's the weather?", "What is our vacation policy?"], label=0.0),
    # ... more examples
]

# 2. Create dataloader
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)

# 3. Define loss function
train_loss = losses.CosineSimilarityLoss(model)

# 4. Fine-tune
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=3,
    warmup_steps=100
)
```

**When to Fine-Tune:**
- You have 500+ labeled query pairs
- Domain-specific vocabulary (legal, medical, technical)
- Current accuracy is below 85% on your test set
- False positive/negative rate is unacceptable for your use case

> *üí° **Tip:***
>
> *Start with more anchors and threshold tuning before investing in fine-tuning. Often, adding 10-20 well-chosen anchor queries can dramatically improve accuracy without the complexity of training.*

---

# ü§ñ A Different Approach: LLM-Based Off-Topic Detection

While embedding-based detection is fast and cost-effective, **LLM-based classification** offers superior accuracy for nuanced decisions. Modern small LLMs like `gpt-4.1-nano` can perform sophisticated reasoning about topic relevance at minimal cost.

### Why Use LLM for Off-Topic Detection?

| Advantage | Description |
|-----------|-------------|
| **üéØ Higher Accuracy** | LLMs understand context, intent, and nuance better than cosine similarity |
| **üß† Reasoning Capability** | Can explain WHY a query is off-topic, useful for debugging |
| **üìù No Anchor Management** | Describe your domain in natural language instead of curating examples |
| **üîÑ Easy Updates** | Change the system description without re-embedding anchors |
| **üåê Handles Edge Cases** | Better at borderline queries that embeddings might misclassify |

### When to Choose LLM Over Embeddings?

| Use Case | Best Approach |
|----------|---------------|
| High volume, low latency (>1000 req/min) | Embeddings |
| High accuracy requirements | LLM |
| Limited anchor examples available | LLM |
| Complex, nuanced domain boundaries | LLM |
| Cost-sensitive production | Embeddings |
| Borderline query handling | LLM |

In [None]:
import json
import time

class LLMTopicGuardrail:
    """
    LLM-based off-topic detection using fast, efficient classification.
    Uses a small model (gpt-4.1-nano) for cost-effective, accurate decisions.
    """

    def __init__(self, domain_description: str, allowed_topics: list[str]):
        """
        Initialize the LLM topic guardrail.

        Args:
            domain_description: Clear description of what the system handles
            allowed_topics: List of allowed topic categories
        """
        self.domain_description = domain_description
        self.allowed_topics = allowed_topics
        self.model = "gpt-4.1-nano"  # Fast, cheap, accurate for classification

        # Build the classification prompt template
        self.system_prompt = self._build_system_prompt()

    def _build_system_prompt(self) -> str:
        """Build an optimized system prompt for fast classification."""
        topics_str = ", ".join(self.allowed_topics)

        return f"""You are a query classifier. Your task is to determine if a user query is ON-TOPIC or OFF-TOPIC.

DOMAIN: {self.domain_description}

ALLOWED TOPICS: {topics_str}

RULES:
- ON-TOPIC: Query relates to any of the allowed topics
- OFF-TOPIC: Query is unrelated to the domain (general knowledge, other domains, chitchat)

Respond with ONLY valid JSON in this exact format:
{{"decision": "ON-TOPIC" or "OFF-TOPIC", "reason": "brief explanation"}}"""

    def check(self, query: str) -> tuple[bool, dict]:
        """
        Check if a query is on-topic using LLM classification.

        Args:
            query: The user's input query

        Returns:
            tuple: (is_on_topic, details)
        """
        start_time = time.time()

        # Build the prompt
        prompt = f"{self.system_prompt}\n\nQUERY: {query}\n\nJSON:"

        try:
            # Call LLM with low temperature for consistent classification
            response = call_llm(prompt, model=self.model, temperature=0)

            # Parse JSON response
            result = json.loads(response.strip())

            is_on_topic = result.get("decision", "").upper() == "ON-TOPIC"
            reason = result.get("reason", "No reason provided")

        except (json.JSONDecodeError, Exception) as e:
            # Fallback: try to extract decision from raw text
            print(f"‚ö†Ô∏è Error parsing JSON: {e}")
            response_upper = response.upper() if 'response' in dir() else ""
            is_on_topic = "ON-TOPIC" in response_upper and "OFF-TOPIC" not in response_upper
            reason = f"Parse error, inferred from response"

        elapsed_ms = (time.time() - start_time) * 1000

        return is_on_topic, {
            "decision": "ON-TOPIC" if is_on_topic else "OFF-TOPIC",
            "reason": reason,
            "latency_ms": elapsed_ms,
            "model": self.model
        }

print("‚úÖ LLMTopicGuardrail class defined!")

‚úÖ LLMTopicGuardrail class defined!


## üè¢ Example: HR Policy Assistant with LLM Detection

Let's create an LLM-based topic guardrail for the same HR Policy Assistant scenario.

In [None]:
# Create an LLM-based topic guardrail for HR
llm_hr_guardrail = LLMTopicGuardrail(
    domain_description="Company HR Policy Assistant that answers questions about employee policies, benefits, procedures, and workplace guidelines.",
    allowed_topics=[
        "vacation and time off",
        "health insurance and benefits",
        "401k and retirement",
        "parental leave",
        "expense reports",
        "dress code",
        "harassment reporting",
        "remote work policies",
        "office hours",
        "payroll and compensation"
    ]
)

print("‚úÖ LLM HR Topic Guardrail created!")
print(f"   Model: {llm_hr_guardrail.model}")
print(f"   Topics: {len(llm_hr_guardrail.allowed_topics)} categories")

‚úÖ LLM HR Topic Guardrail created!
   Model: gpt-4.1-nano
   Topics: 10 categories


## üß™ Testing LLM-Based Detection

Let's test the LLM guardrail with a variety of queries and see how it performs.

In [None]:
# Test queries - mix of on-topic and off-topic
test_queries = [
    # Clear on-topic
    ("How many vacation days do new employees get?", "On-topic"),
    ("What's the process for requesting parental leave?", "On-topic"),
    ("Does the company match 401k contributions?", "On-topic"),

    # Borderline / nuanced
    ("Can I work from a coffee shop?", "Borderline - remote work related"),
    ("What should I wear to a client meeting?", "Borderline - dress code related"),
    ("How do I improve my work-life balance?", "Borderline - could be HR or general"),

    # Clear off-topic
    ("What is the capital of France?", "Off-topic"),
    ("How do I write a Python function?", "Off-topic"),
    ("What's the best pizza topping?", "Off-topic"),
]

print("ü§ñ LLM-BASED OFF-TOPIC DETECTION")
print("=" * 70)

total_latency = 0
for query, expected in test_queries:
    is_on_topic, details = llm_hr_guardrail.check(query)
    total_latency += details["latency_ms"]

    status = "‚úÖ ON-TOPIC" if is_on_topic else "‚ùå OFF-TOPIC"

    print(f"\nüìù Query: \"{query}\"")
    print(f"   Expected: {expected}")
    print(f"   Result: {status}")
    print(f"   Reason: {details['reason']}")
    print(f"   Latency: {details['latency_ms']:.0f}ms")

avg_latency = total_latency / len(test_queries)
print(f"\n" + "=" * 70)
print(f"üìä Average latency: {avg_latency:.0f}ms per query")

ü§ñ LLM-BASED OFF-TOPIC DETECTION

üìù Query: "How many vacation days do new employees get?"
   Expected: On-topic
   Result: ‚úÖ ON-TOPIC
   Reason: The query pertains to vacation days, which is an allowed topic within employee policies.
   Latency: 728ms

üìù Query: "What's the process for requesting parental leave?"
   Expected: On-topic
   Result: ‚úÖ ON-TOPIC
   Reason: The query pertains to parental leave, which is an employee benefit covered under workplace policies.
   Latency: 1927ms

üìù Query: "Does the company match 401k contributions?"
   Expected: On-topic
   Result: ‚úÖ ON-TOPIC
   Reason: The query pertains to 401k contributions, which is an allowed topic related to retirement benefits.
   Latency: 608ms

üìù Query: "Can I work from a coffee shop?"
   Expected: Borderline - remote work related
   Result: ‚úÖ ON-TOPIC
   Reason: The query relates to remote work policies, which is an allowed topic.
   Latency: 582ms

üìù Query: "What should I wear to a client meetin

### ‚ö°Ô∏è Latency optimizations

This optimized guardrail focuses on **reducing tokens, avoiding retries, and enabling cache hits**:

- **System prompt kept separate & stable** (sent as a system message, not concatenated into a big string). This reduces prompt churn and is better for caching.
- **Structured Outputs (schema-enforced)** so the model *must* return valid JSON ‚Üí no `json.loads()` failures, no fallback logic, no hidden retries.
- **Optimize Response** so that model returns `bool` not `string` for `is_on_topic` field. Remove `reason` field (this can be "enabled" during testing).
- **Hard cap on output length** via `max_output_tokens` (we only need `decision` + optional short `reason`).
- **Disable storage** (`store=False`) for slightly leaner requests (and better privacy defaults).

In [None]:
import time
from typing import Literal, Optional, Tuple, Dict, Any, List

from openai import OpenAI
from pydantic import BaseModel, Field


class TopicDecision(BaseModel):
    is_on_topic: bool
    # Make reason optional so you can run "ultra-fast mode" with fewer output tokens.
    reason: Optional[str] = Field(default=None, max_length=120)


class OptimizedLLMTopicGuardrail:
    """
    Optimized LLM-based off-topic detection for low latency.

    Key knobs:
      - system prompt as a stable system message (better caching, less prompt noise)
      - structured outputs (no JSON parsing drama)
      - max_output_tokens kept tiny
      - prompt caching enabled (prompt_cache_key + prompt_cache_retention)
    """

    def __init__(
        self,
        domain_description: str,
        allowed_topics: List[str],
        *,
        client: Optional[OpenAI] = None,
        model: str = "gpt-4.1-nano",
        temperature: float = 0.0,
        max_output_tokens: int = 40,
        include_reason: bool = True,
        store: bool = False,
    ):
        self.client = client or OpenAI()
        self.model = model
        self.temperature = temperature
        self.max_output_tokens = max_output_tokens
        self.include_reason = include_reason
        self.store = store

        # Precompute a compact, stable system prompt once (important for cache hit rate).
        topics = "\n".join(f"- {t.strip()}" for t in allowed_topics if t and t.strip())
        self.system_prompt = (
            "You are a strict topic classifier.\n"
            "Return ONLY the JSON object matching the schema.\n"
            "Decide on_topic is true if the query clearly relates to the domain/topics; otherwise false.\n"
            f"DOMAIN: {domain_description.strip()}\n"
            f"TOPICS:\n{topics}\n"
        )

    def check(self, query: str, *, cache_key: Optional[str] = None) -> Tuple[bool, Dict[str, Any]]:
        """
        Args:
            query: user input
            cache_key: stable identifier to improve cache hit rates (e.g., hashed user/org id)

        Returns:
            (is_on_topic, details_dict)
        """
        start = time.time()

        # Keep the user message tiny. The heavy, stable prefix is in the system message.
        input_messages = [
            {"role": "system", "content": self.system_prompt},
            {
                "role": "user",
                "content": (
                    f"QUERY: {query}\n"
                    + ("Include a short reason." if self.include_reason else "No reason.")
                ),
            },
        ]

        try:
            # Structured outputs: no JSON parsing, no retry loops for formatting.
            # max_output_tokens: hard cap on generated tokens.
            resp = self.client.responses.parse(
                model=self.model,
                input=input_messages,
                text_format=TopicDecision,
                temperature=self.temperature,
                max_output_tokens=self.max_output_tokens,
                store=self.store
            )

            parsed: TopicDecision = resp.output_parsed

            latency_ms = (time.time() - start) * 1000
            usage = getattr(resp, "usage", None)

            return parsed.is_on_topic, {
                "reason": parsed.reason or "",
                "latency_ms": latency_ms,
                "model": self.model,
            }

        except Exception as e:
            # Fail-safe behavior for a guardrail: treat as OFF_TOPIC (or route to stricter path).
            latency_ms = (time.time() - start) * 1000
            return False, {
                "reason": f"classification_error: {type(e).__name__}",
                "latency_ms": latency_ms,
                "model": self.model,
            }

Let's do another test, first create new guardrail object:

In [None]:
optimized_llm_hr_guardrail = OptimizedLLMTopicGuardrail(
    domain_description="Company HR Policy Assistant that answers questions about employee policies, benefits, procedures, and workplace guidelines.",
    allowed_topics=[
        "vacation and time off",
        "health insurance and benefits",
        "401k and retirement",
        "parental leave",
        "expense reports",
        "dress code",
        "harassment reporting",
        "remote work policies",
        "office hours",
        "payroll and compensation"
    ],
    client=client,
    include_reason=False  # Disable include reason during "production". Enable during testing, good while tweaking the prompt, temperature or model choice.
)

print("‚úÖ LLM HR Topic Guardrail created!")
print(f"   Model: {llm_hr_guardrail.model}")
print(f"   Topics: {len(llm_hr_guardrail.allowed_topics)} categories")

‚úÖ LLM HR Topic Guardrail created!
   Model: gpt-4.1-nano
   Topics: 10 categories


In [None]:
# Test queries - mix of on-topic and off-topic
test_queries = [
    # Clear on-topic
    ("How many vacation days do new employees get?", "On-topic"),
    ("What's the process for requesting parental leave?", "On-topic"),
    ("Does the company match 401k contributions?", "On-topic"),

    # Borderline / nuanced
    ("Can I work from a coffee shop?", "Borderline - remote work related"),
    ("What should I wear to a client meeting?", "Borderline - dress code related"),
    ("How do I improve my work-life balance?", "Borderline - could be HR or general"),

    # Clear off-topic
    ("What is the capital of France?", "Off-topic"),
    ("How do I write a Python function?", "Off-topic"),
    ("What's the best pizza topping?", "Off-topic"),
]

print("ü§ñ LLM-BASED OFF-TOPIC DETECTION")
print("=" * 70)

total_latency = 0
for query, expected in test_queries:
    is_on_topic, details = llm_hr_guardrail.check(query)
    total_latency += details["latency_ms"]

    status = "‚úÖ ON-TOPIC" if is_on_topic else "‚ùå OFF-TOPIC"

    print(f"\nüìù Query: \"{query}\"")
    print(f"   Expected: {expected}")
    print(f"   Result: {status}")
    print(f"   Reason: {details['reason']}")
    print(f"   Latency: {details['latency_ms']:.0f}ms")

avg_latency = total_latency / len(test_queries)
print(f"\n" + "=" * 70)
print(f"üìä Average latency: {avg_latency:.0f}ms per query")

ü§ñ LLM-BASED OFF-TOPIC DETECTION

üìù Query: "How many vacation days do new employees get?"
   Expected: On-topic
   Result: ‚úÖ ON-TOPIC
   Reason: The query pertains to vacation days, which is an allowed topic within employee policies.
   Latency: 521ms

üìù Query: "What's the process for requesting parental leave?"
   Expected: On-topic
   Result: ‚úÖ ON-TOPIC
   Reason: The query pertains to parental leave, which is an employee benefit covered under workplace policies.
   Latency: 645ms

üìù Query: "Does the company match 401k contributions?"
   Expected: On-topic
   Result: ‚úÖ ON-TOPIC
   Reason: The query relates to 401k and retirement benefits, which are within the allowed topics.
   Latency: 1041ms

üìù Query: "Can I work from a coffee shop?"
   Expected: Borderline - remote work related
   Result: ‚úÖ ON-TOPIC
   Reason: The query relates to remote work policies, which is an allowed topic.
   Latency: 607ms

üìù Query: "What should I wear to a client meeting?"
   Expec

## ‚ö° Performance Comparison: Embeddings vs LLM

Let's compare both approaches on the same set of queries to see the trade-offs.

In [None]:
# Compare both approaches
comparison_queries = [
    "How do I request time off for a medical appointment?",
    "What benefits are available for part-time employees?",
    "Can I expense my home office furniture?",
    "What programming language should I learn?",
    "Who won the World Cup?",
    "Is there a policy about bringing pets to the office?",  # Borderline
]

print("‚ö° EMBEDDING vs LLM COMPARISON")
print("=" * 75)
print(f"{'Query':<45} {'Embedding':<15} {'LLM':<15}")
print("-" * 75)

embedding_times = []
llm_times = []

for query in comparison_queries:
    # Embedding approach
    start = time.time()
    emb_on_topic, emb_details = hr_guardrail.check(query)
    emb_time = (time.time() - start) * 1000
    embedding_times.append(emb_time)

    # LLM approach
    llm_on_topic, llm_details = optimized_llm_hr_guardrail.check(query)
    llm_times.append(llm_details["latency_ms"])

    # Format results
    emb_result = f"{'‚úÖ' if emb_on_topic else '‚ùå'} ({emb_details['max_similarity']:.2f})"
    llm_result = f"{'‚úÖ' if llm_on_topic else '‚ùå'}"

    display_query = query[:43] + ".." if len(query) > 45 else query
    print(f"{display_query:<45} {emb_result:<15} {llm_result:<15}")

print("-" * 75)
print(f"{'Average Latency:':<45} {sum(embedding_times)/len(embedding_times):.0f}ms{'':<9} {sum(llm_times)/len(llm_times):.0f}ms")

‚ö° EMBEDDING vs LLM COMPARISON
Query                                         Embedding       LLM            
---------------------------------------------------------------------------
How do I request time off for a medical app.. ‚úÖ (0.74)        ‚úÖ              
What benefits are available for part-time e.. ‚ùå (0.34)        ‚úÖ              
Can I expense my home office furniture?       ‚ùå (0.34)        ‚úÖ              
What programming language should I learn?     ‚ùå (0.09)        ‚ùå              
Who won the World Cup?                        ‚ùå (0.09)        ‚ùå              
Is there a policy about bringing pets to th.. ‚úÖ (0.36)        ‚ùå              
---------------------------------------------------------------------------
Average Latency:                              19ms          476ms


## üéØ Choosing the Right Approach

Based on our comparison, here's a decision framework:

| Factor | Embedding-Based | LLM-Based |
|--------|-----------------|-----------|
| **Latency** | ~10-50ms | ~500-1500ms |
| **Cost per query** | ~\$0.00001 (compute only) | ~\$0.0001-0.001 |
| **Accuracy on clear cases** | Excellent | Excellent |
| **Accuracy on borderline** | Good | Excellent |
| **Explainability** | Limited (similarity score) | High (reasoning provided) |
| **Setup effort** | Requires anchor curation | Just describe domain |
| **Maintenance** | Update anchors manually | Update description |

### Recommended Patterns:

**Pattern 1: Embedding-First (Cost Optimized)**
```text
User Query ‚Üí Embedding Check ‚Üí If borderline ‚Üí LLM Check ‚Üí Decision
```

**Pattern 2: LLM-Only (Accuracy Optimized)**
```text
User Query ‚Üí LLM Check ‚Üí Decision
```

**Pattern 3: Parallel (Latency + Accuracy)**
```text
User Query ‚Üí [Embedding Check || LLM Check] ‚Üí Combine Results ‚Üí Decision
```

In [None]:
def hybrid_topic_check(
    query: str,
    embedding_guardrail: TopicGuardrail,
    llm_guardrail: LLMTopicGuardrail,
    borderline_threshold: tuple = (0.30, 0.50)
) -> tuple[bool, dict]:
    """
    Hybrid approach: Use embeddings first, escalate to LLM for borderline cases.

    This pattern optimizes for both cost and accuracy:
    - Clear on-topic (similarity > 0.50): Accept immediately
    - Clear off-topic (similarity < 0.30): Reject immediately
    - Borderline (0.30-0.50): Use LLM for final decision

    Args:
        query: User query to check
        embedding_guardrail: Pre-configured embedding guardrail
        llm_guardrail: Pre-configured LLM guardrail
        borderline_threshold: (low, high) thresholds for borderline zone
    """
    low_thresh, high_thresh = borderline_threshold

    # Step 1: Fast embedding check
    start = time.time()
    emb_on_topic, emb_details = embedding_guardrail.check(query)
    emb_time = (time.time() - start) * 1000

    similarity = emb_details["max_similarity"]

    # Step 2: Decide based on confidence
    if similarity >= high_thresh:
        # High confidence on-topic - no need for LLM
        return True, {
            "method": "embedding",
            "decision": "ON-TOPIC",
            "reason": f"High similarity ({similarity:.2f}) to '{emb_details['best_match']}'",
            "similarity": similarity,
            "latency_ms": emb_time,
            "llm_used": False
        }

    elif similarity < low_thresh:
        # High confidence off-topic - no need for LLM
        return False, {
            "method": "embedding",
            "decision": "OFF-TOPIC",
            "reason": f"Low similarity ({similarity:.2f}) - not related to domain",
            "similarity": similarity,
            "latency_ms": emb_time,
            "llm_used": False
        }

    else:
        # Borderline case - escalate to LLM
        llm_on_topic, llm_details = llm_guardrail.check(query)
        total_time = emb_time + llm_details["latency_ms"]

        return llm_on_topic, {
            "method": "hybrid (escalated to LLM)",
            "decision": llm_details["decision"],
            "reason": llm_details["reason"],
            "similarity": similarity,
            "latency_ms": total_time,
            "llm_used": True
        }

print("‚úÖ Hybrid topic check function defined!")

# Cell Type: Code
# Test the hybrid approach
hybrid_test_queries = [
    "What is our vacation policy?",           # Clear on-topic
    "What is the capital of France?",         # Clear off-topic
    "Can I bring my dog to work sometimes?",  # Borderline
    "How do I handle stress at work?",        # Borderline
    "What's the dress code for casual Friday?", # On-topic
]

print("üîÄ HYBRID APPROACH RESULTS")
print("=" * 75)

for query in hybrid_test_queries:
    is_on_topic, details = hybrid_topic_check(query, hr_guardrail, llm_hr_guardrail)

    status = "‚úÖ ON-TOPIC" if is_on_topic else "‚ùå OFF-TOPIC"
    llm_indicator = "ü§ñ" if details["llm_used"] else "üìä"

    print(f"\n{llm_indicator} Query: \"{query}\"")
    print(f"   Decision: {status}")
    print(f"   Method: {details['method']}")
    print(f"   Reason: {details['reason']}")
    print(f"   Latency: {details['latency_ms']:.0f}ms")

print("\n" + "=" * 75)
print("Legend: üìä = Embedding only, ü§ñ = LLM escalation")

‚úÖ Hybrid topic check function defined!
üîÄ HYBRID APPROACH RESULTS

üìä Query: "What is our vacation policy?"
   Decision: ‚úÖ ON-TOPIC
   Method: embedding
   Reason: High similarity (1.00) to 'What is our vacation policy?'
   Latency: 24ms

üìä Query: "What is the capital of France?"
   Decision: ‚ùå OFF-TOPIC
   Method: embedding
   Reason: Low similarity (0.15) - not related to domain
   Latency: 22ms

ü§ñ Query: "Can I bring my dog to work sometimes?"
   Decision: ‚ùå OFF-TOPIC
   Method: hybrid (escalated to LLM)
   Reason: The query relates to bringing pets to work, which is not covered under the specified HR policies.
   Latency: 607ms

üìä Query: "How do I handle stress at work?"
   Decision: ‚ùå OFF-TOPIC
   Method: embedding
   Reason: Low similarity (0.26) - not related to domain
   Latency: 22ms

üìä Query: "What's the dress code for casual Friday?"
   Decision: ‚úÖ ON-TOPIC
   Method: embedding
   Reason: High similarity (0.81) to 'What is the dress code?'
   Late

## üéì Interactive Demo: Test Your Own Queries

Try modifying the query below to see how the HR guardrail responds!

In [None]:
# üéØ TRY IT YOURSELF!
# Modify this query and run the cell

your_query = "How do I request a day off for a doctor's appointment?"

print("=" * 60)
print("YOUR QUERY ANALYSIS")
print("=" * 60)

is_on_topic, details = hr_guardrail.check(your_query)

print(f"\nüìù Query: \"{your_query}\"")
print(f"\nüìä Results:")
print(f"   Status: {'‚úÖ ON-TOPIC' if is_on_topic else '‚ùå OFF-TOPIC'}")
print(f"   Similarity Score: {details['max_similarity']:.3f}")
print(f"   Threshold: {details['threshold']}")
print(f"   Best Matching Anchor: \"{details['best_match']}\"")

# Show top 3 most similar anchors
print(f"\nüîç Top 3 Most Similar Anchors:")
sorted_sims = sorted(
    details['all_similarities'].items(),
    key=lambda x: x[1],
    reverse=True
)[:3]

for i, (anchor, sim) in enumerate(sorted_sims, 1):
    print(f"   {i}. \"{anchor}\" ‚Üí {sim:.3f}")

print("\n" + "=" * 60)

YOUR QUERY ANALYSIS

üìù Query: "How do I request a day off for a doctor's appointment?"

üìä Results:
   Status: ‚úÖ ON-TOPIC
   Similarity Score: 0.625
   Threshold: 0.35
   Best Matching Anchor: "How do I request time off?"

üîç Top 3 Most Similar Anchors:
   1. "How do I request time off?" ‚Üí 0.625
   2. "What are the office hours?" ‚Üí 0.373
   3. "How many sick days do I get?" ‚Üí 0.283



## üìö Summary: Off-Topic Detection Approaches

| Approach | Best For | Latency | Cost | Accuracy |
|----------|----------|---------|------|----------|
| **Embedding-Based** | High volume, cost-sensitive | ~10-50ms | Very Low | Good |
| **LLM-Based** | High accuracy needs, complex domains | ~100-500ms | Low | Excellent |
| **Hybrid** | Balance of cost and accuracy | ~10-500ms | Optimized | Excellent |

### Key Takeaways:

1. **Start with embeddings** ‚Äî Fast, cheap, and effective for most cases
2. **Add LLM for borderline cases** ‚Äî Gets the best of both worlds
3. **Consider fine-tuning** ‚Äî If embedding accuracy isn't sufficient
4. **Use gpt-4.1-nano** ‚Äî Fast and cheap for classification tasks
5. **Design clear domain descriptions** ‚Äî Better prompts = better LLM accuracy