**If you use our code, please cite:**

@misc{2024<br>
  title = {Semantic Cache from Scratch},<br>
  author = {Hamza Farooq, Darshil Modi, Kanwal Mehreen, Nazila Shafiei},<br>
  keywords = {Semantic Cache},<br>
  year = {2024},<br>
  copyright = {APACHE 2.0 license}<br>
}

## Semantic Cache

Semantic caching accelerates retrieval-augmented workflows by storing and reusing previous embedding-based lookups instead of issuing fresh queries every time. In this notebook, we'll build a lightweight semantic cache from scratch using:

- **Nomic text embeddings** (`nomic-ai/nomic-embed-text-v1.5`) to convert documents and queries into dense vectors  
- **FAISS** (Facebook AI Similarity Search) to index and quickly search those vectors  
- **Traversaal Pro API** to perform RAG over the AWS documentation corpus when a cache miss occurs  

Rather than re-computing embeddings and retrieval for every query, our cache lets us:

1. **Embed** each new query and check if it's already "covered" by a cached result  
2. **Fall back** to a full RAG retrieval (and store the new result) only when necessary  
3. **Skip the cache entirely** for time-sensitive questions that need a fresh answer  
4. **Invoke** the Traversaal Pro API for document-grounded answers on cache misses  

This approach reduces redundant compute, lowers end-to-end latency, and makes RAG pipelines more efficient‚Äîespecially when query patterns exhibit repetition or high similarity. We'll walk through:

1. Loading the Nomic embed model with `trust_remote_code=True`  
2. Building a FAISS index for fast L2 nearest-neighbor lookup  
3. Implementing the core cache hit/miss logic with a time-sensitivity filter  
4. Falling back to Traversaal Pro RAG API for live document retrieval on cache misses  
5. Measuring performance gains against a "no-cache" baseline  

By the end, you'll have a reusable semantic cache scaffold that you can plug into any RAG or search-over-embeddings pipeline. Let's get started!

## Setup and Dependencies

In [None]:
# Install the necessary libraries
!pip install -U faiss-cpu sentence_transformers transformers python-dotenv einops

In [None]:
# Import the necessary libraries

# FAISS for efficient similarity search over vector embeddings
import faiss  # Builds and queries approximate nearest neighbor indices

# Lightweight SQL database for caching metadata, query logs, or evaluation results
import sqlite3  # Persistence layer for storing cache entries or metrics

# SentenceTransformers wrapper around transformer models for text embeddings
from sentence_transformers import SentenceTransformer  # Loads Nomic/embed or other SBERT-style models

# PyTorch backend required by SentenceTransformer and optional model fine-tuning
import torch  # Tensor operations, GPU acceleration, and model inference support

# Transformers library components for causal LLM-based answer generation
from transformers import AutoModelForCausalLM, AutoTokenizer
#   - AutoModelForCausalLM: Load pretrained language models (e.g., GPT variants)
#   - AutoTokenizer: Tokenize text input/output for the LLM

# Core numerical library for array and matrix operations on embeddings
import numpy as np  # Handles vector math, concatenation, and statistical computations

# Pretty-printing complex Python objects during development/debugging
from pprint import pprint  # Nicely formats nested dicts or lists when exploring outputs

# Define the Retrieval Functions

This notebook uses **two different APIs** depending on whether a question is stable or time-sensitive:

| Question type | Backend | Cached? |
|---|---|---|
| Stable / document-grounded | **Traversaal Pro** (RAG over AWS guidebook) | ‚úÖ Yes |
| Time-sensitive / live data | **SerpApi** (Google search results) | ‚ùå Never |

---

## Traversaal Pro ‚Äî RAG as a Service

[Traversaal Pro](https://pro.traversaal.ai) is a hosted RAG platform. You upload documents into a project; the API handles chunking, embedding, retrieval, and generation. In this notebook the corpus is the **AWS Guidebook**.

**API details:**

| Property | Value |
|---|---|
| Endpoint | `POST https://pro-documents.traversaal-api.com/documents/search` |
| Auth | `Authorization: Bearer <your_token>` |
| Request | `{"query": "...", "generation": true}` |
| Response | `{"response": "...", "references": [{score, chunk_text, ...}]}` |

Sign up at [pro.traversaal.ai](https://pro.traversaal.ai) to get your Bearer token.

---

## SerpApi ‚Äî Live Internet Search

[SerpApi](https://serpapi.com) provides structured Google search results via a REST API. We use it for time-sensitive questions that require up-to-date information from the web (current events, live pricing, outages, etc.) ‚Äî answers that must never be served from cache.

**API details:**

| Property | Value |
|---|---|
| Endpoint | `GET https://serpapi.com/search.json` |
| Auth | `?api_key=<your_key>` query param |
| Key params | `q=<query>`, `engine=google`, `num=5` |
| Response | `organic_results[].snippet`, `answer_box` |

Sign up at [serpapi.com](https://serpapi.com) for a free API key (100 searches/month on the free tier).

In [None]:
import os
import requests  # HTTP client for REST API calls 

# ‚îÄ‚îÄ Credential loading ‚Äî works on Colab and locally ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# On Colab:  store keys in the Secrets panel (üîë left sidebar)
#              TRAVERSAAL_PRO_API_KEY
#              SERP_API_KEY
# Locally:   keys are read from Module_3_Agentic_RAG/.env
#              traversaal_pro_api_key=<token>
#              serp_api_key=<key>

try:
    from google.colab import userdata
    traversaal_pro_api_key = userdata.get("TRAVERSAAL_PRO_API_KEY")
    serp_api_key = userdata.get("SERP_API_KEY")
    print("Running on Colab ‚Äî credentials loaded from Secrets.")
except ImportError:
    from dotenv import load_dotenv, find_dotenv
    load_dotenv(find_dotenv())   # walks up the directory tree to find .env
    # .env uses lowercase key names; fall back to uppercase too
    traversaal_pro_api_key = os.getenv("traversaal_pro_api_key") or os.getenv("TRAVERSAAL_PRO_API_KEY")
    serp_api_key = os.getenv("serp_api_key") or os.getenv("SERP_API_KEY")
    print("Running locally ‚Äî credentials loaded from .env file.")

print(f"Traversaal Pro key loaded: {'‚úÖ' if traversaal_pro_api_key else '‚ùå MISSING'}")
print(f"SerpApi key loaded:        {'‚úÖ' if serp_api_key else '‚ùå MISSING'}")


# ‚îÄ‚îÄ Traversaal Pro: RAG over AWS Guidebook ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def make_prediction(query: str) -> dict:
    """
    Query the Traversaal Pro RAG API with a natural language question.

    The API performs retrieval over the configured document corpus (AWS Guidebook)
    and returns a generated answer together with source chunk references.

    Request:
        POST https://pro-documents.traversaal-api.com/documents/search
        {"query": "...", "generation": true}

    Response:
        {
          "response": "<generated answer string>",
          "references": [
            {
              "score": 0.81,
              "file_id": "...",
              "chunk_index": 1,
              "chunk_text": "...",
              "original_file_name": "aws-guide.pdf"
            },
            ...
          ]
        }

    Args:
        query (str): Natural language question answerable from the AWS Guidebook.

    Returns:
        dict: Full API response with 'response' and 'references' keys.
    """
    if not traversaal_pro_api_key:
        raise RuntimeError("Missing TRAVERSAAL_PRO_API_KEY ‚Äî add it to Colab Secrets or .env")

    url = "https://pro-documents.traversaal-api.com/documents/search"
    headers = {
        "Authorization": f"Bearer {traversaal_pro_api_key}",
        "Content-Type": "application/json",
    }
    payload = {"query": query, "generation": True}

    try:
        response = requests.post(url, json=payload, headers=headers)
        if response.status_code == 200:
            print("Traversaal Pro: request successful.")
            try:
                return response.json()
            except ValueError:
                print("Response was not valid JSON.")
                return None
        else:
            print(f"Traversaal Pro: request failed ({response.status_code}): {response.text}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Traversaal Pro: request error: {e}")
        return None


# ‚îÄ‚îÄ SerpApi: Live Google Search ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def search_live(query: str) -> str:
    """
    Search Google in real time using SerpApi and return a formatted answer.

    Used exclusively for time-sensitive questions (current events, live pricing,
    outages, etc.) where a cached answer would quickly become stale.
    Results are intentionally NOT stored in the semantic cache.

    Args:
        query (str): The time-sensitive question to search for.

    Returns:
        str: A formatted string combining the answer box (if present) and
             top organic result snippets.
    """
    if not serp_api_key:
        raise RuntimeError("Missing SERP_API_KEY ‚Äî add it to Colab Secrets or .env")

    print("SerpApi: fetching live search results üåê ...")
    params = {
        "q": query,
        "api_key": serp_api_key,
        "engine": "google",
        "num": 5,
    }

    try:
        response = requests.get("https://serpapi.com/search.json", params=params)
        response.raise_for_status()
        data = response.json()

        parts = []

        # Answer box ‚Äî Google's highlighted direct answer (most relevant)
        answer_box = data.get("answer_box", {})
        if answer_box.get("answer"):
            parts.append(f"[Direct Answer] {answer_box['answer']}")
        elif answer_box.get("snippet"):
            parts.append(f"[Direct Answer] {answer_box['snippet']}")

        # Top organic results ‚Äî titles + snippets
        for i, result in enumerate(data.get("organic_results", [])[:5], start=1):
            title = result.get("title", "")
            snippet = result.get("snippet", "")
            link = result.get("link", "")
            if snippet:
                parts.append(f"[{i}] {title}\n    {snippet}\n    Source: {link}")

        if not parts:
            return "No results found."

        return "\n\n".join(parts)

    except requests.exceptions.RequestException as e:
        return f"SerpApi request error: {e}"
    except Exception as e:
        return f"Unexpected error: {e}"

In [None]:
# Test Traversaal Pro ‚Äî stable AWS question (answer comes from the AWS Guidebook)
result = make_prediction("What is an S3 bucket in AWS?")
print("Generated answer:")
print(result["response"])
print("\nTop source reference:")
if result.get("references"):
    top_ref = result["references"][0]
    print(f"  Score: {top_ref['score']:.3f}")
    print(f"  File:  {top_ref['original_file_name']}")
    print(f"  Chunk: {top_ref['chunk_text'][:200]}...")

In [None]:
# Test SerpApi ‚Äî time-sensitive question (live internet search, NOT from documents)
live_answer = search_live("Are there any AWS outages right now?")
print(live_answer)

### Define SemanticCaching Class

In this cell we define `SemanticCaching`‚Äîa lightweight cache with dual-backend routing:

1. **Time-sensitive guard** ‚Äî detects temporal keywords and routes to **SerpApi** (live Google search), bypassing the cache entirely.  
2. **FAISS lookup** ‚Äî for stable questions, checks if a semantically similar question was already answered. If yes, returns the cached answer instantly.  
3. **Traversaal Pro fallback** ‚Äî on a cache miss, queries the **AWS Guidebook RAG** to get a document-grounded answer, then stores it for future hits.  
4. **JSON persistence** ‚Äî cache entries (questions, embeddings, answers) are saved to disk so the index survives notebook restarts.  
5. **Latency logging** ‚Äî every call reports whether it was a hit, miss, or live search, and how long it took.

---

### What Should (and Should NOT) Be Semantically Cached?

The cache is backed by the **AWS Guidebook** via Traversaal Pro. Since documentation is stable, most AWS concept questions are excellent cache candidates. The exceptions are anything that requires live, up-to-the-minute data.

#### ‚úÖ Good to cache ‚Äî stable AWS documentation answers:
| Question | Why it's safe to cache |
|---|---|
| *"What is an S3 bucket in AWS?"* | Core concept, always the same |
| *"How does AWS Lambda work?"* | Stable service behaviour |
| *"What is AWS IAM?"* | Conceptual definition from docs |
| *"What is the difference between EC2 and ECS?"* | Architectural comparison |
| *"How does Amazon CloudFront work?"* | Service explanation |
| *"What is an AWS VPC?"* | Networking concept |

#### ‚ùå Do NOT cache ‚Äî time-sensitive, answers change even for AWS:
| Question | Why it must NOT be cached | Backend |
|---|---|---|
| *"Are there any AWS outages right now?"* | Status changes minute to minute | SerpApi |
| *"What are the latest AWS features this week?"* | New releases announced daily | SerpApi |
| *"What is the current EC2 pricing today?"* | AWS updates pricing periodically | SerpApi |
| *"Is AWS S3 down right now?"* | Real-time health check | SerpApi |
| *"What new services did AWS announce this month?"* | New info every month | SerpApi |

The `is_time_sensitive()` method catches these using a keyword list and routes them to SerpApi ‚Äî they never touch the FAISS index.

In [None]:
import faiss            # Efficient similarity search over vector embeddings
import json             # Read/write cache from a JSON file
import numpy as np      # Numerical operations on embeddings
from sentence_transformers import SentenceTransformer  # Load Nomic embed model
from transformers import AutoTokenizer, AutoModelForCausalLM  # (Optional) LLM for answer gen
import time             # Measure latency

class SemanticCaching:
    """
    A semantic cache that routes queries to the right backend:

    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
    ‚îÇ  Query                                                   ‚îÇ
    ‚îÇ    ‚îÇ                                                     ‚îÇ
    ‚îÇ    ‚îú‚îÄ Time-sensitive? ‚îÄ‚îÄYES‚îÄ‚îÄ‚ñ∂ SerpApi (live search)     ‚îÇ
    ‚îÇ    ‚îÇ                           NOT cached                ‚îÇ
    ‚îÇ    ‚îÇ                                                     ‚îÇ
    ‚îÇ    ‚îî‚îÄ Stable? ‚îÄ‚îÄ‚ñ∂ FAISS lookup                          ‚îÇ
    ‚îÇ                     ‚îÇ                                    ‚îÇ
    ‚îÇ                     ‚îú‚îÄ HIT  ‚îÄ‚îÄ‚ñ∂ return cached answer ‚ö°  ‚îÇ
    ‚îÇ                     ‚îÇ                                    ‚îÇ
    ‚îÇ                     ‚îî‚îÄ MISS ‚îÄ‚îÄ‚ñ∂ Traversaal Pro (RAG)     ‚îÇ
    ‚îÇ                                 store ‚Üí return           ‚îÇ
    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
    """

    # Keywords that signal the question is time-sensitive and must NOT be cached.
    # Answers to these questions change over time ‚Äî caching would return stale results.
    TIME_SENSITIVE_KEYWORDS = [
        "today", "tonight", "now", "currently", "current",
        "latest", "recent", "recently", "right now", "at the moment",
        "at present", "as of now", "this week", "this month", "this year",
        "this quarter", "this season", "this morning", "this afternoon",
        "this evening", "this weekend", "yesterday", "tomorrow",
        "last week", "last month", "last year", "upcoming", "live",
        "breaking", "just happened", "what time", "what day", "what date",
        "happening now", "events today", "news today", "news this week",
        "stock price", "share price", "weather", "forecast", "temperature",
        "real-time", "realtime", "schedule today", "outage", "down right now",
        "is aws down", "aws status",
    ]

    def __init__(self, json_file='cache.json', clear_on_init=False):
        # Initialize Faiss index with Euclidean distance
        self.index = faiss.IndexFlatL2(768)
        if self.index.is_trained:
            print('Index trained')

        # Initialize Sentence Transformer model
        self.encoder = SentenceTransformer('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True)

        # Euclidean distance threshold for cache hits (lower = stricter)
        self.euclidean_threshold = 0.2

        # JSON file to persist cache entries
        self.json_file = json_file

        # Load cache or clear already loaded cache
        if clear_on_init:
            self.clear_cache()
        else:
            self.load_cache()

    # ------------------------------------------------------------------
    # Time-sensitivity detection
    # ------------------------------------------------------------------

    def is_time_sensitive(self, question: str) -> bool:
        """
        Returns True if the question is time-sensitive and should NOT be cached.

        Time-sensitive questions reference current events, live data, or time-bound
        information whose answers change frequently. These are routed to SerpApi
        for a real-time Google search answer instead of the document RAG system.

        Examples that return True (‚Üí SerpApi, never cached):
            'Are there any AWS outages right now?'
            'What are the latest AWS features released this week?'
            'What is the current EC2 pricing today?'
            'Is AWS S3 down right now?'

        Examples that return False (‚Üí check cache, then Traversaal Pro if miss):
            'What is an S3 bucket in AWS?'
            'How does AWS Lambda work?'
            'What is AWS IAM?'
            'What is the difference between EC2 and ECS?'
        """
        question_lower = question.lower()
        return any(keyword in question_lower for keyword in self.TIME_SENSITIVE_KEYWORDS)

    # ------------------------------------------------------------------
    # Cache persistence
    # ------------------------------------------------------------------

    def clear_cache(self):
        """Clears in-memory cache, resets FAISS index, and overwrites the JSON file."""
        self.cache = {
            'questions': [],
            'embeddings': [],
            'answers': [],
            'response_text': []
        }
        self.index = faiss.IndexFlatL2(768)
        self.save_cache()
        print("Semantic cache cleared.")

    def load_cache(self):
        """Load existing cache or initialize empty structure."""
        try:
            with open(self.json_file, 'r') as file:
                self.cache = json.load(file)
        except FileNotFoundError:
            self.cache = {'questions': [], 'embeddings': [], 'answers': [], 'response_text': []}

    def save_cache(self):
        """Persist cache back to disk."""
        with open(self.json_file, 'w') as file:
            json.dump(self.cache, file)

    # ------------------------------------------------------------------
    # Main query method
    # ------------------------------------------------------------------

    def ask(self, question: str) -> str:
        """
        Route the question to the correct backend and return an answer.

        Routing logic:
          1. Time-sensitive  ‚Üí SerpApi (live Google search) ‚Äî answer NOT cached
          2. Cache HIT       ‚Üí return stored answer instantly
          3. Cache MISS      ‚Üí Traversaal Pro (RAG over AWS docs) ‚Äî answer stored
        """
        start_time = time.time()

        # ‚îÄ‚îÄ 1. Time-sensitivity guard ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        # Live search via SerpApi ‚Äî result intentionally not stored
        if self.is_time_sensitive(question):
            print("‚è∞ Time-sensitive question ‚Äî routing to SerpApi (live search, not cached).")
            response_text = search_live(question)
            print(f"Time taken: {time.time() - start_time:.3f}s")
            return response_text

        try:
            # ‚îÄ‚îÄ 2. Cache lookup ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
            embedding = self.encoder.encode([question], normalize_embeddings=True)
            D, I = self.index.search(embedding, 1)

            if D[0] >= 0:
                if I[0][0] != -1 and D[0][0] <= self.euclidean_threshold:
                    row_id = int(I[0][0])
                    print(f'‚úÖ Cache hit at row: {row_id} | similarity: {1 - D[0][0]:.4f}')
                    print(f"Time taken: {time.time() - start_time:.3f}s")
                    return self.cache['response_text'][row_id]

            # ‚îÄ‚îÄ 3. Cache miss ‚Üí Traversaal Pro RAG ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
            answer, response_text = self.generate_answer(question)

            self.cache['questions'].append(question)
            self.cache['embeddings'].append(embedding[0].tolist())
            self.cache['answers'].append(answer)
            self.cache['response_text'].append(response_text)
            self.index.add(embedding)
            self.save_cache()
            print(f"Time taken: {time.time() - start_time:.3f}s")

            return response_text

        except Exception as e:
            raise RuntimeError(f"Error during 'ask' method: {e}")

    def generate_answer(self, question: str):
        """
        Call Traversaal Pro to answer a stable document-grounded question.

        Uses the AWS Guidebook corpus loaded into your Traversaal Pro project.
        Extracts the 'response' field from the API reply as the answer text.

        Returns:
            tuple: (full API result dict, answer string)
        """
        try:
            result = make_prediction(question)
            # Traversaal Pro returns {"response": "...", "references": [...]}
            response_text = result.get('response', str(result))
            return result, response_text
        except Exception as e:
            raise RuntimeError(f"Error during 'generate_answer' method: {e}")

In [None]:
# Instantiate the semantic cache: builds/loads FAISS index, encoder, and JSON cache
cache = SemanticCaching()

# Uncomment and use to re-instantiate the semantic cache and clear exisitng cache entries
# cache = SemanticCaching(clear_on_init=True)

### Testing the Semantic Cache

We validate the `SemanticCaching` class using AWS Guidebook questions. These are stable, document-grounded questions ‚Äî ideal for caching because the answers don't change over time.

Watch the routing in action:
- **First ask** of a question ‚Üí cache miss ‚Üí Traversaal Pro RAG ‚Üí answer stored  
- **Rephrased version** of the same question ‚Üí cache hit ‚Üí instant return  
- **Time-sensitive question** ‚Üí SerpApi live search ‚Üí never stored

In [None]:
# Q1: Cache miss ‚Äî Traversaal Pro answers from AWS docs, stores result
question1 = "What is an S3 bucket in AWS?"
answer1 = cache.ask(question1)
print(answer1)

# Q2: Cache miss ‚Äî different AWS service
question2 = "How does AWS Lambda work?"
answer2 = cache.ask(question2)
print(answer2)

# Q3: Cache miss ‚Äî another AWS service
question3 = "What is AWS IAM used for?"
answer3 = cache.ask(question3)
print(answer3)

# Q4: Cache miss ‚Äî networking concept
question4 = "What is an Amazon VPC?"
answer4 = cache.ask(question4)
print(answer4)

# Note:
# All four are distinct enough to each get a separate Traversaal Pro call.
# Next, we'll test cache hits with rephrased versions of these questions.

In [None]:
# Cache HIT ‚Äî rephrased version of Q1 ("What is an S3 bucket?")
# The FAISS index finds the stored embedding is similar enough ‚Üí returns instantly
print(cache.ask("Can you explain what Amazon S3 buckets are?"))

In [None]:
# Cache HIT ‚Äî exact same question as Q3
print(cache.ask("What is AWS IAM used for?"))

In [None]:
# Cache MISS ‚Äî new AWS concept not yet in cache ‚Üí Traversaal Pro call
print(cache.ask("What is Amazon CloudFront and how does it work?"))

In [None]:
# Cache HIT ‚Äî semantically similar to the CloudFront question above
print(cache.ask("How does AWS CloudFront serve content to users?"))

In [None]:
# Cache MISS ‚Äî new question about DynamoDB ‚Üí Traversaal Pro call + stored
print(cache.ask("What is Amazon DynamoDB and when should I use it?"))

In [None]:
# Cache HIT ‚Äî rephrased DynamoDB question ‚Üí returned from cache instantly
print(cache.ask("When would I choose DynamoDB over other databases on AWS?"))

In [None]:
# Cache MISS ‚Äî different enough to not match DynamoDB ‚Üí Traversaal Pro call
print(cache.ask("How does Amazon RDS differ from DynamoDB?"))

### Testing the Time-Sensitivity Filter + Dual-Backend Routing

Here we demonstrate the full routing logic:

| Question type | Detected by | Backend | Cached? |
|---|---|---|---|
| Contains temporal keyword | `is_time_sensitive()` ‚Üí `True` | **SerpApi** (live Google search) | ‚ùå Never |
| Stable AWS concept | `is_time_sensitive()` ‚Üí `False` + cache miss | **Traversaal Pro** (AWS docs RAG) | ‚úÖ Stored |
| Previously seen question | `is_time_sensitive()` ‚Üí `False` + cache hit | **FAISS cache** | ‚úÖ Already stored |

**AWS-specific time-sensitive examples** ‚Äî even though they're about AWS, these need live answers:
- *"Are there any AWS outages right now?"* ‚Üí changes minute to minute  
- *"What are the latest AWS features released this week?"* ‚Üí new announcements daily  
- *"What is the current EC2 pricing today?"* ‚Üí pricing can be updated by AWS anytime  

**AWS stable examples** ‚Äî these come from documentation and don't change:
- *"What is an S3 bucket?"* ‚Äî always the same concept
- *"How does Lambda work?"* ‚Äî core service behaviour is stable

In [None]:
# Classification check ‚Äî see which questions are flagged before running any queries
time_sensitive_aws = [
    "Are there any AWS outages right now?",
    "What are the latest AWS features released this week?",
    "What is the current EC2 pricing today?",
    "Is AWS S3 down right now?",
    "What new services did AWS announce this month?",
    "What is the current AWS free tier limit as of now?",
]

stable_aws = [
    "What is an S3 bucket in AWS?",
    "How does AWS Lambda work?",
    "What is AWS IAM?",
    "What is the difference between EC2 and ECS?",
    "How does Amazon CloudFront work?",
    "What is an AWS VPC?",
]

print("=== Time-Sensitive AWS Questions (‚Üí SerpApi, never cached) ===")
for q in time_sensitive_aws:
    flag = cache.is_time_sensitive(q)
    label = "‚è∞ SerpApi (live)" if flag else "‚úÖ Traversaal Pro (cached)"
    print(f"  [{label}] {q}")

print("\n=== Stable AWS Questions (‚Üí Traversaal Pro on miss, cached) ===")
for q in stable_aws:
    flag = cache.is_time_sensitive(q)
    label = "‚è∞ SerpApi (live)" if flag else "‚úÖ Traversaal Pro (cached)"
    print(f"  [{label}] {q}")

In [None]:
# Time-sensitive AWS questions ‚Üí routed to SerpApi, NEVER stored in cache
# Ask the same question twice ‚Äî both calls go live, nothing accumulates in FAISS

print("--- Query A (time-sensitive: outage check) ---")
print(cache.ask("Are there any AWS outages right now?"))

print("\n--- Query A again (still time-sensitive ‚Üí SerpApi again, not cached) ---")
print(cache.ask("Are there any AWS outages right now?"))

print("\n--- Query B (time-sensitive: pricing) ---")
print(cache.ask("What is the current EC2 pricing today?"))

# Verify the cache count has not grown due to these time-sensitive calls
print(f"\nCache entries (should be same as before): {len(cache.cache['questions'])}")
print("Cached questions:", cache.cache['questions'])

In [None]:
# Stable AWS question ‚Üí cache miss on first call (Traversaal Pro), cache hit on second
print("--- Query C (stable AWS, first call ‚Üí Traversaal Pro) ---")
print(cache.ask("How do you configure S3 bucket policies?"))

print("\n--- Query D (semantically similar to C ‚Üí cache hit) ---")
print(cache.ask("What is the way to set up an S3 bucket access policy?"))

print(f"\nTotal cached entries now: {len(cache.cache['questions'])}")
print("Cached questions:", cache.cache['questions'])