# Context Caching with Gemini 3 Demo

In [1]:
# Imports
from pathlib import Path

from kanoa import AnalyticsInterpreter

## Step 1: Locate Knowledge Base

We'll use the example climate science knowledge base included in this repository.
It contains markdown files covering CO2 emissions, ocean temperatures, and methodology.

In [None]:
# Use the example climate science knowledge base
KB_PATH = (
    Path(__file__).parent / "knowledge_base_demo" / "climate_science_kb"
    if "__file__" in dir()
    else Path("knowledge_base_demo/climate_science_kb")
)

# List KB files
kb_files = list(KB_PATH.glob("*.md"))
print(f"Knowledge base path: {KB_PATH}")
print(f"Files: {[f.name for f in kb_files]}")

# Show total content size
total_chars = sum(f.read_text().count("") for f in kb_files)
print(f"Total content: ~{sum(len(f.read_text()) for f in kb_files):,} characters")

## Step 2: Initialize Interpreter with Caching

Create an `AnalyticsInterpreter` with context caching enabled.
The `cache_ttl` parameter controls how long the cache persists.

In [None]:
# Initialize interpreter with context caching
# cache_ttl=3600 means the cache is valid for 1 hour (3600 seconds)

interpreter = AnalyticsInterpreter(
    backend="gemini-3",
    kb_path=str(KB_PATH),
    cache_ttl=3600,  # 1 hour cache TTL
)

print(f"Backend: {interpreter.backend}")
print(f"Cache TTL: {interpreter._backend.cache_ttl}s")

## Step 3: First Query (Cache Miss)

The first query will upload the knowledge base and create a cache.
You'll see the full token cost for the KB content.

In [None]:
# First query - this creates the cache
result1 = interpreter.interpret(
    prompt="What is the current rate of CO2 increase per year?"
)

print("=" * 60)
print("FIRST QUERY RESULTS (Cache Creation)")
print("=" * 60)
print(f"\nResponse:\n{result1.text[:500]}...\n")

if result1.usage:
    print(f"Input tokens:  {result1.usage.input_tokens:,}")
    print(f"Output tokens: {result1.usage.output_tokens:,}")
    print(f"Cached tokens: {result1.usage.cached_tokens or 0:,}")
    print(f"Cache savings: ${result1.usage.cache_savings:.4f}")

## Step 4: Second Query (Cache Hit)

The second query reuses the cached knowledge base.
Notice the **cached tokens** are now non-zero, and **cache savings** shows the cost reduction.

In [None]:
# Second query - this reuses the cache
result2 = interpreter.interpret(
    prompt="What are the sea level projections for 2100 under SSP5-8.5?"
)

print("=" * 60)
print("SECOND QUERY RESULTS (Cache Hit)")
print("=" * 60)
print(f"\nResponse:\n{result2.text[:500]}...\n")

if result2.usage:
    print(f"Input tokens:  {result2.usage.input_tokens:,}")
    print(f"Output tokens: {result2.usage.output_tokens:,}")
    print(f"Cached tokens: {result2.usage.cached_tokens or 0:,}")
    print(f"Cache savings: ${result2.usage.cache_savings:.4f}")

## Understanding Cost Savings

Context caching provides significant cost savings for repeated queries:

| Token Type | Price per 1M tokens | Description |
| --- | --- | --- |
| Standard Input | $2.00 | Regular input tokens |
| Cached Input | $0.50 | Tokens from cache |
| Cache Storage | $0.20/hr | Per million cached tokens |

**Savings formula**: `(cached_tokens / 1M) * ($2.00 - $0.50) = savings`

For a 10,000 token knowledge base queried 10 times:
- Without caching: `10 × 10,000 × $2.00/1M = $0.20`
- With caching: `10,000 × $2.00/1M + 9 × 10,000 × $0.50/1M = $0.065`
- **67% savings!**

## Step 5: Clear Cache (Optional)

You can manually clear the cache if you've updated your knowledge base.

In [None]:
# Clear the cache to force a refresh
interpreter.clear_cache()
print("Cache cleared. Next query will create a new cache.")

## Summary

kanoa's context caching feature:

1. **Automatically** caches your knowledge base content
2. **Reuses** the cache for subsequent queries (same content hash)
3. **Saves ~75%** on input token costs for cached content
4. **Tracks** cached tokens and savings in the `UsageInfo` object

### When to Use Context Caching

- ✅ Interactive analysis sessions with multiple queries
- ✅ Batch processing against a stable knowledge base
- ✅ Knowledge bases > 2,048 tokens (minimum for caching benefit)

### When NOT to Use Context Caching

- ❌ Single-shot queries (cache creation overhead)
- ❌ Rapidly changing knowledge bases
- ❌ Very small knowledge bases (< 2,048 tokens)