# Context Caching with Gemini 3 Demo

## Step 1: Setup & Knowledge Base

We'll initialize the interpreter with **context caching enabled** and set up our knowledge base. While *caching with Gemini is enabled by default*, we'll explicitly demonstrate the API configuration here.

For our proprietary document proxy, we're using the **WMO State of the Global Climate 2025 Update** (presented at COP30). This report was chosen specifically because its publication date (November 4, 2025) falls *after* the knowledge cutoff for the Gemini 3 model family (January 2025), ensuring the model relies solely on the provided context.

For details on how `kanoa` handles file uploads to Google, please refer to the [Gemini Backend Documentation](../docs/source/backends/gemini.md).

In [None]:
from pathlib import Path

import kanoa
from kanoa import AnalyticsInterpreter

# 1. Configuration
# Set global verbosity (True = Info/Uploads, 2 = Debug/Payloads)
kanoa.options.verbose = True

# 2. Define Knowledge Base Resources
# We use the WMO State of the Climate 2025 Update PDF
KB_FILENAME = "State of the Climate 2025 Update COP30 (31 oct).pdf"
KB_DIR = Path("knowledge_base_demo")

# The URL contains spaces and parentheses, which kanoa handles automatically
KB_URL = f"https://wmo.int/sites/default/files/2025-11/{KB_FILENAME}"

# 3. Initialize Interpreter with Caching
# cache_ttl=3600 means the cache is valid for 1 hour
interpreter = AnalyticsInterpreter(
    backend="gemini-3",
    cache_ttl=3600,
)

# 4. Attach Knowledge Base & Add Resource
# We explicitly set the path to keep files in the example folder
interpreter = interpreter.with_kb(kb_path=KB_DIR, kb_type="pdf")

# This downloads the file if missing, or verifies it exists
# We don't need to pass filename; kanoa infers it from the URL
KB_PATH = interpreter.get_kb().add_resource(uri=KB_URL)

# 5. Trigger Load & Verify
# This will trigger the upload to Gemini (if not cached) and print status
interpreter.get_kb().get_context()

print("-" * 40)
print(f"Backend: {interpreter.backend_name}")
print(f"Cache TTL: {interpreter.backend.cache_ttl_seconds}s")
print(f"Knowledge Base: {KB_PATH}")
print(f"File Size: {KB_PATH.stat().st_size / (1024 * 1024):.2f} MB")

## Step 2: Verify Token Count

Before running our query, let's verify the token count of our PDF knowledge base.
This helps us understand the scale of the context we're caching.

In [None]:
# Use the interpreter's built-in cost checker to verify token count and cost
# This validates against warning/approval thresholds
result = interpreter.check_kb_cost()

if result:
    print("Knowledge Base Token Check:")
    print(f"Status: {result.level.upper()}")
    print(f"Token Count: {result.token_count:,}")
    print(f"Estimated Cost: ${result.estimated_cost:.4f}")
    print(f"Message: {result.message}")
else:
    print("No files uploaded or backend does not support file uploads.")

### Checking Cache Status

You can check the status of the context cache for your current knowledge base without running a query. This is useful for verifying if a cache exists and inspecting its properties (TTL, token count).

In [None]:
import json

# Check cache status
status = interpreter.get_cache_status()

print("Current cache status:")
print(json.dumps(status, indent=2, default=str))

## Step 3: First Query (Cache Miss)

The first query will upload the knowledge base and create a cache.
You'll see the full token cost for the KB content.

In [None]:
# First query - this creates the cache
result1 = interpreter.interpret(
    custom_prompt="Summarize the key findings regarding global temperature anomalies in 2025 from the WMO report."
)

print("=" * 60)
print("FIRST QUERY RESULTS (Cache Creation)")
print("=" * 60)
print(f"\nResponse:\n{result1.text[:500]}...\n")

if result1.usage:
    print(f"Input tokens:  {result1.usage.input_tokens:,}")
    print(f"Output tokens: {result1.usage.output_tokens:,}")
    print(f"Cached tokens: {result1.usage.cached_tokens or 0:,}")
    print("Cache savings: $0.0000 (Cache Creation)")

## Step 4: Second Query (Cache Hit)

The second query reuses the cached knowledge base.
Notice the **cached tokens** are now non-zero, and **cache savings** shows the cost reduction.

In [None]:
# Second query - this reuses the cache
result2 = interpreter.interpret(
    custom_prompt="What specific outcomes or decisions from COP30 in Belem are mentioned in relation to climate finance?"
)

print("=" * 60)
print("SECOND QUERY RESULTS (Cache Hit)")
print("=" * 60)
print(f"\nResponse:\n{result2.text[:500]}...\n")

if result2.usage:
    print(f"Input tokens:  {result2.usage.input_tokens:,}")
    print(f"Output tokens: {result2.usage.output_tokens:,}")
    print(f"Cached tokens: {result2.usage.cached_tokens or 0:,}")
    print(f"Cache savings: ${result2.usage.cache_savings or 0.0:.4f}")

## Understanding Cost Savings

Context caching provides significant cost savings for repeated queries:

| Token Type | Price per 1M tokens | Description |
| --- | --- | --- |
| Standard Input | $2.00 | Regular input tokens |
| Cached Input | $0.50 | Tokens from cache |
| Cache Storage | $0.20/hr | Per million cached tokens |

**Savings formula**: `(cached_tokens / 1M) * ($2.00 - $0.50) = savings`

For a 10,000 token knowledge base queried 10 times:
- Without caching: `10 × 10,000 × $2.00/1M = $0.20`
- With caching: `10,000 × $2.00/1M + 9 × 10,000 × $0.50/1M = $0.065`
- **67% savings!**

## Step 5: Clear Cache (Optional)

You can manually clear the cache if you've updated your knowledge base.

In [None]:
# Clear the cache to force a refresh
interpreter.clear_cache()
print("Cache cleared. Next query will create a new cache.")

## Summary

kanoa's context caching feature:

1. **Automatically** caches your knowledge base content
2. **Reuses** the cache for subsequent queries (same content hash)
3. **Saves ~75%** on input token costs for cached content
4. **Tracks** cached tokens and savings in the `UsageInfo` object

### When to Use Context Caching

- ✅ Interactive analysis sessions with multiple queries
- ✅ Batch processing against a stable knowledge base
- ✅ Knowledge bases > 2,048 tokens (minimum for caching benefit)

### When NOT to Use Context Caching

- ❌ Single-shot queries (cache creation overhead)
- ❌ Rapidly changing knowledge bases
- ❌ Very small knowledge bases (< 2,048 tokens)