# TOON Encoding Test for Brave Search Results

This notebook tests different approaches to encode Brave Search results to TOON format.
Goal: Find the correct way to encode search results so TOON uses tabular format (eliminating repeated field names).


In [1]:
# Install required packages if needed
%pip install git+https://github.com/toon-format/toon-python.git


Collecting git+https://github.com/toon-format/toon-python.git
  Cloning https://github.com/toon-format/toon-python.git to /tmp/pip-req-build-uybjmlck
  Running command git clone --filter=blob:none --quiet https://github.com/toon-format/toon-python.git /tmp/pip-req-build-uybjmlck
  Resolved https://github.com/toon-format/toon-python.git to commit 9c4f0c0c24f2a0b0b376315f4b8707f8c9006de6
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: toon_format
  Building wheel for toon_format (pyproject.toml) ... [?25ldone
[?25h  Created wheel for toon_format: filename=toon_format-0.9.0b1-py3-none-any.whl size=36285 sha256=298ea641dec9bbf68d69226a6cd31b4cb504fe79b43bc82a1675c5be86b814bb
  Stored in directory: /tmp/pip-ephem-wheel-cache-_69tffru/wheels/71/5b/77/bc062fea7c190909f65ce491c147046ee0face81c520fba311
Successfully built toon_forma

In [2]:
import json
from toon_format import encode, decode
from typing import Dict, Any, List


## Step 1: Fetch Real Brave Search API Results

This cell fetches real search results from the Brave Search API.
If no API key is available, it falls back to sample data.

**Important for TOON tabular format:** All results must have the same keys (uniform structure) for TOON to use tabular arrays like `results[10]{field1,field2,field3}:` instead of repeating field names.


In [3]:
# ------------------------------------------------------
# Step 1: Fetch Real Brave Search API Results (or sample)
# ------------------------------------------------------
# This cell tries to fetch Brave Search API results using an API key,
# or falls back to built-in sample data if no key/environment variable is found.
#
# - For security, the API key is read from BRAVE_API_KEY (preferred) or set manually below.
# - You can also paste in your own Brave API JSON below.

import os
import requests

# Use your Brave API key here, or set BRAVE_API_KEY environment variable
BRAVE_API_KEY = os.getenv("BRAVE_API_KEY", "")  # Or paste key: "sk-..." here

# Set your search query here (can be any string)
search_query = "best laptops for programming 2024"

def fetch_brave_results(query: str, api_key: str, count: int = 10) -> List[Dict[str, Any]]:
    """
    Fetch search results from Brave Search API.
    If api_key is empty/invalid, return None.
    """
    url = "https://api.search.brave.com/res/v1/web/search"
    headers = {"Accept": "application/json", "X-Subscription-Token": api_key}
    params = {
        "q": query,
        "count": count,
        "safesearch": "off"
    }
    try:
        resp = requests.get(url, headers=headers, params=params, timeout=10)
        if resp.status_code == 200:
            json_data = resp.json()
            # Brave API: web results nested under "web" -> "results"
            web_results = (
                json_data.get("web", {}).get("results", []) 
                if json_data.get("web") 
                else []
            )
            print(f"✅ Successfully fetched {len(web_results)} Brave results")
            return web_results
        else:
            print(f"⚠️  Brave API error: HTTP {resp.status_code}: {resp.text[:240]}")
            return None
    except Exception as ex:
        print(f"⚠️  Exception occurred accessing Brave API: {ex}")
        return None

# Try live fetch, else fallback to sample data
sample_results: List[Dict[str, Any]] = []

if BRAVE_API_KEY:
    brave_results = fetch_brave_results(search_query, BRAVE_API_KEY, count=10)
    if brave_results is not None and isinstance(brave_results, list) and len(brave_results) > 0:
        sample_results = brave_results
else:
    print(
        "⚠️  No BRAVE_API_KEY found! Set an environment variable, or paste your API key above."
        "\nFalling back to built-in example data."
    )

# Example fallback sample data (copy-pasted from real Brave result structure)
if not sample_results:
    sample_results = [
        {
            "title": "The Best Laptops for Programming in 2024",
            "url": "https://example.com/best-laptops-programming",
            "description": "Our top picks for developers, with performance, portability, and battery life in mind.",
            "is_source_local": False,
            "extra_snippets": ["Works for Python, Java, C++, and more."],
            "properties": {"site_name": "LaptopMag", "favicon": "https://favicon.example.com/laptopmag.png"}
        },
        {
            "title": "Top 10 Laptops Every Developer Recommends",
            "url": "https://techsite.com/best-dev-laptops",
            "description": "Hand-picked laptops for coders and engineers: specs, comparison, value.",
            "is_source_local": False,
            "extra_snippets": [],
            "properties": {"site_name": "TechSite", "favicon": "https://favicon.example.com/techsite.png"}
        },
        {
            "title": "Ultimate Guide: Buying a Laptop for Coding",
            "url": "https://devguide.com/laptop-buying",
            "description": "A detailed guide on what to look for when buying a programming laptop.",
            "is_source_local": False,
            "extra_snippets": ["Find out what specs matter for devs."],
            "properties": {"site_name": "DevGuide", "favicon": None}
        },
    ]
    print(f"ℹ️  Using {len(sample_results)} sample search results.")

# Preview first result (for sanity check)
if sample_results:
    print("First search result:\n", json.dumps(sample_results[0], indent=2))
else:
    print("❌ No search results available!")



⚠️  No BRAVE_API_KEY found! Set an environment variable, or paste your API key above.
Falling back to built-in example data.
ℹ️  Using 3 sample search results.
First search result:
 {
  "title": "The Best Laptops for Programming in 2024",
  "url": "https://example.com/best-laptops-programming",
  "description": "Our top picks for developers, with performance, portability, and battery life in mind.",
  "is_source_local": false,
  "extra_snippets": [
    "Works for Python, Java, C++, and more."
  ],
  "properties": {
    "site_name": "LaptopMag",
    "favicon": "https://favicon.example.com/laptopmag.png"
  }
}


## Step 1.5: Verify Uniform Structure

Check if all results have the same keys (required for TOON tabular format).


In [5]:
# Check if all results have uniform structure (same keys)
def check_uniform_structure(results: List[Dict]) -> tuple[bool, set]:
    """Check if all results have the same keys."""
    if not results:
        return True, set()
    
    first_keys = set(results[0].keys())
    for i, result in enumerate(results[1:], 1):
        result_keys = set(result.keys())
        if result_keys != first_keys:
            print(f"❌ Result {i} has different keys!")
            print(f"   Expected: {sorted(first_keys)}")
            print(f"   Got:      {sorted(result_keys)}")
            print(f"   Missing:  {first_keys - result_keys}")
            print(f"   Extra:    {result_keys - first_keys}")
            return False, first_keys
    
    return True, first_keys

is_uniform, keys = check_uniform_structure(sample_results)

if is_uniform:
    print(f"✅ All {len(sample_results)} results have uniform structure")
    print(f"   Keys: {sorted(keys)}")
    print(f"   This structure will work with TOON tabular format!")
else:
    print(f"⚠️  Results have inconsistent structure")
    print(f"   TOON will use mixed array format (less efficient)")


✅ All 3 results have uniform structure
   Keys: ['description', 'extra_snippets', 'is_source_local', 'properties', 'title', 'url']
   This structure will work with TOON tabular format!


## Step 2: Test Current Approach (Direct Encoding)

This is what we're currently doing - encoding the nested structure directly.


In [6]:
# Current approach: encode nested structure directly
current_approach = {"results": sample_results, "count": len(sample_results)}
current_toon = encode(current_approach)

print("=== CURRENT APPROACH (Nested Objects) ===")
print(f"Length: {len(current_toon)} characters")
print(f"\nTOON output (first 1000 chars):")
print(current_toon[:1000])
print("\n...")
print(f"\nFull length: {len(current_toon)} chars")


=== CURRENT APPROACH (Nested Objects) ===
Length: 1109 characters

TOON output (first 1000 chars):
results[3]:
  - title: The Best Laptops for Programming in 2024
    url: "https://example.com/best-laptops-programming"
    description: "Our top picks for developers, with performance, portability, and battery life in mind."
    is_source_local: false
    extra_snippets[1]: "Works for Python, Java, C++, and more."
    properties:
      site_name: LaptopMag
      favicon: "https://favicon.example.com/laptopmag.png"
  - title: Top 10 Laptops Every Developer Recommends
    url: "https://techsite.com/best-dev-laptops"
    description: "Hand-picked laptops for coders and engineers: specs, comparison, value."
    is_source_local: false
    extra_snippets[0]:
    properties:
      site_name: TechSite
      favicon: "https://favicon.example.com/techsite.png"
  - title: "Ultimate Guide: Buying a Laptop for Coding"
    url: "https://devguide.com/laptop-buying"
    description: A detailed guide on 

## Step 3: Test Flattening Approach

Flatten nested objects to primitive fields to enable TOON tabular format.


In [7]:
def flatten_for_toon(obj: Any, prefix: str = "") -> Any:
    """
    Flatten nested objects into primitive fields for TOON tabular format.
    """
    if isinstance(obj, dict):
        flattened = {}
        for key, value in obj.items():
            new_key = f"{prefix}_{key}" if prefix else key
            
            if isinstance(value, dict):
                # Recursively flatten nested dictionaries
                flattened.update(flatten_for_toon(value, new_key))
            elif isinstance(value, list):
                # Convert lists to pipe-delimited strings
                if not value:
                    flattened[new_key] = ""
                elif all(isinstance(v, (str, int, float, bool, type(None))) for v in value):
                    flattened[new_key] = "|".join(str(v) if v is not None else "" for v in value)
                else:
                    flattened[new_key] = json.dumps(value)
            else:
                # Primitive value - keep as-is
                flattened[new_key] = value
        return flattened
    elif isinstance(obj, list):
        return [flatten_for_toon(item, prefix) for item in obj]
    else:
        return obj

# Test flattening
flattened_results = [flatten_for_toon(result) for result in sample_results]
print("\n=== FLATTENED STRUCTURE ===")
print(f"First flattened result:")
print(json.dumps(flattened_results[0], indent=2))



=== FLATTENED STRUCTURE ===
First flattened result:
{
  "title": "The Best Laptops for Programming in 2024",
  "url": "https://example.com/best-laptops-programming",
  "description": "Our top picks for developers, with performance, portability, and battery life in mind.",
  "is_source_local": false,
  "extra_snippets": "Works for Python, Java, C++, and more.",
  "properties_site_name": "LaptopMag",
  "properties_favicon": "https://favicon.example.com/laptopmag.png"
}


In [8]:
# Encode flattened results
flattened_approach = {"results": flattened_results, "count": len(flattened_results)}
flattened_toon = encode(flattened_approach)

print("=== FLATTENED APPROACH ===")
print(f"Length: {len(flattened_toon)} characters")
print(f"\nTOON output:")
print(flattened_toon)
print(f"\n\nLength comparison:")
print(f"Current (nested): {len(current_toon)} chars")
print(f"Flattened: {len(flattened_toon)} chars")
print(f"Savings: {len(current_toon) - len(flattened_toon)} chars ({(1 - len(flattened_toon)/len(current_toon))*100:.1f}% reduction)")


=== FLATTENED APPROACH ===
Length: 825 characters

TOON output:
results[3]{title,url,description,is_source_local,extra_snippets,properties_site_name,properties_favicon}:
  The Best Laptops for Programming in 2024,"https://example.com/best-laptops-programming","Our top picks for developers, with performance, portability, and battery life in mind.",false,"Works for Python, Java, C++, and more.",LaptopMag,"https://favicon.example.com/laptopmag.png"
  Top 10 Laptops Every Developer Recommends,"https://techsite.com/best-dev-laptops","Hand-picked laptops for coders and engineers: specs, comparison, value.",false,"",TechSite,"https://favicon.example.com/techsite.png"
  "Ultimate Guide: Buying a Laptop for Coding","https://devguide.com/laptop-buying",A detailed guide on what to look for when buying a programming laptop.,false,Find out what specs matter for devs.,DevGuide,null
count: 3


Length comparison:
Current (nested): 1109 chars
Flattened: 825 chars
Savings: 284 chars (25.6% reduction)


## Step 4: Test Alternative - Manual Tabular Structure

Manually create a tabular structure to see what TOON expects.


In [9]:
# Manual tabular structure - all fields as primitives
tabular_results = []
for result in sample_results:
    tabular_results.append({
        "type": result.get("type", ""),
        "title": result.get("title", ""),
        "url": result.get("url", ""),
        "description": result.get("description", ""),
        "page_age": result.get("page_age", ""),
        "profile_name": result.get("profile", {}).get("name", "") if result.get("profile") else "",
        "meta_url_favicon": result.get("meta_url", {}).get("favicon", "") if result.get("meta_url") else "",
        "thumbnail_original": result.get("thumbnail", {}).get("original", "") if result.get("thumbnail") else "",
        "extra_snippets": "|".join(result.get("extra_snippets", [])) if result.get("extra_snippets") else "",
        "hash": result.get("hash", "")
    })

print("\n=== MANUAL TABULAR STRUCTURE ===")
print(f"First result:")
print(json.dumps(tabular_results[0], indent=2))



=== MANUAL TABULAR STRUCTURE ===
First result:
{
  "type": "",
  "title": "The Best Laptops for Programming in 2024",
  "url": "https://example.com/best-laptops-programming",
  "description": "Our top picks for developers, with performance, portability, and battery life in mind.",
  "page_age": "",
  "profile_name": "",
  "meta_url_favicon": "",
  "thumbnail_original": "",
  "extra_snippets": "Works for Python, Java, C++, and more.",
  "hash": ""
}


In [10]:
# Encode manual tabular structure
tabular_approach = {"results": tabular_results, "count": len(tabular_results)}
tabular_toon = encode(tabular_approach)

print("=== MANUAL TABULAR APPROACH ===")
print(f"Length: {len(tabular_toon)} characters")
print(f"\nTOON output:")
print(tabular_toon)
print(f"\n\nLength comparison:")
print(f"Current (nested): {len(current_toon)} chars")
print(f"Flattened: {len(flattened_toon)} chars")
print(f"Manual tabular: {len(tabular_toon)} chars")
print(f"\nBest savings: {len(current_toon) - min(len(flattened_toon), len(tabular_toon))} chars")


=== MANUAL TABULAR APPROACH ===
Length: 753 characters

TOON output:
results[3]{type,title,url,description,page_age,profile_name,meta_url_favicon,thumbnail_original,extra_snippets,hash}:
  "",The Best Laptops for Programming in 2024,"https://example.com/best-laptops-programming","Our top picks for developers, with performance, portability, and battery life in mind.","","","","","Works for Python, Java, C++, and more.",""
  "",Top 10 Laptops Every Developer Recommends,"https://techsite.com/best-dev-laptops","Hand-picked laptops for coders and engineers: specs, comparison, value.","","","","","",""
  "","Ultimate Guide: Buying a Laptop for Coding","https://devguide.com/laptop-buying",A detailed guide on what to look for when buying a programming laptop.,"","","","",Find out what specs matter for devs.,""
count: 3


Length comparison:
Current (nested): 1109 chars
Flattened: 825 chars
Manual tabular: 753 chars

Best savings: 356 chars


## Step 5: Verify Decoding Works

Make sure we can decode the TOON format back to the original structure.


In [11]:
# Test decoding the best approach
try:
    decoded = decode(tabular_toon)
    print("=== DECODED TABULAR TOON ===")
    print(f"Type: {type(decoded)}")
    print(f"Keys: {list(decoded.keys()) if isinstance(decoded, dict) else 'N/A'}")
    if isinstance(decoded, dict) and "results" in decoded:
        print(f"Number of results: {len(decoded['results'])}")
        print(f"\nFirst decoded result:")
        print(json.dumps(decoded["results"][0], indent=2))
except Exception as e:
    print(f"Error decoding: {e}")


=== DECODED TABULAR TOON ===
Type: <class 'dict'>
Keys: ['results', 'count']
Number of results: 3

First decoded result:
{
  "type": "",
  "title": "The Best Laptops for Programming in 2024",
  "url": "https://example.com/best-laptops-programming",
  "description": "Our top picks for developers, with performance, portability, and battery life in mind.",
  "page_age": "",
  "profile_name": "",
  "meta_url_favicon": "",
  "thumbnail_original": "",
  "extra_snippets": "Works for Python, Java, C++, and more.",
  "hash": ""
}


## Step 6: Compare All Approaches

Summary of all approaches and their effectiveness.


In [12]:
print("=== SUMMARY ===")
print(f"\n1. Current (nested objects):")
print(f"   Length: {len(current_toon)} chars")
print(f"   Format: YAML-like with repeated field names")
print(f"   Uses tabular format: NO")

print(f"\n2. Flattened approach:")
print(f"   Length: {len(flattened_toon)} chars")
print(f"   Format: {'Tabular' if 'results[' in flattened_toon and '{' in flattened_toon else 'Mixed'}")
print(f"   Uses tabular format: {'YES' if 'results[' in flattened_toon and '{' in flattened_toon else 'NO'}")

print(f"\n3. Manual tabular approach:")
print(f"   Length: {len(tabular_toon)} chars")
print(f"   Format: {'Tabular' if 'results[' in tabular_toon and '{' in tabular_toon else 'Mixed'}")
print(f"   Uses tabular format: {'YES' if 'results[' in tabular_toon and '{' in tabular_toon else 'NO'}")

best_length = min(len(current_toon), len(flattened_toon), len(tabular_toon))
best_approach = "Current" if best_length == len(current_toon) else ("Flattened" if best_length == len(flattened_toon) else "Manual Tabular")

print(f"\n✅ BEST APPROACH: {best_approach}")
print(f"   Token savings vs current: {(1 - best_length/len(current_toon))*100:.1f}%")


=== SUMMARY ===

1. Current (nested objects):
   Length: 1109 chars
   Format: YAML-like with repeated field names
   Uses tabular format: NO

2. Flattened approach:
   Length: 825 chars
   Format: Tabular
   Uses tabular format: YES

3. Manual tabular approach:
   Length: 753 chars
   Format: Tabular
   Uses tabular format: YES

✅ BEST APPROACH: Manual Tabular
   Token savings vs current: 32.1%
