# Baseline agentic search loop

This notebook has a basic agentic search loop

* We have a set of furniture in our catalog
* We tell the Agent our preferences
* The agent uses the search tool to recommend furniture
* Ported code from [this blog post](https://softwaredoug.com/blog/2025/10/06/how-much-does-reasoning-improve-search-quality)

## Better 'starting' conditions

* Optional addition of few shot prompt + query narrative

Why? it seems the agents behavior is heavily influenced by the initial prompt, this gives better conditions starting out

## Error checking harness

Why? Let's the agent know when its made a mistake, it can try again. We should always be very defensive with agents.

In [None]:
!pip install git+https://github.com/softwaredoug/cheat-at-search.git
from cheat_at_search.data_dir import mount
mount(use_gdrive=True)    # colab, share data across notebook runs on gdrive
# mount(use_gdrive=False) # <- colab without gdrive
# mount(use_gdrive=False, manual_path="/path/to/directory")  # <- force data path to specific directory, ie you're running locally.


Collecting git+https://github.com/softwaredoug/cheat-at-search.git
  Cloning https://github.com/softwaredoug/cheat-at-search.git to /tmp/pip-req-build-wwz4iaj0
  Running command git clone --filter=blob:none --quiet https://github.com/softwaredoug/cheat-at-search.git /tmp/pip-req-build-wwz4iaj0
  Resolved https://github.com/softwaredoug/cheat-at-search.git to commit 6a08d097f1d6eaa068fb61af47c621df1682f5e2
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Get an OpenAI Key

This will prompt you for an OpenAI Key to interact with GPT-5

In [None]:
from cheat_at_search.data_dir import key_for_provider
from openai import OpenAI

OPENAI_KEY = key_for_provider("openai")

openai = OpenAI(api_key=OPENAI_KEY)

## Load the Wayfair corpus

We'll recommend products only from this corpus

In [None]:
from cheat_at_search.wands_data import corpus

corpus['category'] = corpus['category'].str.strip()
corpus['sub_category'] = corpus['sub_category'].str.strip()

corpus

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,doc_id,title,description,category,sub_category,cat_subcat,title_snowball,description_snowball,category_snowball
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0,"[overallwidth-sidetoside:64.7, dsprimaryproduc...",0,solid wood platform bed,"good , deep sleep can be quite difficult to ha...",Furniture,Bedroom Furniture,Furniture / Bedroom Furniture,"Terms({'platform', 'bed', 'wood', 'solid'})","Terms({'both', 'nowher', 'age', 'like', 'act',...",Terms({'furnitur'})
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0,"[capacityquarts:7, producttype : slow cooker, ...",1,all-clad 7 qt . slow cooker,"create delicious slow-cooked meals , from tend...",Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances,"Terms({'qt', 'cooker', 'slow', '7', 'clad', 'a...","Terms({'prepar', 'flavor', 'healthi', 'electr'...","Terms({'kitchen', 'tabletop'})"
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0,"[features : keep warm setting, capacityquarts:...",2,all-clad electrics 6.5 qt . slow cooker,prepare home-cooked meals on any schedule with...,Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances,"Terms({'6', 'cooker', 'qt', '5', 'slow', 'elec...","Terms({'hour', 'prepar', 'and', 'essenti', 'fe...","Terms({'kitchen', 'tabletop'})"
3,3,all-clad all professional tools pizza cutter,"Slicers, Peelers And Graters",Browse By Brand / All-Clad,this original stainless tool was designed to c...,overallwidth-sidetoside:3.5|warrantylength : l...,69.0,4.5,42.0,"[overallwidth-sidetoside:3.5, warrantylength :...",3,all-clad all professional tools pizza cutter,this original stainless tool was designed to c...,Browse By Brand,All-Clad,Browse By Brand / All-Clad,"Terms({'profession', 'cutter', 'clad', 'pizza'...","Terms({'sharp', 'complement', 'and', 'easili',...","Terms({'brows', 'brand', 'by'})"
4,4,baldwin prestige alcott passage knob with roun...,Door Knobs,Home Improvement / Doors & Door Hardware / Doo...,the hardware has a rich heritage of delivering...,compatibledoorthickness:1.375 '' |countryofori...,70.0,5.0,42.0,"[compatibledoorthickness:1.375 '' , countryofo...",4,baldwin prestige alcott passage knob with roun...,the hardware has a rich heritage of delivering...,Home Improvement,Doors & Door Hardware,Home Improvement / Doors & Door Hardware,"Terms({'round', 'rosett', 'baldwin', 'passag',...","Terms({'modern', 'and', 'half', 'hardwar', 'pa...","Terms({'improv', 'home'})"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42989,42989,malibu pressure balanced diverter fixed shower...,Shower Panels,Home Improvement / Bathroom Remodel & Bathroom...,the malibu pressure balanced diverter fixed sh...,producttype : shower panel|spraypattern : rain...,3.0,4.5,2.0,"[producttype : shower panel, spraypattern : ra...",42989,malibu pressure balanced diverter fixed shower...,the malibu pressure balanced diverter fixed sh...,Home Improvement,Bathroom Remodel & Bathroom Fixtures,Home Improvement / Bathroom Remodel & Bathro...,"Terms({'divert', 'fix', 'pressur', 'malibu', '...","Terms({'and', 'bodi', 'easili', 'panel', 'styl...","Terms({'improv', 'home'})"
42990,42990,emmeline 5 piece breakfast dining set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,,basematerialdetails : steel| : gray wood|ofhar...,1314.0,4.5,864.0,"[basematerialdetails : steel, : gray wood, of...",42990,emmeline 5 piece breakfast dining set,,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture,"Terms({'5', 'breakfast', 'emmelin', 'piec', 's...",Terms(set()),Terms({'furnitur'})
42991,42991,maloney 3 piece pub table set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,this pub table set includes 1 counter height t...,additionaltoolsrequirednotincluded : power dri...,49.0,4.0,41.0,[additionaltoolsrequirednotincluded : power dr...,42991,maloney 3 piece pub table set,this pub table set includes 1 counter height t...,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture,"Terms({'3', 'tabl', 'piec', 'pub', 'maloney', ...","Terms({'bistro', '1', 'rectangular', 'and', 'w...",Terms({'furnitur'})
42992,42992,fletcher 27.5 '' wide polyester armchair,Teen Lounge Furniture|Accent Chairs,Furniture / Living Room Furniture / Chairs & S...,"bring iconic , modern style to your space in a...",legmaterialdetails : rubberwood|backheight-sea...,1746.0,4.5,1226.0,"[legmaterialdetails : rubberwood, backheight-s...",42992,fletcher 27.5 '' wide polyester armchair,"bring iconic , modern style to your space in a...",Furniture,Living Room Furniture,Furniture / Living Room Furniture,"Terms({'5', 'polyest', 'fletcher', '27', 'wide...","Terms({'modern', 'detail', 'while', 'and', 'cr...",Terms({'furnitur'})


### Index the furniture

We'll index title and description with basic stemming to be able to retrieve them

In [None]:
from searcharray import SearchArray
from cheat_at_search.tokenizers import snowball_tokenizer

corpus['title_snowball'] = SearchArray.index(corpus['title'].fillna(''), snowball_tokenizer)
corpus['description_snowball'] = SearchArray.index(corpus['description'].fillna(''), snowball_tokenizer)
corpus['category_snowball'] = SearchArray.index(corpus['category'].fillna(''), snowball_tokenizer)

2026-02-11 04:13:04,438 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-11 04:13:04,453 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-11 04:13:04,460 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-11 04:13:04,809 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-11 04:13:05,182 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-11 04:13:05,562 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-11 04:13:05,918 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-11 04:13:06,156 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-11 04:13:06,166 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-11 04:13:06,175 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-11 04:13:06,227 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-11 04:13:06,291 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-11 04:13:06,294 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-11 04:13:06,341 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


2026-02-11 04:13:06,383 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-11 04:13:06,393 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-11 04:13:06,396 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-11 04:13:07,568 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-11 04:13:08,818 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-11 04:13:10,802 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-11 04:13:12,835 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-11 04:13:13,434 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-11 04:13:13,487 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-11 04:13:13,533 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-11 04:13:14,180 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-11 04:13:14,629 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-11 04:13:14,631 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-11 04:13:14,878 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


2026-02-11 04:13:15,075 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-11 04:13:15,086 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-11 04:13:15,089 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-11 04:13:15,280 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-11 04:13:15,469 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-11 04:13:15,657 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-11 04:13:15,847 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-11 04:13:16,053 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-11 04:13:16,055 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-11 04:13:16,058 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-11 04:13:16,066 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-11 04:13:16,076 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-11 04:13:16,080 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-11 04:13:16,100 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


## Create a furniture products search function

Here is a function that searches a Wayfair product dataset. It's just a Python function that returns top 10 pieces of furniture.

Right now we'll call it directly, soon we'll help ChatGPT interact with this.

In [None]:
import numpy as np
from typing import Union

from pydantic import BaseModel, Field
from typing import Optional, Literal


Categories = Literal['Furniture', 'Kitchen & Tabletop', 'Browse By Brand',
                     'Home Improvement', 'Décor & Pillows', 'Outdoor',
                     'Storage & Organization', 'Bed & Bath', 'Baby & Kids',
                     'Pet', 'Lighting', 'Rugs', 'School Furniture and Supplies',
                     'Commercial Business Furniture', 'Holiday Décor', 'Fountains',
                     'Contractor', 'Appliances', 'Sale', 'Reception Area',
                     'Foodservice', 'Institutional Furniture Parts & Accessories',
                     'Landscaping Screens & Bridges', 'Shop Product Type', 'Clips',
                     'Slicers, Peelers And Graters', 'Bed Accessories',
                     'Accommodations', 'Buffet Accessories', 'Specialty Serving',
                     'Display Cases', 'Key Organizers', 'Ergonomic Accessories',
                     'Slow Cookers', 'Bath Rugs & Mats', 'Furniture Cushions',
                     'Early Education', 'Learning Resources',
                     'Physical Education Equipment', 'Faux Plants and Trees',
                     'Desk Parts', 'Serving Dishes & Platters', 'Water Filter Pitchers',
                     'Shower Curtain Rods', 'Table Accessories',
                     'Sandboxes & Sand Toys', 'Meeting & Collaborative Spaces',
                     'Desktop Organizers & Desk Pads',
                     'Napkin Rings, Place Card Holders & Food Markers',
                     'Partition & Panel Hardware Accessories', 'Cash Handling', 'Hooks',
                     'Novelty Lighting', 'Protection Plans',
                     'Stages, Risers and Accessories']

def search_wayfair(keywords: str,
                   category: Optional[Categories] = None,
                   top_k: int = 5
                   ) -> list[dict[str, Union[str, int, float]]]:
    """Search the wayfair home goods + furniture catalog, get top_k results

    This is direct keyword search along with optional category filtering.

    Args:
        keywords: The search query string.
        category: category to filter products by, unfiltered when not present
        top_k: The number of top results to return.

    Returns:
        Search results as a list of dictionaries with 'id', 'title', 'description', 'category', and 'score' keys.

    """
    print("search", keywords)
    required_keywords = [term[1:] for term in keywords.split() if term.startswith("+")]
    bm25_scores = np.zeros(len(corpus))
    for term in snowball_tokenizer(keywords):
        bm25_scores += corpus['title_snowball'].array.score(term) * 10
        bm25_scores += corpus['description_snowball'].array.score(term) * 1

    # for required_term in snowball_tokenizer(" ".join(required_keywords)):
    #     required_score = (corpus['title_snowball'].array.score(required_term) +
    #                       corpus['description_snowball'].array.score(required_term))
    #     bm25_scores[required_score == 0] = 0
    # Filter by category
    if category:
        print("Filtering by category:", category)
        cat_tokenized = snowball_tokenizer(category)
        category_mask = corpus['category_snowball'].array.score(cat_tokenized) > 0
        bm25_scores = bm25_scores * category_mask


    top_k_indices = np.argsort(bm25_scores)[-top_k:][::-1]
    bm25_scores = bm25_scores[top_k_indices]
    top_movies = corpus.iloc[top_k_indices].copy()
    top_movies.loc[:, 'score'] = bm25_scores

    results = []
    for id, row in top_movies.iterrows():
        results.append({
            'id': row['doc_id'],
            'title': row['title'],
            'description': row['description'],
            'category': row['category'],
            'score': row['score']
        })
    return results



search_wayfair("geometric style couch", top_k=5)

search geometric style couch


[{'id': 6346,
  'title': 'kaat 3 - light candle style geometric chandelier',
  'description': 'welcome guests to your home with a splash of statement lighting , illuminate your bedroom or light up your dining room table with this charismatic geometric chandelier . made from steel in a handsome metallic finish , this alluring design showcases a simple circular canopy , a straight down rod , and a distinctive openwork geometric frame around a contemporary candelabra-style base . this hardwired modern luminary accommodates three lightbulbs of up to 60 w each ( bulbs not included ) .',
  'category': 'Lighting',
  'score': 45.974431455135345},
 {'id': 39358,
  'title': 'deloris 4 - light candle style geometric chandelier',
  'description': 'this 4-light pendant features a unique design that enhances the contemporary . it also adds a modern style atmosphere to your home for a more fashionable feel .',
  'category': 'Lighting',
  'score': 43.722705125808716},
 {'id': 22831,
  'title': 'matild

## Describe the search tool to the LLM

There is a specific schema for telling OpenAI about our tools / functions. However, the cheat at search library has added some conveniences:

* We use the function name as the name to OpenAI
* We use the doc string to get a description
* The typing information gets encoded in parameters and return value

So IMPORTANTLY -- all these things are part of the prompt

### Annoying serialization / deserialization

When we get it in an OpenAI-friendly format, we also keep around some book-keeping for annoying serialization / deserialization of the arguments

With this we get some plumbing information in a 3-tuple
* The arguments to pass (as one pydantic struct)
* The tool as OpenAI sees it
* The function to call to delegate to this tool

Don't get too lost in the weeds here. In future notebooks, cheat-at-search helper code will just do this for you behind the scenes.

In [None]:
from cheat_at_search.agent.pydantize import make_tool_adapter

search_tool = make_tool_adapter(search_wayfair)

tool_info = {search_wayfair.__name__: search_tool}
tool_info

{'search_wayfair': (cheat_at_search.agent.pydantize.Search_wayfairArgs,
  {'type': 'function',
   'name': 'search_wayfair',
   'description': "Search the wayfair home goods + furniture catalog, get top_k results\n\n    This is direct keyword search along with optional category filtering.\n\n    Args:\n        keywords: The search query string.\n        category: category to filter products by, unfiltered when not present\n        top_k: The number of top results to return.\n\n    Returns:\n        Search results as a list of dictionaries with 'id', 'title', 'description', 'category', and 'score' keys.",
   'parameters': {'properties': {'keywords': {'title': 'Keywords',
      'type': 'string'},
     'category': {'anyOf': [{'enum': ['Furniture',
         'Kitchen & Tabletop',
         'Browse By Brand',
         'Home Improvement',
         'Décor & Pillows',
         'Outdoor',
         'Storage & Organization',
         'Bed & Bath',
         'Baby & Kids',
         'Pet',
         'Li

In [None]:
def call_tool(tool_info, item) -> dict:

    # Lookup how the agent wants to call the tool
    tool_name = item.name
    tool = tool_info[tool_name]
    ToolArgsModel = tool[0]
    tool_fn = tool[2]
    fn_args: ToolArgsModel = ToolArgsModel.model_validate_json(item.arguments)

    print(f"Calling {tool_name} with args {fn_args}")
    # The tool call function itself (ie search)
    # wrapped in something helping with serialization
    py_resp, json_resp = tool_fn(fn_args)
    print("output", py_resp)

    # 4. Provide function call results to the model
    return {
        "type": "function_call_output",
        "call_id": item.call_id,
        "output": json_resp,
    }


## Option 1: add few shot examples to the prompt

We add some examples of the data to the prompt, and what's considered relevant / irrelevant. This is indepnedent of the query

What this code does is grab the search judgments merged with the products, and sample random 10 products. We then

In [None]:
from cheat_at_search.wands_data import labeled_query_products
import pandas as pd

def build_few_shot_prompt(prompt, k=10) -> str:
    labeled_query_products.sample(k, random_state=42)

    labeled = labeled_query_products
    if len(labeled) == 0:
        return []
    relevant = labeled[labeled['label'] == 'Exact']
    irrelevant = labeled[labeled['label'] == 'Irrelevant']
    # Get 3 relevant
    relevant = relevant.sample(min(k // 3, len(relevant)), random_state=42)
    # Get 3 irrelevant
    irrelevant = irrelevant.sample(min(k // 3, len(irrelevant)), random_state=42)
    # Get the rest Partial
    partial = labeled[labeled['label'] == 'Partial']
    partial = partial.sample(min(k - len(relevant) - len(irrelevant), len(partial)), random_state=42)

    # Format into prompt
    labeled = pd.concat([relevant, irrelevant, partial]).sample(frac=1, random_state=42)
    for item in labeled.to_dict(orient='records'):
        print(item)
        prompt += f"""

        User Query: {item['query']}
        Product Name: {item['title']}
        Product Description: {item['description']}
        Product Category: {item['category']}
        Human Label: {item['label']}

        """
    print("Prompt is:")
    print(prompt)
    return prompt

## Optional - give a narrative description of query intent

This can work only if the LLM actually guesses correctly.

In [None]:
def interpret_query(keywords):
    system_prompt = f"""
        Interpret search queries for a home goods / furniture
        store like Wayfair into a description of what's needed in a few
        sentences.

        State it in the voice of the user "I am looking for <detailed info>"

        Pick the most likely intent, don't guess at multiple ones.
    """
    user_prompt = f"""
        User Query: {keywords}
    """
    inputs = []
    inputs.append({"role": "system", "content": system_prompt})

    inputs.append({"role": "user", "content": user_prompt})
    resp = openai.responses.create(
            model="gpt-5",
            input=inputs
    )
    return resp.output[-1].content[-1].text

# interpret_query("led 60")

## Setup agentic loop

Setup and run agentic loop within a minimal harness

In [None]:
import textwrap
from time import sleep


system_prompt = """
    You take user search queries and use a search tool to find furniture / home goods products. N

    Look at the search tools you have, their limitations, how they work, etc when forming your plan.

    Finally return results to the user per the SearchResults schema, ranked best to worst.

    Gather results until you have 10 best matches you can find. It's important to return at least 10.

    Consider possibly

    * Not searching categories if no relevant results found

    It's very important you consider carefully the correct ranking as you'll be evaluated on
    how close that is to the average furniture shoppers ideal ranking.

    Here are some examples of products and relevant / irrelevant results

"""
system_prompt = build_few_shot_prompt(system_prompt)



class SearchResults(BaseModel):
    """The ranked, top 10 search results ordered most relevant to least."""
    results_summary: str = Field(description="The message from you summarizing what you found")

    ranked_results: list[int] = Field(description="Top ranked search results (their doc_ids)")


def agent_run(tool_info,
              text_format,
              inputs,
              model='gpt-5',
              summary=True):

    tool_calls = True
    resp = None
    while tool_calls:
        failing = True
        num_failures = 0
        while failing:
            try:
                # print(inputs)
                resp = openai.responses.parse(
                    model=model,
                    input=inputs,
                    tools=[tool[1] for tool in tool_info.values()],
                    reasoning={
                        "effort": "medium",
                        "summary": "auto" if summary else "none"
                    },
                    text_format=text_format
                )
                failing = False
            except Exception:
                failing = True
                num_failures += 1
                if num_failures > 3:
                    raise
                sleep(1)
        inputs += resp.output
        if summary:
            usage = resp.usage
            print("--")
            print(f"InpTok: {usage.input_tokens}")
            print(f"OutTok: {usage.output_tokens}")
            for item in resp.output:
                if item.type == "reasoning":
                    print("Reasoning:")
                    for summary_item in item.summary:
                        print(textwrap.fill(summary_item.text, 80), "\n")
                    item.summary = []

        for item in resp.output:
            tool_calls = False
            if item.type == "function_call":
                tool_calls = True
                # *** Get the tool, and package
                # up the call to the tool (our python function)
                tool_response = call_tool(tool_info, item)

                # 4. Provide function call results to the model
                inputs.append(tool_response)
    return resp, inputs


def _error_msg(error):
    print(error)
    return {"role": "user", "content": f"Oh this isn't good, it turns out: {error}. Please try again"}



def search(query):
    """A little search harness."""
    inputs = [{"role": "system", "content": system_prompt},
            {"role": "user", "content": interpret_query(query)}]
    error = True
    resp = None
    search_tool = make_tool_adapter(search_wayfair)

    tool_info = {search_wayfair.__name__: search_tool}
    while error:
        resp, inputs = agent_run(tool_info,
                                 text_format=SearchResults,
                                 inputs=inputs)
        # Validate what came back, ensure it fits
        # Lookup each doc id in the response
        num_results = len(resp.output_parsed.ranked_results)
        if num_results != 10:
            inputs.append(_error_msg(f"Expected 10 ranked_results, got {num_results}"))
            error = True
            continue
        for item in resp.output_parsed.ranked_results:
            if item not in corpus['doc_id'].values:
                inputs.append(_error_msg(f"Doc id {item} is not in corpus"))
                error = True
                continue
        error = False

    return resp.output_parsed


resp = search("small woven pouf")
resp

{'query_id': 24, 'query': 'wood coffee table set by storage', 'query_class': 'Living Room Table Sets', 'doc_id': 5696, 'id': 2385, 'product_id_x': 5696, 'label': 'Partial', 'grade': 1.0, 'product_id_y': 5696, 'product_name': 'stuber 3 piece coffee table set', 'product_class': 'Living Room Table Sets', 'category hierarchy': 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Table Sets', 'product_description': 'sleek mid-century modern design is complemented by rustic elements to create the ever timeless jones tables . natural finished reclaimed fir adds history and time-worn character , while black iron hairpin legs add an industrial component . this set of 3 includes one coffee table and two end tables . instantly bring cohesiveness to one room by using all 3 tables together or divide them up amongst different rooms . the sturdy hairpin legs fold to allow for easy portability . each tabletop is lacquer finished for long-lasting protection and durability .', 'produ

SearchResults(results_summary='I prioritized small, round poufs with natural fibers (jute/cotton), neutral tones, tight/hand-woven textures, and sizes closest to 14–16 in. high and ≤18 in. wide. Best match is a true 16" round woven style. Next are cotton-woven options likely available in neutral colors; two are 18" cubes/rounds in a natural color (slightly tall). A 17" round is close on size though material is unspecified. Natural jute options are included but note they’re 20" wide (exceed your width). A chevron option meets size but isn’t neutral. If you want only items strictly within ≤18" W and 14–16" H, the top few are your best bets; the rest are partial matches due to size or pattern.', ranked_results=[15836, 41491, 20759, 23332, 23333, 3271, 31035, 19189, 5866, 14723])

## Compare results to just BM25

Compare a set of queries to BM25 baseline

In [None]:
from cheat_at_search.strategy import SearchStrategy
from cheat_at_search.search import run_strategy
from cheat_at_search.wands_data import judgments

class AgenticSearchStrategy(SearchStrategy):
    def __init__(self,
                 corpus,
                 workers=1):
        super().__init__(corpus, workers=workers)

    def search(self, query, k):
        agentic_query = "Find me: " + query
        print("_________---___________")
        print(agentic_query)
        print("_________---___________")
        resp = search(query)
        return (resp.ranked_results,
                [1.0] * len(resp.ranked_results))

# Get 20 random queries from judgments
seed = 1234
np.random.seed(seed)
random_queries = np.random.choice(judgments['query'].unique(), 8)
selected_judgments = judgments[judgments['query'].isin(random_queries)]
selected_judgments

strategy = AgenticSearchStrategy(corpus, workers=4)
graded_agentic = run_strategy(strategy, selected_judgments)

_________---___________
Find me: carolyn console table
_________---___________
_________---___________
Find me: queen wingback chair
_________---___________
_________---___________
Find me: wishbone chair
_________---___________
_________---___________
Find me: dull bed with shirt head board
_________---___________


Searching:   0%|          | 0/8 [00:00<?, ?it/s]

--
InpTok: 1834
OutTok: 427
Reasoning:
**Searching for console table**  I need to use the search tool to find the
"Carolyn Console Table" from Wayfair. The user is looking for a specific model
from the Carolyn collection, aiming for a standard entryway size that suits a
hallway with a slim profile and a neutral wood or painted finish. My goal is to
display the exact table and close matches, and I’ll ensure to provide at least
10 ranked results, considering furniture as the category. 

**Searching variations for console tables**  I think I’ll need to conduct
multiple searches for the "Carolyn console table" and related terms. I'll try
variations like "Carolyn collection console table," "Carolyn entryway table,"
and "Carolyn hallway table," plus alternative spellings like "Caroline."
Initially, I want to find the exact table, but I may also check for versions
like "Carolyn Narrow Console Table" and "Carolyn sideboard." However, I'll stick
to console tables and relevant matches from the c

Searching:  12%|█▎        | 1/8 [01:18<09:09, 78.46s/it]

--
InpTok: 9885
OutTok: 3299
Reasoning:
**Assessing search results**  I've gathered multiple search results, and now I
need to pick at least 10 that best match the user's criteria. The user wants a
full-size bed frame with a short, low-profile headboard that's simple and
understated – nothing that will dominate the room. It should be a platform bed
that doesn’t require a box spring, with a neutral finish or upholstery.  Looking
at one of the items, I see a "full platform bed" listed. However, it features a
bright yellow color that's definitely not neutral. I also can't clearly confirm
if it has a low headboard. Despite being a platform bed, the color won't fit the
user's needs. 

**Evaluating more options**  Now I’m assessing more search results for the
user's full-size bed frame needs. One option I found is a "metal platform bed"
in bright yellow, which doesn't meet the neutral color requirement and likely
has a standard headboard height—not ideal!   Then, there's a "marjorie full
pla

Searching:  25%|██▌       | 2/8 [01:24<03:34, 35.67s/it]

--
InpTok: 14181
OutTok: 2586
Reasoning:
**Identifying close matches**  I'm considering if there are close matches for
the "Carolyn" collection, like potential variants "Caroline" and "Carolynn," but
they aren't present. I’ll include the carolyn coffee table with storage as a
close match, since it shares design elements with the collection, even if it’s
not an exact match. I should rank it around 7 or 8 as a close match. I'll also
explicitly show other relevant consoles like the narrow rectangular console
table and the Bouldin console table. 

**Evaluating console options**  I've come across a few console tables that could
work well. First, there's the Saunders table (16562) with turned legs and
drawers, which has an older style and likely a neutral painted finish—perfect
for a hallway. Then, I see the Bayle table (29954), which is apothecary-inspired
and made of solid acacia with metal legs, though I’m unsure if it might be
deeper than expected. Finally, there's the Westcott console t

Searching:  38%|███▊      | 3/8 [01:34<02:01, 24.24s/it]

--
InpTok: 15894
OutTok: 1454
Reasoning:
**Summarizing wingback chairs**  I've got enough wingback chairs to make a top
10 list, but none are explicitly labeled as Queen Anne with cabriole legs.
Still, I should be clear in my results summary: while I found traditional
wingback accent chairs in neutral linen, only a few mention Queen Anne cabriole
legs. I’ll make sure to follow the instructions for results to be ranked
according to the schema provided. Now, I’ll focus on ranking the top options for
the user. 

**Describing wingback chair options**  I’m still looking at options and focusing
on those that match neutral beige linen while also being comfortable for
reading. One possibility I found is the Robby chair, which is 29 inches wide,
tufted in linen, and likely features wood legs. It seems to fit the criteria of
having that neutral tone while ensuring a cozy reading experience. I’ll keep
exploring similar options to enrich the recommendations! 

**Gathering wingback chair options** 

Searching:  50%|█████     | 4/8 [02:00<01:38, 24.71s/it]

--
InpTok: 18586
OutTok: 1574
Reasoning:
**Identifying dining chair options**  It seems I’m focusing on finding a list of
ten solid wood dining chairs. Let’s summarize the potential options:   1) The
4196 Cantero chair is a Wishbone reproduction with a natural cord; it looks
perfect.  2) The 5574 Forney chair features quality solid beech.  3) The 35721
Gunnur chair, described as Nordic, might not be stackable, even though it says
it is.  4) The 37312 Norvin chair is related to the Wegner Y-chair.  5) The
17645 Andria chair has a black hemp seat, which is less ideal. 

**Listing additional dining chair options**  Continuing with the dining chair
options, here are some more possibilities:  6) The 5573 Burchfield chair has a
black finish with a paper cord seat, and requires no assembly, but it’s of lower
relevance.  7) The 41475 Sease chair is wishbone-inspired but features an
upholstered seat, which is less ideal.  8) The 30654 Agustin chair has a natural
rush seat; I'm not sure if it's 

Searching:  62%|██████▎   | 5/8 [03:36<02:31, 50.58s/it]

--
InpTok: 5541
OutTok: 2861
Reasoning:
**Identifying top light fixtures**  It looks like we're not considering
"Cliffside." The Dolley, Paquette, and Burton options fit the criteria and can
work with adjustable downrods/chain and dimmer compatibility. We've got more
than ten results, but we need to narrow this down to the top ten best matches.
The focus is on features and finding options that specify "medium-width." Some
items highlight suitability for intimate dining or living, like the Latrell.
I'll rank them based on their key attributes. 

**Clarifying light fixture options**  The dimmable feature isn't specified for
every option. Looking at the fixtures: Denise has a five-light drum with an open
wire cage and seems good. Paquette is dimmer compatible and has an adjustable
downrod with an inner fabric drum shade, also looking solid. Harlan features a
fabric drum shade with five candelabra-base bulbs and an adjustable chain. Glynn
similarly has a fabric drum shade, is dimmer compat

Searching:  75%|███████▌  | 6/8 [03:37<01:07, 33.69s/it]

--
InpTok: 14403
OutTok: 793
Reasoning:
**Finalizing the ranking**  I’m finalizing the ranking and need at least ten
results. The instruction suggests not searching categories if no relevant
results are found, but we've already established that Lighting is relevant. I
should ensure I rank strongly by matching all the constraints, such as dimmable
features and warm white range, while considering multi-pack value. I’m debating
whether a smart bulb, which is 3000K, should be ranked lower than a normal A19
dimmable bulb at 2700K. Since the average shopper likely prefers the standard
A19 shape, I'll place the smart bulb above the ST19. 

**Ranking standard bulbs**  Many consumers might not want smart bulbs, and the
user didn't mention them specifically; they only requested dimmable bulbs, not
smart ones. To avoid going off-target, I’m considering placing smart bulbs below
the ST19. The user asked for standard medium E26 base bulbs that output about
800 lumens, warm white, and preferably dim

Searching:  88%|████████▊ | 7/8 [03:43<00:24, 24.47s/it]

--
InpTok: 25929
OutTok: 1040
Reasoning:
**Compiling matches for preferences**  The tool has some limitations, but I can
still compile a list of 10 best matches, even if some aren’t perfect. Here’s
what I’ve got:  1. 9800 Dickson – perfect 2. 2070 Mayes – good 3. 15836 Aubree
16 – good 4. 50 Hallatrow hand woven – likely good 5. 19532 Woven seagrass pouf
– good 6. 3271 Limnia 17 – acceptable 7. 5866 Odin 20" tufted jute – slightly
larger, lower rank maybe 9 8. 19189 Finkel 20" jute – also 20", lower rank maybe
10 9. 4527 Overstreet 18" round burlap-look – somewhat neutral, rank 7 10. 5117
Heeter 14" round geometric cotton – might be okay, rank 8  I need to ensure
these options align with shopper preferences for tightly woven natural-fiber
options, neutral tones, shape retention, and lightweight designs. 

**Refining pouf selection**  I’ve noted that cotton knits don't hold their shape
well, and we want to steer clear of "garst." In contrast, water hyacinth and
jute woven options typica

Searching: 100%|██████████| 8/8 [03:44<00:00, 28.02s/it]

--
InpTok: 22703
OutTok: 1817
Reasoning:
**Ranking furniture styles**  I'm trying to rank some furniture pieces based on
their modernity. The 17718 Beckville Light Oak 2-drawer with copper handles has
modern lines and light wood, but the hardware isn't silver, so I might place it
around #5 or lower.   Then there's the 20896 Posen 1-drawer weathered oak, which
has carved legs and might not fit the modern style, plus I'm unsure about its
hardware. Let's keep assessing these! 

**Ranking furniture options**  I'm assigning ranks to some furniture pieces
based on their styles and materials. The 37345 Tellier 1-drawer natural oak has
modern lines and includes open shelves. I'd rank it high, maybe #2 or #3.  Next,
the 20254 Odonoghue 2-drawer light brown pine is transitional, so I’d say it
ranks mid.  The 37311 Fiorella has issues with height, possibly landing it at #8
or #9. The 704 and 34027 options are pretty generic and likely rank low.   For
the 39686 Judsonia, I think it falls low to mi




In [None]:
from cheat_at_search.search import ndcgs, graded_bm25, vs_ideal
# 5,5,4,7
ndcgs(graded_agentic).sort_index(), ndcgs(graded_agentic).mean()

(query
 carolyn console table                             0.333333
 cliffside 5 light candle style drum chandelier    0.333333
 dull bed with shirt head board                    0.000000
 led 60                                            0.333333
 light wood nightstand with silver accents         0.162625
 queen wingback chair                              0.304882
 small woven pouf                                  0.750621
 wishbone chair                                    0.951949
 Name: ndcg, dtype: float64,
 np.float64(0.3962595402610305))

In [None]:
ndcgs(graded_bm25[graded_bm25['query'].isin(random_queries)]).sort_index(), ndcgs(graded_bm25[graded_bm25['query'].isin(random_queries)]).mean()

(query
 carolyn console table                             0.333333
 cliffside 5 light candle style drum chandelier    0.333333
 dull bed with shirt head board                    0.000000
 led 60                                            0.333333
 light wood nightstand with silver accents         0.288624
 queen wingback chair                              0.307727
 small woven pouf                                  0.468003
 wishbone chair                                    0.674750
 Name: ndcg, dtype: float64,
 np.float64(0.3423881136250733))

In [None]:
def evalaute_results(query, document):
    """Evaluate the results, return relevance label 0-2"""
    ...
    # Run ranking model

    # Compute quality of document (or set of documents)

    # Return label