# Classic RAG

Using the Wayfair dataset, this shows the 'single shot' classic RAG

We do exactly one search, retrieve results for the agent, and ask the agent to incorporate them in answering the user's question.

In [None]:
!pip install git+https://github.com/softwaredoug/cheat-at-search.git
from cheat_at_search.data_dir import mount
mount(use_gdrive=True)    # colab, share data across notebook runs on gdrive
# mount(use_gdrive=False) # <- colab without gdrive
# mount(use_gdrive=False, manual_path="/path/to/directory")  # <- force data path to specific directory, ie you're running locally.


Collecting git+https://github.com/softwaredoug/cheat-at-search.git
  Cloning https://github.com/softwaredoug/cheat-at-search.git to /tmp/pip-req-build-yo78rvhk
  Running command git clone --filter=blob:none --quiet https://github.com/softwaredoug/cheat-at-search.git /tmp/pip-req-build-yo78rvhk
  Resolved https://github.com/softwaredoug/cheat-at-search.git to commit 6a08d097f1d6eaa068fb61af47c621df1682f5e2
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting openai<2.0.0,>=1.84.0 (from cheat_at_search==0.1.0)
  Downloading openai-1.109.1-py3-none-any.whl.metadata (29 kB)
Collecting pystemmer<4.0.0,>=3.0.0 (from cheat_at_search==0.1.0)
  Downloading PyStemmer-3.0.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting searcharray<0.0.74,>=0.0.73 (from cheat_at_search==0.1.0)
  Downloading 

## Get an OpenAI Key

This will prompt you for an OpenAI Key to interact with GPT-5

In [None]:
from cheat_at_search.data_dir import key_for_provider
from openai import OpenAI

OPENAI_KEY = key_for_provider("openai")

openai = OpenAI(api_key=OPENAI_KEY)

## Load the Wayfair corpus

We'll recommend products only from this corpus

In [None]:
from cheat_at_search.wands_data import corpus

corpus['category'] = corpus['category'].str.strip()

corpus

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,doc_id,title,description,category,sub_category,cat_subcat,title_snowball,description_snowball
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0,"[overallwidth-sidetoside:64.7, dsprimaryproduc...",0,solid wood platform bed,"good , deep sleep can be quite difficult to ha...",Furniture,Bedroom Furniture,Furniture / Bedroom Furniture,"Terms({'solid', 'platform', 'bed', 'wood'})","Terms({'type', 'well', 'problem', 'our', 'emph..."
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0,"[capacityquarts:7, producttype : slow cooker, ...",1,all-clad 7 qt . slow cooker,"create delicious slow-cooked meals , from tend...",Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances,"Terms({'cooker', 'clad', 'all', 'qt', 'slow', ...","Terms({'entertain', 'ingredi', 'come', 'meal',..."
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0,"[features : keep warm setting, capacityquarts:...",2,all-clad electrics 6.5 qt . slow cooker,prepare home-cooked meals on any schedule with...,Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances,"Terms({'cooker', 'clad', 'all', 'qt', '6', '5'...","Terms({'cooker', 'hour', 'insert', 'slow', 'me..."
3,3,all-clad all professional tools pizza cutter,"Slicers, Peelers And Graters",Browse By Brand / All-Clad,this original stainless tool was designed to c...,overallwidth-sidetoside:3.5|warrantylength : l...,69.0,4.5,42.0,"[overallwidth-sidetoside:3.5, warrantylength :...",3,all-clad all professional tools pizza cutter,this original stainless tool was designed to c...,Browse By Brand,All-Clad,Browse By Brand / All-Clad,"Terms({'pizza', 'clad', 'all', 'profession', '...","Terms({'rotari', 'through', 'pasta', 'easili',..."
4,4,baldwin prestige alcott passage knob with roun...,Door Knobs,Home Improvement / Doors & Door Hardware / Doo...,the hardware has a rich heritage of delivering...,compatibledoorthickness:1.375 '' |countryofori...,70.0,5.0,42.0,"[compatibledoorthickness:1.375 '' , countryofo...",4,baldwin prestige alcott passage knob with roun...,the hardware has a rich heritage of delivering...,Home Improvement,Doors & Door Hardware,Home Improvement / Doors & Door Hardware,"Terms({'baldwin', 'rosett', 'with', 'round', '...","Terms({'baldwin', 'rosett', 'is', 'someon', 'p..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42989,42989,malibu pressure balanced diverter fixed shower...,Shower Panels,Home Improvement / Bathroom Remodel & Bathroom...,the malibu pressure balanced diverter fixed sh...,producttype : shower panel|spraypattern : rain...,3.0,4.5,2.0,"[producttype : shower panel, spraypattern : ra...",42989,malibu pressure balanced diverter fixed shower...,the malibu pressure balanced diverter fixed sh...,Home Improvement,Bathroom Remodel & Bathroom Fixtures,Home Improvement / Bathroom Remodel & Bathro...,"Terms({'divert', 'pressur', 'head', 'panel', '...","Terms({'head', 'is', 'overs', 'easili', 'the',..."
42990,42990,emmeline 5 piece breakfast dining set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,,basematerialdetails : steel| : gray wood|ofhar...,1314.0,4.5,864.0,"[basematerialdetails : steel, : gray wood, of...",42990,emmeline 5 piece breakfast dining set,,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture,"Terms({'piec', 'set', 'breakfast', 'dine', '5'...",Terms(set())
42991,42991,maloney 3 piece pub table set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,this pub table set includes 1 counter height t...,additionaltoolsrequirednotincluded : power dri...,49.0,4.0,41.0,[additionaltoolsrequirednotincluded : power dr...,42991,maloney 3 piece pub table set,this pub table set includes 1 counter height t...,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture,"Terms({'piec', 'set', 'tabl', '3', 'pub', 'mal...","Terms({'entertain', 'as', 'set', 'make', 'coff..."
42992,42992,fletcher 27.5 '' wide polyester armchair,Teen Lounge Furniture|Accent Chairs,Furniture / Living Room Furniture / Chairs & S...,"bring iconic , modern style to your space in a...",legmaterialdetails : rubberwood|backheight-sea...,1746.0,4.5,1226.0,"[legmaterialdetails : rubberwood, backheight-s...",42992,fletcher 27.5 '' wide polyester armchair,"bring iconic , modern style to your space in a...",Furniture,Living Room Furniture,Furniture / Living Room Furniture,"Terms({'fletcher', 'polyest', '5', 'wide', 'ar...","Terms({'support', 'upholsteri', 'taper', 'stai..."


### Index the furniture

We'll index title and description with basic stemming to be able to retrieve them

In [None]:
from searcharray import SearchArray
from cheat_at_search.tokenizers import snowball_tokenizer

corpus['title_snowball'] = SearchArray.index(corpus['title'].fillna(''), snowball_tokenizer)
corpus['description_snowball'] = SearchArray.index(corpus['description'].fillna(''), snowball_tokenizer)

2026-02-03 14:53:58,925 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-03 14:53:58,944 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-03 14:53:58,951 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-03 14:53:59,575 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-03 14:54:00,116 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-03 14:54:00,739 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-03 14:54:01,138 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-03 14:54:01,472 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-03 14:54:01,489 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-03 14:54:01,508 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-03 14:54:01,604 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-03 14:54:01,691 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-03 14:54:01,699 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-03 14:54:01,790 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


2026-02-03 14:54:01,849 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-03 14:54:01,869 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-03 14:54:01,871 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-03 14:54:03,801 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-03 14:54:05,626 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-03 14:54:10,061 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-03 14:54:15,403 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-03 14:54:16,240 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-03 14:54:16,350 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-03 14:54:16,384 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-03 14:54:17,158 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-03 14:54:17,434 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-03 14:54:17,437 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-03 14:54:17,656 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


## Create a furniture products search function

Here is a function that searches a Wayfair product dataset. It's just a Python function that returns top 10 pieces of furniture.

Right now we'll call it directly, soon we'll help ChatGPT interact with this.

In [None]:
import numpy as np
from typing import Union

def search_furniture(keywords: str) -> list[dict[str, Union[str, int, float]]]:
    """Search the available furniture products, get top 10 furniture.

    This is just a naive BM25 / keyword search of the product title and description.
    Don't expect sophisticated synonyms or semantic search. Just basic keyword with
    some stemming.

    """
    print("search", keywords)
    required_keywords = [term[1:] for term in keywords.split() if term.startswith("+")]
    bm25_scores = np.zeros(len(corpus))
    for term in snowball_tokenizer(keywords):
        bm25_scores += corpus['title_snowball'].array.score(term) * 7
        bm25_scores += corpus['description_snowball'].array.score(term) * 4

    for required_term in snowball_tokenizer(" ".join(required_keywords)):
        required_score = (corpus['title_snowball'].array.score(required_term) +
                          corpus['description_snowball'].array.score(required_term))
        bm25_scores[required_score == 0] = 0

    top_k_indices = np.argsort(bm25_scores)[-10:][::-1]
    bm25_scores = bm25_scores[top_k_indices]
    top_movies = corpus.iloc[top_k_indices].copy()
    top_movies.loc[:, 'score'] = bm25_scores

    results = []
    for id, row in top_movies.iterrows():
        results.append({
            'id': row['doc_id'],
            'title': row['title'],
            'description': row['description'],
            'score': row['score']
        })
    return results



search_furniture("geometric style +couch")

search geometric style +couch


[{'id': 1217,
  'title': 'extra large and wide couch riser',
  'description': 'our largest and oversized couch , furniture , and bed riser . made for those extra-large couch and furniture legs . we created these to allow one time stacking . tested to lift over 6,000 pounds - we made it heavy duty . includes a leather pad to keep legs from sliding off the top and a rubber base to prevent slipping on the floor . fits almost all sofas , couches , beds , large legs , and feet .',
  'score': 37.797197341918945},
 {'id': 25326,
  'title': 'pixar cars 2 in 1 flip open kids foam couch',
  'description': "now your little one can have their very own place to sit with the marshmallow furniture children 's 2-in-1 flip open foam kids sofa . this couch for toddlers is the perfect place for them to call their own while they read , eat snacks , watch tv , or nap . this marshmallow furniture children 's 2-in-1 flip open foam futon-style sofa is made of lightweight foam so kiddos can move it around from

### Structured response

Here we have a simple pydantic response for the search request

In [None]:
from pydantic import BaseModel, Field
from typing import Optional, Literal


Categories = Literal['Furniture',
                     'Home Improvement',
                     'Décor & Pillows',
                     'Outdoor',
                     'Storage & Organization',
                     'Lighting',
                     'Rugs',
                     'Bed & Bath',
                     'Kitchen & Tabletop',
                     'Baby & Kids',
                     'School Furniture and Supplies',
                     'Appliances',
                     'Holiday Décor',
                     'Commercial Business Furniture',
                     'Pet',
                     'Contractor',
                     'Sale',
                     'Foodservice ',
                     'Reception Area',
                     'Clips']


class SearchRequest(BaseModel):
    """A simple keyword search to the furniture search index."""
    search_query: str = Field(..., description="The search query")

    category: list[Categories] = Field([], description="Filter by category, empty for no filters")


SearchRequest.model_json_schema()

{'description': 'A simple keyword search to the furniture search index.',
 'properties': {'search_query': {'description': 'The search query',
   'title': 'Search Query',
   'type': 'string'},
  'category': {'default': [],
   'description': 'Filter by category, empty for no filters',
   'items': {'enum': ['Furniture',
     'Home Improvement',
     'Décor & Pillows',
     'Outdoor',
     'Storage & Organization',
     'Lighting',
     'Rugs',
     'Bed & Bath',
     'Kitchen & Tabletop',
     'Baby & Kids',
     'School Furniture and Supplies',
     'Appliances',
     'Holiday Décor',
     'Commercial Business Furniture',
     'Pet',
     'Contractor',
     'Sale',
     'Foodservice ',
     'Reception Area',
     'Clips'],
    'type': 'string'},
   'title': 'Category',
   'type': 'array'}},
 'required': ['search_query'],
 'title': 'SearchRequest',
 'type': 'object'}

## Gather initial prompts

* System prompt - the general task, to lookup furniture in our catalog to recommend
* User prompt - what the user has given as a task (here listing the movies they like)



In [None]:
system_prompt = """
Users are coming to explore a catalog of furniture.

Generate a search query
"""

inputs = []
inputs.append({"role": "system", "content": system_prompt})

prompt = """
Help me find a modern couch with geometric style
"""

inputs.append({"role": "user", "content": prompt})


resp = openai.responses.parse(
    model="gpt-5",
    input=inputs,
    text_format=SearchRequest
)
resp.output_parsed

SearchRequest(search_query='modern geometric couch sofa contemporary angular clean lines boxy minimalist', category=['Furniture'])

In [None]:
furniture = search_furniture(resp.output_parsed.search_query)
furniture

search modern geometric couch sofa contemporary angular clean lines boxy minimalist


[{'id': 31141,
  'title': 'convertible sectional sofa couch , l-shaped couch with modern linen fabric for small space dark grey',
  'description': 'small sofa and space-saving : modern and stylish design indoor sofa set fit perfectly with any indoor decor . this sofa sets clean lines , solid construction , and a comfortable finish that the whole family will love , perfect for an apartment , a studio , a condo , or a small space . this small space reversible sectional sofa that works well in any corner or living room . chaise lounge base can go on the left or right freely as you like . sports velcros on the bottom of the cushions can avoid slipping when seating .',
  'score': 58.74964261054993},
 {'id': 6012,
  'title': 'francis contemporary patio sofa with cushions',
  'description': "soak up some sun in style with this sleek patio sofa . it 's made with a clean-lined aluminum frame , so it 's not only delivering an updated take on outdoor furniture , but it also resists water and uv l

## Give results back to the LLM

In classic RAG, we give search results back to the LLM and then ask a summary

In [None]:
system_prompt = """
Answer the users request. Note results have been appended to help you answer.
"""

inputs = []
inputs.append({"role": "system", "content": system_prompt})

prompt = """
Help me find a modern couch with geometric style
"""

inputs.append({"role": "user", "content": prompt})
inputs.append({"role": "user", "content": str(furniture)})

resp = openai.responses.create(
    model="gpt-5",
    input=inputs,
)
inputs += resp.output
print(resp.output)
#

[ResponseReasoningItem(id='rs_084f880c0f4e7ba40069820c3c1ea4819584d6c299d360c647', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_084f880c0f4e7ba40069820c4dfec08195bc7d12068b84db61', content=[ResponseOutputText(annotations=[], text='Here are strong, modern options with clean, geometric lines from the results you shared:\n\n- Chafin Contemporary Leather Sofa (ID 38685)\n  - Bold geometric look: straight arms, tufted box cushions, integrated stainless steel frame. Great for a sleek, modern living room or office.\n\n- Francis Contemporary Patio Sofa with Cushions (ID 6012)\n  - Outdoor-friendly with a clean-lined aluminum frame, wide square arms, and an open, blocky silhouette. Modern and minimal.\n\n- Modern L-Shaped Reversible Sectional Sofa Couch with Solid Wood Legs (ID 654)\n  - Rectilinear profile and crisp cushions; an L-shaped form that reads very geometric while staying cozy for living rooms.\n\n- Convertible Sectio

In [None]:
print(resp.output[-1].content[-1].text)

Here are strong, modern options with clean, geometric lines from the results you shared:

- Chafin Contemporary Leather Sofa (ID 38685)
  - Bold geometric look: straight arms, tufted box cushions, integrated stainless steel frame. Great for a sleek, modern living room or office.

- Francis Contemporary Patio Sofa with Cushions (ID 6012)
  - Outdoor-friendly with a clean-lined aluminum frame, wide square arms, and an open, blocky silhouette. Modern and minimal.

- Modern L-Shaped Reversible Sectional Sofa Couch with Solid Wood Legs (ID 654)
  - Rectilinear profile and crisp cushions; an L-shaped form that reads very geometric while staying cozy for living rooms.

- Convertible Sectional Sofa Couch, L-Shaped, Modern Linen, Dark Grey (ID 31141)
  - Compact, clean lines, reversible chaise, and squared cushions. Good geometric look for small spaces.

- Sectional Couch with Reversible Chaise, Modular L-Shape (ID 33593)
  - Modular “block” components for a very geometric, customizable layout.

## Put all this in a chat loop

This is the classic RAG loop

In [None]:
search_system_prompt = """
Users are coming to explore a catalog of furniture.

Generate a search query
"""

chat_system_prompt = """
Answer the users request. Note results have been appended to help you answer.
"""

search_query_inputs = []
search_query_inputs.append({"role": "system", "content": search_system_prompt})

prompt = """
Help me find a modern couch with geometric style
"""

search_query_inputs.append({"role": "user", "content": prompt})

chat_inputs = []
chat_inputs.append({"role": "system", "content": chat_system_prompt})
chat_inputs.append({"role": "user", "content": prompt})


for _ in range(5):
    resp = openai.responses.parse(
        model="gpt-5",
        input=search_query_inputs,
        text_format=SearchRequest
    )
    search_query_inputs += resp.output
    search_settings = resp.output_parsed

    furniture = search_furniture(search_settings.search_query)

    # Now take that and continue the chat
    chat_inputs.append({"role": "user", "content": str(furniture)})
    resp = openai.responses.create(
        model="gpt-5",
        input=chat_inputs,
    )
    chat_inputs += resp.output
    print(resp.output[-1].content[-1].text)
    user_response = input("User: ")
    chat_inputs.append({"role": "user", "content": user_response})
    search_query_inputs.append({"role": "user", "content": user_response})


search modern geometric couch sofa contemporary angular clean lines
Great brief. From the results you shared, here are the best fits for a modern, geometric look, ranked with quick notes:

Top picks
- Chafin Contemporary Leather Sofa (id: 38685)
  - Why it fits: Straight track arms, boxy silhouette, exposed stainless-steel frame/bar = strong geometric lines.
  - Vibe: Sleek, architectural, contemporary/office-chic.
  - Consider: Likely a firmer sit; black LeatherSoft (easy care, modern, slightly formal).

- Francis Contemporary Patio Sofa (id: 6012)
  - Why it fits: Clean-lined aluminum frame, wide square arms, open blocky base.
  - Vibe: Minimal, modern outdoor piece that can read very geometric.
  - Consider: Designed for outdoors (UV/water resistant). Works on covered patios or modern terraces.

Great if you want an L-shape/small-space solution
- Convertible Sectional Sofa, L-Shaped, Modern Linen (id: 31141)
  - Why it fits: Clean lines and a simple, rectilinear profile; reversible 

KeyboardInterrupt: Interrupted by user