# Synonyms generation with an LLM

<small>
(from <a href="http://maven.com/softwaredoug/cheat-at-search">Cheat at Search with Agents</a> training course by Doug Turnbull.)
</small>

Let's get familiar with the code we'll use for this class by doing what a lot of search teams did when they heard about LLMs

* Can I generate synonyms using LLMs?

We'll try to expand queries -> their synonyms and see if it helps NDCG

In [None]:
!pip install git+https://github.com/softwaredoug/cheat-at-search.git

Collecting git+https://github.com/softwaredoug/cheat-at-search.git
  Cloning https://github.com/softwaredoug/cheat-at-search.git to /tmp/pip-req-build-ylds9986
  Running command git clone --filter=blob:none --quiet https://github.com/softwaredoug/cheat-at-search.git /tmp/pip-req-build-ylds9986
  Resolved https://github.com/softwaredoug/cheat-at-search.git to commit 6a08d097f1d6eaa068fb61af47c621df1682f5e2
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


### Choose Gdrive or Instance Drive

* **Save money / convenience** - set `use_grive` to True and mount your google drive. Data will be cached there. Beware of annoying permissions you need to give this notebook.

* **Higher privacy / more cost** - set `use_gdrive` to False and the data will be stored as long as this notebook's runtime is running. Eventually it will be deallocated and you'll lose this cache and need to re-enter your OpenAI key when prompted.

* **High privacy / save money / higher mainenance burden** - Download ipynb and run in your own Jupyter. Set the CHEAT_AT_SEARCH_DATA_PATH to some place on your system.

In [None]:
from cheat_at_search.data_dir import mount
mount(use_gdrive=True)    # colab, share data across notebook runs on gdrive
# mount(use_gdrive=False) # <- colab without gdrive
# mount(use_gdrive=False, manual_path="/path/to/directory")  # <- force data path to specific directory, ie you're running locally.

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Import helpers

Import the following helpers:

* `run_strategy` -- this runs a "strategy" and gives us the search results for each query back (more on this in a second)
* `graded_bm25` -- a BM25 search baseline. A dump of the search results of every test query in the Wayfair dataset run using a BM25 baseline. Useful to compare our attempts against.
* `ndcgs` -- Take one of the sets of search results (ie `graded_bm25`) and get the NDCG of each query
* `ndcg_delta` -- Compare two sets of search results (ie `graded_bm25` vs `graded_my_cool_experiment`) and see which queries do better / worse
* `vs_ideal` -- Take a set of search results (ie `graded_bm25`) and compare against the ideal according to the ground truth data.

In [None]:
from cheat_at_search.search import run_strategy, graded_bm25, ndcgs, ndcg_delta, vs_ideal

## Import WANDS data

Import [Wayfair Annotated Dataset](https://github.com/wayfair/WANDS) a labeled furniture e-commerce dataset. This is a helpful dataset that has 480 e-commerce queries, along with ~45K furniture / home goods products, and relevance labels for each. In WANDS relevance labels range from 0 (not at all relevant) to 2 (relevant)

Below you see a sample of the corpus as a pandas dataframe.

In [None]:
from cheat_at_search.wands_data import products, judgments

products

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,doc_id,title,description,category,sub_category,cat_subcat,product_name_snowball,product_description_snowball
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0,"[overallwidth-sidetoside:64.7, dsprimaryproduc...",0,solid wood platform bed,"good , deep sleep can be quite difficult to ha...",Furniture,Bedroom Furniture,Furniture / Bedroom Furniture,"Terms({'bed', 'platform', 'solid', 'wood'})","Terms({'ani', 'better', 'under', 'time', 'emph..."
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0,"[capacityquarts:7, producttype : slow cooker, ...",1,all-clad 7 qt . slow cooker,"create delicious slow-cooked meals , from tend...",Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances,"Terms({'7', 'cooker', 'slow', 'all', 'qt', 'cl...","Terms({'unit', 'dish', 'you', 'steel', 'sleek'..."
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0,"[features : keep warm setting, capacityquarts:...",2,all-clad electrics 6.5 qt . slow cooker,prepare home-cooked meals on any schedule with...,Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances,"Terms({'cooker', '5', 'slow', 'electr', 'all',...","Terms({'meal', 'ani', 'slow', 'safe', 'prepar'..."
3,3,all-clad all professional tools pizza cutter,"Slicers, Peelers And Graters",Browse By Brand / All-Clad,this original stainless tool was designed to c...,overallwidth-sidetoside:3.5|warrantylength : l...,69.0,4.5,42.0,"[overallwidth-sidetoside:3.5, warrantylength :...",3,all-clad all professional tools pizza cutter,this original stainless tool was designed to c...,Browse By Brand,All-Clad,Browse By Brand / All-Clad,"Terms({'tool', 'cutter', 'all', 'profession', ...","Terms({'cut', 'pastri', 'design', 'featur', 'p..."
4,4,baldwin prestige alcott passage knob with roun...,Door Knobs,Home Improvement / Doors & Door Hardware / Doo...,the hardware has a rich heritage of delivering...,compatibledoorthickness:1.375 '' |countryofori...,70.0,5.0,42.0,"[compatibledoorthickness:1.375 '' , countryofo...",4,baldwin prestige alcott passage knob with roun...,the hardware has a rich heritage of delivering...,Home Improvement,Doors & Door Hardware,Home Improvement / Doors & Door Hardware,"Terms({'round', 'with', 'knob', 'passag', 'alc...","Terms({'modern', 'ani', 'has', 'effortless', '..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42989,42989,malibu pressure balanced diverter fixed shower...,Shower Panels,Home Improvement / Bathroom Remodel & Bathroom...,the malibu pressure balanced diverter fixed sh...,producttype : shower panel|spraypattern : rain...,3.0,4.5,2.0,"[producttype : shower panel, spraypattern : ra...",42989,malibu pressure balanced diverter fixed shower...,the malibu pressure balanced diverter fixed sh...,Home Improvement,Bathroom Remodel & Bathroom Fixtures,Home Improvement / Bathroom Remodel & Bathro...,"Terms({'shower', 'pressur', 'panel', 'malibu',...","Terms({'includ', 'unit', 'ani', 'overs', 'an',..."
42990,42990,emmeline 5 piece breakfast dining set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,,basematerialdetails : steel| : gray wood|ofhar...,1314.0,4.5,864.0,"[basematerialdetails : steel, : gray wood, of...",42990,emmeline 5 piece breakfast dining set,,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture,"Terms({'5', 'breakfast', 'set', 'emmelin', 'di...",Terms(set())
42991,42991,maloney 3 piece pub table set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,this pub table set includes 1 counter height t...,additionaltoolsrequirednotincluded : power dri...,49.0,4.0,41.0,[additionaltoolsrequirednotincluded : power dr...,42991,maloney 3 piece pub table set,this pub table set includes 1 counter height t...,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture,"Terms({'pub', 'tabl', 'maloney', '3', 'set', '...","Terms({'includ', 'make', 'will', 'meal', 'ani'..."
42992,42992,fletcher 27.5 '' wide polyester armchair,Teen Lounge Furniture|Accent Chairs,Furniture / Living Room Furniture / Chairs & S...,"bring iconic , modern style to your space in a...",legmaterialdetails : rubberwood|backheight-sea...,1746.0,4.5,1226.0,"[legmaterialdetails : rubberwood, backheight-s...",42992,fletcher 27.5 '' wide polyester armchair,"bring iconic , modern style to your space in a...",Furniture,Living Room Furniture,Furniture / Living Room Furniture,"Terms({'fletcher', '5', 'armchair', '27', 'wid...","Terms({'modern', 'showcas', 'your', 'solid', '..."


## Synonym generation

We'll first setup the scaffolding of setting up query -> synonym mapping. Expecting a list back of phrases -> their synonyms.


### Pydantic Models for Structured Output

["Pydantic"](https://docs.pydantic.dev/latest/) is a Python way of having a struct or simple data class. It can be a useful way to serialize data to/from underlying data formats (ie JSON, protobuf). And we'll largely work at this level of abstraction.

We're using [OpenAI's structured output](https://platform.openai.com/docs/guides/structured-outputs). Which means:

* Using pydantic to define the expected output (with a description that the model can use)
* Creating a 'struct like' view of the data we want OpenAI to produce.
* Forcing OpenAI to return a specific format, and not begging it to return parsable JSON

This pattern of using structured outputs is common across other vendors such al Ollama, Gemini, etc. Though there may be mild differences in how the pydantic types are interpreted.

In [None]:
from pydantic import BaseModel, Field
from typing import List
from cheat_at_search.enrich import AutoEnricher


class Query(BaseModel):
    """
    Base model for search queries, containing common query attributes.
    """
    keywords: str = Field(
        ...,
        description="The original search query keywords sent in as input"
    )


class SynonymMapping(BaseModel):
    """
    Model for mapping phrases in the query to equivalent phrases or synonyms.
    """
    phrase: str = Field(
        ...,
        description="The original phrase from the query"
    )
    synonyms: List[str] = Field(
        ...,
        description="List of synonyms or equivalent phrases for the original phrase"
    )


class QueryWithSynonyms(Query):
    """
    Extended model for search queries that includes synonyms for keywords.
    Inherits from the base Query model.
    """
    synonyms: List[SynonymMapping] = Field(
        ...,
        description="Mapping of phrases in the query to equivalent phrases or synonyms"
    )




In [None]:
QueryWithSynonyms.model_json_schema()

{'$defs': {'SynonymMapping': {'description': 'Model for mapping phrases in the query to equivalent phrases or synonyms.',
   'properties': {'phrase': {'description': 'The original phrase from the query',
     'title': 'Phrase',
     'type': 'string'},
    'synonyms': {'description': 'List of synonyms or equivalent phrases for the original phrase',
     'items': {'type': 'string'},
     'title': 'Synonyms',
     'type': 'array'}},
   'required': ['phrase', 'synonyms'],
   'title': 'SynonymMapping',
   'type': 'object'}},
 'description': 'Extended model for search queries that includes synonyms for keywords.\nInherits from the base Query model.',
 'properties': {'keywords': {'description': 'The original search query keywords sent in as input',
   'title': 'Keywords',
   'type': 'string'},
  'synonyms': {'description': 'Mapping of phrases in the query to equivalent phrases or synonyms',
   'items': {'$ref': '#/$defs/SynonymMapping'},
   'title': 'Synonyms',
   'type': 'array'}},
 'require

## Direct enrichment

In [None]:
from cheat_at_search.data_dir import key_for_provider
from openai import OpenAI

openai_key = key_for_provider("openai")


client = OpenAI(
   api_key=openai_key,
)
prompts = []
prompts.append({"role": "system",
               "content": "You are a search query synonym generator for furniture e-commerce"})
prompts.append({"role": "user", "content": "Please generate synonyms for query: suede couch"})

response = client.responses.parse(
   model="gpt-4o",
   input=prompts,
   text_format=QueryWithSynonyms
)

response.output_parsed

QueryWithSynonyms(keywords='suede couch', synonyms=[SynonymMapping(phrase='suede', synonyms=['microfiber', 'faux suede', 'suede-like', 'soft fabric']), SynonymMapping(phrase='couch', synonyms=['sofa', 'settee', 'loveseat', 'divan'])])

### Synonym generation code

We use `AutoEnricher` in this class. This is something that wraps the calls to OpenAI in the `cheat_at_search` package.

Notice when constructing it, we provide three values:

* `model` -- the underlying LLM to use. If you load ChatGPT, you would notice the dropdown of models you can select. They each have pros/cons with cost and quality.
* `system_prompt` -- the general behavior of the agent, priming it for the task its about to perform
* `response_model` -- the Pydantic class to use to generate structured outputs

We can then call `enricher.enrich(prompt)` and get back an instance of `QueryWithSynonyms`

Notice too `get_prompt` generates a prompt given a search query.

In [None]:
syn_enricher = AutoEnricher(model="openai/gpt-5-nano",
                            system_prompt="You are a helpful AI assistant extracting synonyms from queries.",
                            response_model=QueryWithSynonyms)

def get_prompt(query: str):
    prompt = f"""
        Extract synonyms from the following query that will help us find relevant products for the query.

        {query}
    """

    return prompt

print(get_prompt("rack glass"))


        Extract synonyms from the following query that will help us find relevant products for the query.

        rack glass
    


In [None]:
def query_to_syn(query: str):
    return syn_enricher.enrich(get_prompt(query))

query_to_syn("foldout blue ugly love seat")

QueryWithSynonyms(keywords='foldout blue ugly love seat', synonyms=[SynonymMapping(phrase='foldout', synonyms=['pull-out', 'sofa bed', 'futon bed', 'convertible sofa']), SynonymMapping(phrase='blue', synonyms=['navy blue', 'azure', 'cobalt', 'blue color', 'blue shade', 'blue upholstery']), SynonymMapping(phrase='ugly', synonyms=['unattractive', 'unsightly', 'plain', 'hideous', 'unappealing', 'dull']), SynonymMapping(phrase='love seat', synonyms=['loveseat', 'two-seater sofa', 'smaller sofa', 'duo-seater'])])

### Snowball tokenizer

We'll use a [snowball stemmer](https://www.nltk.org/api/nltk.stem.SnowballStemmer.html) when we index the data. This is just a function that takes a string and returns a list of tokens, each snowball stemmed.

In [None]:
from cheat_at_search.tokenizers import snowball_tokenizer
snowball_tokenizer("fancy furniture")

['fanci', 'furnitur']

### Build a SearchStrategy -- Enrich, index, search

A SearchStrategy emulates a typical search system, but in a mini form suitable for dorking around in this notebook.

Notice in `__init__`, indexing:

```
    self.index['product_name_snowball'] = SearchArray.index(
            products['product_name'],
            snowball_tokenizer
        )
```

Then later we `search`, summing up BM25 scores across different fields:

```
        # ***
        # For each token, get the BM25 score of that token in product name and
        # product description. Sum them
        for token in tokenized:
            bm25_scores += self.name_boost * self.index['product_name_snowball'].array.score(token)
            bm25_scores += self.description_boost * self.index['product_description_snowball'].array.score(
                token)
```

Farther down, you see we boost also when we match a synonym phrase.

#### SearchArray

We use a lexical search library [SearchArray](http://github.com/softwaredoug/search-array) for simple lexical searches. (See the notebooks and information in the prework for the class)

In the case of synonyms, a lot of teams trying this have a mature lexical search system like Elasticsearch. Instead of adding embedding retrieval to the search, they try this hack to see if they can cheat at search.

In [None]:
from searcharray import SearchArray
from cheat_at_search.strategy.strategy import SearchStrategy
import numpy as np


class SynonymSearch(SearchStrategy):
    def __init__(self, products, synonym_generator,
                 name_boost=9.3,
                 description_boost=4.1):
        """ Build an index."""
        super().__init__(products)
        self.index = products
        self.name_boost = name_boost
        self.description_boost = description_boost

        #*****
        # Take an array of text (here `products['product_name']`)
        # Tokenize it with snowball (the passed function)
        # Produce a searchable index on "product_name_snowball"
        self.index['product_name_snowball'] = SearchArray.index(
            products['product_name'],
            snowball_tokenizer
        )
        self.index['product_description_snowball'] = SearchArray.index(
            products['product_description'], snowball_tokenizer)
        self.query_to_syn = synonym_generator

    def search(self, query, k=10):
        """Dumb baseline lexical search with LLM generated synonyms"""
        # ***
        # Tokenize the query with snowball
        tokenized = snowball_tokenizer(query)
        bm25_scores = np.zeros(len(self.index))

        # ***
        # For each token, get the BM25 score of that token in product name and
        # product description. Sum them
        for token in tokenized:
            bm25_scores += self.name_boost * self.index['product_name_snowball'].array.score(token)
            bm25_scores += self.description_boost * self.index['product_description_snowball'].array.score(
                token)

        # ***
        # Generate synonyms
        synonyms = self.query_to_syn(query)

        # ***
        # Boost by each synonym phrase
        # (repeat the same above, except we add the BM25 scores of the generated synonyms)
        all_single_tokens = set()
        for mapping in synonyms.synonyms:
            for phrase in mapping.synonyms:
                tokenized = snowball_tokenizer(phrase)
                bm25_scores += self.index['product_name_snowball'].array.score(tokenized)
                bm25_scores += self.index['product_description_snowball'].array.score(tokenized)
                for token in tokenized:
                    all_single_tokens.add(token)

        # ***
        # Boost by each single token
        # for token in all_single_tokens:
        #     bm25_scores += self.index['product_name_snowball'].array.score(token)
        #     bm25_scores += self.index['product_description_snowball'].array.score(token)

        # ***
        # Sort by BM25 scores
        top_k = np.argsort(-bm25_scores)[:k]
        scores = bm25_scores[top_k]

        return top_k, scores


syns = SynonymSearch(products, query_to_syn)

2026-02-09 03:05:47,953 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-09 03:05:47,995 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-09 03:05:48,007 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-09 03:05:48,829 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-09 03:05:49,616 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-09 03:05:50,311 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-09 03:05:51,327 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-09 03:05:52,092 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-09 03:05:52,102 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-09 03:05:52,121 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-09 03:05:52,206 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-09 03:05:52,284 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-09 03:05:52,288 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-09 03:05:52,382 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


2026-02-09 03:05:52,437 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2026-02-09 03:05:52,453 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2026-02-09 03:05:52,460 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2026-02-09 03:05:53,837 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2026-02-09 03:05:54,936 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2026-02-09 03:05:56,104 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2026-02-09 03:05:57,355 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2026-02-09 03:05:58,182 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2026-02-09 03:05:58,226 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2026-02-09 03:05:58,251 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2026-02-09 03:05:59,051 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2026-02-09 03:05:59,982 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2026-02-09 03:05:59,985 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2026-02-09 03:06:00,302 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


### Run strategy, get results back

We call `run_strategy` which behind the scene passes every WANDS query to the `syns` strategy to get search results. Then appends them all to `graded_syns` which has 480 queries times 10 results per query (4800 rows)

In [None]:
# for each query
#   results = syns.search(query)
#   -- Give each result a 'grade'
#   --- Compute DCG
graded_syns = run_strategy(syns, judgments)
graded_syns

Searching: 100%|██████████| 480/480 [00:19<00:00, 24.69it/s]


Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,...,cat_subcat,score,query,query_id,rank,grade,discounted_gain,idcg,dcg,ndcg
0,7465,hair salon chair,Massage Chairs|Recliners,Furniture / Living Room Furniture / Chairs & S...,offers a wide selection of professional salon ...,fauxleathertype : pu|legheight-toptobottom:18|...,69.0,4.5,53.0,"[fauxleathertype : pu, legheight-toptobottom:1...",...,Furniture / Living Room Furniture,88.476816,salon chair,0,1,2.0,3.00,8.786905,8.364683,0.951949
1,7468,mercer41 hair salon chair hydraulic styling ch...,Massage Chairs,Furniture / Living Room Furniture / Chairs & S...,mercer41 beauty offers a wide selection profes...,seatfillmaterial : foam|waterrepellant : no re...,1.0,5.0,1.0,"[seatfillmaterial : foam, waterrepellant : no ...",...,Furniture / Living Room Furniture,76.936114,salon chair,0,2,2.0,1.50,8.786905,8.364683,0.951949
2,25431,barberpub salon massage chair,Massage Chairs,Furniture / Living Room Furniture / Chairs & S...,salon chairs are a wonderful avenue for hairst...,supplierintendedandapproveduse : non residenti...,4.0,5.0,4.0,[supplierintendedandapproveduse : non resident...,...,Furniture / Living Room Furniture,67.237749,salon chair,0,3,2.0,1.00,8.786905,8.364683,0.951949
3,39428,barber salon reclining massage chair,Massage Chairs|Recliners,Furniture / Living Room Furniture / Chairs & S...,heavy-duty hydraulic recline barber chair salo...,seatwidth-sidetoside:19|upholsterycolor : blac...,1.0,5.0,0.0,"[seatwidth-sidetoside:19, upholsterycolor : bl...",...,Furniture / Living Room Furniture,63.400725,salon chair,0,4,2.0,0.75,8.786905,8.364683,0.951949
4,36910,beauty spa salon barber chair,Massage Chairs,Furniture / Living Room Furniture / Chairs & S...,this barber chair would be a perfect choice fo...,upholsterymaterial : leather match|color : red...,18.0,5.0,14.0,"[upholsterymaterial : leather match, color : r...",...,Furniture / Living Room Furniture,63.346391,salon chair,0,5,2.0,0.60,8.786905,8.364683,0.951949
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4795,39976,wall mounted wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,"the latest addition to this collection , this ...",overallheight-toptobottom:4|design : wall moun...,34.0,4.5,18.0,"[overallheight-toptobottom:4, design : wall mo...",...,Kitchen & Tabletop / Tableware & Drinkware,78.468853,rack glass,487,6,0.0,0.00,8.786905,0.000000,0.000000
4796,40243,madisen hanging wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,complement your farmhouse kitchen decor with t...,producttype : wine glass rack|overallwidth-sid...,29.0,5.0,20.0,"[producttype : wine glass rack, overallwidth-s...",...,Kitchen & Tabletop / Tableware & Drinkware,77.515130,rack glass,487,7,0.0,0.00,8.786905,0.000000,0.000000
4797,40244,kena hanging wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,spruce up your farmhouse kitchen decor with th...,warrantylength:1 year|producttype : wine glass...,23.0,5.0,18.0,"[warrantylength:1 year, producttype : wine gla...",...,Kitchen & Tabletop / Tableware & Drinkware,77.191142,rack glass,487,8,0.0,0.00,8.786905,0.000000,0.000000
4798,40245,podgorni hanging wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,display and protect your delicate wine or marg...,overallheight-toptobottom:1.5|stemwarecapacity...,6.0,4.0,6.0,"[overallheight-toptobottom:1.5, stemwarecapaci...",...,Kitchen & Tabletop / Tableware & Drinkware,76.073124,rack glass,487,9,0.0,0.00,8.786905,0.000000,0.000000


### Look at one search result...

In [None]:
graded_syns[graded_syns['query'] == "wood bar stools"]

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,...,cat_subcat,score,query,query_id,rank,grade,discounted_gain,idcg,dcg,ndcg
4340,17625,adona solid wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,this classic bar stool is solid wood for firm ...,levelofassembly : partial assembly|overalldept...,3.0,4.0,3.0,"[levelofassembly : partial assembly, overallde...",...,Furniture / Kitchen & Dining Furniture,66.594088,wood bar stools,440,1,2.0,3.0,8.786905,7.503571,0.853949
4341,39984,solid wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,outfit the home bar or accent your favorite se...,overalldepth-fronttoback:13.4|dsprimaryproduct...,330.0,4.5,253.0,"[overalldepth-fronttoback:13.4, dsprimaryprodu...",...,Furniture / Kitchen & Dining Furniture,66.542115,wood bar stools,440,2,2.0,1.5,8.786905,7.503571,0.853949
4342,24132,bergstrom solid wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,these solid wood bar stools add a contemporary...,seatdepth-fronttoback:13|legbasetype:4 legs|se...,207.0,5.0,154.0,"[seatdepth-fronttoback:13, legbasetype:4 legs,...",...,Furniture / Kitchen & Dining Furniture,66.494473,wood bar stools,440,3,2.0,1.0,8.786905,7.503571,0.853949
4343,4888,gollapalli solid wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,enhance the beauty of your home with the inclu...,upholsterymaterial : faux leather|seatbacktype...,,,,"[upholsterymaterial : faux leather, seatbackty...",...,Furniture / Kitchen & Dining Furniture,66.376257,wood bar stools,440,4,1.0,0.25,8.786905,7.503571,0.853949
4344,5105,peatman solid wood counter & bar stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,this set of two bar stools is a great addition...,seatbacktype : backless|productcare : do not u...,2.0,4.5,2.0,"[seatbacktype : backless, productcare : do not...",...,Furniture / Kitchen & Dining Furniture,65.998547,wood bar stools,440,5,2.0,0.6,8.786905,7.503571,0.853949
4345,18577,pala wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,this stool is an exclusive counter and bar wit...,supplierintendedandapproveduse : non residenti...,5.0,5.0,5.0,[supplierintendedandapproveduse : non resident...,...,Furniture / Kitchen & Dining Furniture,65.768304,wood bar stools,440,6,1.0,0.166667,8.786905,7.503571,0.853949
4346,39979,evelino solid wood counter and bar stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,"spruce up your kitchen , breakfast , or den wi...",seatwidth-sidetoside:13.58|legbasetype:4 legs|...,231.0,4.5,144.0,"[seatwidth-sidetoside:13.58, legbasetype:4 leg...",...,Furniture / Kitchen & Dining Furniture,65.62989,wood bar stools,440,7,2.0,0.428571,8.786905,7.503571,0.853949
4347,37300,axelle solid wood bar and counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,this bar & counter stool is a sublime stool fu...,dswoodtone : medium wood|seatmaterialdetails :...,298.0,4.5,203.0,"[dswoodtone : medium wood, seatmaterialdetails...",...,Furniture / Kitchen & Dining Furniture,65.56473,wood bar stools,440,8,1.0,0.125,8.786905,7.503571,0.853949
4348,4884,abramowitz solid wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,give yourself a place to sit right next to you...,overallproductweight:15.84|framematerial : sol...,,,,"[overallproductweight:15.84, framematerial : s...",...,Furniture / Kitchen & Dining Furniture,65.445647,wood bar stools,440,9,2.0,0.333333,8.786905,7.503571,0.853949
4349,34277,crystelle solid wood bar & counter stool,Bar Stools,Furniture / Kitchen & Dining Furniture / Bar F...,this crystelle solid wood bar & counter stool ...,dssecondaryproductstyle : contemporary industr...,21.0,4.5,17.0,[dssecondaryproductstyle : contemporary indust...,...,Furniture / Kitchen & Dining Furniture,64.735635,wood bar stools,440,10,1.0,0.1,8.786905,7.503571,0.853949


In [None]:
query_to_syn("wood bar stools")

QueryWithSynonyms(keywords='wood bar stools', synonyms=[SynonymMapping(phrase='wood bar stools', synonyms=['wooden bar stools', 'timber bar stools', 'wood barseat stools', 'wood bar-height stools', 'wood barstools', 'wooden counter stools', 'wood counter height stools', 'rustic wood bar stools', 'oak bar stools', 'pine bar stools'])])

## Analyze the results

Let's look at the results to see how we did against a BM25 baseline

Here we get ndcg of each query with `ndcgs`, then compute the mean for all queries. We do this comparing BM25 vs our synonym variant

In [None]:
ndcgs(graded_bm25).mean(), ndcgs(graded_syns).mean()

(np.float64(0.5411098691836396), np.float64(0.5506856004154811))

### Win / loss against BM25 baseline

`ndcg_delta` shows us the per-query NDCG difference

* We note some massive wins
* We unfortunately also note massive variance in outcomes (meaning a risky change)

In [None]:
ndcg_delta(graded_syns, graded_bm25)

Unnamed: 0_level_0,ndcg
query,Unnamed: 1_level_1
midcentury tv unit,0.665763
cover set for outdoor furniture,0.452965
7qt slow cooker,0.446371
bathroom vanity knobs,0.409701
desk for kids,0.315811
...,...
closet storage with zipper,-0.131825
adjustable height artist stool,-0.151741
bathroom single faucet,-0.181547
king size bed,-0.287901


### Examine a single query (what went right/wrong?)

First we see what BM25 produced...

In [None]:
QUERY = "seat cushions desk"

In [None]:
graded_bm25[graded_bm25['query'] == QUERY][['rank', 'product_name', 'product_description', 'grade']]

Unnamed: 0,rank,product_name,product_description,grade
2930,1,ergonomic memory foam seat cushion,work and drive in absolute comfort with the er...,2.0
2931,2,chiavari seat cushion,hard cushions are the most popular choice in t...,1.0
2932,3,deluxe seat cushion,the deluxe seat and back cushion by sacro-ease...,1.0
2933,4,deep outdoor seat cushion,this seat & back deep seating cushions feature...,1.0
2934,5,outdoor seat cushion,add personality and comfort to your outdoor pa...,1.0
2935,6,indoor seat cushion,the classic buffalo check pattern comes to lif...,1.0
2936,7,indoor/outdoor seat cushion,become your own personal designer with their f...,1.0
2937,8,outdoor seat/back cushion,this seat and back cushion adds a boost of sof...,1.0
2938,9,outdoor sunbrella seat cushion,this outdoor wicker seat cushion is made for c...,0.0
2939,10,gel seat cushion,sleekly designed with an ergonomic shape for r...,2.0


In [None]:
graded_syns[graded_syns['query'] ==  QUERY][['rank', 'product_name', 'product_description', 'grade']]

Unnamed: 0,rank,product_name,product_description,grade
2930,1,ergonomic memory foam seat cushion,work and drive in absolute comfort with the er...,2.0
2931,2,deep outdoor seat cushion,this seat & back deep seating cushions feature...,1.0
2932,3,outdoor seat cushion,add personality and comfort to your outdoor pa...,1.0
2933,4,indoor seat cushion,the classic buffalo check pattern comes to lif...,1.0
2934,5,chiavari seat cushion,hard cushions are the most popular choice in t...,1.0
2935,6,deluxe seat cushion,the deluxe seat and back cushion by sacro-ease...,1.0
2936,7,indoor/outdoor seat cushion,become your own personal designer with their f...,1.0
2937,8,ergonomic gel seat cushion,support your seat using the gel cushion . sitt...,2.0
2938,9,gel seat cushion,sleekly designed with an ergonomic shape for r...,2.0
2939,10,devrek outdoor seat cushion,the smoky gray tone of this outdoor oversized ...,1.0


In [None]:
against_ideal = vs_ideal(graded_syns, judgments, products)
against_ideal[against_ideal['query'] == QUERY]

Unnamed: 0,query_id,query,doc_id_ideal,grade_ideal,rank_ideal,title_ideal,title_actual,rank_actual,doc_id_actual,grade_actual,dcg,ndcg
2930,297,seat cushions desk,6263.0,2.0,1.0,amamedic mesh seat cushion,ergonomic memory foam seat cushion,1,39975,2.0,5.40119,0.614686
2931,297,seat cushions desk,11134.0,2.0,2.0,yeslife seat cushion,deep outdoor seat cushion,2,10294,1.0,5.40119,0.614686
2932,297,seat cushions desk,15659.0,2.0,3.0,office chair seat cushion,outdoor seat cushion,3,9648,1.0,5.40119,0.614686
2933,297,seat cushions desk,27626.0,2.0,4.0,pressure relief non-slip orthopedic seat cushion,indoor seat cushion,4,5542,1.0,5.40119,0.614686
2934,297,seat cushions desk,27635.0,2.0,5.0,ergonomic gel seat cushion,chiavari seat cushion,5,28919,1.0,5.40119,0.614686
2935,297,seat cushions desk,30395.0,2.0,6.0,creative office pillow plush back seat cushion,deluxe seat cushion,6,25206,1.0,5.40119,0.614686
2936,297,seat cushions desk,32638.0,2.0,7.0,gel seat cushion,indoor/outdoor seat cushion,7,20124,1.0,5.40119,0.614686
2937,297,seat cushions desk,39975.0,2.0,8.0,ergonomic memory foam seat cushion,ergonomic gel seat cushion,8,27635,2.0,5.40119,0.614686
2938,297,seat cushions desk,198.0,1.0,9.0,indoor chair cushion,gel seat cushion,9,32638,2.0,5.40119,0.614686
2939,297,seat cushions desk,202.0,1.0,10.0,chair indoor seat cushion,devrek outdoor seat cushion,10,30768,1.0,5.40119,0.614686


In [None]:
query_to_syn(QUERY)

QueryWithSynonyms(keywords='seat cushions desk', synonyms=[SynonymMapping(phrase='seat cushions desk', synonyms=['seat cushions', 'desk cushions', 'chair cushions', 'cushions for seat', 'office chair cushions', 'desk chair cushions', 'seat pads', 'desk pad cushions'])])