# Testing Qwen3 embeddings on WANDS

## Objective

In this project, the goal is to:

- Set up a local Quepid instance - giving us a safe playground for experimentation.
- Programmatically load the WANDS dataset into Quepid - creating multiple cases from the same dataset to test different configurations and scenarios.
- Compare scoring approaches - evaluate and contrast various methods for measuring search quality across those cases.

By the end, we’ll have a reproducible workflow for running relevance experiments locally and benchmarking scoring strategies using the WANDS dataset.

## What is WANDS

WANDS is a human-annotated dataset from Wayfair for evaluating product search relevance. It includes 480 queries, ~43K products, and 233K query-product relevance labels (Exact, Partial, Irrelevant), plus rich product metadata—ideal for training and benchmarking search models.

## What is Quepid

Quepid is an open-source search relevance tuning and evaluation tool that bridges the gap between search engineers and domain experts. It lets you run queries, inspect results, and score them against a gold standard of relevance judgments — all in a collaborative interface. With support for search engines like Elasticsearch, Solr, and OpenSearch, Quepid makes it easier to experiment with ranking changes, track their impact over time, and communicate improvements to non-technical stakeholders. Whether you’re iterating on query configurations or benchmarking machine-learning-based ranking models, Quepid gives you a structured way to measure and improve search quality.

## Initial set up of everything

#### Set Up Infra

`cp .env.example .env`

`vi .env`

In [2]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml build

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
 => [internal] load local bake definitions                                 0.0s
[?25h[1A[1A[0G[?25l[+] Building 0.2s (1/1)                                                         
[34m => [internal] load local bake definitions                                 0.0s
[0m[34m => => reading from stdin 1.30kB                                           0.0s
[0m[?25h[1A[1A[1A[0G[?25l[+] Building 0.2s (1/2)                                                         
[34m => [internal] load local bake definitions                                 0.0s
[0m[34m => => reading from stdin 1.30kB                                           0.0s
[0m[?25h[1A[1A[1A[0G[?25l[+] Building 0.3s (2/4)                                                         
[34m => [internal] load local bake definitions                                 0.0s
[0m[34m => => reading from stdin 1.30kB                

In [3]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml run quepid-api-quepid bin/rake db:migrate

[1A[1B[0G[?25l[+] Running 0/1
 [33m⠙[0m quepid-api-mysql Pulling                                                [34m0.1s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠹[0m quepid-api-mysql Pulling                                                [34m0.2s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠸[0m quepid-api-mysql Pulling                                                [34m0.3s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠼[0m quepid-api-mysql Pulling                                                [34m0.4s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠴[0m quepid-api-mysql Pulling                                                [34m0.5s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠦[0m quepid-api-mysql Pulling                                                [34m0.6s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠧[0m quepid-api-mysql Pulling                                                [34m0.7s [0m
[?25h[1A[1A[0G[?25l[+] Runni

In [4]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml run quepid-api-quepid bin/rake db:seed

[1A[1B[0G[?25l[+] Running 0/1
 [33m⠙[0m quepid-api-mysql Pulling                                                [34m0.1s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠹[0m quepid-api-mysql Pulling                                                [34m0.2s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠸[0m quepid-api-mysql Pulling                                                [34m0.3s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠼[0m quepid-api-mysql Pulling                                                [34m0.4s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠴[0m quepid-api-mysql Pulling                                                [34m0.5s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠦[0m quepid-api-mysql Pulling                                                [34m0.6s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠧[0m quepid-api-mysql Pulling                                                [34m0.7s [0m
[?25h[1A[1A[0G[?25l[+] Runni

In [7]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml run quepid-api-quepid bundle exec thor user:create -a admin@example.com "Admin User" supersecret

[1A[1B[0G[?25l[+] Running 0/1
 [33m⠋[0m quepid-api-mysql Pulling                                                [34m0.1s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠙[0m quepid-api-mysql Pulling                                                [34m0.2s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠹[0m quepid-api-mysql Pulling                                                [34m0.3s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠸[0m quepid-api-mysql Pulling                                                [34m0.4s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠼[0m quepid-api-mysql Pulling                                                [34m0.5s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠴[0m quepid-api-mysql Pulling                                                [34m0.6s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠦[0m quepid-api-mysql Pulling                                                [34m0.7s [0m
[?25h[1A[1A[0G[?25l[+] Runni

In [8]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml run quepid-api-quepid bundle exec thor user:add_api_key admin@example.com

[1A[1B[0G[?25l[+] Running 0/1
 [33m⠋[0m quepid-api-mysql Pulling                                                [34m0.1s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠙[0m quepid-api-mysql Pulling                                                [34m0.2s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠹[0m quepid-api-mysql Pulling                                                [34m0.3s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠸[0m quepid-api-mysql Pulling                                                [34m0.4s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠼[0m quepid-api-mysql Pulling                                                [34m0.5s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠴[0m quepid-api-mysql Pulling                                                [34m0.6s [0m
[?25h[1A[1A[0G[?25l[+] Running 0/1
 [33m⠦[0m quepid-api-mysql Pulling                                                [34m0.7s [0m
[?25h[1A[1A[0G[?25l[+] Runni

In [None]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml up


In [15]:
!docker compose -f docker-compose.yml -f docker-compose-ollama.yml exec ollama /usr/bin/ollama pull qwen3-embedding

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 3fcd3febec8b:   0% ▕               

In [16]:
QUEPID_TOKEN = '71180e219a105c619965498298bcbf81886d8dd22f64df2742c63acabeb5f8e4'  # past the token you created earlier

#### Services

After `docker compose up` you will have running instance of elasticsearch, quepid and quepid HTTP API (with a sandbox)

- for api sandbox: http://localhost:8081/api/docs
- for quepid: http://localhost:3000/

#### Config

In [17]:
WANDS_INDEX = 'http://localhost:9200/wands'
QUEPID_AUTH = {
    "Authorization": f"Bearer {QUEPID_TOKEN}"
}

### Python dependencis

In [18]:
!pip install pandas requests tqdm



In [20]:
import requests
import json

from tqdm import tqdm
import pandas as pd

### WANDS

In [3]:
!git clone https://github.com/wayfair/WANDS.git

fatal: destination path 'WANDS' already exists and is not an empty directory.


In [21]:
query_df = pd.read_csv("WANDS/dataset/query.csv", sep='\t')
query_df

Unnamed: 0,query_id,query,query_class
0,0,salon chair,Massage Chairs
1,1,smart coffee table,Coffee & Cocktail Tables
2,2,dinosaur,Kids Wall Décor
3,3,turquoise pillows,Accent Pillows
4,4,chair and a half recliner,Recliners
...,...,...,...
475,483,rustic twig,Faux Plants and Trees
476,484,nespresso vertuo next premium by breville with...,Espresso Machines
477,485,pedistole sink,Kitchen Sinks
478,486,54 in bench cushion,Furniture Cushions


In [22]:
product_df = pd.read_csv("WANDS/dataset/product.csv", sep='\t')
product_df

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0
3,3,all-clad all professional tools pizza cutter,"Slicers, Peelers And Graters",Browse By Brand / All-Clad,this original stainless tool was designed to c...,overallwidth-sidetoside:3.5|warrantylength : l...,69.0,4.5,42.0
4,4,baldwin prestige alcott passage knob with roun...,Door Knobs,Home Improvement / Doors & Door Hardware / Doo...,the hardware has a rich heritage of delivering...,compatibledoorthickness:1.375 '' |countryofori...,70.0,5.0,42.0
...,...,...,...,...,...,...,...,...,...
42989,42989,malibu pressure balanced diverter fixed shower...,Shower Panels,Home Improvement / Bathroom Remodel & Bathroom...,the malibu pressure balanced diverter fixed sh...,producttype : shower panel|spraypattern : rain...,3.0,4.5,2.0
42990,42990,emmeline 5 piece breakfast dining set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,,basematerialdetails : steel| : gray wood|ofhar...,1314.0,4.5,864.0
42991,42991,maloney 3 piece pub table set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,this pub table set includes 1 counter height t...,additionaltoolsrequirednotincluded : power dri...,49.0,4.0,41.0
42992,42992,fletcher 27.5 '' wide polyester armchair,Teen Lounge Furniture|Accent Chairs,Furniture / Living Room Furniture / Chairs & S...,"bring iconic , modern style to your space in a...",legmaterialdetails : rubberwood|backheight-sea...,1746.0,4.5,1226.0


In [8]:
labels_df = pd.read_csv("WANDS/dataset/label.csv", sep='\t')
labels_df

Unnamed: 0,id,query_id,product_id,label
0,0,0,25434,Exact
1,1,0,12088,Irrelevant
2,2,0,42931,Exact
3,3,0,2636,Exact
4,4,0,42923,Exact
...,...,...,...,...
233443,234010,478,15439,Partial
233444,234011,478,451,Partial
233445,234012,478,30764,Irrelevant
233446,234013,478,16796,Partial


## Build embeddings

In [24]:
from openai import OpenAI
openai_client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'
)

In [30]:
def make_embedding(text: str) -> list[float]:
    resp = openai_client.embeddings.create(
        model="qwen3-embedding",
        input=text,
    )
    return resp.data[0].embedding

def embed_row(row) -> list[float] | None:
    if pd.isna(row["product_name"]):
        return None
    return make_embedding(row['product_name'])

def embed_row_instruct(row) -> list[float] | None:
    if pd.isna(row["product_name"]):
        return None
    instruction = "Instruct: Embed the product title for e-commerce search\nProduct: {row['product_name']}"
    return make_embedding(instruction)

In [27]:
# make_embedding('hello world')

In [28]:
tqdm.pandas()
product_df["name_vector1"] = product_df.progress_apply(embed_row, axis=1)

100%|████████████████████████████████████████████████████████████████████████████████| 42994/42994 [4:33:05<00:00,  2.62it/s]


In [31]:
tqdm.pandas()
product_df["name_vector2"] = product_df.progress_apply(embed_row_instruct, axis=1)

100%|████████████████████████████████████████████████████████████████████████████████| 42994/42994 [8:10:00<00:00,  1.46it/s]


In [32]:
product_df.to_parquet(
    "data/qwen-embeddings.parquet",
    index=False
)

In [None]:
product_df2 = pd.read_parquet(
    "data/qwen-embeddings.parquet"
)

In [33]:
product_df

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,name_vector1,name_vector2
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0,"[0.03973584622144699, 0.0051023829728364944, 0...","[0.024189326912164688, 0.002437025774270296, -..."
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0,"[-0.00834231823682785, 0.016884781420230865, 0...","[0.024189326912164688, 0.002437025774270296, -..."
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0,"[-0.011743362061679363, 0.017630895599722862, ...","[0.024189326912164688, 0.002437025774270296, -..."
3,3,all-clad all professional tools pizza cutter,"Slicers, Peelers And Graters",Browse By Brand / All-Clad,this original stainless tool was designed to c...,overallwidth-sidetoside:3.5|warrantylength : l...,69.0,4.5,42.0,"[-0.0019922717474400997, 0.024797895923256874,...","[0.024189326912164688, 0.002437025774270296, -..."
4,4,baldwin prestige alcott passage knob with roun...,Door Knobs,Home Improvement / Doors & Door Hardware / Doo...,the hardware has a rich heritage of delivering...,compatibledoorthickness:1.375 '' |countryofori...,70.0,5.0,42.0,"[0.0046699391677975655, 0.023147614672780037, ...","[0.024189326912164688, 0.002437025774270296, -..."
...,...,...,...,...,...,...,...,...,...,...,...
42989,42989,malibu pressure balanced diverter fixed shower...,Shower Panels,Home Improvement / Bathroom Remodel & Bathroom...,the malibu pressure balanced diverter fixed sh...,producttype : shower panel|spraypattern : rain...,3.0,4.5,2.0,"[0.006178273819386959, -0.00010368255607318133...","[0.024189326912164688, 0.002437025774270296, -..."
42990,42990,emmeline 5 piece breakfast dining set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,,basematerialdetails : steel| : gray wood|ofhar...,1314.0,4.5,864.0,"[0.0053030638955533504, 0.019402146339416504, ...","[0.024189326912164688, 0.002437025774270296, -..."
42991,42991,maloney 3 piece pub table set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,this pub table set includes 1 counter height t...,additionaltoolsrequirednotincluded : power dri...,49.0,4.0,41.0,"[0.009952022694051266, 0.020193250849843025, -...","[0.024189326912164688, 0.002437025774270296, -..."
42992,42992,fletcher 27.5 '' wide polyester armchair,Teen Lounge Furniture|Accent Chairs,Furniture / Living Room Furniture / Chairs & S...,"bring iconic , modern style to your space in a...",legmaterialdetails : rubberwood|backheight-sea...,1746.0,4.5,1226.0,"[0.009815959259867668, 0.025633202865719795, 0...","[0.024189326912164688, 0.002437025774270296, -..."


In [45]:
len(list(product_df["name_vector1"][0]))

4096

## Set up Elasticsearch

### Create index

In [36]:
inference = requests.put(
    'http://localhost:9200/_inference/text_embedding/qwen3-embeddings',
    json={
    "service": "custom",
    "service_settings": {
      "secret_parameters": {
        "api_key": "ollama"
      },
      "url": "http://ollama:11434/v1/embeddings",
      "headers": {
        "Authorization": "Bearer ${api_key}",
        "Content-Type": "application/json"
      },
      "request": "{\"model\":\"qwen3-embedding\",\"input\":${input}}",
      "response": {
        "json_parser": {
          "text_embeddings": "$.data[*].embedding[*]"
        }
      }
    }
  }
)

{'inference_id': 'qwen3-embeddings',
 'task_type': 'text_embedding',
 'service': 'custom',
 'service_settings': {'similarity': 'dot_product',
  'dimensions': 4096,
  'url': 'http://ollama:11434/v1/embeddings',
  'headers': {'Authorization': 'Bearer ${api_key}',
   'Content-Type': 'application/json'},
  'request': '{"model":"qwen3-embedding","input":${input}}',
  'response': {'json_parser': {'text_embeddings': '$.data[*].embedding[*]'}},
  'input_type': {'translation': {}, 'default': ''},
  'rate_limit': {'requests_per_minute': 10000},
  'batch_size': 10},
 'chunking_settings': {'strategy': 'word',
  'max_chunk_size': 250,
  'overlap': 100}}

In [37]:
inference.json()

{'inference_id': 'qwen3-embeddings',
 'task_type': 'text_embedding',
 'service': 'custom',
 'service_settings': {'similarity': 'dot_product',
  'dimensions': 4096,
  'url': 'http://ollama:11434/v1/embeddings',
  'headers': {'Authorization': 'Bearer ${api_key}',
   'Content-Type': 'application/json'},
  'request': '{"model":"qwen3-embedding","input":${input}}',
  'response': {'json_parser': {'text_embeddings': '$.data[*].embedding[*]'}},
  'input_type': {'translation': {}, 'default': ''},
  'rate_limit': {'requests_per_minute': 10000},
  'batch_size': 10},
 'chunking_settings': {'strategy': 'word',
  'max_chunk_size': 250,
  'overlap': 100}}

In [51]:
mapping = {
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "name_embedding": {
        "type": "dense_vector",
        "dims": 4096,
        "index": True,
        "similarity": "cosine"
      }
    }
  }
  }

In [47]:
index1 = requests.put(
    'http://localhost:9200/product-1',
    json=mapping
)
index1.json()

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'product-1'}

In [49]:
aaa = requests.get(
    'http://localhost:9200/product-1/_mapping',
)
aaa.json()

{'product-1': {'mappings': {'properties': {'name': {'type': 'text'},
    'name_embedding': {'type': 'dense_vector',
     'dims': 4096,
     'index': True,
     'similarity': 'cosine',
     'index_options': {'type': 'int8_hnsw',
      'm': 16,
      'ef_construction': 100}}}}}}

In [53]:
index2 = requests.put(
    'http://localhost:9200/product-2',
    json=mapping
)
index2.json()

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'product-2'}

### Index products

In [54]:
def index_record(index, id, name, embedding):
    if id and name and embedding:
        try:
            return requests.post(
                f"http://localhost:9200/{index}/_doc/{id}",
                json={
                    'name': name,
                    'name_embedding': embedding
                }
            )
        except:
            pass

In [55]:
for index, row in tqdm(product_df.iterrows(), total=len(product_df)):
    _ = index_record('product-1', row['product_id'], row['product_name'], row['name_vector1'])

100%|█████████████████████████████████████████████████████████████████████████████████| 42994/42994 [06:06<00:00, 117.45it/s]


In [56]:
for index, row in tqdm(product_df.iterrows(), total=len(product_df)):
    _ = index_record('product-2', row['product_id'], row['product_name'], row['name_vector2'])

100%|█████████████████████████████████████████████████████████████████████████████████| 42994/42994 [05:34<00:00, 128.72it/s]


### Search in data

In [61]:
r = requests.post('http://localhost:9200/product-1/_search', 
json={
  "knn": {
    "field": "name_embedding",
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "qwen3-embeddings",
        "model_text": "dinosaur"
      }
    },
    "k": 10,
    "num_candidates": 300
  },
  "_source": "name"
})

r.json()


{'took': 481,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 10, 'relation': 'eq'},
  'max_score': 0.90430737,
  'hits': [{'_index': 'product-1',
    '_id': '34735',
    '_score': 0.90430737,
    '_source': {'name': 'dinosaur standup'}},
   {'_index': 'product-1',
    '_id': '38628',
    '_score': 0.8929882,
    '_source': {'name': 'dinosaur skull with base'}},
   {'_index': 'product-1',
    '_id': '34737',
    '_score': 0.8918686,
    '_source': {'name': 'dinosaur wall decor'}},
   {'_index': 'product-1',
    '_id': '12165',
    '_score': 0.8910303,
    '_source': {'name': 'dinosaur green & blue - graphic art print'}},
   {'_index': 'product-1',
    '_id': '29736',
    '_score': 0.8905773,
    '_source': {'name': 'dinosaur storage bin'}},
   {'_index': 'product-1',
    '_id': '11877',
    '_score': 0.88856506,
    '_source': {'name': 'dinosaur flying swimming and land dinosaurs with lake and mountains dino park a

In [57]:
response = requests.post(
    "http://localhost:9200/product-1/_search",
    json={
        "size": 0,
        "track_total_hits": True
    }
)
response.json()

{'took': 80,
 'timed_out': False,
 'terminated_early': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 42993, 'relation': 'eq'},
  'max_score': None,
  'hits': []}}

In [72]:
def search_query(query='#$query##'):
    return {
      "knn": {
        "field": "name_embedding",
        "query_vector_builder": {
          "text_embedding": {
            "model_id": "qwen3-embeddings",
            "model_text": query
          }
        },
        "k": 10,
        "num_candidates": 300
      },
      "_source": "name"
    }


def search(index, query, instruction=None):
    if instruction:
        query = instruction.format(query=query)
    print(query)
    response = requests.post(
        f"http://localhost:9200/{index}/_search",
        json=search_query(query)
    )
    return response.json()

In [76]:
instruction = "Instruct: Embed the product title for e-commerce search\nProduct name: {query}"
# print(search('product-1', 'dinosaur'))
# print(search('product-2', 'dinosaur', instruction))
print(search('product-1', 'men shoes 43'))
print(search('product-2', 'men shoes 43', instruction))




men shoes 43
{'took': 343, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10, 'relation': 'eq'}, 'max_score': 0.86428356, 'hits': [{'_index': 'product-1', '_id': '22396', '_score': 0.86428356, '_source': {'name': '45 pair shoe rack'}}, {'_index': 'product-1', '_id': '39396', '_score': 0.84891415, '_source': {'name': 'closet organizer 45 pair shoe storage cabinet'}}, {'_index': 'product-1', '_id': '6996', '_score': 0.84537125, '_source': {'name': '30 pair shoe rack'}}, {'_index': 'product-1', '_id': '40956', '_score': 0.84462166, '_source': {'name': '3 pair shoe storage'}}, {'_index': 'product-1', '_id': '29881', '_score': 0.84303, '_source': {'name': 'fit and stylish shoe closet shoes - print'}}, {'_index': 'product-1', '_id': '36863', '_score': 0.84227943, '_source': {'name': '45 pair shoe storage cabinet'}}, {'_index': 'product-1', '_id': '28414', '_score': 0.84218884, '_source': {'name': 'rebrilliant 40 pair stack

## Loading data into Quepid

### Create team

In [14]:
team = requests.post(
    'http://localhost:8081/api/teams/', 
    headers = AUTH,
    json={
        "name": "wands"
    }   
)

team = team.json()

In [15]:
team

{'id': 1,
 'name': 'wands',
 'created_at': '2025-08-15T09:55:27.968Z',
 'updated_at': '2025-08-15T09:55:27.968Z'}

### Create search endpoint

In [16]:
endpoint = requests.post(
    'http://localhost:8081/api/search_endpoints/', 
    headers = AUTH,
    json={
        "name": "wands",
        "endpoint_url": "http://quepid-api-elasticsearch:9200/wands/_search",
        "search_engine": "es",
        "api_method": "POST",
        "proxy_requests": 1,   
    }   
)

endpoint = endpoint.json()


In [17]:
endpoint

{'id': 1,
 'name': 'wands',
 'owner': 1,
 'search_engine': 'es',
 'endpoint_url': 'http://quepid-api-elasticsearch:9200/wands/_search',
 'api_method': 'POST',
 'custom_headers': None,
 'archived': 0,
 'created_at': '2025-08-15T09:55:33.136Z',
 'updated_at': '2025-08-15T09:55:33.136Z',
 'basic_auth_credential': None,
 'mapper_code': None,
 'proxy_requests': 1,
 'options': None}

### Create cases

In [18]:
# list scorers
scorers = requests.get(
    'http://localhost:8081/api/scorers/', 
    headers = AUTH
)
{s['id']: s['name'] for s in scorers.json()['items']}

{1: 'nDCG@10',
 2: 'DCG@10',
 3: 'CG@10',
 4: 'P@10',
 5: 'AP@10',
 6: 'RR@10',
 7: 'ERR@10'}

In [116]:
case1 = requests.post(
    'http://localhost:8081/api/case/', 
    headers = AUTH,
    json={
        "name": "wands",
        "scorer_id": 1,
        "book_id": 0,
        "search_endpoint_id": endpoint.get('id'),
        "search_query": json.dumps(search_query())
    }   
)

case2 = requests.post(
    'http://localhost:8081/api/case/', 
    headers = AUTH,
    json={
        "name": "wands boosted",
        "scorer_id": 1,
        "book_id": 0,
        "search_endpoint_id": endpoint.get('id'),
        "search_query": json.dumps(search_query_boosted())
    }   
)

case1 = case1.json()
case2 = case2.json()

In [117]:
print(case1)
print(case2)

{'id': 1, 'case_name': 'wands', 'last_try_number': 1, 'owner': 1, 'archived': 0, 'scorer_id': 1, 'created_at': '2025-08-14T14:03:36.927Z', 'updated_at': '2025-08-14T14:03:36.927Z', 'book_id': None, 'public': None, 'options': None, 'nightly': 1}
{'id': 2, 'case_name': 'wands boosted', 'last_try_number': 1, 'owner': 1, 'archived': 0, 'scorer_id': 1, 'created_at': '2025-08-14T14:03:36.953Z', 'updated_at': '2025-08-14T14:03:36.953Z', 'book_id': None, 'public': None, 'options': None, 'nightly': 1}


### Load queries and Judgements

In [28]:
def add_query(case, query):
    quepid_query = requests.post(
        f'http://localhost:8081/api/query/{case.get("id")}/', 
        headers = AUTH,
        json={
            "query_text": query
        }   
    )
    if quepid_query.status_code == 200:
        return quepid_query.json() 


def add_label(query_id, doc_id, label):
    # print([query, doc_id, label])
    return requests.post(
        f'http://localhost:8081/api/rating/query/{query_id}/rating/', 
        headers = AUTH,
        json={
            "doc_id": str(doc_id),
            "rating": label_to_rating(label)
        }   
    )


def label_to_rating(label):
    if label == 'Partial':
        return 2
    if label == 'Exact':
        return 3
    return 0


def add_labels(quepid_query, query_labels):
    for _, label in query_labels.iterrows():
        add_label(quepid_query, label['product_id'], label['label'])


In [119]:
for index, row in tqdm(query_df.iterrows(), total=len(query_df)):
    query_labels_df = labels_df[labels_df['query_id'] == row['query_id']]
    for case in [case1, case2]:
        if quepid_query := add_query(case, row['query']):
            add_labels(quepid_query.get('id'), query_labels_df)

100%|██████████████████████████████████████████████████████████████████████████| 480/480 [1:17:02<00:00,  9.63s/it]


In [120]:
requests.get(
    'http://localhost:8081/api/case/1/', 
    headers = AUTH   
).json()

{'id': 1,
 'case_name': 'wands',
 'last_try_number': 1,
 'owner': 1,
 'archived': 0,
 'scorer_id': 1,
 'created_at': '2025-08-14T14:03:37Z',
 'updated_at': '2025-08-14T14:03:37Z',
 'book_id': None,
 'public': None,
 'options': None,
 'nightly': 1}

what latest score is?

![Title](quepid-wands.png)