# 4 Best Hybrid Search Configuration

This notebook runs different hybrid search configurations, calculates the metrics for each configuration and compares the results to the metrics calcuated after the baseline run from the previous notebook. 

We are using the same query set to have a fair comparison.

## Get queries

In [32]:
import pandas as pd
import numpy as np
import requests
import json
import mercury as mr
import itertools
from tqdm.notebook import tqdm_notebook

app = mr.App(title="Let's Run a Hybrid Search", static_notebook=True)

## Load Query Sets and Ratings

Use the query sets and created ratings/judgements from notebook 3.

In [12]:
# Set the boolean value accordingly to use either the small (small_b = True) or the large dataset (small_b = False).
small_b = True

if small_b:
    df_train = pd.read_csv('../data/query_train_small.csv')
    df_test = pd.read_csv('../data/query_test_small.csv')
if not small_b:
    df_train = pd.read_csv('../data/query_train.csv')
    df_test = pd.read_csv('../data/query_test.csv')
df_query_set = pd.concat([df_train, df_test])

In [13]:
# Import the ratings generated in the previous notebook
df_ratings = pd.read_csv('../data/ratings.csv', sep="\t", names=['query', 'docid', 'rating', 'idx'])#, index=False)
df_ratings

Unnamed: 0,query,docid,rating,idx
0,09 g6 wheel not cover grey,B07WSBV4PK,0,0
1,09 g6 wheel not cover grey,B07WS62XTM,0,0
2,09 g6 wheel not cover grey,B07VLX8QWK,0,0
3,09 g6 wheel not cover grey,B07WNHR1JT,0,0
4,09 g6 wheel not cover grey,B07WJ3B8ZZ,0,0
...,...,...,...,...
4710,youth size dirt bike helmet,B074W42CDK,3,249
4711,youth size dirt bike helmet,B00Y4G7U7C,3,249
4712,youth size dirt bike helmet,B00RTD45HU,3,249
4713,youth size dirt bike helmet,B07CVKRSRL,3,249


In [14]:
df_queries = df_ratings.groupby(by='query', as_index=False).agg({
    'rating': ['count']
})
df_query_idx = df_queries['query']

In [15]:
df_query_idx = pd.DataFrame(df_query_idx)

df_query_idx = df_query_idx.reset_index()

df_query_idx = df_query_idx.rename(columns={'index': 'idx'})
df_query_idx

Unnamed: 0,idx,query
0,0,09 g6 wheel not cover grey
1,1,1 1/4 pop up bathroom sink drain without overflow
2,2,1 ‘ velcro without adhesive for sewing
3,3,1.5 inch heel
4,4,1/2 inch binder
...,...,...
245,245,wood arm chair
246,246,work clothes for women
247,247,work vest for men
248,248,youth compound bow left hand


## Query OpenSearch with the Hybrid Search Configurations

Let's make sure that we can execute hybrid search queries by creating a pipeline and using it in a query

In [17]:
keyword_weight = 0.3

In [18]:
neural_weight = round(1.0 - keyword_weight, 2)
print(f"Keyword Weight is {keyword_weight} and Neural Weight is {neural_weight}")

Keyword Weight is 0.3 and Neural Weight is 0.7


In [19]:
# Get model_id
# We are assuming that the installation has only one model. Change this if you have more models 
# and need to pick a specific one

url = "http://localhost:9200/_plugins/_ml/models/_search"

headers = {
    'Content-Type': 'application/json'
}

payload = {
  "query": {
    "match_all": {}
  },
  "size": 1
}

response = requests.request("POST", url, headers=headers, data=json.dumps(payload))

model_id = response.json()['hits']['hits'][0]['_source']['model_id']

In [20]:
normalization = 'l2'
combination = 'arithmetic_mean'
keyword = 0.3
vector = 0.7
pipeline = 'hybrid-search-pipeline'

url = "http://localhost:9200/_search/pipeline/" + pipeline

print(f"Setting default model id to: {model_id}")
payload = {
  "request_processors": [
    {
      "neural_query_enricher" : {
        "description": "Sets the default model ID at index and field levels",
        "default_model_id": model_id,
        "neural_field_default_id": {
           "title_embeddings": model_id
        }
      }
    }
  ],
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              keyword_weight,
              neural_weight
            ]
          }
        }
      }
    }
  ]    
}


response = requests.request("PUT", url, headers=headers, data=json.dumps(payload))
mr.JSON(response.json(), level=4)

Setting default model id to: 1Zr9VpIBFSlgWAuGFzOL


In [24]:
url = "http://localhost:9200/ecommerce/_search?search_pipeline=" + pipeline
    
# iterate over all query strings and send a hybrid search query to OpenSearch with the set pipeline
query = "iphone case"

payload = {
  "_source": {
    "excludes": [
      "title_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "multi_match" : {
              "type":       "best_fields",
              "fields":     [
                "product_id^100",
                "product_bullet_point^3",
                "product_color^2",
                "product_brand^5",
                "product_description",
                "product_title^10"
              ],
              "operator":   "and",
              "query":      query
            }
        },
        {
          "neural": {
            "title_embedding": {
              "query_text": query,
              "k": 100
            }
          }
        }
      ]
    }
  },
  "size": 100
}

response = requests.request("POST", url, headers=headers, data=json.dumps(payload)).json()

In [27]:
mr.JSON(response, level=2)

## Try out all Hybrid Search Configurations

Our global hybrid search optimization notebook tries out 66 parameter combinations for hybrid search with the following set:
* normalization technique: [`l2`, `min_max`]
* combination technique: [`arithmetic_mean`, `harmonic_mean`, `geometric_mean`]
* keyword search weight: [`0.0`, `0.1`, `0.2`, `0.3`, `0.4`, `0.5`, `0.6`, `0.7`, `0.8`, `0.9`, `1.0`]
* neural search weight: [`1.0`, `0.9`, `0.8`, `0.7`, `0.6`, `0.5`, `0.4`, `0.3`, `0.2`, `0.1`, `0.0`]

Neural and keyword search weights always add up to `1.0`, so a keyword search weight of `0.1` automatically comes with a neural search weight of `0.9`, a keyword search weight of `0.2` comes with a neural search weight of `0.8`, etc.

### Create a DataFrame with all possible combinations of hybrid search configurations

In [28]:
# Define the possible values for each column
normalization_values = ['min_max', 'l2']
combination_values = ['arithmetic_mean', 'harmonic_mean', 'geometric_mean']
keyword_values = [round(i * 0.1, 1) for i in range(11)]

# Create all possible combinations of normalization, combination, and keyword
combinations = list(itertools.product(normalization_values, combination_values, keyword_values))

# Calculate the vector as 1.0 - keyword
data = [(norm, comb, kw, 1.0 - kw) for norm, comb, kw in combinations]

# Create DataFrame
df_hybrid_search_params = pd.DataFrame(data, columns=['normalization', 'combination', 'keyword', 'vector'])

# Create a column with a pipeline name made up of its components
df_hybrid_search_params['pipeline'] = df_hybrid_search_params.normalization.apply(str) + \
    df_hybrid_search_params.combination.apply(str) + df_hybrid_search_params.keyword.apply(str)

df_hybrid_search_params.head()

Unnamed: 0,normalization,combination,keyword,vector,pipeline
0,min_max,arithmetic_mean,0.0,1.0,min_maxarithmetic_mean0.0
1,min_max,arithmetic_mean,0.1,0.9,min_maxarithmetic_mean0.1
2,min_max,arithmetic_mean,0.2,0.8,min_maxarithmetic_mean0.2
3,min_max,arithmetic_mean,0.3,0.7,min_maxarithmetic_mean0.3
4,min_max,arithmetic_mean,0.4,0.6,min_maxarithmetic_mean0.4


### Iterate over all hybrid search configurations

We execute each query from the training data set against each of the hybrid search configuration and store the 100 first results in a DataFrame for the upcoming metrics calculation

In [56]:
df_relevance = pd.DataFrame()
for config in tqdm_notebook(df_hybrid_search_params.itertuples()):
#for config in df_hybrid_search_params.head(1).itertuples():
    norm = config[1]
    combi = config[2]
    keywordness = round(config[3],2)
    neuralness = round(config[4], 2)
    pipeline_name = config[5]

    # Set pipeline 
     
    url = "http://localhost:9200/ecommerce/_search"
    
    # iterate over all query strings and send a hybrid search query to OpenSearch with the set pipeline
    for query in df_query_idx[df_query_idx['query'].isin(df_train['query_string'])].itertuples():
    
        payload = {
          "_source": {
            "excludes": [
              "title_embedding"
            ]
          },
          "query": {
            "hybrid": {
              "queries": [
                {
                  "multi_match" : {
                      "type":       "best_fields",
                      "fields":     [
                        "product_id^100",
                        "product_bullet_point^3",
                        "product_color^2",
                        "product_brand^5",
                        "product_description",
                        "product_title^10"
                      ],
                      "operator":   "and",
                      "query":      query[2]
                    }
                },
                {
                  "neural": {
                    "title_embedding": {
                      "query_text": query[2],
                      "k": 100
                    }
                  }
                }
              ]
            }
          },
          "search_pipeline": {
            "request_processors": [
              {
                "neural_query_enricher" : {
                  "description": "one of many search pipelines for experimentation",
                  "default_model_id": model_id,
                  "neural_field_default_id": {
                    "title_embeddings": model_id
                  }
                }
              }
            ],
            "phase_results_processors": [
              {
                "normalization-processor": {
                  "normalization": {
                    "technique": norm
                  },
                  "combination": {
                    "technique": combi,
                    "parameters": {
                      "weights": [
                        keywordness,
                        neuralness
                      ]
                    }
                  }
                }
              }
            ]    
          },
          "size": 100
        }
    
        response = requests.request("POST", url, headers=headers, data=json.dumps(payload)).json()
        # store results per pipeline_id
        position = 0
        for hit in response['hits']['hits']:
            # create a new row for the DataFrame and append it
            row = { 'query_id' : str(query[1]), 'query_string': query[2], 'product_id' : hit["_id"], 'position' : str(position), 'relevance' : hit["_score"], 'run': pipeline_name }
    
            new_row_df = pd.DataFrame([row])
            df_relevance = pd.concat([df_relevance, new_row_df], ignore_index=True)
            #print("%(id)s %(title)s: %(name)s" % hit["_source"])
            position += 1
    
    # work with two for loops:
    # 1) one to iterate over the list of queries and have a query id instead of a query
    # 2) another one to iterate over the result sets to have the position of the result in the result set 
    
    # DataFrame with columns:
    # query_id: the id of the query as the trec_eval tool needs a numeric id rather than a query string as an identifier
    # product_id: the id of the product in the hit list
    # position: the position of the product in the result set
    # relevance: relevance as given by the search engine
    # run: the name of the query pipeline

0it [00:00, ?it/s]

In [57]:
df_relevance.head(3)

Unnamed: 0,query_id,query_string,product_id,position,relevance,run
0,0,09 g6 wheel not cover grey,B01N23EA38,0,1.0,min_maxarithmetic_mean0.0
1,0,09 g6 wheel not cover grey,B07L8T3NMN,1,0.819022,min_maxarithmetic_mean0.0
2,0,09 g6 wheel not cover grey,B000630I12,2,0.799753,min_maxarithmetic_mean0.0


There are 100 results per query, so there are _number of queries_ * 100 rows per pipeline in the resulting DataFrame

In [58]:
df_relevance[df_relevance['run'] == "min_maxarithmetic_mean0.0"].shape[0]

20000

In [59]:
df_relevance.shape[0]

1320000

# Calculate Metrics per Pipeline

In [60]:
df_ratings.columns = ['query_string', 'product_id', 'rating', 'query_id']
df_ratings.head(3)

Unnamed: 0,query_string,product_id,rating,query_id
0,09 g6 wheel not cover grey,B07WSBV4PK,0,0
1,09 g6 wheel not cover grey,B07WS62XTM,0,0
2,09 g6 wheel not cover grey,B07VLX8QWK,0,0


In [61]:
# Make sure ids are strings - otherwise the merge operation might cause an error
df_relevance['query_id'] = df_relevance['query_id'].astype(str)
df_relevance['position'] = df_relevance['position'].astype(int)
df_ratings['query_id'] = df_ratings['query_id'].astype(str)
# Remove duplicates from the ratings DataFrame
df_unique_ratings = df_ratings.drop_duplicates(subset=['product_id', 'query_id'])

In [62]:
# Merge results on query_id and product_id so that the resulting DataFrame has the ratings together with the search results
# Validations helps us make sure that we have only one rating for each query-doc pair. We have identical query-doc pairs per
# search pipeline but we can only have one rating for these.
df_merged = df_relevance.merge(df_unique_ratings, on=['query_id', 'product_id'], how='left', validate='many_to_one')
df_merged = df_merged.drop(columns=['query_string_y'])

df_merged.head(3)
df_merged = df_merged.rename(columns={"query_string_x": "query_string"})

In [63]:
# Count the rows without ratings - the higher the count is the less reliable the results will be
nan_count_rating = df_merged['rating'].isna().sum()
print(f"There are {df_merged.shape[0]} rows and {nan_count_rating} do not contain a rating")

There are 1320000 rows and 1209406 do not contain a rating


## Calculate Metrics

Iterate over the queries in the query set, calculate the three metrics dcg@10, ndcg@10 and precision@10 and store the results for every query in a DataFrame

In [64]:
df_ratings.head(3)

Unnamed: 0,query_string,product_id,rating,query_id
0,09 g6 wheel not cover grey,B07WSBV4PK,0,0
1,09 g6 wheel not cover grey,B07WS62XTM,0,0
2,09 g6 wheel not cover grey,B07VLX8QWK,0,0


In [65]:
# import from shared utils file metrics.py
from utils import metrics

metrics = [
    ("dcg", metrics.dcg_at_10, None),
    ("ndcg", metrics.ndcg_at_10, None),
    ("prec@10", metrics.precision_at_k, None),
    ("ratio_of_ratings", metrics.ratio_of_ratings, None)
]

In [66]:
reference = {query: df for query, df in df_ratings.groupby("query_string")}

df_metrics = []
for m_name, m_function, ref_search in metrics:
    for (query_string, run), df_gr in df_merged.groupby(["query_string", "run"]):
        metric = m_function(df_gr, reference=reference[query_string])
        df_metrics.append(pd.DataFrame({
            "query": [query_string],
            "pipeline": [run],
            "metric": [m_name],
            "value": [metric],
        }))
df_metrics = pd.concat(df_metrics)

In [67]:
df_metrics

Unnamed: 0,query,pipeline,metric,value
0,09 g6 wheel not cover grey,l2arithmetic_mean0.0,dcg,0.0
0,09 g6 wheel not cover grey,l2arithmetic_mean0.1,dcg,0.0
0,09 g6 wheel not cover grey,l2arithmetic_mean0.2,dcg,0.0
0,09 g6 wheel not cover grey,l2arithmetic_mean0.3,dcg,0.0
0,09 g6 wheel not cover grey,l2arithmetic_mean0.4,dcg,0.0
...,...,...,...,...
0,youth size dirt bike helmet,min_maxharmonic_mean0.6,ratio_of_ratings,0.0
0,youth size dirt bike helmet,min_maxharmonic_mean0.7,ratio_of_ratings,0.0
0,youth size dirt bike helmet,min_maxharmonic_mean0.8,ratio_of_ratings,0.0
0,youth size dirt bike helmet,min_maxharmonic_mean0.9,ratio_of_ratings,0.1


In [68]:
df_metrics.to_csv('../data/metrics_query_train_small.csv', index=False)

## Calculate Metrics per Pipeline by Averaging the Query Metrics

In [69]:
df_metrics_per_pipeline = df_metrics.pivot_table(index="pipeline", columns="metric", values="value", aggfunc=lambda x: x.mean().round(2))
df_metrics_per_pipeline = df_metrics_per_pipeline.reset_index()

### Top five Pipelines by NDCG@10 Descending

In [70]:
df_metrics_per_pipeline.sort_values(by='ndcg', ascending=False).head(5)

metric,pipeline,dcg,ndcg,prec@10,ratio_of_ratings
4,l2arithmetic_mean0.4,6.12,0.28,0.31,0.32
5,l2arithmetic_mean0.5,6.12,0.28,0.31,0.32
6,l2arithmetic_mean0.6,6.07,0.28,0.31,0.32
7,l2arithmetic_mean0.7,6.1,0.28,0.31,0.32
8,l2arithmetic_mean0.8,6.07,0.28,0.31,0.32


### Top five Pipelines by DCG@10 Descending

In [71]:
df_metrics_per_pipeline.sort_values(by='dcg', ascending=False).head(5)

metric,pipeline,dcg,ndcg,prec@10,ratio_of_ratings
4,l2arithmetic_mean0.4,6.12,0.28,0.31,0.32
5,l2arithmetic_mean0.5,6.12,0.28,0.31,0.32
7,l2arithmetic_mean0.7,6.1,0.28,0.31,0.32
9,l2arithmetic_mean0.9,6.08,0.28,0.31,0.32
6,l2arithmetic_mean0.6,6.07,0.28,0.31,0.32


### Top five Pipelines by Precision@10 Descending

In [72]:
df_metrics_per_pipeline.sort_values(by='prec@10', ascending=False).head(5)

metric,pipeline,dcg,ndcg,prec@10,ratio_of_ratings
4,l2arithmetic_mean0.4,6.12,0.28,0.31,0.32
5,l2arithmetic_mean0.5,6.12,0.28,0.31,0.32
6,l2arithmetic_mean0.6,6.07,0.28,0.31,0.32
7,l2arithmetic_mean0.7,6.1,0.28,0.31,0.32
8,l2arithmetic_mean0.8,6.07,0.28,0.31,0.32


In [74]:
df_merged.to_csv('../data/results_and_ratings_query_set_small.csv')

In [76]:
# Use a query from the query set to see the results by pipeline

query = 'youth size dirt bike helmet'

df_merged[(df_merged['query_string'] == query) & (df_merged['run'] == 'min_maxarithmetic_mean0.0')]

Unnamed: 0,query_id,query_string,product_id,position,relevance,run,rating
19900,249,youth size dirt bike helmet,B07917XT2B,0,1.000000,min_maxarithmetic_mean0.0,
19901,249,youth size dirt bike helmet,B0745L42J8,1,0.829768,min_maxarithmetic_mean0.0,
19902,249,youth size dirt bike helmet,B0874J3ZTK,2,0.757054,min_maxarithmetic_mean0.0,
19903,249,youth size dirt bike helmet,B07ZCVMPK8,3,0.757054,min_maxarithmetic_mean0.0,
19904,249,youth size dirt bike helmet,B07BRVHW2J,4,0.703876,min_maxarithmetic_mean0.0,
...,...,...,...,...,...,...,...
19995,249,youth size dirt bike helmet,B07H6Q7YPB,95,0.024611,min_maxarithmetic_mean0.0,
19996,249,youth size dirt bike helmet,B075FDFF69,96,0.010447,min_maxarithmetic_mean0.0,
19997,249,youth size dirt bike helmet,B07BDF6CNJ,97,0.008554,min_maxarithmetic_mean0.0,
19998,249,youth size dirt bike helmet,B075RSKGSY,98,0.003211,min_maxarithmetic_mean0.0,


In [78]:
df_metrics[(df_metrics['query'] == query) & (df_metrics['pipeline'] == 'min_maxarithmetic_mean0.0')]

Unnamed: 0,query,pipeline,metric,value
0,youth size dirt bike helmet,min_maxarithmetic_mean0.0,dcg,1.20412
0,youth size dirt bike helmet,min_maxarithmetic_mean0.0,ndcg,0.049301
0,youth size dirt bike helmet,min_maxarithmetic_mean0.0,prec@10,0.1
0,youth size dirt bike helmet,min_maxarithmetic_mean0.0,ratio_of_ratings,0.1


## Evaluate the Best Hybrid Search Configuration

We identified the best hybrid search configuration by running our training set against the different combinations.

Now we take the winning configuration and execute the test set against this configuration. We can use the calculated numbers then to compare it with our baseline from notebook 3.

In [82]:
# Set parameters for best hybrid search config:

norm = "l2"
combi = "arithmetic_mean"
keywordness = 0.4
neuralness = 0.6
pipeline_name = "l2arithmetic_mean0.4"

In [84]:
df_relevance_test = pd.DataFrame()
url = "http://localhost:9200/ecommerce/_search"

# iterate over all query strings in the test set and send a hybrid search query to OpenSearch with the set configuration
for query in tqdm_notebook(df_query_idx[df_query_idx['query'].isin(df_test['query_string'])].itertuples()):

    payload = {
      "_source": {
        "excludes": [
          "title_embedding"
        ]
      },
      "query": {
        "hybrid": {
          "queries": [
            {
              "multi_match" : {
                  "type":       "best_fields",
                  "fields":     [
                    "product_id^100",
                    "product_bullet_point^3",
                    "product_color^2",
                    "product_brand^5",
                    "product_description",
                    "product_title^10"
                  ],
                  "operator":   "and",
                  "query":      query[2]
                }
            },
            {
              "neural": {
                "title_embedding": {
                  "query_text": query[2],
                  "k": 100
                }
              }
            }
          ]
        }
      },
      "search_pipeline": {
        "request_processors": [
          {
            "neural_query_enricher" : {
              "description": "one of many search pipelines for experimentation",
              "default_model_id": model_id,
              "neural_field_default_id": {
                "title_embeddings": model_id
              }
            }
          }
        ],
        "phase_results_processors": [
          {
            "normalization-processor": {
              "normalization": {
                "technique": norm
              },
              "combination": {
                "technique": combi,
                "parameters": {
                  "weights": [
                    keywordness,
                    neuralness
                  ]
                }
              }
            }
          }
        ]    
      },
      "size": 100
    }

    response = requests.request("POST", url, headers=headers, data=json.dumps(payload)).json()
    # store results per pipeline_id
    position = 0
    for hit in response['hits']['hits']:
        # create a new row for the DataFrame and append it
        row = { 'query_id' : str(query[1]), 'query_string': query[2], 'product_id' : hit["_id"], 'position' : str(position), 'relevance' : hit["_score"], 'run': pipeline_name }

        new_row_df = pd.DataFrame([row])
        df_relevance_test = pd.concat([df_relevance_test, new_row_df], ignore_index=True)
        #print("%(id)s %(title)s: %(name)s" % hit["_source"])
        position += 1
    
    # work with two for loops:
    # 1) one to iterate over the list of queries and have a query id instead of a query
    # 2) another one to iterate over the result sets to have the position of the result in the result set 
    
    # DataFrame with columns:
    # query_id: the id of the query as the trec_eval tool needs a numeric id rather than a query string as an identifier
    # product_id: the id of the product in the hit list
    # position: the position of the product in the result set
    # relevance: relevance as given by the search engine
    # run: the name of the query pipeline

0it [00:00, ?it/s]

In [85]:
df_relevance_test.head(3)

Unnamed: 0,query_id,query_string,product_id,position,relevance,run
0,1,1 1/4 pop up bathroom sink drain without overflow,B07DN8NKMV,0,0.138455,l2arithmetic_mean0.4
1,1,1 1/4 pop up bathroom sink drain without overflow,B07DN8CZ21,1,0.135583,l2arithmetic_mean0.4
2,1,1 1/4 pop up bathroom sink drain without overflow,B08MT7ZDN2,2,0.130087,l2arithmetic_mean0.4


In [86]:
# Make sure ids are strings - otherwise the merge operation might cause an error
df_relevance_test['query_id'] = df_relevance_test['query_id'].astype(str)
df_relevance_test['position'] = df_relevance_test['position'].astype(int)

In [87]:
# Merge results on query_id and product_id so that the resulting DataFrame has the ratings together with the search results
# Validations helps us make sure that we have only one rating for each query-doc pair. We have identical query-doc pairs per
# search pipeline but we can only have one rating for these.
df_merged_test = df_relevance_test.merge(df_unique_ratings, on=['query_id', 'product_id'], how='left', validate='many_to_one')
df_merged_test = df_merged_test.drop(columns=['query_string_y'])

df_merged_test.head(3)
df_merged_test = df_merged_test.rename(columns={"query_string_x": "query_string"})

In [88]:
df_metrics_test_set = []
for m_name, m_function, ref_search in metrics:
    for (query_string, run), df_gr in df_merged_test.groupby(["query_string", "run"]):
        metric = m_function(df_gr, reference=reference[query_string])
        df_metrics_test_set.append(pd.DataFrame({
            "query": [query_string],
            "pipeline": [run],
            "metric": [m_name],
            "value": [metric],
        }))
df_metrics_test_set = pd.concat(df_metrics_test_set)

In [89]:
df_metrics_test_set.head(3)

Unnamed: 0,query,pipeline,metric,value
0,1 1/4 pop up bathroom sink drain without overflow,l2arithmetic_mean0.4,dcg,1.722706
0,1 ‘ velcro without adhesive for sewing,l2arithmetic_mean0.4,dcg,2.150515
0,10x6 plastic register cover without vent,l2arithmetic_mean0.4,dcg,1.83505


In [90]:
df_metrics_test_set.to_csv('../data/metrics_query_test_small.csv', index=False)

In [91]:
df_metrics_test_per_pipeline = df_metrics_test_set.pivot_table(index="pipeline", columns="metric", values="value", aggfunc=lambda x: x.mean().round(2))
df_metrics_test_per_pipeline = df_metrics_test_per_pipeline.reset_index()

In [92]:
df_metrics_test_per_pipeline

metric,pipeline,dcg,ndcg,prec@10,ratio_of_ratings
0,l2arithmetic_mean0.4,6.27,0.28,0.32,0.33


## Metrics for Small Query Set

Compared to the baseline of the previous notebook this is an improvement:

| Metric    | Baseline BM25 | Global Hybrid Search Optimizer 
| -------- | ------- | ------- |
| DCG  | 6.03    | 6.27    |
| NDCG | 0.26    | 0.28    |
| Precision    | 0.30     | 0.32    |
