## 1. Setup: Connecting to Elasticsearch and preparing trec_eval
In this step, we make sure our environment is ready. 
We connect to the search engine and prepare the **official NIST trec_eval tool**. 
This tool is written in C, so we need to compile it using the `make` command.

In [16]:
import pandas as pd
import json
import nltk
from elasticsearch import Elasticsearch, helpers
from tqdm import tqdm
import collections
import math
import os
import subprocess
from nltk.corpus import wordnet, stopwords

In [17]:
# --- project configuration: change these paths to match your system ---

# 1. search engine settings
index_name = "trec_product_search"

# 2. data file paths (where your corpus and queries are)
corpus_path  = "product_catalogue_esci.jsonl"
queries_path = "qid2query.tsv"
qrels_path   = "product-search-dev.qrels.txt"

In [18]:
# --- part 1: connect to elasticsearch ---
# we assume the database is running on the standard local port
es = Elasticsearch("http://localhost:9200")

# check if the connection was successful
if es.ping():
    print(f"connected to {es.info()['version']['number']}")
else:
    print("ERROR with elasticsearch, check if running")


connected to 9.1.4


## üõ†Ô∏è Step 2: Setting up the Grading Tool
For the final evaluation, we use the official **trec_eval** tool. 
Since this tool is built specifically for your operating system, the path to the 
executable file might change. If you are the examiner, please update the 
`trec_eval_executable` path below to point to your compiled version.

In [19]:
# --- examiner: change this path if your trec_eval is in a different folder ---
# this variable points directly to the compiled 'trec_eval' file
trec_eval_executable = "./trec_eval/trec_eval"

# --- check if the tool is accessible ---
# we check if the file exists at the path provided above
if os.path.exists(trec_eval_executable):
    print(f"success: grading tool found at {trec_eval_executable}")
    
    # we test if the tool actually runs by asking for its help menu
    # we use 'subprocess' to run a quick terminal command
    try:
        # this is like typing './trec_eval/trec_eval' in the terminal
        test_run = subprocess.run([trec_eval_executable], capture_output=True, text=True)
        print("‚úÖ success: tool is executable and responding.")
    except Exception as error_message:
        print(f"error: the tool was found but cannot run. error: {error_message}")
else:
    print(f"error: trec_eval not found at {trec_eval_executable}")
    print("please check the path or compile the tool in your terminal first")

# --- define the metrics we want to measure ---
# we use the standard metrics for search engine quality:
# 1. recip_rank: how high up is the first correct result? (MRR)
# 2. recall.100: did we find the correct results in our top 100?
# 3. ndcg_cut.100: how good is the overall ranking of the top 100?
evaluation_metrics = "-m recip_rank -m recall.100 -m ndcg_cut.100"

success: grading tool found at ./trec_eval/trec_eval
‚úÖ success: tool is executable and responding.


## üìä Step 3: Indexing - Building the Product Database
In this step, we prepare the search engine to receive our data. 
We define a **Custom Analyzer** called `product_cleaner`. This cleaner ensures that:
* **Lowercase**: Searching for "SONY" or "sony" gives the same result.
* **Stop Words**: Common words like "a", "the", and "is" are ignored so they don't slow down the search.
* **Stemming**: Searching for "gaming" will also find products labeled "gamer" or "game".

We merge the product description and bullet points into a single field called `contents` to give the search engine more data to work with.

In [20]:
index_configurations = {
  "settings": {
    "analysis": {
      "analyzer": {
        "product_cleaner": {
          "type": "custom",
          "tokenizer": "standard", # splits sentences into individual words
          "filter": [
              "lowercase",       # makes everything lower case
              "my_stop_words",   # removes boring words (the, a, is)
              "my_stemmer"       # chops words to their root (phones -> phone)
          ]
        }
      },
      "filter": {
        "my_stemmer": {
            "type": "stemmer", 
            "language": "english"
        },
        "my_stop_words": {
            "type": "stop", 
            "stopwords": "_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": {"type": "keyword"}, # 'keyword' means we want the exact ID, no cleaning
      "title": {
          "type": "text", 
          "analyzer": "product_cleaner"
      },
      "contents": {
          "type": "text", 
          "analyzer": "product_cleaner"
      },
      "brand": {
          "type": "text", 
          "analyzer": "product_cleaner"
      }
    }
  }
}

In [21]:
# --- 2. create the index in elasticsearch ---
# first, we delete the index if it already exists (to start fresh)
if es.indices.exists(index=index_name):
    print(f"deleting old index: {index_name}")
    es.indices.delete(index=index_name)

# now we create the index using the configurations we defined above
es.indices.create(index=index_name, body=index_configurations)
print(f"‚úÖ success: fresh index '{index_name}' created.")

deleting old index: trec_product_search
‚úÖ success: fresh index 'trec_product_search' created.


In [None]:
# --- 3. read the file and upload the data ---
def prepare_products_for_upload():
    """ 
    this helper function reads our json file line-by-line 
    and prepares a 'bundle' for the search engine.
    """
    print(f"reading data from: {corpus_path}")
    
    with open(corpus_path, 'r', encoding='utf-8') as file:
        for line in file:
            try:
                # convert the text line into a python dictionary
                item = json.loads(line)
                
                # identify the correct ID
                # we prefer 'trecid' if it exists, otherwise we use 'product_id'
                final_id = str(item.get("trecid") or item.get("product_id"))
                
                # combine the description and bullet points into one 'contents' field
                description = item.get("product_description", "") or ""
                bullets = item.get("product_bullet_point", "") or ""
                full_text_contents = f"{description} {bullets}"

                # yield (give back) the formatted product to the uploader
                yield {
                    "_index": index_name,
                    "_id": final_id,
                    "_source": {
                        "id": final_id,
                        "title": item.get("product_title", "") or "",
                        "contents": full_text_contents,
                        "brand": item.get("product_brand", "") or ""
                    }
                }
            except Exception as e:
                # if one line is broken, we skip it and keep going
                continue

# --- 4. run the bulk uploader ---
# we use 'helpers.bulk' to upload many products at once (very fast)
print("starting index, expect this to take 2-3 minutes")
success_count, error_count = helpers.bulk(es, prepare_products_for_upload(), chunk_size=5000)

print(f"success: indexed {success_count} products")
if error_count:
    print(f"error: {len(error_count)} products failed to index.")

starting upload... please wait.
reading data from: product_catalogue_esci.jsonl
‚úÖ success: indexed 1118990 products.


In [23]:
# first look into the corpus
with open(corpus_path, 'r', encoding='utf-8') as f:
   for i in range(20):
      print(json.loads(f.readline()))
      

# we find the fields 'product_id', 'product_title',  'product_description', 'product_bullet_point', 'product_brand', 'product_color_name', 'product_locale', 'trecid'

{'product_id': 'B003O0MNGC', 'product_title': 'Delta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor', 'product_description': None, 'product_bullet_point': 'Virtually silent at less than 0.3 sones\nPrecision engineered with DC brushless motor for extended reliability\nEasily switch in and out of humidity sensing mode by toggling wall switch\nENERGY STAR qualified for efficient cost-saving operation\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances', 'product_brand': 'DELTA ELECTRONICS (AMERICAS) LTD.', 'product_color_name': 'White', 'product_locale': 'us', 'trecid': 201460}
{'product_id': 'B00MARNO5Y', 'product_title': 'Aero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring', 'product_description': None, 'product_bullet_point': 'Super quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\nMay be

## üß† Strategy A: Semantic Search (Query Expansion)
The goal of this strategy is to solve the **"Vocabulary Mismatch"** problem. 
For example, if a user searches for "sneakers," but a product is listed as "running shoes," a standard search might miss it. 

We use the **NLTK WordNet** library to:
1. Identify the words in the user's query.
2. Find synonyms (synsets) for those words.
3. Add those synonyms to the search query to "cast a wider net."

We also apply **Field Boosting** here, making sure the `title` and `brand` are still weighted more heavily than the synonyms found in the `contents`.

In [29]:
# --- 1. setup the synonym dictionary ---
# we check if the wordnet data is already downloaded
try:
    nltk.data.find('corpora/wordnet')
except LookupError:
    print("downloading wordnet dictionary...")
    nltk.download('wordnet')
    nltk.download('omw-1.4')
    nltk.download('stopwords')

# we create a list of common 'stop words' (like 'the', 'is') to ignore
english_stop_words = set(stopwords.words('english'))

# we create a 'memory' (cache) so we don't have to look up the same word twice
synonym_memory_cache = {}

def get_synonyms_for_query(user_text):
    """
    this function takes a sentence and adds synonyms for each word.
    example: 'phone' -> 'phone telephone cellular'
    """
    individual_words = user_text.split()
    expanded_words_list = []
    
    for word in individual_words:
        # always keep the original word the user typed!
        expanded_words_list.append(word)
        
        # if we already looked up this word, get it from memory
        if word in synonym_memory_cache:
            expanded_words_list.extend(synonym_memory_cache[word])
            continue
            
        # we only look for synonyms for important words (not 'the', 'a', etc.)
        if word.lower() not in english_stop_words and len(word) > 2:
            found_synonyms = set()
            try:
                # we look for 'nouns' (objects) specifically
                for synset in wordnet.synsets(word, pos=wordnet.NOUN):
                    for lemma in synset.lemmas():
                        # wordnet uses underscores like 'cell_phone', we replace with spaces
                        clean_synonym = lemma.name().replace('_', ' ')
                        
                        # add it to our set if it's actually a different word
                        if clean_synonym.lower() != word.lower():
                            found_synonyms.add(clean_synonym)
            except:
                pass # if wordnet fails, we just move to the next word
            
            # store the results in our memory cache
            synonym_list = list(found_synonyms)
            synonym_memory_cache[word] = synonym_list
            expanded_words_list.extend(synonym_list)
            
    # remove any duplicates and join the words back into one string
    return " ".join(list(set(expanded_words_list)))

def search_strategy_a_semantic(query_text, result_limit=100):
    """
    runs the search using our synonym expansion and field boosting.
    """
    # first, expand the query
    rich_query = get_synonyms_for_query(query_text)
    
    # define the search body for elasticsearch
    search_body = {
        "size": result_limit,
        "query": {
            "multi_match": {
                "query": rich_query,
                # we boost the title (3x) and brand (2x) because they are most important
                "fields": ["title^3", "contents", "brand^2"],
                # we use 'or' because a document won't have ALL synonyms
                "operator": "or" 
            }
        }
    }
    
    # execute the search
    response = es.search(index=index_name, body=search_body)
    return response['hits']['hits']

# quick test to see if it works
test_query = "mobile phone"
print(f"original: {test_query}")
print(f"expanded: {get_synonyms_for_query(test_query)}")

downloading wordnet dictionary...
original: mobile phone
expanded: sound telephone earphone Mobile River mobile telephone set earpiece speech sound headphone phone


[nltk_data] Downloading package wordnet to /home/marvin/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/marvin/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package stopwords to /home/marvin/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## üöÄ Strategy B: Structural Boosting (Optimized Weights)
This strategy focuses on the **Importance of Fields**. We rely on the high quality of specific metadata.

Instead of expanding the query with synonyms, we search for the exact words the user provided but tell the engine which fields "matter" more. We use **Field Boosting** (using the `^` symbol) to assign weights:
* **Title (^3)**: The most important indicator of relevance.
* **Brand (^2)**: Very important for filtered searches.
* **Color (^1.5)**: Important for specific aesthetic queries.
* **Contents (^1)**: Provides context but is secondary to the title.

We use the `best_fields` type, which ensures that a product gets a high score if one of its fields is a perfect match for the query.

In [30]:
def search_strategy_b_boosting(user_query, result_limit=100):
    """
    executes the search using exact terms but with prioritized field weights.
    we include the 'color' field here for more precise matching.
    """
    
    # we build the search request for elasticsearch
    search_body = {
        "size": result_limit,
        "query": {
            "multi_match": {
                "query": user_query,
                # we define our boosting weights here:
                "fields": [
                    "title^3",    # title is 3x more important
                    "brand^2",    # brand is 2x more important
                    "color^1.5",  # color is 1.5x more important
                    "contents"    # contents has the standard weight (1)
                ],
                
                # 'best_fields' finds the single field that matches best 
                # and uses its score. this is great for product search.
                "type": "best_fields"
            }
        }
    }
    
    # execute the search against our index
    response = es.search(index=index_name, body=search_body)
    
    # return the list of hits
    return response['hits']['hits']

# simple test run
test_search = "black wireless headphones"
print(f"testing structural boosting for: '{test_search}'")
# results = search_strategy_b_boosting(test_search)

testing structural boosting for: 'black wireless headphones'


## üìä Final Step: Scientific Comparison of Strategies
In this final stage, we perform a rigorous experiment to see which strategy performs best. 
We evaluate the performance using three industry-standard metrics:
* **MRR (Mean Reciprocal Rank)**: Measures how quickly the first relevant result appears.
* **Recall@100**: Measures what percentage of relevant products we actually found in the top 100.
* **NDCG@100**: Measures the overall quality and "ranking order" of the top 100 results.

We will generate three **Run Files** (`.txt` files) in the standard TREC format and then pass them 
to the `trec_eval` tool for the final grades.

In [32]:
def run_the_final_experiment(query_limit=1000):
    """
    this function runs all three strategies and compares them 
    using the trec_eval tool.
    """
    print("--- starting the final scientific experiment ---")
    
    # 1. load the answer key (qrels) so we know which products are actually correct
    # we only want to test queries that have known correct answers
    print(f"loading answer key from: {qrels_path}")
    correct_answers_map = collections.defaultdict(set)
    valid_query_ids = set()
    
    with open(qrels_path, 'r') as f:
        for line in f:
            parts = line.strip().split()
            # if the relevance score is greater than 0, it is a correct match
            if len(parts) == 4 and int(parts[3]) > 0:
                valid_query_ids.add(parts[0])
                correct_answers_map[parts[0]].add(parts[2])

    # 2. load the user queries from the tsv file
    print(f"loading queries from: {queries_path}")
    with open(queries_path, 'r') as f:
        all_query_lines = f.readlines()

    # 3. prepare the files where we will save our search results (run files)
    # we open them in 'write' mode to start fresh
    file_baseline = open("run_baseline.txt", "w")
    file_wordnet  = open("run_wordnet.txt", "w")
    file_boosting = open("run_boosting.txt", "w")
    
    processed_count = 0
    progress_bar = tqdm(total=query_limit)
    
    # 4. the main experiment loop
    for line in all_query_lines:
        if processed_count >= query_limit: 
            break
            
        parts = line.strip().split('\t')
        if len(parts) < 2: 
            continue
            
        query_id = parts[0]
        query_text = parts[1]
        
        # skip this query if we don't have an answer key for it
        if query_id not in valid_query_ids: 
            continue
            
        # --- RUN 1: BASELINE (Control Group) ---
        # we search with equal weights and no synonyms
        results_baseline = es.search(index=index_name, size=100, body={
            "query": {"multi_match": {"query": query_text, "fields": ["title", "contents", "brand"]}}
        })['hits']['hits']
        
        for i, hit in enumerate(results_baseline):
            # write in trec format: qid Q0 doc_id rank score run_name
            file_baseline.write(f"{query_id} Q0 {hit['_id']} {i+1} {hit['_score']:.4f} baseline\n")
            
        # --- RUN 2: WORDNET (Strategy A) ---
        results_wordnet = search_strategy_a_semantic(query_text, result_limit=100)
        for i, hit in enumerate(results_wordnet):
            file_wordnet.write(f"{query_id} Q0 {hit['_id']} {i+1} {hit['_score']:.4f} wordnet\n")
            
        # --- RUN 3: BOOSTING (Strategy B) ---
        results_boosting = search_strategy_b_boosting(query_text, result_limit=100)
        for i, hit in enumerate(results_boosting):
            file_boosting.write(f"{query_id} Q0 {hit['_id']} {i+1} {hit['_score']:.4f} boosting\n")
            
        processed_count += 1
        progress_bar.update(1)

    # close all files to save the data
    progress_bar.close()
    file_baseline.close(); file_wordnet.close(); file_boosting.close()
    
    # 5. call the official trec_eval tool (the one you built in WSL)
    print("\n\nüìä --- FINAL SCIENTIFIC SCORES --- üìä")
    
    # we use the executable path and metrics we defined in step 2
    # the command looks like: ./trec_eval/trec_eval -m ... qrels_file run_file
    
    print("\n>>> 1. BASELINE (The Control)")
    !{trec_eval_executable} {evaluation_metrics} {qrels_path} run_baseline.txt
    
    print("\n>>> 2. STRATEGY A (WordNet / Semantic)")
    !{trec_eval_executable} {evaluation_metrics} {qrels_path} run_wordnet.txt
    
    print("\n>>> 3. STRATEGY B (Boosting / Structural)")
    !{trec_eval_executable} {evaluation_metrics} {qrels_path} run_boosting.txt

# --- run the finale! ---
run_the_final_experiment(1000)

--- starting the final scientific experiment ---
loading answer key from: product-search-dev.qrels.txt
loading queries from: qid2query.tsv



  results_baseline = es.search(index=index_name, size=100, body={

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:19<00:00, 50.24it/s]




üìä --- FINAL SCIENTIFIC SCORES --- üìä

>>> 1. BASELINE (The Control)
recip_rank            	all	0.4039
recall_100            	all	0.3650
ndcg_cut_100          	all	0.2829

>>> 2. STRATEGY A (WordNet / Semantic)
recip_rank            	all	0.2726
recall_100            	all	0.2536
ndcg_cut_100          	all	0.1856

>>> 3. STRATEGY B (Boosting / Structural)
recip_rank            	all	0.4526
recall_100            	all	0.3858
ndcg_cut_100          	all	0.3079


In [34]:
def big_parameter_tuning(sample_size=200):
    """
    this function tests every combination of weights.
    it is now cleaned to avoid 'spamming' the output.
    """
    print(f"--- starting grid search on {sample_size} queries ---")
    
    title_weights       = [1, 2, 3, 4, 5]
    description_weights = [1, 2, 3]
    brand_weights       = [1, 2, 3, 4, 5]
    color_weights       = [1, 2, 3]
    
    best_ndcg = 0
    best_weights = {}

    # load the answer key
    qrels = collections.defaultdict(set)
    with open(qrels_path, 'r') as f:
        for line in f:
            p = line.split()
            if int(p[3]) > 0: qrels[p[0]].add(p[2])

    # load queries
    with open(queries_path, 'r') as f:
        all_lines = f.readlines()
        sample_queries = []
        for line in all_lines:
            parts = line.strip().split('\t')
            if len(parts) >= 2 and parts[0] in qrels:
                sample_queries.append(parts)
            if len(sample_queries) >= sample_size: break

    total_combos = len(title_weights) * len(description_weights) * len(brand_weights) * len(color_weights)
    
    # tqdm(leave=True) and removing prints inside the loop prevents the 'broken output'
    pbar = tqdm(total=total_combos, desc="optimizing weights")

    for t_w in title_weights:
        for d_w in description_weights:
            for b_w in brand_weights:
                for c_w in color_weights:
                    scores = []
                    for qid, txt in sample_queries:
                        # FIX: size is moved inside the body to stop DeprecationWarnings
                        search_body = {
                            "size": 10,
                            "query": {
                                "multi_match": {
                                    "query": txt,
                                    "fields": [
                                        f"title^{t_w}", 
                                        f"contents^{d_w}", 
                                        f"brand^{b_w}", 
                                        f"color^{c_w}"
                                    ],
                                    "type": "best_fields"
                                }
                            }
                        }
                        
                        # FIX: we only pass index and body
                        res = es.search(index=index_name, body=search_body)['hits']['hits']
                        
                        dcg = sum([1.0/math.log2(i+2) for i,h in enumerate(res) if h['_id'] in qrels[qid]])
                        idcg = sum([1.0/math.log2(i+2) for i in range(min(len(qrels[qid]), 10))])
                        if idcg > 0:
                            scores.append(dcg/idcg)
                    
                    avg_ndcg = sum(scores) / len(scores) if scores else 0
                    
                    if avg_ndcg > best_ndcg:
                        best_ndcg = avg_ndcg
                        best_weights = {"title": t_w, "contents": d_w, "brand": b_w, "color": c_w}
                        # update progress bar with the current best score
                        pbar.set_postfix({"best_ndcg": f"{best_ndcg:.4f}"})
                    
                    pbar.update(1)
                    
    pbar.close()
    return best_weights

# run the tuner
final_optimized_weights = big_parameter_tuning(200)

--- starting grid search on 200 queries ---


optimizing weights: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 225/225 [01:55<00:00,  1.94it/s, best_ndcg=0.2264]


In [35]:
def search_strategy_c_optimized(user_query, result_limit=100):
    """
    the ultimate search strategy using weights found by the grid search.
    """
    # we pull the weights from the 'final_optimized_weights' variable
    t_w = final_optimized_weights.get('title', 3)
    d_w = final_optimized_weights.get('contents', 1)
    b_w = final_optimized_weights.get('brand', 2)
    c_w = final_optimized_weights.get('color', 1)

    search_body = {
        "size": result_limit,
        "query": {
            "multi_match": {
                "query": user_query,
                "fields": [
                    f"title^{t_w}", 
                    f"contents^{d_w}", 
                    f"brand^{b_w}", 
                    f"color^{c_w}"
                ],
                "type": "best_fields"
            }
        }
    }
    
    response = es.search(index=index_name, body=search_body)
    return response['hits']['hits']

In [36]:
def run_final_scientific_experiment(query_limit=1000):
    print("--- üî¨ starting the final scientific experiment ---")
    
    # 1. create the result files
    file_base = open("run_baseline.txt", "w")
    file_wordnet = open("run_wordnet.txt", "w")
    file_manual = open("run_manual_boost.txt", "w")
    file_optimized = open("run_ml_optimized.txt", "w")
    
    # 2. load queries and answer key
    qrels = collections.defaultdict(set)
    with open(qrels_path, 'r') as f:
        for line in f:
            p = line.split()
            if int(p[3]) > 0: qrels[p[0]].add(p[2])

    with open(queries_path, 'r') as f:
        all_query_lines = f.readlines()

    processed = 0
    # we clean up the tqdm output to prevent empty rows
    pbar = tqdm(total=query_limit, desc="running search strategies", leave=True)
    
    for line in all_query_lines:
        if processed >= query_limit: break
        
        parts = line.strip().split('\t')
        if len(parts) < 2 or parts[0] not in qrels: continue
        
        qid, txt = parts[0], parts[1]
        
        # --- A. Baseline ---
        # we pass size inside the body to avoid deprecation warnings
        res_base = es.search(index=index_name, body={
            "size": 100, 
            "query": {"multi_match": {"query": txt, "fields": ["title", "contents", "brand"]}}
        })['hits']['hits']
        for i, h in enumerate(res_base):
            file_base.write(f"{qid} Q0 {h['_id']} {i+1} {h['_score']:.4f} baseline\n")
            
        # --- B. WordNet (Strategy A) ---
        res_wn = search_strategy_a_semantic(txt, result_limit=100)
        for i, h in enumerate(res_wn):
            file_wordnet.write(f"{qid} Q0 {h['_id']} {i+1} {h['_score']:.4f} wordnet\n")
            
        # --- C. Manual Boosting (Strategy B) ---
        res_man = search_strategy_b_boosting(txt, result_limit=100)
        for i, h in enumerate(res_man):
            file_manual.write(f"{qid} Q0 {h['_id']} {i+1} {h['_score']:.4f} manual\n")
            
        # --- D. ML Optimized (The Winner) ---
        res_opt = search_strategy_c_optimized(txt, result_limit=100)
        for i, h in enumerate(res_opt):
            file_optimized.write(f"{qid} Q0 {h['_id']} {i+1} {h['_score']:.4f} optimized\n")
            
        processed += 1
        pbar.update(1)

    pbar.close()
    file_base.close(); file_wordnet.close(); file_manual.close(); file_optimized.close()
    
    # 3. Grade everything with trec_eval
    print("\n\nüìä --- FINAL SCIENTIFIC GRADES --- üìä")
    
    # we use the metrics and tool path we defined in Step 2
    cmd_prefix = f"{trec_eval_executable} {evaluation_metrics} {qrels_path}"
    
    print("\n>>> 1. BASELINE (Control)")
    !{cmd_prefix} run_baseline.txt
    
    print("\n>>> 2. STRATEGY A (WordNet)")
    !{cmd_prefix} run_wordnet.txt
    
    print("\n>>> 3. STRATEGY B (Manual Boosting)")
    !{cmd_prefix} run_manual_boost.txt
    
    print("\n>>> 4. OPTIMIZED (ML Grid Search Results)")
    !{cmd_prefix} run_ml_optimized.txt

# Execute the final run
run_final_scientific_experiment(1000)

--- üî¨ starting the final scientific experiment ---


running search strategies: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:22<00:00, 45.38it/s]




üìä --- FINAL SCIENTIFIC GRADES --- üìä

>>> 1. BASELINE (Control)
recip_rank            	all	0.4039
recall_100            	all	0.3650
ndcg_cut_100          	all	0.2829

>>> 2. STRATEGY A (WordNet)
recip_rank            	all	0.2726
recall_100            	all	0.2536
ndcg_cut_100          	all	0.1856

>>> 3. STRATEGY B (Manual Boosting)
recip_rank            	all	0.4526
recall_100            	all	0.3858
ndcg_cut_100          	all	0.3079

>>> 4. OPTIMIZED (ML Grid Search Results)
recip_rank            	all	0.4549
recall_100            	all	0.4027
ndcg_cut_100          	all	0.3181
