Zuveir Jameer

8 April 2025

Batch 1 which contains 4 search engines.

The code below are to run the 75 queries against an Elasticsearch instance which contains TREC 2021 Clinical Trials data.

The Elasticsearch instance contains ~26,000 indexed documents.

The default Elasticsearch indexing is BM25.

In the code below you will run searches using the following:
1. Semantic search on tokenised 'detailed_description' (detailed_descrption_vector) with ClinicalBERT
2. B25 as initial search. Then second search using semantic search to re-rank results
3. BM25 only
4. BM25W where weight as been added to the 'detailed_descrption'

In [1]:
from pprint import pprint
from elasticsearch import Elasticsearch, helpers
from credentials import USERNAME, PASSWORD
import json

In [2]:
CERT_FINGERPRINT = "24a71503f93ed6e422bc8782254a826382fadd83ca09234fa7c0297173132530"

In [3]:
# Connect using API KEY
es = Elasticsearch("https://localhost:9200",api_key="eHdSWmo1VUJOYkdYV0x0X3QyTnY6cU9kQVJ1Y0VRdmFWSTc5bzB6THoxUQ==",ssl_assert_fingerprint=CERT_FINGERPRINT,)

In [4]:
# Connect using credentials and ssl fingerprint
es = Elasticsearch("https://localhost:9200",basic_auth=(USERNAME, PASSWORD),ssl_assert_fingerprint=CERT_FINGERPRINT)

Check whether connected to ES

In [5]:
client_info = es.info()
print('Connected to elasticsearch')
pprint(client_info.body)

Connected to elasticsearch
{'cluster_name': 'docker-cluster',
 'cluster_uuid': 'daxPe0rzR9Gx5iOkHZsZTQ',
 'name': '200fd7a2e004',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2025-02-28T10:07:26.089129809Z',
             'build_flavor': 'default',
             'build_hash': 'a091390de485bd4b127884f7e565c0cad59b10d2',
             'build_snapshot': False,
             'build_type': 'docker',
             'lucene_version': '9.12.0',
             'minimum_index_compatibility_version': '7.0.0',
             'minimum_wire_compatibility_version': '7.17.0',
             'number': '8.17.3'}}


### Semantic search only using ClinicalBERT 

(which was used to create token embeddings in the detailed_descrption_vector dense vector field)

In [None]:
import json
import numpy as np
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModel
import torch
import time
import pandas as pd
from datetime import datetime
import torch.nn.functional as F

tokenizer = AutoTokenizer.from_pretrained("medicalai/ClinicalBERT")
model = AutoModel.from_pretrained("medicalai/ClinicalBERT")
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def mean_pooling(model_output, attention_mask):
    """
    Perform mean pooling on token embeddings while taking the attention mask into account.
    """
    token_embeddings = model_output.last_hidden_state  # (batch_size, seq_len, hidden_size)
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, dim=1)
    sum_mask = torch.clamp(input_mask_expanded.sum(dim=1), min=1e-9)
    return sum_embeddings / sum_mask

def get_embedding(text, max_length=512):
    """
    Compute the embedding for the given text using ClinicalBERT.
    Returns the embedding as a list of floats.
    """
    encoded_input = tokenizer(
        text,
        padding="max_length",
        truncation=True,
        max_length=max_length,
        return_tensors="pt"
    ).to(device)

    with torch.no_grad():
        model_output = model(**encoded_input)
    # embedding_temp = mean_pooling(model_output, encoded_input['attention_mask'])
    embedding = mean_pooling(model_output, encoded_input['attention_mask'])
     # Normalize the embedding to unit length
    # embedding = F.normalize(embedding_temp, p=2, dim=1) #Added 22 March
    return embedding[0].cpu().numpy().tolist()


# Evaluation Metric Functions
def precision_at_k(re_ranked_docs, relevant_docs, k):
    top_k_docs = re_ranked_docs[:k]
    num_relevant_in_k = sum(1 for doc in top_k_docs 
                            if doc in relevant_docs and relevant_docs[doc] > 0)
    return num_relevant_in_k / k if k else 0.0

def recall_at_k(re_ranked_docs, relevant_docs, k):
    top_k_docs = re_ranked_docs[:k]
    total_relevant = sum(1 for doc, rel in relevant_docs.items() if rel > 0)
    if total_relevant == 0:
        return 0.0
    num_relevant_in_k = sum(1 for doc in top_k_docs 
                            if doc in relevant_docs and relevant_docs[doc] > 0)
    return num_relevant_in_k / total_relevant

def f1_at_k(re_ranked_docs, relevant_docs, k):
    p = precision_at_k(re_ranked_docs, relevant_docs, k)
    r = recall_at_k(re_ranked_docs, relevant_docs, k)
    if (p + r) == 0:
        return 0.0
    return 2 * (p * r) / (p + r)

def average_precision(re_ranked_docs, relevant_docs):
    if not relevant_docs:
        return 0.0
    
    relevant_indices = [
        i for i, doc in enumerate(re_ranked_docs)
        if doc in relevant_docs and relevant_docs[doc] > 0
    ]
    if not relevant_indices:
        return 0.0
    
    ap_sum = 0.0
    for idx, rank_pos in enumerate(relevant_indices, start=1):
        prec = precision_at_k(re_ranked_docs, relevant_docs, rank_pos+1)
        ap_sum += prec
    return ap_sum / len(relevant_indices)

def ndcg_at_k(re_ranked_docs, relevant_docs, k):
    def dcg(scores):
        return sum(( (2**s - 1) / np.log2(idx + 2)) for idx, s in enumerate(scores))
    
    top_k_docs = re_ranked_docs[:k]
    actual_scores = [relevant_docs[doc] if doc in relevant_docs else 0 for doc in top_k_docs]
    actual_dcg = dcg(actual_scores)
    
    sorted_rels = sorted(relevant_docs.values(), reverse=True)[:k]
    ideal_dcg = dcg(sorted_rels)
    
    if ideal_dcg == 0:
        return 0.0
    return actual_dcg / ideal_dcg

def mrr_at_k(re_ranked_docs, relevant_docs, k):
    for i, doc_id in enumerate(re_ranked_docs[:k], start=1):
        if doc_id in relevant_docs and relevant_docs[doc_id] > 0:
            return 1.0 / i
    return 0.0

# Data Loading query and qrels from JSONL files
def load_jsonl(filepath):
    data = {}
    with open(filepath, "r") as f:
        for line in f:
            entry = json.loads(line)
            qid = entry["query_id"]
            # If it's a query record
            if "text" in entry:
                data[qid] = entry["text"].strip()
            else:
                # It's a qrel record
                if qid not in data:
                    data[qid] = {}
                doc_id = entry["doc_id"]
                relevance = int(entry.get("relevance", 0))
                data[qid][doc_id] = relevance
    return data

queries = load_jsonl("trec_ct_2021_query.jsonl") 
qrels = load_jsonl("trec_ct_2021_qrels.jsonl")    


# Semantic-Only Search on tokenised embeddings
def semantic_search(query_text, top_k=10):
    """
    Search the entire index using a script_score
    with 'detailed_descrption_vector' and query vector.
    """
    query_vector = get_embedding(query_text)
    vector_query_body = {
        "size": top_k,  # put size in the body to avoid deprecation warnings
        "query": {
            "script_score": {
                "query": {
                    "bool": {
                        "must": [
                            {"exists": {"field": "detailed_descrption_vector"}}
                        ]
                    }
                },
                "script": {
                    "source": "cosineSimilarity(params.query_vector, 'detailed_descrption_vector') + 1.0",
                    "params": {"query_vector": query_vector}
                }
            }
        }
    }

    response = es.search(index="ir_dev_index", body=vector_query_body)
    # Return doc_id from _source if that's how Qrels references them
    print("Time taken for semantic search: {} ms".format(response["took"]))
    return [hit["_source"]["doc_id"] for hit in response["hits"]["hits"]]

K = 10               # top K for retrieval & evaluation
RELEVANCE_THRESHOLD = 2  # Consider docs with relevance >= x as relevant when evaluating

def get_relevant_docs(qrel_dict, relevance_threshold=1):
    return {doc_id: rel for doc_id, rel in qrel_dict.items() if rel >= relevance_threshold}

def get_all_docs(qrel_dict):
    """Return the entire dictionary of {doc_id: relevance}, including relevance=0."""
    return {doc_id: rel for doc_id, rel in qrel_dict.items()}

all_metrics = []

for qid, query_text in tqdm(queries.items(), total=len(queries)):
    if not query_text:
        continue
    
    # Track the total time
    start_time = time.time()

    # Retrieve top K docs via semantic search
    retrieved_docs = semantic_search(query_text, top_k=K)
    
    # Filter qrels to get only docs with >= threshold ( for evaluation only )
    relevant_docs = get_relevant_docs(qrels.get(qid, {}), relevance_threshold=RELEVANCE_THRESHOLD)
    
    #All docs retrieved (i.e no threshold) to compute nDCG based on graded relevance 0,1,2
    ndcg_relevant_docs = get_all_docs(qrels.get(qid, {}))

    # Mark the end time for entire query processing
    total_time = time.time() - start_time

    # Compute metrics
    prec_val = precision_at_k(retrieved_docs, relevant_docs, K)
    rec_val = recall_at_k(retrieved_docs, relevant_docs, K)
    f_val = f1_at_k(retrieved_docs, relevant_docs, K)
    ap_val = average_precision(retrieved_docs, relevant_docs)
    ndcg_val = ndcg_at_k(retrieved_docs, ndcg_relevant_docs, K)
    mrr_val = mrr_at_k(retrieved_docs, relevant_docs, K)

    metrics = {
        "query_id": qid,
        "precision": prec_val,
        "recall": rec_val,
        "f1": f_val,
        "average_precision": ap_val,
        "ndcg": ndcg_val,
        "mrr": mrr_val,
        "time_seconds": total_time
    }
    all_metrics.append(metrics)
    print(f"Query {qid} retrieved: {retrieved_docs}")
    print(f" Query {qid} metrics: Precision@10: {prec_val:.4f}, Recall@10: {rec_val:.4f}, F1@10: {f_val:.4f}, AP: {ap_val:.4f}, nDCG@10: {ndcg_val:.4f}")
    print(f"  Total time for query: {total_time:.2f} seconds")

# Compute overall metrics
if all_metrics:
    mean_precision = np.mean([m["precision"] for m in all_metrics])
    mean_recall = np.mean([m["recall"] for m in all_metrics])
    mean_f1 = np.mean([m["f1"] for m in all_metrics])
    mean_ap = np.mean([m["average_precision"] for m in all_metrics])
    mean_ndcg = np.mean([m["ndcg"] for m in all_metrics])
    mean_mrr = np.mean([m["mrr"] for m in all_metrics])
    avg_time = np.mean([m["time_seconds"] for m in all_metrics])

    print("\n--- Overall Evaluation (Token Clinical Bert Semantic-Only) ---")
    current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print(f"Date/time:{current_time}")
    print(f"Relevance threshold >= {RELEVANCE_THRESHOLD}")
    print(f"Mean Precision@{K}: {mean_precision:.4f}")
    print(f"Mean Recall@{K}: {mean_recall:.4f}")
    print(f"Mean F1@{K}: {mean_f1:.4f}")
    print(f"Mean Average Precision (MAP): {mean_ap:.4f}")
    print(f"Mean nDCG@{K}: {mean_ndcg:.4f}")
    print(f"Mean MRR@{K}: {mean_mrr:.4f}")
    print(f"Average total time per query: {avg_time:.2f} seconds")
else:
    print("No queries processed.")

# Create a DataFrame to export the computed results for all queries
data = {
    "Test": "Token Emdedding Only.",
    "Date/Time": [current_time],
    "Relevance Threshold": [RELEVANCE_THRESHOLD],
    f"Mean Precision@{K}": [mean_precision],
    f"Mean Recall@{K}": [mean_recall],
    f"Mean F1@{K}": [mean_f1],
    "Mean Average Precision (MAP)": [mean_ap],
    f"Mean nDCG@{K}": [mean_ndcg],
    f"Mean MRR@{K}": [mean_mrr],
    "Avg Time per Query (s)": [avg_time],
}
df = pd.DataFrame(data)

# Export to an Excel file
xlsx_filename = "evaluation_results_TOKEN_embeddings_only.xlsx"
df.to_excel(xlsx_filename, index=False)

print(f"\nMetrics have been exported to {xlsx_filename}")

  4%|▍         | 3/75 [00:00<00:07, 10.20it/s]

Time taken for semantic search: 13 ms
Query 1 retrieved: ['NCT03201367', 'NCT04201002', 'NCT02910596', 'NCT04694053', 'NCT04450316', 'NCT04645264', 'NCT02776202', 'NCT04159077', 'NCT04829890', 'NCT04553562']
 Query 1 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.11 seconds
Time taken for semantic search: 13 ms
Query 2 retrieved: ['NCT04061213', 'NCT00976625', 'NCT00478296', 'NCT02910349', 'NCT02316587', 'NCT04299971', 'NCT03863132', 'NCT04078867', 'NCT04830787', 'NCT00820352']
 Query 2 metrics: Precision@10: 0.1000, Recall@10: 0.0072, F1@10: 0.0135, AP: 0.2000, nDCG@10: 0.2146
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 3 retrieved: ['NCT03902262', 'NCT02809651', 'NCT03377062', 'NCT02325154', 'NCT03080779', 'NCT01654198', 'NCT00395330', 'NCT03496545', 'NCT02097212', 'NCT00313703']
 Query 3 metrics: Precision@10: 0.1000, Recall@10: 0.0312, F1@10: 0.0476, AP: 0.3333, nDCG@10: 0.1

  7%|▋         | 5/75 [00:00<00:06, 10.34it/s]

Time taken for semantic search: 11 ms
Query 4 retrieved: ['NCT02407080', 'NCT03647631', 'NCT02291796', 'NCT00706641', 'NCT01281033', 'NCT03902262', 'NCT03201367', 'NCT00498238', 'NCT03712345', 'NCT02768259']
 Query 4 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 5 retrieved: ['NCT01050361', 'NCT03902262', 'NCT01905488', 'NCT01252875', 'NCT01281033', 'NCT04005222', 'NCT04299971', 'NCT02781207', 'NCT02961127', 'NCT01458860']
 Query 5 metrics: Precision@10: 0.1000, Recall@10: 0.0084, F1@10: 0.0155, AP: 0.1000, nDCG@10: 0.0857
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 6 retrieved: ['NCT02200328', 'NCT02776202', 'NCT03902262', 'NCT00006132', 'NCT04614142', 'NCT00789997', 'NCT01678365', 'NCT02086825', 'NCT03378479', 'NCT00003805']
 Query 6 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0

 12%|█▏        | 9/75 [00:00<00:06, 10.42it/s]

Time taken for semantic search: 12 ms
Query 7 retrieved: ['NCT02086825', 'NCT03547895', 'NCT03201367', 'NCT03987893', 'NCT04622449', 'NCT03372499', 'NCT03461250', 'NCT03513900', 'NCT02069678', 'NCT01178372']
 Query 7 metrics: Precision@10: 0.3000, Recall@10: 0.0240, F1@10: 0.0444, AP: 0.2500, nDCG@10: 0.2182
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 8 retrieved: ['NCT02086825', 'NCT03902262', 'NCT03547895', 'NCT01868373', 'NCT02340845', 'NCT00357851', 'NCT01739634', 'NCT02548715', 'NCT00641030', 'NCT00439231']
 Query 8 metrics: Precision@10: 0.1000, Recall@10: 0.0109, F1@10: 0.0196, AP: 0.1000, nDCG@10: 0.0636
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 9 retrieved: ['NCT00642564', 'NCT01054599', 'NCT01320670', 'NCT00510783', 'NCT00486239', 'NCT04004702', 'NCT02569970', 'NCT01391091', 'NCT04287140', 'NCT04186286']
 Query 9 metrics: Precision@10: 0.1000, Recall@10: 0.0769, F1@10: 0.0870, AP: 0.2500, nDCG@10: 0.2

 15%|█▍        | 11/75 [00:01<00:06, 10.40it/s]

Time taken for semantic search: 13 ms
Query 10 retrieved: ['NCT02776202', 'NCT00273559', 'NCT00789997', 'NCT00512070', 'NCT01408836', 'NCT02148549', 'NCT00006669', 'NCT00054158', 'NCT01892241', 'NCT00050817']
 Query 10 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 11 retrieved: ['NCT00595725', 'NCT04694053', 'NCT03243825', 'NCT02910596', 'NCT04424485', 'NCT03002051', 'NCT03797326', 'NCT00579865', 'NCT03630926', 'NCT01075269']
 Query 11 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0496
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 12 retrieved: ['NCT00001894', 'NCT03706833', 'NCT01678144', 'NCT04820764', 'NCT02071654', 'NCT00413998', 'NCT04147884', 'NCT04078867', 'NCT01368575', 'NCT04009434']
 Query 12 metrics: Precision@10: 0.2000, Recall@10: 0.0192, F1@10: 0.0351, AP: 0.4167, nDCG@1

 20%|██        | 15/75 [00:01<00:05, 10.27it/s]

Time taken for semantic search: 13 ms
Query 13 retrieved: ['NCT02910596', 'NCT03201367', 'NCT03341221', 'NCT04333602', 'NCT04450316', 'NCT03729882', 'NCT04507932', 'NCT02889471', 'NCT02508142', 'NCT04437108']
 Query 13 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 14 retrieved: ['NCT02059993', 'NCT03902262', 'NCT01281033', 'NCT00852124', 'NCT02329522', 'NCT00616005', 'NCT02097212', 'NCT01266317', 'NCT03547895', 'NCT02035345']
 Query 14 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0528
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 15 retrieved: ['NCT03902262', 'NCT03201367', 'NCT02340845', 'NCT02086825', 'NCT00121901', 'NCT03259880', 'NCT00487396', 'NCT04645030', 'NCT03944148', 'NCT03561441']
 Query 15 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 23%|██▎       | 17/75 [00:01<00:05,  9.77it/s]

Time taken for semantic search: 49 ms
Query 16 retrieved: ['NCT02059993', 'NCT01281033', 'NCT01266317', 'NCT00840463', 'NCT03994848', 'NCT03779750', 'NCT02097212', 'NCT00722254', 'NCT02781207', 'NCT00614900']
 Query 16 metrics: Precision@10: 0.1000, Recall@10: 0.0161, F1@10: 0.0278, AP: 0.1000, nDCG@10: 0.0636
  Total time for query: 0.13 seconds
Time taken for semantic search: 11 ms
Query 17 retrieved: ['NCT02776202', 'NCT04694053', 'NCT01731886', 'NCT00892346', 'NCT00393380', 'NCT04614142', 'NCT01227187', 'NCT00613093', 'NCT01408836', 'NCT03779750']
 Query 17 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 7 ms
Query 18 retrieved: ['NCT03503175', 'NCT01186237', 'NCT02150083', 'NCT00458471', 'NCT03713411', 'NCT03609580', 'NCT00416234', 'NCT03462602', 'NCT04835766', 'NCT00252941']
 Query 18 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10

 28%|██▊       | 21/75 [00:02<00:05, 10.19it/s]

Time taken for semantic search: 5 ms
Query 19 retrieved: ['NCT02781207', 'NCT02291796', 'NCT01281033', 'NCT03304847', 'NCT01097330', 'NCT01252823', 'NCT00247533', 'NCT04193813', 'NCT00919373', 'NCT01798277']
 Query 19 metrics: Precision@10: 0.4000, Recall@10: 0.0348, F1@10: 0.0640, AP: 0.5167, nDCG@10: 0.4284
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 20 retrieved: ['NCT01116167', 'NCT04553562', 'NCT04143295', 'NCT03453632', 'NCT02010814', 'NCT03647631', 'NCT03371589', 'NCT00006278', 'NCT02501395', 'NCT03823053']
 Query 20 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 14 ms
Query 21 retrieved: ['NCT03902262', 'NCT03201367', 'NCT00105066', 'NCT02340845', 'NCT02377947', 'NCT00523874', 'NCT04850547', 'NCT02086825', 'NCT04553562', 'NCT00234494']
 Query 21 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10

 31%|███       | 23/75 [00:02<00:05, 10.25it/s]

Time taken for semantic search: 13 ms
Query 22 retrieved: ['NCT03902262', 'NCT01492179', 'NCT01991223', 'NCT03591419', 'NCT02380898', 'NCT04450316', 'NCT01388933', 'NCT04231279', 'NCT01652183', 'NCT04553562']
 Query 22 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0528
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 23 retrieved: ['NCT00493974', 'NCT00395330', 'NCT04027322', 'NCT02329522', 'NCT00175565', 'NCT02682862', 'NCT00608764', 'NCT01296087', 'NCT02066129', 'NCT00318708']
 Query 23 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0926
  Total time for query: 0.09 seconds
Time taken for semantic search: 16 ms
Query 24 retrieved: ['NCT03272217', 'NCT00859339', 'NCT02156843', 'NCT00216125', 'NCT01071928', 'NCT03238235', 'NCT01034631', 'NCT03117010', 'NCT01215136', 'NCT01880632']
 Query 24 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 36%|███▌      | 27/75 [00:02<00:04, 10.46it/s]

Time taken for semantic search: 11 ms
Query 25 retrieved: ['NCT03647631', 'NCT03630926', 'NCT04529369', 'NCT01891149', 'NCT04201002', 'NCT00055991', 'NCT01707771', 'NCT04253249', 'NCT01256944', 'NCT04032860']
 Query 25 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 13 ms
Query 26 retrieved: ['NCT03902262', 'NCT03201367', 'NCT04553562', 'NCT02340845', 'NCT02380898', 'NCT04450316', 'NCT01388933', 'NCT04231279', 'NCT03259880', 'NCT03017274']
 Query 26 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 11 ms
Query 27 retrieved: ['NCT03201367', 'NCT04032860', 'NCT03461250', 'NCT01360268', 'NCT02340845', 'NCT00849745', 'NCT04035837', 'NCT03740789', 'NCT04405011', 'NCT01100398']
 Query 27 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 39%|███▊      | 29/75 [00:02<00:04, 10.21it/s]

Time taken for semantic search: 12 ms
Query 28 retrieved: ['NCT01657487', 'NCT02876289', 'NCT02329522', 'NCT00493974', 'NCT01742338', 'NCT03033758', 'NCT01114386', 'NCT01865500', 'NCT00852124', 'NCT04382625']
 Query 28 metrics: Precision@10: 0.3000, Recall@10: 0.0210, F1@10: 0.0392, AP: 0.6250, nDCG@10: 0.3843
  Total time for query: 0.12 seconds
Time taken for semantic search: 12 ms
Query 29 retrieved: ['NCT00234494', 'NCT03970668', 'NCT04585542', 'NCT02086825', 'NCT00911625', 'NCT00216125', 'NCT01215136', 'NCT00893425', 'NCT02776202', 'NCT01408836']
 Query 29 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 30 retrieved: ['NCT00715572', 'NCT03699098', 'NCT03641638', 'NCT03106935', 'NCT01832753', 'NCT00380614', 'NCT00450463', 'NCT04850547', 'NCT00497575', 'NCT02265328']
 Query 30 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 44%|████▍     | 33/75 [00:03<00:04, 10.23it/s]

Time taken for semantic search: 12 ms
Query 31 retrieved: ['NCT03630926', 'NCT03144037', 'NCT02780960', 'NCT04201002', 'NCT00285207', 'NCT03647631', 'NCT02540421', 'NCT03324373', 'NCT02253719', 'NCT00055991']
 Query 31 metrics: Precision@10: 0.1000, Recall@10: 0.0222, F1@10: 0.0364, AP: 0.1111, nDCG@10: 0.0663
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 32 retrieved: ['NCT03902262', 'NCT00216125', 'NCT00859339', 'NCT04850547', 'NCT03117010', 'NCT02377947', 'NCT02472509', 'NCT03341221', 'NCT03201367', 'NCT01215136']
 Query 32 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 33 retrieved: ['NCT00021866', 'NCT03144037', 'NCT00055991', 'NCT03640273', 'NCT03867240', 'NCT03498742', 'NCT00009672', 'NCT00101725', 'NCT02636569', 'NCT03324373']
 Query 33 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 47%|████▋     | 35/75 [00:03<00:03, 10.24it/s]

Time taken for semantic search: 12 ms
Query 34 retrieved: ['NCT00908076', 'NCT04808843', 'NCT00185575', 'NCT03499301', 'NCT04373317', 'NCT00449865', 'NCT04573504', 'NCT03479190', 'NCT01276041', 'NCT00361218']
 Query 34 metrics: Precision@10: 0.1000, Recall@10: 0.0127, F1@10: 0.0225, AP: 0.1000, nDCG@10: 0.0636
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 35 retrieved: ['NCT04133662', 'NCT00361218', 'NCT04808843', 'NCT00313703', 'NCT00608764', 'NCT03479190', 'NCT04573504', 'NCT00464711', 'NCT04664231', 'NCT02367729']
 Query 35 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 36 retrieved: ['NCT00105066', 'NCT04357119', 'NCT02556749', 'NCT00640757', 'NCT04850547', 'NCT01215136', 'NCT00862758', 'NCT01562678', 'NCT00948402', 'NCT01385098']
 Query 36 metrics: Precision@10: 0.3000, Recall@10: 0.0163, F1@10: 0.0309, AP: 0.6250, nDCG@1

 52%|█████▏    | 39/75 [00:03<00:03, 10.24it/s]

Time taken for semantic search: 11 ms
Query 37 retrieved: ['NCT00313703', 'NCT03479190', 'NCT00395330', 'NCT00397462', 'NCT03647631', 'NCT03280849', 'NCT00608764', 'NCT00971074', 'NCT04573504', 'NCT04568005']
 Query 37 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 14 ms
Query 38 retrieved: ['NCT00105066', 'NCT03902262', 'NCT02086994', 'NCT03201367', 'NCT03117010', 'NCT04612803', 'NCT03647631', 'NCT00852124', 'NCT00613093', 'NCT02377947']
 Query 38 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 9 ms
Query 39 retrieved: ['NCT01795131', 'NCT00676247', 'NCT04497012', 'NCT03902262', 'NCT02325492', 'NCT00692224', 'NCT04850547', 'NCT04121572', 'NCT02472509', 'NCT03448952']
 Query 39 metrics: Precision@10: 0.1000, Recall@10: 0.0294, F1@10: 0.0455, AP: 0.1667, nDCG@10

 55%|█████▍    | 41/75 [00:04<00:03, 10.33it/s]

Time taken for semantic search: 12 ms
Query 40 retrieved: ['NCT03479190', 'NCT02501395', 'NCT03500861', 'NCT01875458', 'NCT04553562', 'NCT02541292', 'NCT04130204', 'NCT03371589', 'NCT03626298', 'NCT00006278']
 Query 40 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 6 ms
Query 41 retrieved: ['NCT00908076', 'NCT00449865', 'NCT04051060', 'NCT00493974', 'NCT02896816', 'NCT04373317', 'NCT00313703', 'NCT01909531', 'NCT00608764', 'NCT00812851']
 Query 41 metrics: Precision@10: 0.1000, Recall@10: 0.0075, F1@10: 0.0139, AP: 0.5000, nDCG@10: 0.2899
  Total time for query: 0.09 seconds
Time taken for semantic search: 4 ms
Query 42 retrieved: ['NCT03902262', 'NCT03201367', 'NCT02989337', 'NCT02086994', 'NCT04357119', 'NCT00852124', 'NCT01011517', 'NCT03026686', 'NCT02340845', 'NCT03017274']
 Query 42 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10:

 60%|██████    | 45/75 [00:04<00:02, 10.39it/s]

Time taken for semantic search: 13 ms
Query 43 retrieved: ['NCT00269399', 'NCT01388933', 'NCT00794924', 'NCT04694053', 'NCT03902262', 'NCT03201367', 'NCT03261297', 'NCT01714557', 'NCT03529318', 'NCT04450316']
 Query 43 metrics: Precision@10: 0.1000, Recall@10: 0.0159, F1@10: 0.0274, AP: 1.0000, nDCG@10: 0.2201
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 44 retrieved: ['NCT00764764', 'NCT03769246', 'NCT03479190', 'NCT03385408', 'NCT04150445', 'NCT00862758', 'NCT04782349', 'NCT02159612', 'NCT03823053', 'NCT02501395']
 Query 44 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0836
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 45 retrieved: ['NCT03902262', 'NCT02377947', 'NCT03201367', 'NCT00216125', 'NCT03117010', 'NCT04850547', 'NCT01215136', 'NCT02340845', 'NCT03212872', 'NCT02265328']
 Query 45 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 63%|██████▎   | 47/75 [00:04<00:02, 10.42it/s]

Time taken for semantic search: 11 ms
Query 46 retrieved: ['NCT03902262', 'NCT00859339', 'NCT00216125', 'NCT02377947', 'NCT03117010', 'NCT04645030', 'NCT03238235', 'NCT01732107', 'NCT03201367', 'NCT01215136']
 Query 46 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 4 ms
Query 47 retrieved: ['NCT03902262', 'NCT03201367', 'NCT02291796', 'NCT02035345', 'NCT03496545', 'NCT00706641', 'NCT02086994', 'NCT02248233', 'NCT00616005', 'NCT04005222']
 Query 47 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0231
  Total time for query: 0.09 seconds
Time taken for semantic search: 13 ms
Query 48 retrieved: ['NCT03453632', 'NCT03007173', 'NCT04254510', 'NCT03647631', 'NCT00301405', 'NCT04002219', 'NCT00143598', 'NCT03626298', 'NCT01652625', 'NCT04142632']
 Query 48 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10

 68%|██████▊   | 51/75 [00:04<00:02, 10.21it/s]

Time taken for semantic search: 12 ms
Query 49 retrieved: ['NCT03651388', 'NCT02223936', 'NCT00023101', 'NCT01891149', 'NCT02830321', 'NCT03708497', 'NCT03568370', 'NCT00021866', 'NCT04352062', 'NCT00134745']
 Query 49 metrics: Precision@10: 0.1000, Recall@10: 0.0286, F1@10: 0.0444, AP: 0.1000, nDCG@10: 0.0636
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 50 retrieved: ['NCT01165489', 'NCT02337088', 'NCT03726697', 'NCT00776789', 'NCT03551600', 'NCT04083768', 'NCT01682122', 'NCT04062747', 'NCT03448952', 'NCT01245296']
 Query 50 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 10 ms
Query 51 retrieved: ['NCT03902262', 'NCT02086994', 'NCT02830321', 'NCT03201367', 'NCT02989337', 'NCT03026686', 'NCT04450316', 'NCT03017274', 'NCT04645030', 'NCT00269399']
 Query 51 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 71%|███████   | 53/75 [00:05<00:02, 10.30it/s]

Time taken for semantic search: 11 ms
Query 52 retrieved: ['NCT03261297', 'NCT00953667', 'NCT04254510', 'NCT00269399', 'NCT00971074', 'NCT03017274', 'NCT03529318', 'NCT04491799', 'NCT03187795', 'NCT02502487']
 Query 52 metrics: Precision@10: 0.1000, Recall@10: 0.0244, F1@10: 0.0392, AP: 0.2500, nDCG@10: 0.0948
  Total time for query: 0.10 seconds
Time taken for semantic search: 6 ms
Query 53 retrieved: ['NCT00216125', 'NCT03902262', 'NCT00859339', 'NCT01880632', 'NCT03117010', 'NCT01215136', 'NCT00234494', 'NCT04121572', 'NCT00537173', 'NCT00030498']
 Query 53 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 54 retrieved: ['NCT03902262', 'NCT03117010', 'NCT00216125', 'NCT04441489', 'NCT04267965', 'NCT01732107', 'NCT00859339', 'NCT01215136', 'NCT03201367', 'NCT03976843']
 Query 54 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10

 76%|███████▌  | 57/75 [00:05<00:01, 10.41it/s]

Time taken for semantic search: 6 ms
Query 55 retrieved: ['NCT04626817', 'NCT03651388', 'NCT00786500', 'NCT03500861', 'NCT03453632', 'NCT03647631', 'NCT03201367', 'NCT00332631', 'NCT03268681', 'NCT00613093']
 Query 55 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 11 ms
Query 56 retrieved: ['NCT04201002', 'NCT04825626', 'NCT04674241', 'NCT01203618', 'NCT00128232', 'NCT02459652', 'NCT00993356', 'NCT04424485', 'NCT03144037', 'NCT02629549']
 Query 56 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0844
  Total time for query: 0.09 seconds
Time taken for semantic search: 9 ms
Query 57 retrieved: ['NCT03902262', 'NCT03201367', 'NCT03259880', 'NCT04850547', 'NCT02340845', 'NCT04231279', 'NCT01520675', 'NCT04553562', 'NCT04450316', 'NCT03261297']
 Query 57 metrics: Precision@10: 0.1000, Recall@10: 0.0093, F1@10: 0.0169, AP: 0.3333, nDCG@10:

 79%|███████▊  | 59/75 [00:05<00:01, 10.05it/s]

Time taken for semantic search: 11 ms
Query 58 retrieved: ['NCT03902262', 'NCT01388933', 'NCT01639170', 'NCT04694053', 'NCT01986686', 'NCT03201367', 'NCT04450316', 'NCT01991223', 'NCT03477253', 'NCT03591419']
 Query 58 metrics: Precision@10: 0.1000, Recall@10: 0.0182, F1@10: 0.0308, AP: 0.1000, nDCG@10: 0.0636
  Total time for query: 0.12 seconds
Time taken for semantic search: 5 ms
Query 59 retrieved: ['NCT01742338', 'NCT00493974', 'NCT03498742', 'NCT02329522', 'NCT00694850', 'NCT02824523', 'NCT00565955', 'NCT02682862', 'NCT00497523', 'NCT03033758']
 Query 59 metrics: Precision@10: 0.1000, Recall@10: 0.0053, F1@10: 0.0101, AP: 0.1429, nDCG@10: 0.0946
  Total time for query: 0.09 seconds
Time taken for semantic search: 11 ms
Query 60 retrieved: ['NCT03902262', 'NCT00216125', 'NCT03272217', 'NCT00859339', 'NCT02156843', 'NCT03238235', 'NCT01732107', 'NCT01034631', 'NCT01071928', 'NCT03117010']
 Query 60 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10

 84%|████████▍ | 63/75 [00:06<00:01, 10.27it/s]

Time taken for semantic search: 11 ms
Query 61 retrieved: ['NCT03144037', 'NCT03647631', 'NCT03630926', 'NCT03797326', 'NCT04457700', 'NCT00542191', 'NCT04201002', 'NCT04694053', 'NCT03560102', 'NCT03092518']
 Query 61 metrics: Precision@10: 0.2000, Recall@10: 0.0167, F1@10: 0.0308, AP: 0.2111, nDCG@10: 0.1775
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 62 retrieved: ['NCT01458860', 'NCT04738864', 'NCT00384618', 'NCT04290767', 'NCT04299971', 'NCT02961127', 'NCT04300686', 'NCT02884141', 'NCT04850547', 'NCT02775318']
 Query 62 metrics: Precision@10: 0.1000, Recall@10: 0.0154, F1@10: 0.0267, AP: 1.0000, nDCG@10: 0.2201
  Total time for query: 0.10 seconds
Time taken for semantic search: 10 ms
Query 63 retrieved: ['NCT03902262', 'NCT04553562', 'NCT04450316', 'NCT03201367', 'NCT02340845', 'NCT01388933', 'NCT03017274', 'NCT03259880', 'NCT02380898', 'NCT04231279']
 Query 63 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 87%|████████▋ | 65/75 [00:06<00:00, 10.33it/s]

Time taken for semantic search: 13 ms
Query 64 retrieved: ['NCT01360268', 'NCT03461250', 'NCT01905254', 'NCT03740789', 'NCT03201367', 'NCT04032860', 'NCT00849745', 'NCT00663182', 'NCT03212872', 'NCT01872988']
 Query 64 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0316
  Total time for query: 0.10 seconds
Time taken for semantic search: 12 ms
Query 65 retrieved: ['NCT00105066', 'NCT01215136', 'NCT00893425', 'NCT00537173', 'NCT00216125', 'NCT01281033', 'NCT03413462', 'NCT00852124', 'NCT04121572', 'NCT01011517']
 Query 65 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.09 seconds
Time taken for semantic search: 11 ms
Query 66 retrieved: ['NCT00105066', 'NCT00302926', 'NCT03641638', 'NCT00715572', 'NCT04175964', 'NCT00715715', 'NCT00640757', 'NCT00456885', 'NCT00497653', 'NCT02010814']
 Query 66 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 92%|█████████▏| 69/75 [00:06<00:00, 10.51it/s]

Time taken for semantic search: 12 ms
Query 67 retrieved: ['NCT03647631', 'NCT02113514', 'NCT02797600', 'NCT01402583', 'NCT01116167', 'NCT03630926', 'NCT00055991', 'NCT03144037', 'NCT04553562', 'NCT01095198']
 Query 67 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0212
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 68 retrieved: ['NCT00751426', 'NCT01408836', 'NCT01561248', 'NCT04622449', 'NCT00922883', 'NCT01360268', 'NCT02776202', 'NCT01410578', 'NCT00380614', 'NCT02464124']
 Query 68 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0367
  Total time for query: 0.09 seconds
Time taken for semantic search: 13 ms
Query 69 retrieved: ['NCT00021866', 'NCT03144037', 'NCT03867240', 'NCT01276041', 'NCT01754493', 'NCT03640273', 'NCT00971074', 'NCT01099657', 'NCT00009672', 'NCT03498742']
 Query 69 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@1

 95%|█████████▍| 71/75 [00:06<00:00, 10.46it/s]

Time taken for semantic search: 13 ms
Query 70 retrieved: ['NCT00908076', 'NCT00185575', 'NCT00053846', 'NCT00608764', 'NCT04051060', 'NCT04573504', 'NCT00395330', 'NCT00848406', 'NCT00313703', 'NCT00449865']
 Query 70 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 11 ms
Query 71 retrieved: ['NCT00105066', 'NCT04357119', 'NCT02556749', 'NCT00640757', 'NCT00003906', 'NCT01215136', 'NCT01707771', 'NCT01977417', 'NCT01281033', 'NCT00456885']
 Query 71 metrics: Precision@10: 0.2000, Recall@10: 0.0150, F1@10: 0.0280, AP: 0.3929, nDCG@10: 0.2334
  Total time for query: 0.09 seconds
Time taken for semantic search: 4 ms
Query 72 retrieved: ['NCT03902262', 'NCT00105066', 'NCT03201367', 'NCT04352062', 'NCT00380614', 'NCT02086994', 'NCT03117010', 'NCT00613093', 'NCT04413149', 'NCT03212872']
 Query 72 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10

100%|██████████| 75/75 [00:07<00:00, 10.30it/s]

Time taken for semantic search: 4 ms
Query 73 retrieved: ['NCT03902262', 'NCT01795131', 'NCT02325492', 'NCT00676247', 'NCT04121572', 'NCT04850547', 'NCT04497012', 'NCT01385098', 'NCT04531163', 'NCT00692224']
 Query 73 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0528
  Total time for query: 0.09 seconds
Time taken for semantic search: 12 ms
Query 74 retrieved: ['NCT03479190', 'NCT02501395', 'NCT03500861', 'NCT04130204', 'NCT01875458', 'NCT03453632', 'NCT02541292', 'NCT00111241', 'NCT00449865', 'NCT00971074']
 Query 74 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000
  Total time for query: 0.10 seconds
Time taken for semantic search: 13 ms
Query 75 retrieved: ['NCT00908076', 'NCT04051060', 'NCT00493974', 'NCT00449865', 'NCT00936585', 'NCT02896816', 'NCT00646204', 'NCT03446807', 'NCT01366027', 'NCT04130204']
 Query 75 metrics: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10




### BM25 and semantic search with ClinicalBERT

Initial retrieval with BM25 and re-ranking with ClinicalBERT

In [None]:
import json
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModel
import torch
import numpy as np
import time
import pandas as pd
from datetime import datetime

# Load ClinicalBERT tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("medicalai/ClinicalBERT")
model = AutoModel.from_pretrained("medicalai/ClinicalBERT")


def mean_pooling(model_output, attention_mask):
    """
    Perform mean pooling on token embeddings while taking the attention mask into account.
    """
    token_embeddings = model_output.last_hidden_state  # (batch_size, sequence_length, hidden_size)
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, dim=1)
    sum_mask = torch.clamp(input_mask_expanded.sum(dim=1), min=1e-9)
    return sum_embeddings / sum_mask

def get_embedding(text, max_length=512):
    """
    Compute and return the embedding for the given text using ClinicalBERT.
    Returns the embedding as a list of floats.
    """
    encoded_input = tokenizer(
        text,
        padding="max_length",
        truncation=True,
        max_length=max_length,
        return_tensors="pt"
    )
    with torch.no_grad():
        model_output = model(**encoded_input)
    embedding = mean_pooling(model_output, encoded_input['attention_mask'])
    return embedding[0].cpu().numpy().tolist()

# Evaluation Metric Functions
def precision_at_k(re_ranked_docs, relevant_docs, k):
    top_k_docs = re_ranked_docs[:k]
    num_relevant_in_k = sum(1 for doc in top_k_docs 
                            if doc in relevant_docs and relevant_docs[doc] > 0)
    return num_relevant_in_k / k if k else 0.0

def recall_at_k(re_ranked_docs, relevant_docs, k):
    top_k_docs = re_ranked_docs[:k]
    total_relevant = sum(1 for doc, rel in relevant_docs.items() if rel > 0)
    if total_relevant == 0:
        return 0.0
    num_relevant_in_k = sum(1 for doc in top_k_docs 
                            if doc in relevant_docs and relevant_docs[doc] > 0)
    return num_relevant_in_k / total_relevant

def f1_at_k(re_ranked_docs, relevant_docs, k):
    p = precision_at_k(re_ranked_docs, relevant_docs, k)
    r = recall_at_k(re_ranked_docs, relevant_docs, k)
    if (p + r) == 0:
        return 0.0
    return 2 * (p * r) / (p + r)

def average_precision(re_ranked_docs, relevant_docs):
    if not relevant_docs:
        return 0.0
    
    relevant_indices = [
        i for i, doc in enumerate(re_ranked_docs)
        if doc in relevant_docs and relevant_docs[doc] > 0
    ]
    if not relevant_indices:
        return 0.0
    
    ap_sum = 0.0
    for idx, rank_pos in enumerate(relevant_indices, start=1):
        prec = precision_at_k(re_ranked_docs, relevant_docs, rank_pos+1)
        ap_sum += prec
    return ap_sum / len(relevant_indices)

def ndcg_at_k(re_ranked_docs, relevant_docs, k):
    def dcg(scores):
        return sum(( (2**s - 1) / np.log2(idx + 2)) for idx, s in enumerate(scores))
    
    top_k_docs = re_ranked_docs[:k]
    actual_scores = [relevant_docs[doc] if doc in relevant_docs else 0 for doc in top_k_docs]
    actual_dcg = dcg(actual_scores)
    
    sorted_rels = sorted(relevant_docs.values(), reverse=True)[:k]
    ideal_dcg = dcg(sorted_rels)
    
    if ideal_dcg == 0:
        return 0.0
    return actual_dcg / ideal_dcg

def mrr_at_k(re_ranked_docs, relevant_docs, k):
    for i, doc_id in enumerate(re_ranked_docs[:k], start=1):
        if doc_id in relevant_docs and relevant_docs[doc_id] > 0:
            return 1.0 / i
    return 0.0

# Load queries and qrels from JSONL files
def load_jsonl(filepath):
    data = {}
    with open(filepath, "r") as f:
        for line in f:
            entry = json.loads(line)
            key = entry["query_id"]
            if "text" in entry:
                data[key] = entry["text"].strip()
            else:
                if key not in data:
                    data[key] = {}
                data[key][entry["doc_id"]] = int(entry.get("relevance", 0))
    return data

# Stage 1: initial candidate retrieval based on text fields "title", "summary", "detailed_descrption
def initial_candidate_retrieval(query_text, candidate_size=500):
    initial_body = {
        "query": {
            "multi_match": {
                "query": query_text,
                "fields": ["title", "summary", "detailed_descrption"]
            }
        }
    }
    response = es.search(index="ir_dev_index", body=initial_body, size=candidate_size)
    candidate_ids = [hit["_id"] for hit in response["hits"]["hits"]]
    print("Time taken to for lexical search: {} ms".format(response["took"]))
    return candidate_ids

# Stage 2: vector re-ranking on the candidate set.
def vector_re_ranking(query_text, candidate_ids, top_k=10):
    query_vector = get_embedding(query_text)  # Assumes get_embedding() is defined elsewhere.
    vector_query_body = {
        "query": {
            "script_score": {
                "query": {
                    "bool": {
                        "filter": [
                            {"ids": {"values": candidate_ids}},
                            {"exists": {"field": "detailed_descrption_vector"}}
                        ]
                    }
                },
                "script": {
                    "source": "cosineSimilarity(params.query_vector, 'detailed_descrption_vector') + 1.0",
                    "params": {"query_vector": query_vector}
                }
            }
        }
    }
    response = es.search(index="ir_dev_index", body=vector_query_body, size=top_k)
    print("Time taken to re-rank: {} ms".format(response["took"]))
    return [hit["_source"]["doc_id"] for hit in response["hits"]["hits"]]

# Load queries and qrels
queries = load_jsonl("trec_ct_2021_query.jsonl")
qrels = load_jsonl("trec_ct_2021_qrels.jsonl") 

# Set the RELEVANCE_THRESHOLD. Consider any document with relevance >= RELEVANCE_THRESHOLD as relevant for evaluation purposes
RELEVANCE_THRESHOLD = 2
def get_relevant_docs(qrel_dict, relevance_threshold=1):
    return {doc_id: rel for doc_id, rel in qrel_dict.items() if rel >= RELEVANCE_THRESHOLD}

# For ndcg calculation
def get_all_docs(qrel_dict):
    """Return the entire dictionary of {doc_id: relevance}, including relevance=0."""
    return {doc_id: rel for doc_id, rel in qrel_dict.items()}

CANDIDATE_SIZE = 1000 #top records for first stage ranking
K = 10  # top K for re-ranking and evaluation
all_metrics = []

print("Processing queries and computing evaluation metrics...")
for qid, query_text in tqdm(queries.items(), total=len(queries)):
    if not query_text:
        continue

    # Track the total time
    start_time = time.time()
    
    # Stage 1: retrieve candidate document IDs.
    candidate_ids = initial_candidate_retrieval(query_text, candidate_size=CANDIDATE_SIZE)
    # Stage 2: re-rank candidates via vector search.
    retrieved_docs = vector_re_ranking(query_text, candidate_ids, top_k=K)
    
    # Mark the end time for entire query processing
    total_time = time.time() - start_time

    # Get relevant docs for this query from qrels using the RELEVANCE_THRESHOLD (for evaluation)
    relevant_docs = get_relevant_docs(qrels.get(qid, {}), relevance_threshold=RELEVANCE_THRESHOLD)
    
    #All docs retrieved (i.e no threshold) to compute nDCG based on graded relevance 0,1,2
    ndcg_relevant_docs = get_all_docs(qrels.get(qid, {}))
    
    # Compute evaluation measures using our functions.
    prec = precision_at_k(retrieved_docs, relevant_docs, K)
    rec = recall_at_k(retrieved_docs, relevant_docs, K)
    f1 = f1_at_k(retrieved_docs, relevant_docs, K)
    ap = average_precision(retrieved_docs, relevant_docs)
    ndcg = ndcg_at_k(retrieved_docs, ndcg_relevant_docs, K)
    mrr_val = mrr_at_k(retrieved_docs, relevant_docs, K)
    
    metrics = {
        "query_id": qid,
        "precision": prec,
        "recall": rec,
        "f1": f1,
        "average_precision": ap,
        "ndcg": ndcg,
        "mrr": mrr_val,
        "time_seconds": total_time
    }
    all_metrics.append(metrics)
    
    print(f"Query {qid}: Precision@{K}: {prec:.4f}, Recall@{K}: {rec:.4f}, F1@{K}: {f1:.4f}, AP: {ap:.4f}, nDCG@{K}: {ndcg:.4f}, mrr@{K}: {mrr_val:.4f}")
    print(f"  Total time for query: {total_time:.2f} seconds")

# Compute overall evaluation metrics.
if all_metrics:
    avg_precision = np.mean([m["precision"] for m in all_metrics])
    avg_recall = np.mean([m["recall"] for m in all_metrics])
    avg_f1 = np.mean([m["f1"] for m in all_metrics])
    mean_ap = np.mean([m["average_precision"] for m in all_metrics])
    mean_ndcg = np.mean([m["ndcg"] for m in all_metrics])
    avg_mrr = np.mean([m["mrr"] for m in all_metrics])
    avg_time = np.mean([m["time_seconds"] for m in all_metrics])
    
    
    print("\nBM25L + Semantic search (Clinical BERT) - Overall evaluation metrics:")
    current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print(f"Date/time:{current_time}")
    print(f"\nThreshold =: {RELEVANCE_THRESHOLD}")
    print(f"Mean Precision@{K}: {avg_precision:.4f}")
    print(f"Mean Recall@{K}: {avg_recall:.4f}")
    print(f"Mean F1@{K}: {avg_f1:.4f}")
    print(f"Mean Average Precision (MAP): {mean_ap:.4f}")
    print(f"Mean nDCG@{K}: {mean_ndcg:.4f}")
    print(f"Mean MRR@{K}: {avg_mrr:.4f}")
    print(f"Average total time per query: {avg_time:.2f} seconds")
else:
    print("No queries processed.")

# Create a DataFrame to export the computed results for all queries
data = {
    "Test": "BM25L and Token Emdedding.",
    "Date/Time": [current_time],
    "Relevance Threshold": [RELEVANCE_THRESHOLD],
    f"Mean Precision@{K}": [avg_precision],
    f"Mean Recall@{K}": [avg_recall],
    f"Mean F1@{K}": [avg_f1],
    "Mean Average Precision (MAP)": [mean_ap],
    f"Mean nDCG@{K}": [mean_ndcg],
    f"Mean MRR@{K}": [avg_mrr],
    "Avg Time per Query (s)": [avg_time],
}
df = pd.DataFrame(data)

# Export to an Excel file
xlsx_filename = "evaluation_results_Lexical_and_TOKEN_embeddings.xlsx"
df.to_excel(xlsx_filename, index=False)

print(f"\nMetrics have been exported to {xlsx_filename}")


Processing queries and computing evaluation metrics...


  response = es.search(index="ir_dev_index", body=initial_body, size=candidate_size)
  response = es.search(index="ir_dev_index", body=vector_query_body, size=top_k)
  1%|▏         | 1/75 [00:01<02:09,  1.75s/it]

Time taken to for lexical search: 1314 ms
Time taken to re-rank: 60 ms
Query 1: Precision@10: 0.1000, Recall@10: 0.0213, F1@10: 0.0351, AP: 0.1667, nDCG@10: 0.0784, mrr@10: 0.1667
  Total time for query: 1.75 seconds
Time taken to for lexical search: 587 ms


  3%|▎         | 2/75 [00:03<01:47,  1.47s/it]

Time taken to re-rank: 325 ms
Query 2: Precision@10: 0.1000, Recall@10: 0.0072, F1@10: 0.0135, AP: 0.2000, nDCG@10: 0.2146, mrr@10: 0.2000
  Total time for query: 1.27 seconds


  4%|▍         | 3/75 [00:03<01:24,  1.17s/it]

Time taken to for lexical search: 415 ms
Time taken to re-rank: 73 ms
Query 3: Precision@10: 0.1000, Recall@10: 0.0312, F1@10: 0.0476, AP: 0.3333, nDCG@10: 0.1563, mrr@10: 0.3333
  Total time for query: 0.82 seconds
Time taken to for lexical search: 295 ms


  5%|▌         | 4/75 [00:04<01:12,  1.02s/it]

Time taken to re-rank: 133 ms
Query 4: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.77 seconds


  7%|▋         | 5/75 [00:05<00:59,  1.18it/s]

Time taken to for lexical search: 176 ms
Time taken to re-rank: 42 ms
Query 5: Precision@10: 0.1000, Recall@10: 0.0084, F1@10: 0.0155, AP: 0.1250, nDCG@10: 0.1151, mrr@10: 0.1250
  Total time for query: 0.56 seconds
Time taken to for lexical search: 159 ms


  8%|▊         | 6/75 [00:06<01:05,  1.05it/s]

Time taken to re-rank: 682 ms
Query 6: Precision@10: 0.1000, Recall@10: 0.0085, F1@10: 0.0157, AP: 0.1111, nDCG@10: 0.1872, mrr@10: 0.1111
  Total time for query: 1.16 seconds


  9%|▉         | 7/75 [00:06<00:55,  1.22it/s]

Time taken to for lexical search: 176 ms
Time taken to re-rank: 31 ms
Query 7: Precision@10: 0.3000, Recall@10: 0.0240, F1@10: 0.0444, AP: 0.2611, nDCG@10: 0.2208, mrr@10: 0.2000
  Total time for query: 0.54 seconds


 11%|█         | 8/75 [00:07<00:47,  1.40it/s]

Time taken to for lexical search: 145 ms
Time taken to re-rank: 19 ms
Query 8: Precision@10: 0.1000, Recall@10: 0.0109, F1@10: 0.0196, AP: 0.2000, nDCG@10: 0.0851, mrr@10: 0.2000
  Total time for query: 0.49 seconds


 12%|█▏        | 9/75 [00:07<00:42,  1.56it/s]

Time taken to for lexical search: 134 ms
Time taken to re-rank: 17 ms
Query 9: Precision@10: 0.1000, Recall@10: 0.0769, F1@10: 0.0870, AP: 0.2500, nDCG@10: 0.2306, mrr@10: 0.2500
  Total time for query: 0.48 seconds


 13%|█▎        | 10/75 [00:08<00:37,  1.72it/s]

Time taken to for lexical search: 119 ms
Time taken to re-rank: 17 ms
Query 10: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.44 seconds


 15%|█▍        | 11/75 [00:08<00:35,  1.81it/s]

Time taken to for lexical search: 150 ms
Time taken to re-rank: 22 ms
Query 11: Precision@10: 0.1000, Recall@10: 0.0189, F1@10: 0.0317, AP: 0.1250, nDCG@10: 0.2974, mrr@10: 0.1250
  Total time for query: 0.49 seconds


 16%|█▌        | 12/75 [00:09<00:34,  1.85it/s]

Time taken to for lexical search: 186 ms
Time taken to re-rank: 27 ms
Query 12: Precision@10: 0.2000, Recall@10: 0.0192, F1@10: 0.0351, AP: 0.4167, nDCG@10: 0.2048, mrr@10: 0.3333
  Total time for query: 0.52 seconds


 17%|█▋        | 13/75 [00:09<00:32,  1.92it/s]

Time taken to for lexical search: 157 ms
Time taken to re-rank: 10 ms
Query 13: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0600, mrr@10: 0.0000
  Total time for query: 0.47 seconds


 19%|█▊        | 14/75 [00:10<00:31,  1.93it/s]

Time taken to for lexical search: 161 ms
Time taken to re-rank: 26 ms
Query 14: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0367, mrr@10: 0.0000
  Total time for query: 0.51 seconds


 20%|██        | 15/75 [00:10<00:30,  1.96it/s]

Time taken to for lexical search: 152 ms
Time taken to re-rank: 15 ms
Query 15: Precision@10: 0.1000, Recall@10: 0.0070, F1@10: 0.0131, AP: 0.1429, nDCG@10: 0.0734, mrr@10: 0.1429
  Total time for query: 0.49 seconds


 21%|██▏       | 16/75 [00:11<00:34,  1.71it/s]

Time taken to for lexical search: 138 ms
Time taken to re-rank: 15 ms
Query 16: Precision@10: 0.1000, Recall@10: 0.0161, F1@10: 0.0278, AP: 0.2000, nDCG@10: 0.1072, mrr@10: 0.2000
  Total time for query: 0.75 seconds


 23%|██▎       | 17/75 [00:12<00:34,  1.68it/s]

Time taken to for lexical search: 269 ms
Time taken to re-rank: 35 ms
Query 17: Precision@10: 0.1000, Recall@10: 0.0079, F1@10: 0.0146, AP: 0.1250, nDCG@10: 0.0915, mrr@10: 0.1250
  Total time for query: 0.63 seconds


 24%|██▍       | 18/75 [00:12<00:31,  1.81it/s]

Time taken to for lexical search: 124 ms
Time taken to re-rank: 15 ms
Query 18: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.45 seconds


 25%|██▌       | 19/75 [00:13<00:30,  1.87it/s]

Time taken to for lexical search: 152 ms
Time taken to re-rank: 33 ms
Query 19: Precision@10: 0.4000, Recall@10: 0.0348, F1@10: 0.0640, AP: 0.5750, nDCG@10: 0.4473, mrr@10: 1.0000
  Total time for query: 0.50 seconds


 27%|██▋       | 20/75 [00:13<00:29,  1.89it/s]

Time taken to for lexical search: 157 ms
Time taken to re-rank: 46 ms
Query 20: Precision@10: 0.1000, Recall@10: 0.0238, F1@10: 0.0385, AP: 0.5000, nDCG@10: 0.1705, mrr@10: 0.5000
  Total time for query: 0.51 seconds


 28%|██▊       | 21/75 [00:14<00:27,  1.97it/s]

Time taken to for lexical search: 103 ms
Time taken to re-rank: 25 ms
Query 21: Precision@10: 0.1000, Recall@10: 0.0204, F1@10: 0.0339, AP: 0.1429, nDCG@10: 0.0734, mrr@10: 0.1429
  Total time for query: 0.45 seconds


 29%|██▉       | 22/75 [00:14<00:27,  1.92it/s]

Time taken to for lexical search: 160 ms
Time taken to re-rank: 77 ms
Query 22: Precision@10: 0.1000, Recall@10: 0.0141, F1@10: 0.0247, AP: 0.1250, nDCG@10: 0.1306, mrr@10: 0.1250
  Total time for query: 0.55 seconds


 31%|███       | 23/75 [00:15<00:27,  1.90it/s]

Time taken to for lexical search: 145 ms
Time taken to re-rank: 55 ms
Query 23: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0737, mrr@10: 0.0000
  Total time for query: 0.53 seconds


 32%|███▏      | 24/75 [00:15<00:26,  1.92it/s]

Time taken to for lexical search: 156 ms
Time taken to re-rank: 42 ms
Query 24: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.51 seconds


 33%|███▎      | 25/75 [00:16<00:25,  1.97it/s]

Time taken to for lexical search: 141 ms
Time taken to re-rank: 29 ms
Query 25: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.48 seconds


 35%|███▍      | 26/75 [00:16<00:23,  2.09it/s]

Time taken to for lexical search: 92 ms
Time taken to re-rank: 12 ms
Query 26: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0231, mrr@10: 0.0000
  Total time for query: 0.41 seconds


 36%|███▌      | 27/75 [00:17<00:25,  1.89it/s]

Time taken to for lexical search: 307 ms
Time taken to re-rank: 29 ms
Query 27: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0683, mrr@10: 0.0000
  Total time for query: 0.64 seconds


 37%|███▋      | 28/75 [00:17<00:23,  2.01it/s]

Time taken to for lexical search: 81 ms
Time taken to re-rank: 15 ms
Query 28: Precision@10: 0.4000, Recall@10: 0.0280, F1@10: 0.0523, AP: 0.6528, nDCG@10: 0.4992, mrr@10: 1.0000
  Total time for query: 0.42 seconds


 39%|███▊      | 29/75 [00:18<00:22,  2.04it/s]

Time taken to for lexical search: 127 ms
Time taken to re-rank: 21 ms
Query 29: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.47 seconds


 40%|████      | 30/75 [00:18<00:21,  2.07it/s]

Time taken to for lexical search: 130 ms
Time taken to re-rank: 16 ms
Query 30: Precision@10: 0.1000, Recall@10: 0.0208, F1@10: 0.0345, AP: 0.1000, nDCG@10: 0.2297, mrr@10: 0.1000
  Total time for query: 0.47 seconds


 41%|████▏     | 31/75 [00:19<00:20,  2.10it/s]

Time taken to for lexical search: 135 ms
Time taken to re-rank: 19 ms
Query 31: Precision@10: 0.1000, Recall@10: 0.0222, F1@10: 0.0364, AP: 0.5000, nDCG@10: 0.1672, mrr@10: 0.5000
  Total time for query: 0.46 seconds


 43%|████▎     | 32/75 [00:19<00:19,  2.18it/s]

Time taken to for lexical search: 96 ms
Time taken to re-rank: 15 ms
Query 32: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 44%|████▍     | 33/75 [00:19<00:18,  2.24it/s]

Time taken to for lexical search: 87 ms
Time taken to re-rank: 13 ms
Query 33: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 45%|████▌     | 34/75 [00:20<00:17,  2.30it/s]

Time taken to for lexical search: 88 ms
Time taken to re-rank: 10 ms
Query 34: Precision@10: 0.1000, Recall@10: 0.0127, F1@10: 0.0225, AP: 0.3333, nDCG@10: 0.1100, mrr@10: 0.3333
  Total time for query: 0.41 seconds


 47%|████▋     | 35/75 [00:20<00:17,  2.27it/s]

Time taken to for lexical search: 92 ms
Time taken to re-rank: 14 ms
Query 35: Precision@10: 0.1000, Recall@10: 0.0122, F1@10: 0.0217, AP: 0.2500, nDCG@10: 0.0948, mrr@10: 0.2500
  Total time for query: 0.45 seconds


 48%|████▊     | 36/75 [00:21<00:16,  2.33it/s]

Time taken to for lexical search: 78 ms
Time taken to re-rank: 12 ms
Query 36: Precision@10: 0.3000, Recall@10: 0.0163, F1@10: 0.0309, AP: 0.6806, nDCG@10: 0.4459, mrr@10: 1.0000
  Total time for query: 0.40 seconds


 49%|████▉     | 37/75 [00:21<00:16,  2.34it/s]

Time taken to for lexical search: 102 ms
Time taken to re-rank: 13 ms
Query 37: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 51%|█████     | 38/75 [00:21<00:15,  2.36it/s]

Time taken to for lexical search: 81 ms
Time taken to re-rank: 11 ms
Query 38: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 52%|█████▏    | 39/75 [00:22<00:15,  2.34it/s]

Time taken to for lexical search: 99 ms
Time taken to re-rank: 10 ms
Query 39: Precision@10: 0.1000, Recall@10: 0.0294, F1@10: 0.0455, AP: 0.1667, nDCG@10: 0.0784, mrr@10: 0.1667
  Total time for query: 0.44 seconds


 53%|█████▎    | 40/75 [00:22<00:16,  2.18it/s]

Time taken to for lexical search: 201 ms
Time taken to re-rank: 18 ms
Query 40: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.53 seconds


 55%|█████▍    | 41/75 [00:23<00:15,  2.21it/s]

Time taken to for lexical search: 105 ms
Time taken to re-rank: 14 ms
Query 41: Precision@10: 0.3000, Recall@10: 0.0224, F1@10: 0.0417, AP: 0.3619, nDCG@10: 0.4459, mrr@10: 0.5000
  Total time for query: 0.44 seconds


 56%|█████▌    | 42/75 [00:23<00:15,  2.19it/s]

Time taken to for lexical search: 99 ms
Time taken to re-rank: 10 ms
Query 42: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0651, mrr@10: 0.0000
  Total time for query: 0.46 seconds


 57%|█████▋    | 43/75 [00:24<00:14,  2.23it/s]

Time taken to for lexical search: 107 ms
Time taken to re-rank: 19 ms
Query 43: Precision@10: 0.1000, Recall@10: 0.0159, F1@10: 0.0274, AP: 1.0000, nDCG@10: 0.2201, mrr@10: 1.0000
  Total time for query: 0.43 seconds


 59%|█████▊    | 44/75 [00:24<00:13,  2.27it/s]

Time taken to for lexical search: 97 ms
Time taken to re-rank: 17 ms
Query 44: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0259, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 60%|██████    | 45/75 [00:25<00:13,  2.29it/s]

Time taken to for lexical search: 92 ms
Time taken to re-rank: 11 ms
Query 45: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.43 seconds


 61%|██████▏   | 46/75 [00:25<00:12,  2.28it/s]

Time taken to for lexical search: 98 ms
Time taken to re-rank: 11 ms
Query 46: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.44 seconds


 63%|██████▎   | 47/75 [00:26<00:12,  2.24it/s]

Time taken to for lexical search: 118 ms
Time taken to re-rank: 19 ms
Query 47: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.47 seconds


 64%|██████▍   | 48/75 [00:26<00:11,  2.25it/s]

Time taken to for lexical search: 92 ms
Time taken to re-rank: 8 ms
Query 48: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.44 seconds


 65%|██████▌   | 49/75 [00:26<00:11,  2.29it/s]

Time taken to for lexical search: 90 ms
Time taken to re-rank: 15 ms
Query 49: Precision@10: 0.1000, Recall@10: 0.0286, F1@10: 0.0444, AP: 0.5000, nDCG@10: 0.1389, mrr@10: 0.5000
  Total time for query: 0.42 seconds


 67%|██████▋   | 50/75 [00:27<00:10,  2.31it/s]

Time taken to for lexical search: 97 ms
Time taken to re-rank: 16 ms
Query 50: Precision@10: 0.1000, Recall@10: 0.0714, F1@10: 0.0833, AP: 0.1111, nDCG@10: 0.0663, mrr@10: 0.1111
  Total time for query: 0.43 seconds


 68%|██████▊   | 51/75 [00:27<00:10,  2.33it/s]

Time taken to for lexical search: 91 ms
Time taken to re-rank: 13 ms
Query 51: Precision@10: 0.1000, Recall@10: 0.0357, F1@10: 0.0526, AP: 0.1667, nDCG@10: 0.0784, mrr@10: 0.1667
  Total time for query: 0.42 seconds


 69%|██████▉   | 52/75 [00:28<00:09,  2.31it/s]

Time taken to for lexical search: 88 ms
Time taken to re-rank: 6 ms
Query 52: Precision@10: 0.1000, Recall@10: 0.0244, F1@10: 0.0392, AP: 0.5000, nDCG@10: 0.1650, mrr@10: 0.5000
  Total time for query: 0.44 seconds


 71%|███████   | 53/75 [00:28<00:09,  2.32it/s]

Time taken to for lexical search: 70 ms
Time taken to re-rank: 13 ms
Query 53: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.43 seconds


 72%|███████▏  | 54/75 [00:29<00:09,  2.28it/s]

Time taken to for lexical search: 112 ms
Time taken to re-rank: 10 ms
Query 54: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.45 seconds


 73%|███████▎  | 55/75 [00:29<00:08,  2.31it/s]

Time taken to for lexical search: 95 ms
Time taken to re-rank: 14 ms
Query 55: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 75%|███████▍  | 56/75 [00:29<00:08,  2.30it/s]

Time taken to for lexical search: 96 ms
Time taken to re-rank: 20 ms
Query 56: Precision@10: 0.1000, Recall@10: 0.0196, F1@10: 0.0328, AP: 0.1250, nDCG@10: 0.1873, mrr@10: 0.1250
  Total time for query: 0.43 seconds


 76%|███████▌  | 57/75 [00:30<00:07,  2.32it/s]

Time taken to for lexical search: 88 ms
Time taken to re-rank: 8 ms
Query 57: Precision@10: 0.2000, Recall@10: 0.0185, F1@10: 0.0339, AP: 0.3929, nDCG@10: 0.2122, mrr@10: 0.5000
  Total time for query: 0.42 seconds


 77%|███████▋  | 58/75 [00:30<00:07,  2.33it/s]

Time taken to for lexical search: 93 ms
Time taken to re-rank: 17 ms
Query 58: Precision@10: 0.1000, Recall@10: 0.0182, F1@10: 0.0308, AP: 0.3333, nDCG@10: 0.1100, mrr@10: 0.3333
  Total time for query: 0.43 seconds


 79%|███████▊  | 59/75 [00:31<00:06,  2.32it/s]

Time taken to for lexical search: 117 ms
Time taken to re-rank: 22 ms
Query 59: Precision@10: 0.4000, Recall@10: 0.0212, F1@10: 0.0402, AP: 0.2958, nDCG@10: 0.3350, mrr@10: 0.2000
  Total time for query: 0.44 seconds


 80%|████████  | 60/75 [00:31<00:06,  2.31it/s]

Time taken to for lexical search: 112 ms
Time taken to re-rank: 21 ms
Query 60: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.44 seconds


 81%|████████▏ | 61/75 [00:32<00:06,  2.20it/s]

Time taken to for lexical search: 146 ms
Time taken to re-rank: 40 ms
Query 61: Precision@10: 0.1000, Recall@10: 0.0083, F1@10: 0.0154, AP: 0.3333, nDCG@10: 0.1661, mrr@10: 0.3333
  Total time for query: 0.50 seconds


 83%|████████▎ | 62/75 [00:32<00:06,  2.11it/s]

Time taken to for lexical search: 177 ms
Time taken to re-rank: 22 ms
Query 62: Precision@10: 0.1000, Recall@10: 0.0154, F1@10: 0.0267, AP: 1.0000, nDCG@10: 0.2201, mrr@10: 1.0000
  Total time for query: 0.52 seconds


 84%|████████▍ | 63/75 [00:33<00:05,  2.17it/s]

Time taken to for lexical search: 98 ms
Time taken to re-rank: 11 ms
Query 63: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.43 seconds


 85%|████████▌ | 64/75 [00:33<00:05,  1.96it/s]

Time taken to for lexical search: 118 ms
Time taken to re-rank: 26 ms
Query 64: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0316, mrr@10: 0.0000
  Total time for query: 0.62 seconds


 87%|████████▋ | 65/75 [00:34<00:04,  2.08it/s]

Time taken to for lexical search: 89 ms
Time taken to re-rank: 16 ms
Query 65: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 88%|████████▊ | 66/75 [00:34<00:04,  2.18it/s]

Time taken to for lexical search: 78 ms
Time taken to re-rank: 14 ms
Query 66: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.40 seconds


 89%|████████▉ | 67/75 [00:34<00:03,  2.24it/s]

Time taken to for lexical search: 87 ms
Time taken to re-rank: 21 ms
Query 67: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0993, mrr@10: 0.0000
  Total time for query: 0.41 seconds


 91%|█████████ | 68/75 [00:35<00:03,  2.25it/s]

Time taken to for lexical search: 98 ms
Time taken to re-rank: 18 ms
Query 68: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.1177, mrr@10: 0.0000
  Total time for query: 0.44 seconds


 92%|█████████▏| 69/75 [00:35<00:02,  2.22it/s]

Time taken to for lexical search: 87 ms
Time taken to re-rank: 10 ms
Query 69: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.46 seconds


 93%|█████████▎| 70/75 [00:36<00:02,  2.25it/s]

Time taken to for lexical search: 91 ms
Time taken to re-rank: 12 ms
Query 70: Precision@10: 0.2000, Recall@10: 0.0182, F1@10: 0.0333, AP: 0.2917, nDCG@10: 0.2079, mrr@10: 0.3333
  Total time for query: 0.43 seconds


 95%|█████████▍| 71/75 [00:36<00:01,  2.28it/s]

Time taken to for lexical search: 101 ms
Time taken to re-rank: 9 ms
Query 71: Precision@10: 0.2000, Recall@10: 0.0150, F1@10: 0.0280, AP: 0.4167, nDCG@10: 0.2393, mrr@10: 0.5000
  Total time for query: 0.43 seconds


 96%|█████████▌| 72/75 [00:37<00:01,  2.33it/s]

Time taken to for lexical search: 76 ms
Time taken to re-rank: 14 ms
Query 72: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr@10: 0.0000
  Total time for query: 0.40 seconds


 97%|█████████▋| 73/75 [00:37<00:00,  2.35it/s]

Time taken to for lexical search: 86 ms
Time taken to re-rank: 16 ms
Query 73: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0537, mrr@10: 0.0000
  Total time for query: 0.42 seconds


 99%|█████████▊| 74/75 [00:37<00:00,  2.31it/s]

Time taken to for lexical search: 103 ms
Time taken to re-rank: 17 ms
Query 74: Precision@10: 0.1000, Recall@10: 0.0909, F1@10: 0.0952, AP: 0.2000, nDCG@10: 0.0851, mrr@10: 0.2000
  Total time for query: 0.45 seconds


100%|██████████| 75/75 [00:38<00:00,  1.95it/s]

Time taken to for lexical search: 98 ms
Time taken to re-rank: 15 ms
Query 75: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0245, mrr@10: 0.0000
  Total time for query: 0.43 seconds

BM25L + Semantic search (Clinical BERT) - Overall evaluation metrics:
Date/time:2025-04-10 08:17:38

Threshold =: 2
Mean Precision@10: 0.0773
Mean Recall@10: 0.0122
Mean F1@10: 0.0191
Mean Average Precision (MAP): 0.1695
Mean nDCG@10: 0.1076
Mean MRR@10: 0.1858
Average total time per query: 0.51 seconds





### Lexical search only (elastic saerch BM25)

In [None]:
import json
import numpy as np
from elasticsearch import Elasticsearch, helpers
from tqdm import tqdm
import time
import pandas as pd
from datetime import datetime

# Evaluation Metric Functions
def precision_at_k(re_ranked_docs, relevant_docs, k):
    top_k_docs = re_ranked_docs[:k]
    num_relevant_in_k = sum(1 for doc in top_k_docs 
                            if doc in relevant_docs and relevant_docs[doc] > 0)
    return num_relevant_in_k / k if k else 0.0

def recall_at_k(re_ranked_docs, relevant_docs, k):
    top_k_docs = re_ranked_docs[:k]
    total_relevant = sum(1 for doc, rel in relevant_docs.items() if rel > 0)
    if total_relevant == 0:
        return 0.0
    num_relevant_in_k = sum(1 for doc in top_k_docs 
                            if doc in relevant_docs and relevant_docs[doc] > 0)
    return num_relevant_in_k / total_relevant

def f1_at_k(re_ranked_docs, relevant_docs, k):
    p = precision_at_k(re_ranked_docs, relevant_docs, k)
    r = recall_at_k(re_ranked_docs, relevant_docs, k)
    if (p + r) == 0:
        return 0.0
    return 2 * (p * r) / (p + r)

def average_precision(re_ranked_docs, relevant_docs):
    if not relevant_docs:
        return 0.0
    
    relevant_indices = [
        i for i, doc in enumerate(re_ranked_docs)
        if doc in relevant_docs and relevant_docs[doc] > 0
    ]
    if not relevant_indices:
        return 0.0
    
    ap_sum = 0.0
    for idx, rank_pos in enumerate(relevant_indices, start=1):
        prec = precision_at_k(re_ranked_docs, relevant_docs, rank_pos+1)
        ap_sum += prec
    return ap_sum / len(relevant_indices)

def ndcg_at_k(re_ranked_docs, relevant_docs, k):
    def dcg(scores):
        return sum(( (2**s - 1) / np.log2(idx + 2)) for idx, s in enumerate(scores))
    
    top_k_docs = re_ranked_docs[:k]
    actual_scores = [relevant_docs[doc] if doc in relevant_docs else 0 for doc in top_k_docs]
    actual_dcg = dcg(actual_scores)
    
    sorted_rels = sorted(relevant_docs.values(), reverse=True)[:k]
    ideal_dcg = dcg(sorted_rels)
    
    if ideal_dcg == 0:
        return 0.0
    return actual_dcg / ideal_dcg

def mrr_at_k(re_ranked_docs, relevant_docs, k):
    for i, doc_id in enumerate(re_ranked_docs[:k], start=1):
        if doc_id in relevant_docs and relevant_docs[doc_id] > 0:
            return 1.0 / i
    return 0.0

# Define the text-based query search 
def search_text_based(query_text, K=10):
    search_body = {
        "query": {
            "multi_match": {
                "query": query_text,
                "fields": ["title", "summary", "detailed_descrption"]
            }
        }
    }
    response = es.search(index="ir_dev_index", body=search_body, size=K)
    print("Time taken to retrieve (ES only): {} ms".format(response["took"]))
    return [hit["_source"]["doc_id"] for hit in response['hits']['hits']]

# Load queries from a JSONL file   
def load_queries(file_path):
    queries = {}
    with open(file_path, "r") as f:
        for line in f:
            doc = json.loads(line)
            queries[doc["query_id"]] = doc["text"].strip()
    return queries

# Load qrels from a JSONL file
def load_qrels(file_path):
    qrels = {}
    with open(file_path, "r") as f:
        for line in f:
            doc = json.loads(line)
            qid = doc["query_id"]
            relevance = int(doc.get("relevance", 0))
            # Only keep documents meeting the threshold; else, assign 0.
            relevance = relevance if relevance >= RELEVANCE_THRESHOLD else 0
            if qid not in qrels:
                qrels[qid] = {}
            qrels[qid][doc["doc_id"]] = relevance
    return qrels

RELEVANCE_THRESHOLD = 2  # Only documents with relevance >= x are considered relevant for evaluation
K = 10  # Top-K documents for evaluation

# Filter qrels based on relevance threshold (for evaluation)
def get_relevant_docs(qrel_dict, threshold=RELEVANCE_THRESHOLD):
    return {doc_id: rel for doc_id, rel in qrel_dict.items() if rel >= threshold}

# For ndcg calculation
def get_all_docs(qrel_dict):
    """Return the entire dictionary of {doc_id: relevance}, including relevance=0."""
    return {doc_id: rel for doc_id, rel in qrel_dict.items()}

if __name__ == "__main__":
    # Set file paths for your queries and qrels JSONL files.
    queries_file = "trec_ct_2021_query.jsonl"
    qrels_file = "trec_ct_2021_qrels.jsonl"

    # Load queries and qrels.
    queries = load_queries(queries_file)
    qrels = load_qrels(qrels_file)

    all_metrics = []

    print("BM25L - Processing queries and computing evaluation metrics...")
    for qid, qtext in tqdm(queries.items(), total=len(queries)):
        if not qtext:
            continue
        
        # Track the total time
        start_time = time.time()
        # Stage 1: Retrieve top-K documents using text-based search.
        retrieved_docs = search_text_based(qtext, K=K)

        # Mark the end time for entire query processing
        total_time = time.time() - start_time

        # Get relevant docs for this query using the relevance threshold.
        relevant_docs = get_relevant_docs(qrels.get(qid, {}), threshold=RELEVANCE_THRESHOLD)

        #All docs retrieved (i.e no threshold) to compute nDCG based on graded relevance 0,1,2
        ndcg_relevant_docs = get_all_docs(qrels.get(qid, {}))

        # Compute evaluation measures.
        prec = precision_at_k(retrieved_docs, relevant_docs, K)
        rec = recall_at_k(retrieved_docs, relevant_docs, K)
        f1 = f1_at_k(retrieved_docs, relevant_docs, K)
        ap = average_precision(retrieved_docs, relevant_docs)
        ndcg = ndcg_at_k(retrieved_docs, ndcg_relevant_docs, K)
        mrr_val = mrr_at_k(retrieved_docs, relevant_docs, K)
        
        metrics = {
            "query_id": qid,
            "precision": prec,
            "recall": rec,
            "f1": f1,
            "average_precision": ap,
            "ndcg": ndcg,
            "mrr": mrr_val,
            "time_seconds": total_time
        }
        all_metrics.append(metrics)
        print(f"Query {qid}: Precision@{K}: {prec:.4f}, Recall@{K}: {rec:.4f}, F1@{K}: {f1:.4f}, AP: {ap:.4f}, nDCG@{K}: {ndcg:.4f}, mrr{K}: {mrr_val:.4f}")
        print(f"  Total time for query: {total_time:.2f} seconds")

    # Compute overall evaluation metrics.
    if all_metrics:
        avg_precision = np.mean([m["precision"] for m in all_metrics])
        avg_recall = np.mean([m["recall"] for m in all_metrics])
        avg_f1 = np.mean([m["f1"] for m in all_metrics])
        mean_ap = np.mean([m["average_precision"] for m in all_metrics])
        mean_ndcg = np.mean([m["ndcg"] for m in all_metrics])
        mean_mrr = np.mean([m["mrr"] for m in all_metrics])
        avg_time = np.mean([m["time_seconds"] for m in all_metrics])

        print("\nBM25L(standard es) - Overall evaluation metrics:")
        current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        print(f"Date/time:{current_time}")
        print(f"\nThreshold = {RELEVANCE_THRESHOLD}")
        print(f"Mean Precision@{K}: {avg_precision:.4f}")
        print(f"Mean Recall@{K}: {avg_recall:.4f}")
        print(f"Mean F1@{K}: {avg_f1:.4f}")
        print(f"Mean Average Precision (MAP): {mean_ap:.4f}")
        print(f"Mean nDCG@{K}: {mean_ndcg:.4f}")
        print(f"Mean MRR@{K}: {mean_mrr:.4f}")
        print(f"Average total time per query: {avg_time:.2f} seconds")
    else:
        print("No queries processed.")

# Create a DataFrame to export the computed results for all queries
data = {
    "Test": "BM25L Only.",
    "Date/Time": [current_time],
    "Relevance Threshold": [RELEVANCE_THRESHOLD],
    f"Mean Precision@{K}": [avg_precision],
    f"Mean Recall@{K}": [avg_recall],
    f"Mean F1@{K}": [avg_f1],
    "Mean Average Precision (MAP)": [mean_ap],
    f"Mean nDCG@{K}": [mean_ndcg],
    f"Mean MRR@{K}": [mean_mrr],
    "Avg Time per Query (s)": [avg_time],
}
df = pd.DataFrame(data)

# Export to an Excel file
xlsx_filename = "BM25L_only.xlsx"
df.to_excel(xlsx_filename, index=False)

print(f"\nMetrics have been exported to {xlsx_filename}")

BM25L - Processing queries and computing evaluation metrics...


  response = es.search(index="ir_dev_index", body=search_body, size=K)
  3%|▎         | 2/75 [00:00<00:04, 16.50it/s]

Time taken to retrieve (ES only): 46 ms
Query 1: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 46 ms
Query 2: Precision@10: 0.2000, Recall@10: 0.0145, F1@10: 0.0270, AP: 0.1964, nDCG@10: 0.1428, mrr10: 0.1429
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 37 ms
Query 3: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds


  5%|▌         | 4/75 [00:00<00:04, 16.89it/s]

Time taken to retrieve (ES only): 49 ms
Query 4: Precision@10: 0.1000, Recall@10: 0.0227, F1@10: 0.0370, AP: 0.1667, nDCG@10: 0.0784, mrr10: 0.1667
  Total time for query: 0.06 seconds


  8%|▊         | 6/75 [00:00<00:04, 16.06it/s]

Time taken to retrieve (ES only): 56 ms
Query 5: Precision@10: 0.3000, Recall@10: 0.0252, F1@10: 0.0465, AP: 0.2944, nDCG@10: 0.2431, mrr10: 0.3333
  Total time for query: 0.07 seconds
Time taken to retrieve (ES only): 41 ms
Query 6: Precision@10: 0.3000, Recall@10: 0.0256, F1@10: 0.0472, AP: 0.5444, nDCG@10: 0.3621, mrr10: 1.0000
  Total time for query: 0.06 seconds


 11%|█         | 8/75 [00:00<00:04, 15.60it/s]

Time taken to retrieve (ES only): 49 ms
Query 7: Precision@10: 0.4000, Recall@10: 0.0320, F1@10: 0.0593, AP: 0.3458, nDCG@10: 0.3098, mrr10: 0.2500
  Total time for query: 0.07 seconds
Time taken to retrieve (ES only): 41 ms
Query 8: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds


 13%|█▎        | 10/75 [00:00<00:04, 15.86it/s]

Time taken to retrieve (ES only): 43 ms
Query 9: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 44 ms
Query 10: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 27 ms
Query 11: Precision@10: 0.1000, Recall@10: 0.0189, F1@10: 0.0317, AP: 0.1250, nDCG@10: 0.0694, mrr10: 0.1250
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 27 ms
Query 12: Precision@10: 0.5000, Recall@10: 0.0481, F1@10: 0.0877, AP: 0.7889, nDCG@10: 0.5989, mrr10: 1.0000
  Total time for query: 0.05 seconds


 17%|█▋        | 13/75 [00:00<00:03, 17.64it/s]

Time taken to retrieve (ES only): 41 ms
Query 13: Precision@10: 0.1000, Recall@10: 0.0185, F1@10: 0.0312, AP: 0.3333, nDCG@10: 0.1100, mrr10: 0.3333
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 44 ms
Query 14: Precision@10: 0.1000, Recall@10: 0.0147, F1@10: 0.0256, AP: 0.1111, nDCG@10: 0.0663, mrr10: 0.1111
  Total time for query: 0.06 seconds


 20%|██        | 15/75 [00:00<00:03, 16.27it/s]

Time taken to retrieve (ES only): 62 ms
Query 15: Precision@10: 0.2000, Recall@10: 0.0140, F1@10: 0.0261, AP: 0.7500, nDCG@10: 0.3149, mrr10: 1.0000
  Total time for query: 0.08 seconds


 23%|██▎       | 17/75 [00:01<00:03, 15.81it/s]

Time taken to retrieve (ES only): 63 ms
Query 16: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.09 seconds
Time taken to retrieve (ES only): 30 ms
Query 17: Precision@10: 0.6000, Recall@10: 0.0472, F1@10: 0.0876, AP: 0.6458, nDCG@10: 0.5578, mrr10: 0.5000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 37 ms
Query 18: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds


 25%|██▌       | 19/75 [00:01<00:03, 16.28it/s]

Time taken to retrieve (ES only): 39 ms
Query 19: Precision@10: 0.2000, Recall@10: 0.0174, F1@10: 0.0320, AP: 0.4167, nDCG@10: 0.2173, mrr10: 0.5000
  Total time for query: 0.06 seconds


 28%|██▊       | 21/75 [00:01<00:03, 16.73it/s]

Time taken to retrieve (ES only): 41 ms
Query 20: Precision@10: 0.1000, Recall@10: 0.0238, F1@10: 0.0385, AP: 0.5000, nDCG@10: 0.1389, mrr10: 0.5000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 36 ms
Query 21: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 32 ms
Query 22: Precision@10: 0.1000, Recall@10: 0.0141, F1@10: 0.0247, AP: 0.1250, nDCG@10: 0.0694, mrr10: 0.1250
  Total time for query: 0.05 seconds


 31%|███       | 23/75 [00:01<00:02, 17.40it/s]

Time taken to retrieve (ES only): 43 ms
Query 23: Precision@10: 0.1000, Recall@10: 0.0119, F1@10: 0.0213, AP: 0.1667, nDCG@10: 0.0784, mrr10: 0.1667
  Total time for query: 0.06 seconds


 33%|███▎      | 25/75 [00:01<00:02, 17.47it/s]

Time taken to retrieve (ES only): 39 ms
Query 24: Precision@10: 0.3000, Recall@10: 0.0270, F1@10: 0.0496, AP: 0.5278, nDCG@10: 0.3558, mrr10: 1.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 43 ms
Query 25: Precision@10: 0.1000, Recall@10: 0.0135, F1@10: 0.0238, AP: 0.1000, nDCG@10: 0.0636, mrr10: 0.1000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 52 ms
Query 26: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.07 seconds


 36%|███▌      | 27/75 [00:01<00:02, 16.66it/s]

Time taken to retrieve (ES only): 43 ms
Query 27: Precision@10: 0.1000, Recall@10: 0.0175, F1@10: 0.0299, AP: 0.5000, nDCG@10: 0.1389, mrr10: 0.5000
  Total time for query: 0.06 seconds


 39%|███▊      | 29/75 [00:01<00:02, 16.00it/s]

Time taken to retrieve (ES only): 34 ms
Query 28: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 61 ms
Query 29: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.08 seconds
Time taken to retrieve (ES only): 33 ms
Query 30: Precision@10: 0.1000, Recall@10: 0.0208, F1@10: 0.0345, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 22 ms
Query 31: Precision@10: 0.1000, Recall@10: 0.0222, F1@10: 0.0364, AP: 0.1429, nDCG@10: 0.0734, mrr10: 0.1429
  Total time for query: 0.04 seconds


 45%|████▌     | 34/75 [00:02<00:02, 17.97it/s]

Time taken to retrieve (ES only): 38 ms
Query 32: Precision@10: 0.1000, Recall@10: 0.0385, F1@10: 0.0556, AP: 0.3333, nDCG@10: 0.1100, mrr10: 0.3333
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 30 ms
Query 33: Precision@10: 0.2000, Recall@10: 0.0099, F1@10: 0.0188, AP: 0.3611, nDCG@10: 0.2051, mrr10: 0.5000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 40 ms
Query 34: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 38 ms
Query 35: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds


 51%|█████     | 38/75 [00:02<00:02, 17.84it/s]

Time taken to retrieve (ES only): 38 ms
Query 36: Precision@10: 0.1000, Recall@10: 0.0054, F1@10: 0.0103, AP: 0.1000, nDCG@10: 0.0636, mrr10: 0.1000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 40 ms
Query 37: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 36 ms
Query 38: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 42 ms
Query 39: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds


 56%|█████▌    | 42/75 [00:02<00:02, 16.41it/s]

Time taken to retrieve (ES only): 39 ms
Query 40: Precision@10: 0.1000, Recall@10: 0.0909, F1@10: 0.0952, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 37 ms
Query 41: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 66 ms
Query 42: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.08 seconds
Time taken to retrieve (ES only): 26 ms
Query 43: Precision@10: 0.1000, Recall@10: 0.0159, F1@10: 0.0274, AP: 0.1250, nDCG@10: 0.0694, mrr10: 0.1250
  Total time for query: 0.04 seconds


 60%|██████    | 45/75 [00:02<00:01, 17.39it/s]

Time taken to retrieve (ES only): 37 ms
Query 44: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 44 ms
Query 45: Precision@10: 0.2000, Recall@10: 0.0138, F1@10: 0.0258, AP: 0.2429, nDCG@10: 0.1585, mrr10: 0.2000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 49 ms
Query 46: Precision@10: 0.1000, Recall@10: 0.0091, F1@10: 0.0167, AP: 0.1111, nDCG@10: 0.0663, mrr10: 0.1111
  Total time for query: 0.07 seconds


 63%|██████▎   | 47/75 [00:02<00:01, 16.53it/s]

Time taken to retrieve (ES only): 53 ms
Query 47: Precision@10: 0.3000, Recall@10: 0.0234, F1@10: 0.0435, AP: 0.3333, nDCG@10: 0.2547, mrr10: 0.3333
  Total time for query: 0.07 seconds


 65%|██████▌   | 49/75 [00:02<00:01, 17.06it/s]

Time taken to retrieve (ES only): 39 ms
Query 48: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 28 ms
Query 49: Precision@10: 0.1000, Recall@10: 0.0286, F1@10: 0.0444, AP: 0.1667, nDCG@10: 0.0784, mrr10: 0.1667
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 35 ms
Query 50: Precision@10: 0.3000, Recall@10: 0.2143, F1@10: 0.2500, AP: 0.5556, nDCG@10: 0.3273, mrr10: 0.5000
  Total time for query: 0.05 seconds


 68%|██████▊   | 51/75 [00:03<00:01, 17.65it/s]

Time taken to retrieve (ES only): 35 ms
Query 51: Precision@10: 0.1000, Recall@10: 0.0357, F1@10: 0.0526, AP: 0.1111, nDCG@10: 0.0663, mrr10: 0.1111
  Total time for query: 0.05 seconds


 72%|███████▏  | 54/75 [00:03<00:01, 18.28it/s]

Time taken to retrieve (ES only): 38 ms
Query 52: Precision@10: 0.2000, Recall@10: 0.0488, F1@10: 0.0784, AP: 0.4167, nDCG@10: 0.2173, mrr10: 0.5000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 24 ms
Query 53: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 40 ms
Query 54: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 50 ms
Query 55: Precision@10: 0.3000, Recall@10: 0.1304, F1@10: 0.1818, AP: 0.5778, nDCG@10: 0.3715, mrr10: 1.0000
  Total time for query: 0.07 seconds


 79%|███████▊  | 59/75 [00:03<00:00, 19.04it/s]

Time taken to retrieve (ES only): 29 ms
Query 56: Precision@10: 0.4000, Recall@10: 0.0784, F1@10: 0.1311, AP: 0.4683, nDCG@10: 0.3733, mrr10: 0.5000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 37 ms
Query 57: Precision@10: 0.1000, Recall@10: 0.0093, F1@10: 0.0169, AP: 0.1000, nDCG@10: 0.0636, mrr10: 0.1000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 28 ms
Query 58: Precision@10: 0.2000, Recall@10: 0.0364, F1@10: 0.0615, AP: 0.5833, nDCG@10: 0.2489, mrr10: 0.5000
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 27 ms
Query 59: Precision@10: 0.2000, Recall@10: 0.0106, F1@10: 0.0201, AP: 0.2778, nDCG@10: 0.1763, mrr10: 0.3333
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 32 ms
Query 60: Precision@10: 0.1000, Recall@10: 0.0067, F1@10: 0.0126, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
  Total time for query: 0.05 seconds


 83%|████████▎ | 62/75 [00:03<00:00, 19.16it/s]

Time taken to retrieve (ES only): 27 ms
Query 61: Precision@10: 0.1000, Recall@10: 0.0083, F1@10: 0.0154, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 44 ms
Query 62: Precision@10: 0.1000, Recall@10: 0.0154, F1@10: 0.0267, AP: 0.1429, nDCG@10: 0.0734, mrr10: 0.1429
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 42 ms
Query 63: Precision@10: 0.1000, Recall@10: 0.0286, F1@10: 0.0444, AP: 0.2500, nDCG@10: 0.0948, mrr10: 0.2500
  Total time for query: 0.06 seconds


 85%|████████▌ | 64/75 [00:03<00:00, 18.87it/s]

Time taken to retrieve (ES only): 37 ms
Query 64: Precision@10: 0.2000, Recall@10: 0.0308, F1@10: 0.0533, AP: 0.6429, nDCG@10: 0.2935, mrr10: 1.0000
  Total time for query: 0.05 seconds


 89%|████████▉ | 67/75 [00:03<00:00, 20.01it/s]

Time taken to retrieve (ES only): 39 ms
Query 65: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds
Time taken to retrieve (ES only): 29 ms
Query 66: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 17 ms
Query 67: Precision@10: 0.1000, Recall@10: 0.0167, F1@10: 0.0286, AP: 0.5000, nDCG@10: 0.1389, mrr10: 0.5000
  Total time for query: 0.03 seconds
Time taken to retrieve (ES only): 26 ms
Query 68: Precision@10: 0.1000, Recall@10: 0.0233, F1@10: 0.0377, AP: 0.2500, nDCG@10: 0.0948, mrr10: 0.2500
  Total time for query: 0.04 seconds
Time taken to retrieve (ES only): 27 ms
Query 69: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.04 seconds


 96%|█████████▌| 72/75 [00:04<00:00, 19.54it/s]

Time taken to retrieve (ES only): 55 ms
Query 70: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.07 seconds
Time taken to retrieve (ES only): 35 ms
Query 71: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 37 ms
Query 72: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds
Time taken to retrieve (ES only): 40 ms
Query 73: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.06 seconds


100%|██████████| 75/75 [00:04<00:00, 17.58it/s]

Time taken to retrieve (ES only): 37 ms
Query 74: Precision@10: 0.2000, Recall@10: 0.1818, F1@10: 0.1905, AP: 0.7500, nDCG@10: 0.3149, mrr10: 1.0000
  Total time for query: 0.08 seconds
Time taken to retrieve (ES only): 32 ms
Query 75: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
  Total time for query: 0.05 seconds

BM25L(standard es) - Overall evaluation metrics:
Date/time:2025-03-25 12:00:09

Threshold = 2
Mean Precision@10: 0.1093
Mean Recall@10: 0.0208
Mean F1@10: 0.0312
Mean Average Precision (MAP): 0.2424
Mean nDCG@10: 0.1164
Mean MRR@10: 0.2740
Average total time per query: 0.06 seconds

Metrics have been exported to BM25L_only.xlsx





### BM25W Description boosted ^2

In Elasticsearch, putting more weight on the description field

In [None]:
import json
import numpy as np
from elasticsearch import Elasticsearch, helpers
from tqdm import tqdm
import time
import pandas as pd
from datetime import datetime

# Evaluation Functions
def precision_at_k(retrieved, relevant, k):
    retrieved_k = retrieved[:k]
    total_relevant = len([doc for doc, rel in relevant.items() if rel > 0])
    if total_relevant == 0:
        return 0.0
    correct = sum(1 for doc in retrieved_k if doc in relevant and relevant[doc] > 0)
    return correct / k

def recall_at_k(retrieved, relevant, k):
    retrieved_k = retrieved[:k]
    total_relevant = len([doc for doc, rel in relevant.items() if rel > 0])
    if total_relevant == 0:
        return 0.0
    correct = sum(1 for doc in retrieved_k if doc in relevant and relevant[doc] > 0)
    return correct / total_relevant

def f1_at_k(retrieved, relevant, k):
    p = precision_at_k(retrieved, relevant, k)
    r = recall_at_k(retrieved, relevant, k)
    if p + r == 0:
        return 0.0
    return 2 * p * r / (p + r)

def mrr_at_k(retrieved, relevant, k):
    for i, doc_id in enumerate(retrieved_docs[:k], start=1):
        if doc_id in relevant_docs and relevant_docs[doc_id] > 0:
            return 1.0 / i
    return 0.0

def average_precision(retrieved, relevant):
    if not relevant_docs:
        return 0.0
    
    relevant_indices = [
        i for i, doc in enumerate(retrieved_docs)
        if doc in relevant_docs and relevant_docs[doc] > 0
    ]
    if not relevant_indices:
        return 0.0
    
    ap_sum = 0.0
    for idx, rank_pos in enumerate(relevant_indices, start=1):
        prec = precision_at_k(retrieved_docs, relevant_docs, rank_pos+1)
        ap_sum += prec
    return ap_sum / len(relevant_indices)

def ndcg_at_k(retrieved, relevant, k):
    def dcg(scores):
        return sum((2**rel - 1) / np.log2(i + 1) for i, rel in enumerate(scores, start=1))
    retrieved_scores = [relevant.get(doc, 0) for doc in retrieved[:k]]
    ideal_scores = sorted(relevant.values(), reverse=True)[:k]
    if not ideal_scores or dcg(ideal_scores) == 0:
        return 0.0
    return dcg(retrieved_scores) / dcg(ideal_scores)

# Define the text-based query search
def search_text_based(query_text, K=10):
    search_body = {
        "query": {
            "multi_match": {
                "query": query_text,
                "fields": ["title", "summary", "detailed_descrption^2"]
            }
        }
    }
    response = es.search(index="ir_dev_index", body=search_body, size=K)
    # Return document IDs stored in _source["doc_id"]
    return [hit["_source"]["doc_id"] for hit in response['hits']['hits']]

# --- Load queries from a JSONL file ---
def load_queries(file_path):
    queries = {}
    with open(file_path, "r") as f:
        for line in f:
            doc = json.loads(line)
            queries[doc["query_id"]] = doc["text"].strip()
    return queries

# Load qrels from a JSONL file 
def load_qrels(file_path):
    qrels = {}
    with open(file_path, "r") as f:
        for line in f:
            doc = json.loads(line)
            qid = doc["query_id"]
            relevance = int(doc.get("relevance", 0))
            # Only keep documents meeting the threshold; else, assign 0. (for evaluation only)
            relevance = relevance if relevance >= RELEVANCE_THRESHOLD else 0
            if qid not in qrels:
                qrels[qid] = {}
            qrels[qid][doc["doc_id"]] = relevance
    return qrels

RELEVANCE_THRESHOLD = 2  # Only documents with relevance >= 2 are considered relevant during evaluation
K = 10  # Top-K documents for evaluation

# Filter qrels based on relevance threshold for evaluation
def get_relevant_docs(qrel_dict, threshold=RELEVANCE_THRESHOLD):
    return {doc_id: rel for doc_id, rel in qrel_dict.items() if rel >= threshold}

# For ndcg calculation
def get_all_docs(qrel_dict):
    """Return the entire dictionary of {doc_id: relevance}, including relevance=0."""
    return {doc_id: rel for doc_id, rel in qrel_dict.items()}


if __name__ == "__main__":
    queries_file = "trec_ct_2021_query.jsonl"
    qrels_file = "trec_ct_2021_qrels.jsonl"

    # Load queries and qrels.
    queries = load_queries(queries_file)
    qrels = load_qrels(qrels_file)

    all_metrics = []

    print("BM25LW - Processing queries and computing evaluation metrics...")
    for qid, qtext in tqdm(queries.items(), total=len(queries)):
        if not qtext:
            continue
        
         #Track total time
        start_time = time.time()
        
        # Retrieve top-K documents using text-based search.
        retrieved_docs = search_text_based(qtext, K=K)
        
        # Mark the end time for entire query processing
        total_time = time.time() - start_time

        # Get relevant docs for this query using the relevance threshold.
        relevant_docs = get_relevant_docs(qrels.get(qid, {}), threshold=RELEVANCE_THRESHOLD)

        #All docs retrieved (i.e no threshold) to compute nDCG based on graded relevance 0,1,2
        ndcg_relevant_docs = get_all_docs(qrels.get(qid, {}))
        
        # Compute evaluation measures.
        prec = precision_at_k(retrieved_docs, relevant_docs, K)
        rec = recall_at_k(retrieved_docs, relevant_docs, K)
        f1 = f1_at_k(retrieved_docs, relevant_docs, K)
        ap = average_precision(retrieved_docs, relevant_docs)
        ndcg = ndcg_at_k(retrieved_docs, ndcg_relevant_docs, K)
        mrr_val = mrr_at_k(retrieved_docs, relevant_docs, K)

        metrics = {
            "query_id": qid,
            "precision": prec,
            "recall": rec,
            "f1": f1,
            "average_precision": ap,
            "ndcg": ndcg,
            "mrr": mrr_val,
            "time_seconds": total_time
        }
        all_metrics.append(metrics)
        print(f"Query {qid}: Precision@{K}: {prec:.4f}, Recall@{K}: {rec:.4f}, F1@{K}: {f1:.4f}, AP: {ap:.4f}, nDCG@{K}: {ndcg:.4f}, mrr{K}: {mrr_val:.4f}")

    # Compute overall evaluation metrics.
    if all_metrics:
        avg_precision = np.mean([m["precision"] for m in all_metrics])
        avg_recall = np.mean([m["recall"] for m in all_metrics])
        avg_f1 = np.mean([m["f1"] for m in all_metrics])
        mean_ap = np.mean([m["average_precision"] for m in all_metrics])
        mean_ndcg = np.mean([m["ndcg"] for m in all_metrics])
        mean_mrr = np.mean([m["mrr"] for m in all_metrics])
        avg_time = np.mean([m["time_seconds"] for m in all_metrics])

        print("\nBM25LW - description boosted ^2 - Overall evaluation metrics:")
        current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        print(f"Date/time:{current_time}")
        print(f"\nThreshold = {RELEVANCE_THRESHOLD}")
        print(f"Mean Precision@{K}: {avg_precision:.4f}")
        print(f"Mean Recall@{K}: {avg_recall:.4f}")
        print(f"Mean F1@{K}: {avg_f1:.4f}")
        print(f"Mean Average Precision (MAP): {mean_ap:.4f}")
        print(f"Mean nDCG@{K}: {mean_ndcg:.4f}")
        print(f"Mean MRR@{K}: {mean_mrr:.4f}")
        print(f"Average total time per query: {avg_time:.2f} seconds")
    else:
        print("No queries processed.")

    # Create a DataFrame to export the computed results for all queries
data = {
    "Test": "BM25LW Boosted",
    "Date/Time": [current_time],
    "Relevance Threshold": [RELEVANCE_THRESHOLD],
    f"Mean Precision@{K}": [avg_precision],
    f"Mean Recall@{K}": [avg_recall],
    f"Mean F1@{K}": [avg_f1],
    "Mean Average Precision (MAP)": [mean_ap],
    f"Mean nDCG@{K}": [mean_ndcg],
    f"Mean MRR@{K}": [mean_mrr],
    "Avg Time per Query (s)": [avg_time],
}
df = pd.DataFrame(data)

# Export to an Excel file
xlsx_filename = "BM25LW Boosted.xlsx"
df.to_excel(xlsx_filename, index=False)

print(f"\nMetrics have been exported to {xlsx_filename}")


BM25LW - Processing queries and computing evaluation metrics...


  0%|          | 0/75 [00:00<?, ?it/s]

Query 1: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


  response = es.search(index="ir_dev_index", body=search_body, size=K)
  3%|▎         | 2/75 [00:00<00:04, 17.87it/s]

Query 2: Precision@10: 0.2000, Recall@10: 0.0145, F1@10: 0.0270, AP: 0.1964, nDCG@10: 0.1428, mrr10: 0.1429
Query 3: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


  5%|▌         | 4/75 [00:00<00:03, 18.15it/s]

Query 4: Precision@10: 0.1000, Recall@10: 0.0227, F1@10: 0.0370, AP: 0.1667, nDCG@10: 0.0784, mrr10: 0.1667
Query 5: Precision@10: 0.3000, Recall@10: 0.0252, F1@10: 0.0465, AP: 0.2944, nDCG@10: 0.2431, mrr10: 0.3333


  8%|▊         | 6/75 [00:00<00:03, 17.36it/s]

Query 6: Precision@10: 0.2000, Recall@10: 0.0171, F1@10: 0.0315, AP: 0.6250, nDCG@10: 0.2895, mrr10: 1.0000
Query 7: Precision@10: 0.4000, Recall@10: 0.0320, F1@10: 0.0593, AP: 0.3458, nDCG@10: 0.3098, mrr10: 0.2500


 11%|█         | 8/75 [00:00<00:04, 15.87it/s]

Query 8: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 13%|█▎        | 10/75 [00:00<00:04, 16.13it/s]

Query 9: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 10: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 11: Precision@10: 0.1000, Recall@10: 0.0189, F1@10: 0.0317, AP: 0.1429, nDCG@10: 0.0734, mrr10: 0.1429


 17%|█▋        | 13/75 [00:00<00:03, 18.14it/s]

Query 12: Precision@10: 0.5000, Recall@10: 0.0481, F1@10: 0.0877, AP: 0.7889, nDCG@10: 0.5989, mrr10: 1.0000
Query 13: Precision@10: 0.1000, Recall@10: 0.0185, F1@10: 0.0312, AP: 0.3333, nDCG@10: 0.1100, mrr10: 0.3333
Query 14: Precision@10: 0.1000, Recall@10: 0.0147, F1@10: 0.0256, AP: 0.1250, nDCG@10: 0.0694, mrr10: 0.1250


 23%|██▎       | 17/75 [00:01<00:03, 16.87it/s]

Query 15: Precision@10: 0.2000, Recall@10: 0.0140, F1@10: 0.0261, AP: 0.7500, nDCG@10: 0.3149, mrr10: 1.0000
Query 16: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 17: Precision@10: 0.6000, Recall@10: 0.0472, F1@10: 0.0876, AP: 0.6458, nDCG@10: 0.5578, mrr10: 0.5000
Query 18: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 28%|██▊       | 21/75 [00:01<00:03, 17.05it/s]

Query 19: Precision@10: 0.2000, Recall@10: 0.0174, F1@10: 0.0320, AP: 0.4167, nDCG@10: 0.2173, mrr10: 0.5000
Query 20: Precision@10: 0.1000, Recall@10: 0.0238, F1@10: 0.0385, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
Query 21: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 22: Precision@10: 0.2000, Recall@10: 0.0282, F1@10: 0.0494, AP: 0.1714, nDCG@10: 0.1370, mrr10: 0.1429


 33%|███▎      | 25/75 [00:01<00:02, 17.81it/s]

Query 23: Precision@10: 0.1000, Recall@10: 0.0119, F1@10: 0.0213, AP: 0.1667, nDCG@10: 0.0784, mrr10: 0.1667
Query 24: Precision@10: 0.3000, Recall@10: 0.0270, F1@10: 0.0496, AP: 0.5278, nDCG@10: 0.3558, mrr10: 1.0000
Query 25: Precision@10: 0.1000, Recall@10: 0.0135, F1@10: 0.0238, AP: 0.1000, nDCG@10: 0.0636, mrr10: 0.1000
Query 26: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 39%|███▊      | 29/75 [00:01<00:02, 17.15it/s]

Query 27: Precision@10: 0.1000, Recall@10: 0.0175, F1@10: 0.0299, AP: 0.5000, nDCG@10: 0.1389, mrr10: 0.5000
Query 28: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 29: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 30: Precision@10: 0.1000, Recall@10: 0.0208, F1@10: 0.0345, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000


 43%|████▎     | 32/75 [00:01<00:02, 18.03it/s]

Query 31: Precision@10: 0.1000, Recall@10: 0.0222, F1@10: 0.0364, AP: 0.1667, nDCG@10: 0.0784, mrr10: 0.1667
Query 32: Precision@10: 0.1000, Recall@10: 0.0385, F1@10: 0.0556, AP: 0.3333, nDCG@10: 0.1100, mrr10: 0.3333
Query 33: Precision@10: 0.2000, Recall@10: 0.0099, F1@10: 0.0188, AP: 0.3750, nDCG@10: 0.2083, mrr10: 0.5000


 45%|████▌     | 34/75 [00:01<00:02, 17.60it/s]

Query 34: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 48%|████▊     | 36/75 [00:02<00:02, 17.77it/s]

Query 35: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 36: Precision@10: 0.1000, Recall@10: 0.0054, F1@10: 0.0103, AP: 0.1111, nDCG@10: 0.0663, mrr10: 0.1111
Query 37: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 51%|█████     | 38/75 [00:02<00:02, 17.61it/s]

Query 38: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 53%|█████▎    | 40/75 [00:02<00:02, 17.32it/s]

Query 39: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 40: Precision@10: 0.1000, Recall@10: 0.0909, F1@10: 0.0952, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
Query 41: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 60%|██████    | 45/75 [00:02<00:01, 16.76it/s]

Query 42: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 43: Precision@10: 0.1000, Recall@10: 0.0159, F1@10: 0.0274, AP: 0.1250, nDCG@10: 0.0694, mrr10: 0.1250
Query 44: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 45: Precision@10: 0.2000, Recall@10: 0.0138, F1@10: 0.0258, AP: 0.2429, nDCG@10: 0.1585, mrr10: 0.2000


 65%|██████▌   | 49/75 [00:02<00:01, 16.34it/s]

Query 46: Precision@10: 0.1000, Recall@10: 0.0091, F1@10: 0.0167, AP: 0.1111, nDCG@10: 0.0663, mrr10: 0.1111
Query 47: Precision@10: 0.3000, Recall@10: 0.0234, F1@10: 0.0435, AP: 0.3694, nDCG@10: 0.2646, mrr10: 0.3333
Query 48: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 49: Precision@10: 0.2000, Recall@10: 0.0571, F1@10: 0.0889, AP: 0.2250, nDCG@10: 0.1584, mrr10: 0.2500


 71%|███████   | 53/75 [00:03<00:01, 17.63it/s]

Query 50: Precision@10: 0.2000, Recall@10: 0.1429, F1@10: 0.1667, AP: 0.5833, nDCG@10: 0.2489, mrr10: 0.5000
Query 51: Precision@10: 0.1000, Recall@10: 0.0357, F1@10: 0.0526, AP: 0.1111, nDCG@10: 0.0663, mrr10: 0.1111
Query 52: Precision@10: 0.2000, Recall@10: 0.0488, F1@10: 0.0784, AP: 0.4167, nDCG@10: 0.2173, mrr10: 0.5000
Query 53: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


 77%|███████▋  | 58/75 [00:03<00:00, 18.86it/s]

Query 54: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 55: Precision@10: 0.2000, Recall@10: 0.0870, F1@10: 0.1212, AP: 0.7500, nDCG@10: 0.3149, mrr10: 1.0000
Query 56: Precision@10: 0.4000, Recall@10: 0.0784, F1@10: 0.1311, AP: 0.4683, nDCG@10: 0.3733, mrr10: 0.5000
Query 57: Precision@10: 0.1000, Recall@10: 0.0093, F1@10: 0.0169, AP: 0.1429, nDCG@10: 0.0734, mrr10: 0.1429
Query 58: Precision@10: 0.2000, Recall@10: 0.0364, F1@10: 0.0615, AP: 0.5833, nDCG@10: 0.2489, mrr10: 0.5000


 83%|████████▎ | 62/75 [00:03<00:00, 18.74it/s]

Query 59: Precision@10: 0.2000, Recall@10: 0.0106, F1@10: 0.0201, AP: 0.3929, nDCG@10: 0.2122, mrr10: 0.5000
Query 60: Precision@10: 0.1000, Recall@10: 0.0067, F1@10: 0.0126, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
Query 61: Precision@10: 0.1000, Recall@10: 0.0083, F1@10: 0.0154, AP: 1.0000, nDCG@10: 0.2201, mrr10: 1.0000
Query 62: Precision@10: 0.1000, Recall@10: 0.0154, F1@10: 0.0267, AP: 0.1429, nDCG@10: 0.0734, mrr10: 0.1429


 88%|████████▊ | 66/75 [00:03<00:00, 18.37it/s]

Query 63: Precision@10: 0.1000, Recall@10: 0.0286, F1@10: 0.0444, AP: 0.2500, nDCG@10: 0.0948, mrr10: 0.2500
Query 64: Precision@10: 0.2000, Recall@10: 0.0308, F1@10: 0.0533, AP: 0.6429, nDCG@10: 0.2935, mrr10: 1.0000
Query 65: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 66: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 67: Precision@10: 0.2000, Recall@10: 0.0333, F1@10: 0.0571, AP: 0.3750, nDCG@10: 0.2083, mrr10: 0.5000


 92%|█████████▏| 69/75 [00:03<00:00, 20.29it/s]

Query 68: Precision@10: 0.1000, Recall@10: 0.0233, F1@10: 0.0377, AP: 0.2500, nDCG@10: 0.0948, mrr10: 0.2500
Query 69: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 70: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 71: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000


100%|██████████| 75/75 [00:04<00:00, 17.61it/s]

Query 72: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 73: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000
Query 74: Precision@10: 0.2000, Recall@10: 0.1818, F1@10: 0.1905, AP: 0.7500, nDCG@10: 0.3149, mrr10: 1.0000
Query 75: Precision@10: 0.0000, Recall@10: 0.0000, F1@10: 0.0000, AP: 0.0000, nDCG@10: 0.0000, mrr10: 0.0000

BM25LW - description boosted ^2 - Overall evaluation metrics:
Date/time:2025-03-25 12:00:13

Threshold = 2
Mean Precision@10: 0.1093
Mean Recall@10: 0.0200
Mean F1@10: 0.0303
Mean Average Precision (MAP): 0.2562
Mean nDCG@10: 0.1186
Mean MRR@10: 0.2857
Average total time per query: 0.06 seconds

Metrics have been exported to BM25LW Boosted.xlsx



