# Ejercicio 10: Re-ranking

**Objetivo:** Implementar y evaluar un pipeline de Recuperación de Información en dos etapas, y analizar el impacto del re-ranking en la calidad del ranking.

## Parte 1. Preparación del corpus

* Cargar el corpus (documentos/pasajes).
* Cargar las consultas (queries).
* Cargar qrels (relevancia).

In [1]:
!pip install beir
from beir import util
from beir.datasets.data_loader import GenericDataLoader
import pandas as pd

Collecting beir
  Downloading beir-2.2.0-py3-none-any.whl.metadata (28 kB)
Collecting pytrec-eval-terrier (from beir)
  Downloading pytrec_eval_terrier-0.5.10-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (1.1 kB)
Downloading beir-2.2.0-py3-none-any.whl (77 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pytrec_eval_terrier-0.5.10-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (304 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m304.8/304.8 kB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pytrec-eval-terrier, beir
Successfully installed beir-2.2.0 pytrec-eval-terrier-0.5.10


  from tqdm.autonotebook import tqdm


In [2]:
DATASET_NAME = "scifact"
DATA_DIR = "../data/beir_datasets"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{DATASET_NAME}.zip"
util.download_and_unzip(url, DATA_DIR)

../data/beir_datasets/scifact.zip:   0%|          | 0.00/2.69M [00:00<?, ?iB/s]

'../data/beir_datasets/scifact'

In [3]:
dataset_path = DATA_DIR + "/" + DATASET_NAME
corpus, queries, qrels = GenericDataLoader(dataset_path).load(split="test")

  0%|          | 0/5183 [00:00<?, ?it/s]

In [4]:
df_corpus = (
    pd.DataFrame.from_dict(corpus, orient="index")
      .reset_index()
      .rename(columns={"index": "doc_id"})
)

df_corpus

Unnamed: 0,doc_id,text,title
0,4983,Alterations of the architecture of cerebral wh...,Microstructural development of human newborn c...
1,5836,Myelodysplastic syndromes (MDS) are age-depend...,Induction of myelodysplasia by myeloid-derived...
2,7912,ID elements are short interspersed elements (S...,"BC1 RNA, the transcript from a master gene for..."
3,18670,DNA methylation plays an important role in bio...,The DNA Methylome of Human Peripheral Blood Mo...
4,19238,Two human Golli (for gene expressed in the oli...,The human myelin basic protein gene is include...
...,...,...,...
5178,195689316,BACKGROUND The main associations of body-mass ...,Body-mass index and cause-specific mortality i...
5179,195689757,A key aberrant biological difference between t...,Targeting metabolic remodeling in glioblastoma...
5180,196664003,A signaling pathway transmits information from...,Signaling architectures that transmit unidirec...
5181,198133135,AIMS Trabecular bone score (TBS) is a surrogat...,"Association between pre-diabetes, type 2 diabe..."


In [5]:
df_queries = (
    pd.DataFrame.from_dict(queries, orient="index", columns=["query"])
      .reset_index()
      .rename(columns={"index": "query_id"})
)

df_queries

Unnamed: 0,query_id,query
0,1,0-dimensional biomaterials show inductive prop...
1,3,"1,000 genomes project enables mapping of genet..."
2,5,1/2000 in UK have abnormal PrP positivity.
3,13,5% of perinatal mortality is due to low birth ...
4,36,A deficiency of vitamin B12 increases blood le...
...,...,...
295,1379,Women with a higher birth weight are more like...
296,1382,aPKCz causes tumour enhancement by affecting g...
297,1385,cSMAC formation enhances weak ligand signalling.
298,1389,mTORC2 regulates intracellular cysteine levels...


In [6]:
rows = []
for qid, docs in qrels.items():
    for doc_id, rel in docs.items():
        rows.append({
            "query_id": qid,
            "doc_id": doc_id,
            "relevance": rel
        })

df_qrels = pd.DataFrame(rows)
df_qrels

Unnamed: 0,query_id,doc_id,relevance
0,1,31715818,1
1,3,14717500,1
2,5,13734012,1
3,13,1606628,1
4,36,5152028,1
...,...,...,...
334,1379,17450673,1
335,1382,17755060,1
336,1385,306006,1
337,1389,23895668,1


In [7]:
# Elegimos una query cualquiera que tenga varios documentos relevantes
qid = "133"

print("Query:")
print(df_queries.loc[df_queries["query_id"] == qid, "query"].values[0])

print("\nDocumentos relevantes para esta query:")
df_qrels[(df_qrels["query_id"] == qid) & (df_qrels["relevance"] > 0)]

Query:
Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Documentos relevantes para esta query:


Unnamed: 0,query_id,doc_id,relevance
31,133,38485364,1
32,133,6969753,1
33,133,17934082,1
34,133,16280642,1
35,133,12640810,1


## Parte 2. Retrieval inicial (baseline)

* Implementar retrieval inicial con BM25
* Obtener métricas: Recall@10 nDCG@10

In [15]:
# Parte 2: Retrieval inicial (baseline) con BM25

!pip install rank-bm25

from rank_bm25 import BM25Okapi
import numpy as np
from collections import defaultdict

# Preparar corpus para BM25
corpus_texts = df_corpus["text"].tolist()
corpus_ids = df_corpus["doc_id"].tolist()
tokenized_corpus = [text.lower().split() for text in corpus_texts]

# Crear índice BM25
bm25 = BM25Okapi(tokenized_corpus)

# Realizar retrieval para todas las queries
all_results = {}
for _, row in df_queries.iterrows():
    qid = row["query_id"]
    query_text = row["query"]

    tokenized_query = query_text.lower().split()
    scores = bm25.get_scores(tokenized_query)
    top_10_indices = np.argsort(scores)[::-1][:10]

    all_results[qid] = {corpus_ids[idx]: scores[idx] for idx in top_10_indices}

# Convertir qrels a diccionario
qrels_dict = defaultdict(dict)
for _, row in df_qrels.iterrows():
    qrels_dict[row["query_id"]][row["doc_id"]] = row["relevance"]

# Calcular Recall@10
def recall_at_k(results, qrels, k=10):
    recalls = []
    for qid, retrieved in results.items():
        if qid not in qrels:
            continue
        relevant = set([d for d, r in qrels[qid].items() if r > 0])
        if len(relevant) == 0:
            continue
        retrieved_top_k = set(list(retrieved.keys())[:k])
        recalls.append(len(retrieved_top_k & relevant) / len(relevant))
    return np.mean(recalls)

# Calcular nDCG@10
def ndcg_at_k(results, qrels, k=10):
    ndcgs = []
    for qid, retrieved in results.items():
        if qid not in qrels:
            continue
        relevances = [qrels[qid].get(doc_id, 0) for doc_id in list(retrieved.keys())[:k]]
        dcg = sum((2**rel - 1) / np.log2(i + 2) for i, rel in enumerate(relevances))
        ideal = sorted(qrels[qid].values(), reverse=True)[:k]
        idcg = sum((2**rel - 1) / np.log2(i + 2) for i, rel in enumerate(ideal))
        ndcgs.append(dcg / idcg if idcg > 0 else 0.0)
    return np.mean(ndcgs)

# Resultados
print("RESULTADOS - BM25 BASELINE")
print(f"Recall@10: {recall_at_k(all_results, qrels_dict, k=10):.4f}")
print(f"nDCG@10:   {ndcg_at_k(all_results, qrels_dict, k=10):.4f}")

RESULTADOS - BM25 BASELINE
Recall@10: 0.6688
nDCG@10:   0.5438


## Parte 3. Implementación del re-ranking _cross-encoder_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [16]:
# Parte 3: Re-ranking con Cross-Encoder

!pip install sentence-transformers

from sentence_transformers import CrossEncoder
import pandas as pd

# Cargar modelo cross-encoder
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Re-rankear resultados de BM25
reranked_results = {}

for qid, retrieved_docs in all_results.items():
    query_text = df_queries.loc[df_queries["query_id"] == qid, "query"].values[0]

    # Preparar pares (query, documento)
    doc_ids = list(retrieved_docs.keys())
    doc_texts = []
    for doc_id in doc_ids:
        doc_text = df_corpus.loc[df_corpus["doc_id"] == doc_id, "text"].values[0]
        doc_texts.append(doc_text)

    pairs = [[query_text, doc_text] for doc_text in doc_texts]

    # Calcular scores con cross-encoder
    ce_scores = model.predict(pairs)

    # Re-rankear por score
    ranked_indices = np.argsort(ce_scores)[::-1]
    reranked_results[qid] = {doc_ids[idx]: float(ce_scores[idx]) for idx in ranked_indices}

# Calcular métricas con re-ranking
print("RESULTADOS - CROSS-ENCODER RE-RANKING")
print(f"Recall@10: {recall_at_k(reranked_results, qrels_dict, k=10):.4f}")
print(f"nDCG@10:   {ndcg_at_k(reranked_results, qrels_dict, k=10):.4f}")

# Comparar cambios de posición para una query ejemplo
qid_example = "133"
print(f"\n{'='*50}")
print(f"COMPARACIÓN PARA QUERY {qid_example}")
print(f"{'='*50}")

query_text = df_queries.loc[df_queries["query_id"] == qid_example, "query"].values[0]
print(f"Query: {query_text}\n")

bm25_docs = list(all_results[qid_example].keys())
reranked_docs = list(reranked_results[qid_example].keys())

print("Posición | BM25 Doc ID | Cross-Encoder Doc ID | Cambio")
print("-" * 60)

for i in range(10):
    bm25_doc = bm25_docs[i]
    ce_doc = reranked_docs[i]

    # Verificar si el documento cambió
    if bm25_doc == ce_doc:
        cambio = "="
    else:
        # Encontrar nueva posición del doc BM25
        try:
            new_pos = reranked_docs.index(bm25_doc) + 1
            cambio = f"→ pos {new_pos}"
        except ValueError:
            cambio = "→ fuera top-10"

    # Marcar relevancia
    rel_bm25 = "✓" if qrels_dict[qid_example].get(bm25_doc, 0) > 0 else " "
    rel_ce = "✓" if qrels_dict[qid_example].get(ce_doc, 0) > 0 else " "

    print(f"{i+1:2d}       | {bm25_doc:9s} {rel_bm25} | {ce_doc:9s} {rel_ce}      | {cambio}")

# Resumen de cambios
cambios = sum(1 for i in range(10) if bm25_docs[i] != reranked_docs[i])
print(f"\nDocumentos que cambiaron de posición: {cambios}/10")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

RESULTADOS - CROSS-ENCODER RE-RANKING
Recall@10: 0.6688
nDCG@10:   0.5968

COMPARACIÓN PARA QUERY 133
Query: Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Posición | BM25 Doc ID | Cross-Encoder Doc ID | Cambio
------------------------------------------------------------
 1       | 26688294    | 12640810  ✓      | → pos 9
 2       | 37964706    | 9507605          | → pos 5
 3       | 9507605     | 17934082  ✓      | → pos 2
 4       | 5270265     | 86694016         | → pos 10
 5       | 12785130    | 37964706         | → pos 8
 6       | 45764440    | 5821617          | → pos 7
 7       | 86694016    | 45764440         | → pos 4
 8       | 12640810  ✓ | 12785130         | → pos 1
 9       | 5821617     | 26688294         | → pos 6
10       | 17934082  ✓ | 5270265          | → pos 3

Documentos que cambiaron de posición: 10/10


## Parte 4. Implementación del re-ranking _LTR_

* Re-rankear los top-k candidatos para cada query.
* Identificar qué documentos cambian de posición en el top 10

In [17]:
# Parte 4: Re-ranking con Learning to Rank (LTR)

!pip install xgboost scikit-learn

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

# Extraer features para LTR
def extract_features(query_text, doc_text, bm25_score):
    """Extrae features para el modelo LTR"""
    query_lower = query_text.lower()
    doc_lower = doc_text.lower()

    # Features básicas
    query_terms = set(query_lower.split())
    doc_terms = doc_lower.split()

    # 1. BM25 score
    f1 = bm25_score

    # 2. Número de términos de query en documento
    f2 = sum(1 for term in query_terms if term in doc_lower)

    # 3. Proporción de términos de query en documento
    f3 = f2 / len(query_terms) if len(query_terms) > 0 else 0

    # 4. Longitud del documento
    f4 = len(doc_terms)

    # 5. Posición promedio de términos de query en documento
    positions = []
    for term in query_terms:
        for i, doc_term in enumerate(doc_terms):
            if term in doc_term:
                positions.append(i)
    f5 = np.mean(positions) if positions else len(doc_terms)

    # 6. Densidad de términos de query
    f6 = f2 / len(doc_terms) if len(doc_terms) > 0 else 0

    return [f1, f2, f3, f4, f5, f6]

# Preparar datos de entrenamiento
X_train = []
y_train = []
qids_train = []

for qid, retrieved_docs in all_results.items():
    if qid not in qrels_dict:
        continue

    query_text = df_queries.loc[df_queries["query_id"] == qid, "query"].values[0]

    for doc_id, bm25_score in retrieved_docs.items():
        doc_text = df_corpus.loc[df_corpus["doc_id"] == doc_id, "text"].values[0]

        # Extraer features
        features = extract_features(query_text, doc_text, bm25_score)
        X_train.append(features)

        # Label: relevancia del documento
        y_train.append(qrels_dict[qid].get(doc_id, 0))
        qids_train.append(qid)

X_train = np.array(X_train)
y_train = np.array(y_train)

print(f"Datos de entrenamiento: {len(X_train)} pares (query, documento)")

# Entrenar modelo LTR con XGBoost
model_ltr = xgb.XGBRanker(
    objective='rank:pairwise',
    n_estimators=100,
    learning_rate=0.1,
    max_depth=6
)

# Agrupar por query para XGBoost
group_sizes = []
current_qid = None
count = 0
for qid in qids_train:
    if qid != current_qid:
        if current_qid is not None:
            group_sizes.append(count)
        current_qid = qid
        count = 1
    else:
        count += 1
group_sizes.append(count)

model_ltr.fit(X_train, y_train, group=group_sizes)

# Re-rankear con LTR
ltr_results = {}

for qid, retrieved_docs in all_results.items():
    query_text = df_queries.loc[df_queries["query_id"] == qid, "query"].values[0]

    doc_ids = list(retrieved_docs.keys())
    X_test = []

    for doc_id in doc_ids:
        doc_text = df_corpus.loc[df_corpus["doc_id"] == doc_id, "text"].values[0]
        bm25_score = retrieved_docs[doc_id]
        features = extract_features(query_text, doc_text, bm25_score)
        X_test.append(features)

    # Predecir scores
    ltr_scores = model_ltr.predict(np.array(X_test))

    # Re-rankear
    ranked_indices = np.argsort(ltr_scores)[::-1]
    ltr_results[qid] = {doc_ids[idx]: float(ltr_scores[idx]) for idx in ranked_indices}

# Calcular métricas con LTR
print("\nRESULTADOS - LTR RE-RANKING")
print(f"Recall@10: {recall_at_k(ltr_results, qrels_dict, k=10):.4f}")
print(f"nDCG@10:   {ndcg_at_k(ltr_results, qrels_dict, k=10):.4f}")

# Comparar cambios de posición
qid_example = "133"
print(f"\n{'='*50}")
print(f"COMPARACIÓN PARA QUERY {qid_example}")
print(f"{'='*50}")

query_text = df_queries.loc[df_queries["query_id"] == qid_example, "query"].values[0]
print(f"Query: {query_text}\n")

bm25_docs = list(all_results[qid_example].keys())
ltr_docs = list(ltr_results[qid_example].keys())

print("Posición | BM25 Doc ID | LTR Doc ID  | Cambio")
print("-" * 60)

for i in range(10):
    bm25_doc = bm25_docs[i]
    ltr_doc = ltr_docs[i]

    if bm25_doc == ltr_doc:
        cambio = "="
    else:
        try:
            new_pos = ltr_docs.index(bm25_doc) + 1
            cambio = f"→ pos {new_pos}"
        except ValueError:
            cambio = "→ fuera top-10"

    rel_bm25 = "✓" if qrels_dict[qid_example].get(bm25_doc, 0) > 0 else " "
    rel_ltr = "✓" if qrels_dict[qid_example].get(ltr_doc, 0) > 0 else " "

    print(f"{i+1:2d}       | {bm25_doc:9s} {rel_bm25} | {ltr_doc:9s} {rel_ltr} | {cambio}")

cambios = sum(1 for i in range(10) if bm25_docs[i] != ltr_docs[i])
print(f"\nDocumentos que cambiaron de posición: {cambios}/10")

Datos de entrenamiento: 3000 pares (query, documento)

RESULTADOS - LTR RE-RANKING
Recall@10: 0.6688
nDCG@10:   0.6729

COMPARACIÓN PARA QUERY 133
Query: Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.

Posición | BM25 Doc ID | LTR Doc ID  | Cambio
------------------------------------------------------------
 1       | 26688294    | 17934082  ✓ | → pos 5
 2       | 37964706    | 12640810  ✓ | → pos 6
 3       | 9507605     | 5270265     | → pos 4
 4       | 5270265     | 9507605     | → pos 3
 5       | 12785130    | 26688294    | → pos 10
 6       | 45764440    | 37964706    | → pos 9
 7       | 86694016    | 5821617     | → pos 8
 8       | 12640810  ✓ | 86694016    | → pos 2
 9       | 5821617     | 45764440    | → pos 7
10       | 17934082  ✓ | 12785130    | → pos 1

Documentos que cambiaron de posición: 10/10


## Parte 5. Evaluación post re-ranking

Calcular métricas:
* nDCG@10
* MAP
* Recall@10

In [18]:
# Parte 5: Evaluación post re-ranking

import numpy as np
from collections import defaultdict

# Función para calcular MAP (Mean Average Precision)
def map_at_k(results, qrels, k=10):
    """
    Calcula MAP@k (Mean Average Precision)
    """
    aps = []

    for qid, retrieved in results.items():
        if qid not in qrels:
            continue

        relevant_docs = set([d for d, r in qrels[qid].items() if r > 0])
        if len(relevant_docs) == 0:
            continue

        # Documentos recuperados en top-k
        retrieved_top_k = list(retrieved.keys())[:k]

        # Calcular Average Precision
        num_relevant_seen = 0
        sum_precisions = 0.0

        for i, doc_id in enumerate(retrieved_top_k, 1):
            if doc_id in relevant_docs:
                num_relevant_seen += 1
                precision_at_i = num_relevant_seen / i
                sum_precisions += precision_at_i

        ap = sum_precisions / len(relevant_docs) if len(relevant_docs) > 0 else 0.0
        aps.append(ap)

    return np.mean(aps) if aps else 0.0

# Calcular todas las métricas para los tres métodos
print("="*60)
print("EVALUACIÓN COMPARATIVA")
print("="*60)

methods = {
    "BM25": all_results,
    "Cross-Encoder": reranked_results,
    "LTR": ltr_results
}

results_table = []

for method_name, results in methods.items():
    recall = recall_at_k(results, qrels_dict, k=10)
    ndcg = ndcg_at_k(results, qrels_dict, k=10)
    map_score = map_at_k(results, qrels_dict, k=10)

    results_table.append({
        "Método": method_name,
        "Recall@10": recall,
        "nDCG@10": ndcg,
        "MAP@10": map_score
    })

# Mostrar resultados en tabla
print("\n{:<15} {:<12} {:<12} {:<12}".format("Método", "Recall@10", "nDCG@10", "MAP@10"))
print("-" * 60)

for row in results_table:
    print("{:<15} {:<12.4f} {:<12.4f} {:<12.4f}".format(
        row["Método"],
        row["Recall@10"],
        row["nDCG@10"],
        row["MAP@10"]
    ))

# Calcular mejoras relativas
print("\n" + "="*60)
print("MEJORAS RELATIVAS vs BM25")
print("="*60)

baseline = results_table[0]

for i in range(1, len(results_table)):
    method = results_table[i]
    print(f"\n{method['Método']}:")

    recall_improvement = ((method['Recall@10'] - baseline['Recall@10']) / baseline['Recall@10']) * 100
    ndcg_improvement = ((method['nDCG@10'] - baseline['nDCG@10']) / baseline['nDCG@10']) * 100
    map_improvement = ((method['MAP@10'] - baseline['MAP@10']) / baseline['MAP@10']) * 100

    print(f"  Recall@10: {recall_improvement:+.2f}%")
    print(f"  nDCG@10:   {ndcg_improvement:+.2f}%")
    print(f"  MAP@10:    {map_improvement:+.2f}%")

# Análisis por query: identificar queries donde hay mayor mejora
print("\n" + "="*60)
print("QUERIES CON MAYOR MEJORA EN nDCG@10")
print("="*60)

query_improvements = []

for qid in all_results.keys():
    if qid not in qrels_dict:
        continue

    # Calcular nDCG individual para cada método
    ndcg_bm25 = ndcg_at_k({qid: all_results[qid]}, qrels_dict, k=10)
    ndcg_ce = ndcg_at_k({qid: reranked_results[qid]}, qrels_dict, k=10)
    ndcg_ltr = ndcg_at_k({qid: ltr_results[qid]}, qrels_dict, k=10)

    improvement_ce = ndcg_ce - ndcg_bm25
    improvement_ltr = ndcg_ltr - ndcg_bm25

    query_text = df_queries.loc[df_queries["query_id"] == qid, "query"].values[0]

    query_improvements.append({
        "qid": qid,
        "query": query_text,
        "bm25": ndcg_bm25,
        "ce_improvement": improvement_ce,
        "ltr_improvement": improvement_ltr
    })

# Ordenar por mayor mejora con Cross-Encoder
query_improvements.sort(key=lambda x: x["ce_improvement"], reverse=True)

print("\nTop 5 queries con mayor mejora (Cross-Encoder):")
for i, item in enumerate(query_improvements[:5], 1):
    print(f"\n{i}. Query ID: {item['qid']}")
    print(f"   Query: {item['query'][:80]}...")
    print(f"   BM25 nDCG@10: {item['bm25']:.4f}")
    print(f"   Mejora CE: {item['ce_improvement']:+.4f}")
    print(f"   Mejora LTR: {item['ltr_improvement']:+.4f}")

EVALUACIÓN COMPARATIVA

Método          Recall@10    nDCG@10      MAP@10      
------------------------------------------------------------
BM25            0.6688       0.5438       0.4993      
Cross-Encoder   0.6688       0.5968       0.5672      
LTR             0.6688       0.6729       0.6688      

MEJORAS RELATIVAS vs BM25

Cross-Encoder:
  Recall@10: +0.00%
  nDCG@10:   +9.74%
  MAP@10:    +13.60%

LTR:
  Recall@10: +0.00%
  nDCG@10:   +23.74%
  MAP@10:    +33.96%

QUERIES CON MAYOR MEJORA EN nDCG@10

Top 5 queries con mayor mejora (Cross-Encoder):

1. Query ID: 692
   Query: Leuko-increased blood increases infectious complications in red blood cell trans...
   BM25 nDCG@10: 0.3155
   Mejora CE: +0.6845
   Mejora LTR: +0.6845

2. Query ID: 693
   Query: Leuko-reduced blood reduces infectious complications in red blood cell transfusi...
   BM25 nDCG@10: 0.3155
   Mejora CE: +0.6845
   Mejora LTR: +0.6845

3. Query ID: 1259
   Query: The relationship between a breast cancer patie