# Week 5 - Lab 2: Index Tuning and Recall Testing

**Duration:** 90-120 minutes  
**Level:** Advanced  
**Prerequisites:** Week 5 Lessons 2-3, Lab 1

---

## ðŸŽ¯ Learning Objectives

In this lab, you will:
- Understand HNSW index parameters (M, ef_construction, ef_search)
- Implement recall@k measurement with ground truth
- Benchmark latency (p50, p95, p99) for different configurations
- Tune index parameters for quality/speed trade-offs
- Measure memory footprint and compression impact
- Compare HNSW vs IVF-based indexes

---

## ðŸ“‹ Lab Outline

1. Setup and Data Generation
2. Exercise 1: Build FAISS HNSW Index
3. Exercise 2: Measure Recall@k with Ground Truth
4. Exercise 3: Latency Benchmarking
5. Exercise 4: HNSW Parameter Sweep (ef_search)
6. Exercise 5: IVF Index Comparison
7. Exercise 6: Product Quantization (PQ) for Compression
8. Bonus Challenge: Multi-dimensional Analysis

---

## 1. Setup and Data Generation

In [None]:
# Install required packages
!pip install -q openai faiss-cpu numpy python-dotenv

In [None]:
import os
import time
import json
import numpy as np
import faiss
from typing import List, Dict, Set, Tuple
from openai import OpenAI
from dotenv import load_dotenv
from collections import defaultdict

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("âœ… Setup complete!")
print(f"FAISS version: {faiss.__version__}")
print(f"NumPy version: {np.__version__}")

### Generate Synthetic Corpus

We'll create a larger corpus (1000 documents) with known structure for ground truth testing.

In [None]:
# Generate synthetic documents with categories
CATEGORIES = {
    "architecture": [
        "microservices design patterns and best practices",
        "event-driven architecture with message queues",
        "API gateway design and implementation strategies",
        "service mesh and network policies",
        "distributed systems and consistency models",
    ],
    "database": [
        "SQL query optimization and indexing strategies",
        "NoSQL databases comparison and use cases",
        "database sharding and replication techniques",
        "ACID properties and transaction management",
        "vector databases for semantic search applications",
    ],
    "ml": [
        "machine learning model training and evaluation",
        "deep learning architectures and neural networks",
        "natural language processing with transformers",
        "computer vision and convolutional networks",
        "reinforcement learning algorithms and applications",
    ],
    "devops": [
        "kubernetes cluster management and orchestration",
        "CI/CD pipeline design and automation",
        "infrastructure as code with Terraform",
        "monitoring and observability with Prometheus",
        "container security and best practices",
    ],
}

def generate_corpus(n_docs: int = 1000) -> List[Dict]:
    """Generate synthetic corpus with categories."""
    corpus = []
    categories = list(CATEGORIES.keys())
    
    for i in range(n_docs):
        cat = categories[i % len(categories)]
        templates = CATEGORIES[cat]
        template = templates[i % len(templates)]
        
        # Add variation
        text = f"{template} - document {i} variation {i % 10}"
        
        corpus.append({
            "id": f"doc_{i}",
            "text": text,
            "category": cat,
        })
    
    return corpus

CORPUS = generate_corpus(1000)
print(f"Generated {len(CORPUS)} documents across {len(CATEGORIES)} categories")
print(f"Sample: {CORPUS[0]['text'][:80]}...")

In [None]:
def get_embeddings_batch(texts: List[str], model: str = "text-embedding-3-small") -> np.ndarray:
    """Get embeddings for texts in batch."""
    cleaned = [t.replace("\n", " ") for t in texts]
    response = client.embeddings.create(input=cleaned, model=model)
    embeddings = [item.embedding for item in response.data]
    return np.array(embeddings, dtype=np.float32)


# Generate embeddings for corpus (batched)
print("Generating embeddings for corpus (this may take 30-60 seconds)...")
batch_size = 100
all_embeddings = []

for i in range(0, len(CORPUS), batch_size):
    batch = CORPUS[i:i+batch_size]
    texts = [doc["text"] for doc in batch]
    embs = get_embeddings_batch(texts)
    all_embeddings.append(embs)
    print(f"  Processed {min(i+batch_size, len(CORPUS))}/{len(CORPUS)} documents")
    time.sleep(0.5)  # Rate limiting

corpus_embeddings = np.vstack(all_embeddings)
print(f"âœ… Generated embeddings: {corpus_embeddings.shape}")
print(f"Memory: {corpus_embeddings.nbytes / (1024**2):.2f} MB")

### Generate Test Queries with Ground Truth

We create queries that should retrieve documents from specific categories.

In [None]:
# Generate test queries with known relevant documents
TEST_QUERIES = [
    {
        "text": "microservices architecture patterns",
        "category": "architecture",
    },
    {
        "text": "SQL database optimization techniques",
        "category": "database",
    },
    {
        "text": "machine learning neural networks",
        "category": "ml",
    },
    {
        "text": "kubernetes container orchestration",
        "category": "devops",
    },
    {
        "text": "vector database semantic search",
        "category": "database",
    },
]

# Build ground truth: documents in same category are relevant
def build_ground_truth(queries: List[Dict], corpus: List[Dict]) -> Dict[str, Set[str]]:
    """Build ground truth mappings from queries to relevant doc IDs."""
    ground_truth = {}
    
    for i, query in enumerate(queries):
        query_id = f"q_{i}"
        target_cat = query["category"]
        
        # All docs in same category are relevant
        relevant = {doc["id"] for doc in corpus if doc["category"] == target_cat}
        ground_truth[query_id] = relevant
    
    return ground_truth

GROUND_TRUTH = build_ground_truth(TEST_QUERIES, CORPUS)

print(f"Created {len(TEST_QUERIES)} test queries")
print(f"Sample ground truth for q_0: {len(GROUND_TRUTH['q_0'])} relevant docs")

---

## Exercise 1: Build FAISS HNSW Index

**Task:** Build an HNSW index with FAISS and understand key parameters.

**Key Parameters:**
- `M`: Number of connections per node (typical: 16-48)
- `ef_construction`: Search width during build (typical: 100-400)
- `ef_search`: Search width during query (typical: 50-400)

In [None]:
def build_hnsw_index(
    embeddings: np.ndarray,
    M: int = 32,
    ef_construction: int = 200,
) -> faiss.Index:
    """
    Build FAISS HNSW index.
    
    Args:
        embeddings: Embedding matrix (n_docs, dim)
        M: Number of connections per node
        ef_construction: Search width during construction
    """
    # TODO: Implement HNSW index construction
    # 1. Get dimensionality
    # 2. Create IndexHNSWFlat
    # 3. Set ef_construction (hnsw.efConstruction)
    # 4. Add embeddings
    
    dim = embeddings.shape[1]
    
    # Create HNSW index
    index = faiss.IndexHNSWFlat(dim, M)
    index.hnsw.efConstruction = ef_construction
    
    # Add vectors
    print(f"Building index with M={M}, ef_construction={ef_construction}...")
    start = time.time()
    index.add(embeddings)
    elapsed = time.time() - start
    
    print(f"âœ… Index built in {elapsed:.2f}s")
    print(f"   Total vectors: {index.ntotal}")
    
    return index


# Build baseline index
index_baseline = build_hnsw_index(corpus_embeddings, M=32, ef_construction=200)

---

## Exercise 2: Measure Recall@k with Ground Truth

**Task:** Implement recall@k measurement against ground truth.

In [None]:
def recall_at_k(retrieved: List[str], relevant: Set[str], k: int) -> float:
    """Calculate recall@k."""
    # TODO: Implement recall@k
    # recall@k = |retrieved[:k] âˆ© relevant| / min(k, |relevant|)
    
    topk = set(retrieved[:k])
    hits = len(topk & relevant)
    denominator = min(k, len(relevant))
    
    return hits / max(1, denominator)


def search_index(
    index: faiss.Index,
    query_emb: np.ndarray,
    k: int = 10,
    ef_search: int = None,
) -> Tuple[List[int], List[float], float]:
    """
    Search HNSW index.
    
    Returns:
        (indices, distances, latency_ms)
    """
    # TODO: Set ef_search if provided (for HNSW indexes)
    if ef_search is not None and hasattr(index, 'hnsw'):
        index.hnsw.efSearch = ef_search
    
    # Search with timing
    start = time.perf_counter()
    distances, indices = index.search(query_emb.reshape(1, -1), k)
    latency_ms = (time.perf_counter() - start) * 1000
    
    return indices[0].tolist(), distances[0].tolist(), latency_ms


def evaluate_index(
    index: faiss.Index,
    queries: List[Dict],
    ground_truth: Dict[str, Set[str]],
    k: int = 10,
    ef_search: int = None,
) -> Dict:
    """Evaluate index on test queries."""
    recalls = []
    latencies = []
    
    # Generate query embeddings
    query_texts = [q["text"] for q in queries]
    query_embs = get_embeddings_batch(query_texts)
    
    for i, query_emb in enumerate(query_embs):
        query_id = f"q_{i}"
        relevant = ground_truth[query_id]
        
        # Search
        indices, _, latency = search_index(index, query_emb, k=k, ef_search=ef_search)
        
        # Convert indices to doc IDs
        retrieved_ids = [CORPUS[idx]["id"] for idx in indices]
        
        # Calculate recall
        recall = recall_at_k(retrieved_ids, relevant, k)
        recalls.append(recall)
        latencies.append(latency)
    
    return {
        "recall@k": np.mean(recalls),
        "latency_p50_ms": np.percentile(latencies, 50),
        "latency_p95_ms": np.percentile(latencies, 95),
        "latency_p99_ms": np.percentile(latencies, 99),
        "recalls": recalls,
        "latencies": latencies,
    }


# Evaluate baseline
results_baseline = evaluate_index(index_baseline, TEST_QUERIES, GROUND_TRUTH, k=10, ef_search=200)

print("Baseline HNSW (M=32, ef_construction=200, ef_search=200):")
print(f"  Recall@10: {results_baseline['recall@k']:.3f}")
print(f"  Latency p50: {results_baseline['latency_p50_ms']:.2f}ms")
print(f"  Latency p95: {results_baseline['latency_p95_ms']:.2f}ms")
print(f"  Latency p99: {results_baseline['latency_p99_ms']:.2f}ms")

---

## Exercise 3: Latency Benchmarking

**Task:** Run multiple queries and measure latency distribution.

In [None]:
def benchmark_latency(
    index: faiss.Index,
    query_emb: np.ndarray,
    k: int = 10,
    ef_search: int = None,
    n_runs: int = 100,
) -> Dict:
    """Benchmark search latency with multiple runs."""
    # TODO: Implement latency benchmarking
    # 1. Run search n_runs times
    # 2. Collect latencies
    # 3. Calculate percentiles
    
    latencies = []
    
    for _ in range(n_runs):
        _, _, latency = search_index(index, query_emb, k=k, ef_search=ef_search)
        latencies.append(latency)
    
    return {
        "mean_ms": np.mean(latencies),
        "std_ms": np.std(latencies),
        "p50_ms": np.percentile(latencies, 50),
        "p95_ms": np.percentile(latencies, 95),
        "p99_ms": np.percentile(latencies, 99),
        "min_ms": np.min(latencies),
        "max_ms": np.max(latencies),
    }


# Benchmark with first query
query_emb = get_embeddings_batch([TEST_QUERIES[0]["text"]])[0]
bench_results = benchmark_latency(index_baseline, query_emb, k=10, ef_search=200, n_runs=100)

print("Latency Benchmark (100 runs):")
print(f"  Mean: {bench_results['mean_ms']:.2f}ms Â± {bench_results['std_ms']:.2f}ms")
print(f"  p50:  {bench_results['p50_ms']:.2f}ms")
print(f"  p95:  {bench_results['p95_ms']:.2f}ms")
print(f"  p99:  {bench_results['p99_ms']:.2f}ms")
print(f"  Min:  {bench_results['min_ms']:.2f}ms")
print(f"  Max:  {bench_results['max_ms']:.2f}ms")

---

## Exercise 4: HNSW Parameter Sweep (ef_search)

**Task:** Sweep ef_search parameter and plot recall vs latency trade-off.

In [None]:
# TODO: Sweep ef_search values and measure recall + latency
ef_search_values = [10, 20, 50, 100, 200, 400, 800]
sweep_results = []

print("Sweeping ef_search parameter...\n")

for ef_search in ef_search_values:
    print(f"Testing ef_search={ef_search}...")
    
    results = evaluate_index(
        index_baseline,
        TEST_QUERIES,
        GROUND_TRUTH,
        k=10,
        ef_search=ef_search
    )
    
    sweep_results.append({
        "ef_search": ef_search,
        "recall@10": results["recall@k"],
        "latency_p50_ms": results["latency_p50_ms"],
        "latency_p95_ms": results["latency_p95_ms"],
    })
    
    print(f"  Recall@10: {results['recall@k']:.3f}")
    print(f"  Latency p95: {results['latency_p95_ms']:.2f}ms\n")

# Display results table
print("\n=== ef_search Parameter Sweep Results ===")
print("ef_search | Recall@10 | p50 (ms) | p95 (ms)")
print("----------|-----------|----------|----------")
for r in sweep_results:
    print(f"{r['ef_search']:9d} | {r['recall@10']:9.3f} | {r['latency_p50_ms']:8.2f} | {r['latency_p95_ms']:8.2f}")

### Analysis: Recall vs Latency Trade-off

**Observations:**
- Lower `ef_search` â†’ faster queries, lower recall
- Higher `ef_search` â†’ slower queries, higher recall
- Diminishing returns: recall plateaus at high `ef_search`

**Production Recommendation:**
- Find the "knee" of the curve where recall improvement plateaus
- Balance with latency SLO (e.g., p95 < 100ms)
- Typical sweet spot: ef_search = 100-200

---

## Exercise 5: IVF Index Comparison

**Task:** Build IVF (Inverted File) index and compare with HNSW.

**IVF Parameters:**
- `nlist`: Number of clusters (typical: sqrt(n_docs))
- `nprobe`: Number of clusters to search (typical: 1-20)

In [None]:
def build_ivf_index(
    embeddings: np.ndarray,
    nlist: int = 100,
) -> faiss.Index:
    """
    Build FAISS IVF index.
    
    Args:
        embeddings: Embedding matrix
        nlist: Number of Voronoi cells
    """
    # TODO: Implement IVF index construction
    # 1. Create quantizer (IndexFlatL2)
    # 2. Create IndexIVFFlat
    # 3. Train on embeddings
    # 4. Add embeddings
    
    dim = embeddings.shape[1]
    
    # Create quantizer and IVF index
    quantizer = faiss.IndexFlatL2(dim)
    index = faiss.IndexIVFFlat(quantizer, dim, nlist)
    
    # Train and add
    print(f"Training IVF index with nlist={nlist}...")
    start = time.time()
    index.train(embeddings)
    index.add(embeddings)
    elapsed = time.time() - start
    
    print(f"âœ… Index built in {elapsed:.2f}s")
    print(f"   Total vectors: {index.ntotal}")
    
    return index


def search_ivf_index(
    index: faiss.IndexIVF,
    query_emb: np.ndarray,
    k: int = 10,
    nprobe: int = 10,
) -> Tuple[List[int], List[float], float]:
    """Search IVF index with nprobe parameter."""
    index.nprobe = nprobe
    
    start = time.perf_counter()
    distances, indices = index.search(query_emb.reshape(1, -1), k)
    latency_ms = (time.perf_counter() - start) * 1000
    
    return indices[0].tolist(), distances[0].tolist(), latency_ms


# Build IVF index
index_ivf = build_ivf_index(corpus_embeddings, nlist=100)

In [None]:
# Evaluate IVF with different nprobe values
nprobe_values = [1, 5, 10, 20, 50]
ivf_results = []

print("Evaluating IVF index...\n")

for nprobe in nprobe_values:
    print(f"Testing nprobe={nprobe}...")
    
    recalls = []
    latencies = []
    
    query_texts = [q["text"] for q in TEST_QUERIES]
    query_embs = get_embeddings_batch(query_texts)
    
    for i, query_emb in enumerate(query_embs):
        query_id = f"q_{i}"
        relevant = GROUND_TRUTH[query_id]
        
        indices, _, latency = search_ivf_index(index_ivf, query_emb, k=10, nprobe=nprobe)
        retrieved_ids = [CORPUS[idx]["id"] for idx in indices]
        
        recall = recall_at_k(retrieved_ids, relevant, 10)
        recalls.append(recall)
        latencies.append(latency)
    
    ivf_results.append({
        "nprobe": nprobe,
        "recall@10": np.mean(recalls),
        "latency_p50_ms": np.percentile(latencies, 50),
        "latency_p95_ms": np.percentile(latencies, 95),
    })
    
    print(f"  Recall@10: {np.mean(recalls):.3f}")
    print(f"  Latency p95: {np.percentile(latencies, 95):.2f}ms\n")

# Display comparison
print("\n=== IVF Parameter Sweep Results ===")
print("nprobe | Recall@10 | p50 (ms) | p95 (ms)")
print("-------|-----------|----------|----------")
for r in ivf_results:
    print(f"{r['nprobe']:6d} | {r['recall@10']:9.3f} | {r['latency_p50_ms']:8.2f} | {r['latency_p95_ms']:8.2f}")

### HNSW vs IVF Comparison

**HNSW Advantages:**
- Higher recall at similar latency
- No training required (simpler pipeline)
- Better for dynamic data (easier to add vectors)

**IVF Advantages:**
- Lower memory footprint
- Better for very large datasets (>10M vectors)
- Can be combined with PQ for compression

---

## Exercise 6: Product Quantization (PQ) for Compression

**Task:** Apply Product Quantization to compress vectors and measure impact.

**PQ Parameters:**
- `m`: Number of subquantizers (dim must be divisible by m)
- `nbits`: Bits per subquantizer (typical: 8)

In [None]:
def build_ivf_pq_index(
    embeddings: np.ndarray,
    nlist: int = 100,
    m: int = 96,  # text-embedding-3-small is 1536-dim, 1536/96 = 16
    nbits: int = 8,
) -> faiss.Index:
    """
    Build FAISS IVF-PQ index (compressed).
    
    Args:
        embeddings: Embedding matrix
        nlist: Number of Voronoi cells
        m: Number of subquantizers
        nbits: Bits per subquantizer
    """
    # TODO: Implement IVF-PQ index
    # 1. Create quantizer
    # 2. Create IndexIVFPQ
    # 3. Train and add
    
    dim = embeddings.shape[1]
    
    if dim % m != 0:
        raise ValueError(f"Dimension {dim} must be divisible by m={m}")
    
    quantizer = faiss.IndexFlatL2(dim)
    index = faiss.IndexIVFPQ(quantizer, dim, nlist, m, nbits)
    
    print(f"Training IVF-PQ index (nlist={nlist}, m={m}, nbits={nbits})...")
    start = time.time()
    index.train(embeddings)
    index.add(embeddings)
    elapsed = time.time() - start
    
    # Calculate compression ratio
    original_bytes = embeddings.nbytes
    compressed_bytes_per_vector = m  # Each subquantizer uses 1 byte (2^8 = 256 codes)
    compressed_bytes = len(embeddings) * compressed_bytes_per_vector
    ratio = original_bytes / compressed_bytes
    
    print(f"âœ… Index built in {elapsed:.2f}s")
    print(f"   Original size: {original_bytes / (1024**2):.2f} MB")
    print(f"   Compressed size: {compressed_bytes / (1024**2):.2f} MB")
    print(f"   Compression ratio: {ratio:.1f}x")
    
    return index


# Build IVF-PQ index
index_ivf_pq = build_ivf_pq_index(corpus_embeddings, nlist=100, m=96, nbits=8)

In [None]:
# Evaluate IVF-PQ
print("Evaluating IVF-PQ index (nprobe=10)...")

recalls = []
latencies = []

query_texts = [q["text"] for q in TEST_QUERIES]
query_embs = get_embeddings_batch(query_texts)

for i, query_emb in enumerate(query_embs):
    query_id = f"q_{i}"
    relevant = GROUND_TRUTH[query_id]
    
    indices, _, latency = search_ivf_index(index_ivf_pq, query_emb, k=10, nprobe=10)
    retrieved_ids = [CORPUS[idx]["id"] for idx in indices]
    
    recall = recall_at_k(retrieved_ids, relevant, 10)
    recalls.append(recall)
    latencies.append(latency)

print(f"\nIVF-PQ Results (nprobe=10):")
print(f"  Recall@10: {np.mean(recalls):.3f}")
print(f"  Latency p50: {np.percentile(latencies, 50):.2f}ms")
print(f"  Latency p95: {np.percentile(latencies, 95):.2f}ms")
print(f"\nNote: PQ typically reduces recall by 1-5% but provides 10-20x compression")

---

## Bonus Challenge: Multi-dimensional Analysis

**Task:** Create a comparison table across all index types and configurations.

In [None]:
# TODO: Compile comprehensive comparison
comparison = [
    {
        "Index Type": "HNSW (M=32)",
        "Config": "ef_search=100",
        "Recall@10": 0.0,  # Fill from sweep_results
        "p95 Latency (ms)": 0.0,
        "Memory (MB)": corpus_embeddings.nbytes / (1024**2),
        "Build Time (s)": 0.0,
    },
    # Add more configurations...
]

# Find specific configurations from sweep results
hnsw_100 = next((r for r in sweep_results if r["ef_search"] == 100), None)
hnsw_200 = next((r for r in sweep_results if r["ef_search"] == 200), None)
ivf_10 = next((r for r in ivf_results if r["nprobe"] == 10), None)

comparison = [
    {
        "Index Type": "HNSW",
        "Config": "ef_search=100",
        "Recall@10": hnsw_100["recall@10"] if hnsw_100 else 0.0,
        "p95 (ms)": hnsw_100["latency_p95_ms"] if hnsw_100 else 0.0,
        "Memory (MB)": corpus_embeddings.nbytes / (1024**2),
    },
    {
        "Index Type": "HNSW",
        "Config": "ef_search=200",
        "Recall@10": hnsw_200["recall@10"] if hnsw_200 else 0.0,
        "p95 (ms)": hnsw_200["latency_p95_ms"] if hnsw_200 else 0.0,
        "Memory (MB)": corpus_embeddings.nbytes / (1024**2),
    },
    {
        "Index Type": "IVF",
        "Config": "nprobe=10",
        "Recall@10": ivf_10["recall@10"] if ivf_10 else 0.0,
        "p95 (ms)": ivf_10["latency_p95_ms"] if ivf_10 else 0.0,
        "Memory (MB)": corpus_embeddings.nbytes / (1024**2),
    },
    {
        "Index Type": "IVF-PQ",
        "Config": "nprobe=10, m=96",
        "Recall@10": np.mean(recalls),
        "p95 (ms)": np.percentile(latencies, 95),
        "Memory (MB)": (len(CORPUS) * 96) / (1024**2),  # Compressed
    },
]

print("\n=== Index Type Comparison ===")
print(f"{'Index Type':<12} | {'Config':<20} | {'Recall@10':<10} | {'p95 (ms)':<10} | {'Memory (MB)':<12}")
print("-" * 80)
for row in comparison:
    print(f"{row['Index Type']:<12} | {row['Config']:<20} | {row['Recall@10']:<10.3f} | {row['p95 (ms)']:<10.2f} | {row['Memory (MB)']:<12.2f}")

---

## ðŸŽ‰ Lab Complete!

### What You Learned

- âœ… Built and configured FAISS HNSW indexes
- âœ… Measured recall@k with ground truth
- âœ… Benchmarked latency (p50, p95, p99)
- âœ… Tuned ef_search for quality/speed trade-offs
- âœ… Compared HNSW vs IVF indexes
- âœ… Applied Product Quantization for compression
- âœ… Analyzed multi-dimensional trade-offs

### Key Takeaways

1. **HNSW** is preferred for <10M vectors with high recall requirements
2. **ef_search** has the largest impact on recall/latency trade-off
3. **IVF** works better at massive scale (>10M vectors)
4. **PQ compression** provides 10-20x memory savings with 1-5% recall drop
5. Always measure with ground truth and production query patterns

### Production Recommendations

- Start with HNSW (M=32, ef_construction=200)
- Tune ef_search to meet latency SLO (target p95 < 100ms)
- Monitor recall@k continuously with canary queries
- Consider IVF-PQ for cost optimization at scale
- Always test with representative query distribution

### Next Steps

1. Test with your own corpus and queries
2. Integrate with production vector database (Pinecone, Weaviate, Qdrant)
3. Set up continuous evaluation pipeline
4. Move on to Week 5 Lesson 4: Production RAG Systems

### Resources

- Week 5 Resources: [../resources/README.md](../resources/README.md)
- Index Tuning Cheatsheet: [../resources/index-tuning-cheatsheet.md](../resources/index-tuning-cheatsheet.md)
- Recall vs Latency Guide: [../resources/recall-vs-latency-evaluation.md](../resources/recall-vs-latency-evaluation.md)