# Context

In `make_synthetic_questions.ipynb`, we generated synthetic questions to bootstrap evaluation of the retrieval system in our hardware store's Q&A system.

This notebook shows the first step in calculating precision and recall with different retrieval parameters. We will run more advanced experiments in future notebooks after we have these baseline scores.

## Data

Here is a brief review of the data.

In [3]:
import json
import lancedb
import os
import pandas as pd
from typing import List, Dict
from concurrent.futures import ThreadPoolExecutor

pd.set_option("display.max_colwidth", 160)

db = lancedb.connect("./lancedb")
reviews_table = db.open_table("reviews")
reviews_table.to_pandas().head()

Unnamed: 0,id,product_title,product_description,review,vector
0,0,High-Speed Drill Bit Set,"This high-speed drill bit set includes 15 professional-grade bits that are perfect for drilling through wood, metal, and plastic. Made from premium steel, t...","I recently bought the High-Speed Drill Bit Set and am thoroughly impressed. These bits cut through wood, metal, and plastic like a hot knife through butter....","[0.012096934, -0.016582856, -0.024458809, -0.017401762, 0.021676935, -0.022688525, -0.0054975115, 0.08998337, 0.034249555, -0.03386419, 0.07567659, -0.00561..."
1,1,High-Speed Drill Bit Set,"This high-speed drill bit set includes 15 professional-grade bits that are perfect for drilling through wood, metal, and plastic. Made from premium steel, t...",This drill bit set is a game-changer! The bits are sharp and perfect for my woodworking projects. I was able to drill precise holes in oak and pine without ...,"[0.004564493, -0.008680302, 0.00031745803, -0.026921017, -0.00273093, -0.025931612, 0.01688891, 0.02678296, 0.010406008, -0.023377566, 0.08955265, -0.020812..."
2,2,High-Speed Drill Bit Set,"This high-speed drill bit set includes 15 professional-grade bits that are perfect for drilling through wood, metal, and plastic. Made from premium steel, t...","I've used many drill bit sets over the years, but this High-Speed Drill Bit Set stands out. The 15 professional-grade bits have handled everything from dril...","[0.0156312, -0.0011927274, -0.004294711, -0.04323878, 0.0030402269, -0.0394292, -0.026048033, 0.052905604, 0.03709583, -0.022083685, 0.08314418, -0.01344069..."
3,3,High-Speed Drill Bit Set,"This high-speed drill bit set includes 15 professional-grade bits that are perfect for drilling through wood, metal, and plastic. Made from premium steel, t...","As a professional contractor, I need reliable tools, and this drill bit set does not disappoint. The set includes every size I need, and they all perform ex...","[0.003315595, 0.006962443, 0.009330287, -0.0075881425, -0.012305427, -0.026549296, -0.03496557, 0.023874737, -0.010250433, 0.0022972994, 0.060901437, -0.029..."
4,4,High-Speed Drill Bit Set,"This high-speed drill bit set includes 15 professional-grade bits that are perfect for drilling through wood, metal, and plastic. Made from premium steel, t...","My home renovation project has been a breeze thanks to this High-Speed Drill Bit Set. I’ve used it on wood, plastic, and even some metal fixtures, and it pe...","[0.014810753, 0.013838572, -0.0072553484, -0.031877924, 0.014870764, -0.041311678, -0.01651507, 0.058954958, 0.0143666705, -0.026452918, 0.054202076, -0.002..."


In [4]:
with open("synthetic_eval_dataset.json", "r") as f:
    synthetic_questions = json.load(f)
synthetic_questions[:5]

[{'question': 'How well do these drill bits perform on different materials?',
  'answer': 'These bits cut through wood, metal, and plastic like a hot knife through butter.',
  'chunk_id': '0'},
 {'question': 'What is the quality of the construction of the drill bits?',
  'answer': 'The drill bits are made from premium steel, offering incredible durability and showing no wear after multiple uses.',
  'chunk_id': '0'},
 {'question': 'How good are the bits for drilling through different materials?',
  'answer': 'The bits work great for drilling precise holes in both wood (like oak and pine) and plastic without getting dull.',
  'chunk_id': '1'},
 {'question': 'What’s the deal with the carrying case in this drill bit set?',
  'answer': 'The carrying case is well-organized, making it easy to find the right bit quickly for storage and transport.',
  'chunk_id': '1'},
 {'question': 'How many bits are in the High-Speed Drill Bit Set and what are they made of?',
  'answer': 'The High-Speed Dril

## Set Up Evaluation

Load the evaluation questions into a structured format.

In [5]:
from pydantic import BaseModel


class EvalQuestion(BaseModel):
    question: str
    answer: str
    chunk_id: str


eval_questions = [EvalQuestion(**question) for question in synthetic_questions]

Build a simple search function

In [6]:
def run_simple_request(q: EvalQuestion, n_return_vals=5):
    results = (
        reviews_table.search(q.question).select(["id"]).limit(n_return_vals).to_list()
    )
    return [str(q.chunk_id) == str(r["id"]) for r in results]

Now do the benchmarking. For simplicity, we just compare retrieval sizes with a simple semantic search in this cell.

In [7]:
def score(hits):
    # This implementation assumes
    n_retrieval_requests = len(hits)
    total_retrievals = sum(len(l) for l in hits)
    true_positives = sum(sum(sublist) for sublist in hits)
    precision = true_positives / total_retrievals if total_retrievals > 0 else 0
    recall = true_positives / n_retrieval_requests if n_retrieval_requests > 0 else 0
    return {"precision": precision, "recall": recall}


def score_simple_search(n_to_retrieve: List[int]) -> Dict[str, float]:
    # parallelize to speed this up 5-10X
    with ThreadPoolExecutor() as executor:
        hits = list(
            executor.map(lambda q: run_simple_request(q, n_to_retrieve), eval_questions)
        )
    return score(hits)


k_to_retrieve = [5, 10, 20]
scores = pd.DataFrame([score_simple_search(n) for n in k_to_retrieve])
scores["n_retrieved"] = k_to_retrieve
scores

Unnamed: 0,precision,recall,n_retrieved
0,0.088333,0.441667,5
1,0.065,0.65,10
2,0.0425,0.85,20


If you have Cohere set up, you can see uf a reranker improves results (we'll talk more about rerankers in the coming weeks).

In [6]:
try:
    import cohere
    from diskcache import Cache
    cohere_api_key = os.environ["COHERE_API_KEY"]

    # Use diskcache to reduce re-running in case of error (or addition of new data)
    cache = Cache("./cohere_cache")
    
    def run_reranked_request(q: EvalQuestion, n_return_vals=5, n_to_rerank=40) -> List[bool]:
        # First, get more results than we need
        initial_results = reviews_table.search(q.question) \
            .select(["id", "review"]) \
            .limit(n_to_rerank) \
            .to_list()
        
        # Prepare texts for reranking
        texts = [r["review"] for r in initial_results]
        
        cache_key = f"{q.question}_{n_return_vals}".replace("?", "")
        # Try to get the result from cache
        cached_result = cache.get(cache_key)
        if cached_result is not None:
            return cached_result
        
        # Rerank using Cohere
        co = cohere.Client(cohere_api_key)
        reranked = co.rerank(
            query=q.question,
            documents=texts,
            top_n=n_return_vals
        )
        
        # Map reranked results back to original IDs
        reranked_ids = [initial_results[r.index]["id"] for r in reranked.results]
        result = [str(q.chunk_id) == str(r) for r in reranked_ids]
        cache.set(cache_key, result)
        return result

    def score_reranked_search(n_to_retrieve: List[int], n_to_rerank: int = 40) -> Dict[str, float]:
        with ThreadPoolExecutor() as executor:
            hits = list(executor.map(
                lambda q: run_reranked_request(q, n_to_retrieve, n_to_rerank), 
                eval_questions
            ))
        return score(hits)

    k_to_retrieve = [5, 10, 20]
    reranked_scores = pd.DataFrame([score_reranked_search(n) for n in k_to_retrieve])
    reranked_scores["n_retrieved"] = k_to_retrieve
    print(reranked_scores)
except Exception as e:
    print(f"Could not run reranker.\n{e}")
    print("Ensure COHERE_API_KEY env is set... and cohere library diskcache are installed.")
    print("Connection reset by peer is likely rate limiting from Cohere")

   precision    recall  n_retrieved
0   0.125198  0.625991            5
1   0.081806  0.818062           10
2   0.046960  0.939207           20
