# Evaluate AI Search Quality with Exa & Quotient

<a target="_blank" href="https://colab.research.google.com/github/quotient-ai/quotient-cookbooks/blob/main/cookbooks/search/exa/exa-quotient-detections.ipynb">
 <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This cookbook demonstrates how to monitor AI search results from [Exa](https://exa.ai/) for hallucinations or retrieval issues using [Quotient AI](https://www.quotientai.co/).

We'll cover:
- Performing AI-powered search using Exa's Python SDK
- Logging search results in Quotient
- Automatically detecting hallucinations and irrelevant results
- Understanding common failure cases and how to fix them


In [1]:
# Install dependencies
! pip install -qU quotientai exa-py tqdm


Note: you may need to restart the kernel to use updated packages.


## Step 0: Grab your API keys

We'll use API keys from:
 - [Exa](https://exa.ai/) — get your API key from the [Exa dashboard](https://dashboard.exa.ai/)
 - [Quotient AI](https://www.quotientai.co) — get your API key from the [Quotient AI app](https://app.quotientai.co)
 
Both Exa and Quotient offer generous free tiers to get started; you can check out their pricing [here](https://exa.ai/pricing) and [here](https://www.quotientai.co/pricing).


In [1]:
import os
# Set API keys:
os.environ['EXA_API_KEY'] = "exa_api_key_here"
os.environ['QUOTIENT_API_KEY'] ="quotient_api_key_here"

## Step 1: Connect to Exa Search and Quotient monitoring

We'll use Exa's Python SDK to retrieve content from the web and get AI-generated answers for each query. The `answer()` method performs an Exa search and uses an LLM to generate either:

1. A direct answer for specific queries (i.e., "What is the capital of France?" would return "Paris")
2. A detailed summary with citations for open-ended queries (i.e., "What is the state of AI in healthcare?" would return a summary with citations to relevant sources)


In [2]:
from exa_py import Exa

# Initialize Exa client
exa = Exa(os.getenv("EXA_API_KEY"))

Quotient is an intelligent observability platform designed for retrieval-augmented and search-augmented AI systems.

Quotient performs automated detections on two key fronts each time you send it a log:

- **Hallucination:** Identifies statements in the model output that are unsupported by the retrieved documents or that contradict them. This flagging is done at the sentence level and returns a boolean indicator if any part of the answer contains a hallucination.

- **Document Relevance:** Evaluates each retrieved document to determine whether it meaningfully contributed to grounding the answer. Quotient returns relevance labels for all documents, helping gauge retrieval and search quality.
  
These capabilities are enabled automatically when `hallucination_detection=True` is set during logger initialization.

Below, we'll set up the Quotient logger, send each AI-search result for automatic evaluation, and retrieve structured logs and detections:

In [5]:
from quotientai import QuotientAI

# Initialize Quotient SDK

quotient = QuotientAI()

quotient.logger.init(
    # Name your application or project
    app_name="exa-answer-eval",
    # Set the environment (e.g., "dev", "prod", "staging")
    environment="test",
    # Set the sample rate for logging (0-1.0)
    sample_rate=1.0,
    # Enable hallucination detection
    hallucination_detection=True,
    # Set the sample rate for  detections (0-1.0)
    hallucination_detection_sample_rate=1.0,
)

<quotientai.client.QuotientLogger at 0x106add540>

## Step 2: Get a set of example queries

We'll evaluate on a set of realistic user queries covering a diverse set of topics. From each sample, we will use the `question` attribute to run a fresh search and compare the generated answer against retrieved documents.


In [3]:
import json

# Load queries from file
with open("search_queries.jsonl") as f:
    queries = [json.loads(line)["question"] for line in f]

Alternatively, you can connect Quotient to a live development or production environment and run detections automatically as data comes in — no manual setup required beyond the few-lines-of-code initial integration.

## Step 3: Query Exa for each example query and log your results in Quotient

Let's run fresh searches for a subset of examples using Exa's Python SDK.


In [7]:
exa_results = []
log_ids = []

num_results = 10

for query in queries[:num_results]:
    # Get answer from Exa
    response = exa.answer(
        query,
        text=True
    )
    
    answer = response.answer
    documents = [citation.__dict__ for citation in response.citations]
    
    print(f"\n🧠 {query}")
    print(f"➡️ {answer}")
    print(f"📚 Found {len(documents)} citations")
    
    # Format documents for Quotient (using citation text and metadata)
    documents = [
        {
            "page_content": document.get('text', ''),
            "metadata": {
                "source": document.get('url', ''),
                "title": document.get('title', ''),
                "author": document.get('author', ''),
                "published_date": document.get('publishedDate', '')
            }
        }
        for document in documents
    ]
    
    # Log to Quotient
    log_id = quotient.log(
        user_query=query,
        model_output=answer,
        documents=documents,
    )
    
    print(f"📝 Logged to Quotient with log_id: {log_id}")
    
    exa_results.append(response)
    log_ids.append(log_id)



🧠 What is the top emerging technology in 2025 according to the article '25 New Technology Trends for 2025'?
➡️ Agentic AI is the top emerging technology in 2025. ([Gartner](https://www.gartner.com/en/articles/top-technology-trends-2025), [Simplilearn.com](https://www.simplilearn.com/top-technology-trends-and-jobs-article))

📚 Found 8 citations
📝 Logged to Quotient with log_id: 7c7b57f4-78c4-4753-9cd9-a12bd7162b6a

🧠 What is the name of the 105-qubit quantum processor unveiled by Alphabet?
➡️ Willow is the name of the 105-qubit quantum processor unveiled by Alphabet. ([PostQuantum.com](https://postquantum.com/industry-news/google-willow-quantum-chip), [Google Blog](https://blog.google/technology/research/google-willow-quantum-chip), [quantumcomputingreport.com](https://quantumcomputingreport.com/google-unveils-the-105-qubit-willow-chip-and-demonstrates-new-levels-of-rcs-benchmark-performance-and-quantum-error-correction-below-the-threshold), [livescience.com](https://www.livescience.co

### How It Works

When `.log()` is called:

1. **Data ingestion:** The query, model output, and all retrieved document contents are logged to Quotient.

2. **Async detection pipeline:** Quotient runs:
- **Hallucination detection**, labeling the output as hallucinated or not.
- **Document relevance scoring**, marking which retrieved documents helped ground the output 

3. **Result retrieval:** You can poll or fetch detections linked to your `log_id`.

4. **Monitor and troubleshoot in the Quotient app:** Access the [Quotient dashboard](app.quotientai.co) to:
- Monitor you AI system over time
- Review flagged hallucinated sentences.
- See which documents were irrelevant.
- Compare across tags or environments for deeper insights.

For full implementation details, visit the Quotient [docs](https://docs.quotientai.co/).

# Step 4: Review detections in Quotient

You can now view your logs and detections in the [Quotient dashboard](app.quotientai.co), where you can also filter them by tags and environments to identify common failure patterns.

![Quotient AI Dashboard](Quotient_Dashboard.png "Quotient AI Dashboard")


## What You’ve Built

A lightweight search and monitoring pipeline that:
- Runs live AI search queries
- Automatically checks if answers are grounded in retrieved evidence
- Flags hallucinations and irrelevant retrievals

You can scale this to monitor production traffic, benchmark retrieval and search performance, or compare different models side by side.

## How to interpret the results
- Well-grounded systems typically show **< 5% hallucination rate**. If yours is higher, it’s often a signal that either your data ingestion, retrieval pipeline, or prompting needs improvement.

- High-performing systems typically show **> 75% document relevance**. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.


# (Optional) Grab the detection results from Quotient

Quotient's detections are now available to fetch via the Quotient SDK using the `log_id` you received earlier:

In [9]:
from tqdm import tqdm

hallucination_detections = []
doc_relevancy_detections = []

for id in tqdm(log_ids):
    try:
        detection = quotient.poll_for_detection(log_id=id)
        # Add the hallucination detection to the hallucination_detections list
        hallucination_detections.append(detection.has_hallucination)
        # Add the document relevancy detection to the doc_relevancy_detections list
        docs = detection.log_documents
        doc_relevancy_detections.append(sum(1 for doc in docs if doc.get('is_relevant') is True) / len(docs) if docs else None)
    except:
        continue

print(f"Number of results: {len(log_ids)}")
print(f"Percentage of hallucinations: {sum(hallucination_detections)/len(hallucination_detections)*100:.2f}%")
print(f"Average percentage of relevant documents: {sum(doc_relevancy_detections)/len(doc_relevancy_detections)*100:.2f}%")

100%|██████████| 10/10 [00:21<00:00,  2.17s/it]

Number of results: 10
Percentage of hallucinations: 30.00%
Average percentage of relevant documents: 43.75%



