# Build a RAG Pipeline with Exa Search & OpenAI

<a target="_blank" href="https://colab.research.google.com/github/quotient-ai/quotient-cookbooks/blob/main/cookbooks/search/exa/exa-oai-quotient-detections.ipynb">
 <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This cookbook demonstrates how to build a Retrieval Augmented Generation (RAG) pipeline using [Exa](https://exa.ai/) for AI web search, [OpenAI](https://openai.com) for generating answers from the retrieved content, and [Quotient AI](https://www.quotientai.co/) for monitoring search quality and detecting hallucinations in answers.

We'll cover:
- Setting up Exa for AI web search
- Using OpenAI to generate answers from search results
- Logging search results and answers in Quotient
- Automatically detecting hallucinations and irrelevant results
- Understanding common failure cases and how to fix them


In [1]:
# Install dependencies
! pip install -qU quotientai exa-py openai tqdm

Note: you may need to restart the kernel to use updated packages.


## Step 0: Grab your API keys

We'll use API keys from:
 - [OpenAI](www.openai.com) — get your API key from the [OpenAI API platform](https://platform.openai.com/login)
 - [Exa](https://exa.ai/) — get your API key from the [Exa dashboard](https://dashboard.exa.ai/)
 - [Quotient AI](https://www.quotientai.co) — get your API key from the [Quotient AI app](https://app.quotientai.co)
 
Both Exa and Quotient offer generous free tiers to get started; you can check out their pricing [here](https://exa.ai/pricing) and [here](https://www.quotientai.co/pricing).


In [1]:
import os
# Set API keys:
os.environ['EXA_API_KEY'] = "exa_api_key_here"
os.environ['QUOTIENT_API_KEY'] ="quotient_api_key_here"
os.environ['OPENAI_API_KEY'] ="quotient_api_key_here"

## Step 1: Set up Exa, OpenAI and Quotient clients

We'll use:
- Exa's `search_and_contents` API to retrieve relevant web content
- OpenAI `gpt-4o` to generate answers based on the retrieved content
- Quotient to monitor the quality of our search and generation pipeline


In [2]:
from exa_py import Exa
from openai import OpenAI

# Initialize Exa client
exa = Exa(os.getenv("EXA_API_KEY"))

# Initialize OpenAI client
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Quotient is an intelligent observability platform designed for retrieval-augmented and search-augmented AI systems.

Quotient performs automated detections on two key fronts each time you send it a log:

- **Hallucination:** Identifies statements in the model output that are unsupported by the retrieved documents or that contradict them. This flagging is done at the sentence level and returns a boolean indicator if any part of the answer contains a hallucination.

- **Document Relevance:** Evaluates each retrieved document to determine whether it meaningfully contributed to grounding the answer. Quotient returns relevance labels for all documents, helping gauge retrieval and search quality.
  
These capabilities are enabled automatically when `hallucination_detection=True` is set during logger initialization.

Below, we'll set up the Quotient logger, send each AI-search result for automatic evaluation, and retrieve structured logs and detections:

In [3]:
from quotientai import QuotientAI, DetectionType

# Initialize Quotient SDK
quotient = QuotientAI()

logger = quotient.logger.init(
    # Name your application or project
    app_name="exa-search-eval",
    # Set the environment (e.g., "dev", "prod", "staging")
    environment="test",
    # Set the sample rate for logging (0-1.0)
    sample_rate=1.0,
    # this will automatically run hallucination detection on 100% of your model outputs in relation to the documents you provide
    detections=[DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
    detection_sample_rate=1.0,
)

<quotientai.client.QuotientLogger at 0x1205f79d0>

## Step 2: Get a set of example queries

We'll evaluate on a set of realistic user queries covering a diverse set of topics. From each sample, we will use the `question` attribute to run a fresh search and compare the generated answer against retrieved documents.


In [4]:
import json

# Load queries from file
with open("search_queries.jsonl") as f:
    queries = [json.loads(line)["question"] for line in f]

Alternatively, you can connect Quotient to a live development or production environment and run detections automatically as data comes in — no manual setup required beyond the few-lines-of-code initial integration.

## Step 3: Run the RAG pipeline

Let's process our queries using:
1. Exa for searching and retrieving relevant content
2. OpenAI for generating answers from the retrieved content
3. Quotient for monitoring the quality of results


In [None]:
results = []
log_ids = []

num_queries = 10

for query in queries[:num_queries]:
    print(f"\n🧠 {query}")
    
    # Search with Exa
    search_response = exa.search_and_contents(
        query,
        text=True
    )
    
    # Extract relevant content from search results
    contexts = [result.text for result in search_response.results]
    
    # Format prompt for OpenAI
    prompt = f"""Answer the following question using ONLY the provided context. If the context doesn't contain enough information to fully answer the question, acknowledge what information is missing.

Context:
{contexts}

Question: {query}

Answer:"""

    # Generate answer with OpenAI
    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    
    answer = completion.choices[0].message.content
    print(f"➡️ {answer}")
    
    # Log to Quotient
    log_id = quotient.log(
        user_query=query,
        model_output=answer,
        documents=[str(doc) for doc in search_response.results],
    )
    print(f"📝 Logged to Quotient with log_id: {log_id}")
    
    results.append({"query": query, "answer": answer})
    log_ids.append(log_id)



🧠 What is the top emerging technology in 2025 according to the article '25 New Technology Trends for 2025'?
➡️ The top emerging technology in 2025 according to the article "25 New Technology Trends for 2025" is Generative AI.
📝 Logged to Quotient with log_id: 3c9c3df5-96c2-424d-b04d-aa038548681d

🧠 What is the name of the 105-qubit quantum processor unveiled by Alphabet?
➡️ The name of the 105-qubit quantum processor unveiled by Alphabet is "Willow."
📝 Logged to Quotient with log_id: b5ba8323-3499-44d7-9577-edad62650a9a

🧠 What is the projected market size of the global 5G market by 2026 according to the article 'Top 25 New Technology Trends in 2025' on GeeksforGeeks?
➡️ The context provided does not specify the projected market size of the global 5G market by 2026 according to the article 'Top 25 New Technology Trends in 2025' on GeeksforGeeks. Additional information from the article would be needed to answer this question.
📝 Logged to Quotient with log_id: cdffd0f9-fac4-496c-8997-67

Alternatively, you can connect Quotient to a live development or production environment and run detections automatically as data comes in — no manual setup required beyond the few-lines-of-code initial integration.

### How It Works

When `.log()` is called:

1. **Data ingestion:** The query, model output, and all retrieved document contents are logged to Quotient.

2. **Async detection pipeline:** Quotient runs:
- **Hallucination detection**, labeling the output as hallucinated or not.
- **Document relevance scoring**, marking which retrieved documents helped ground the output 

3. **Result retrieval:** You can poll or fetch detections linked to your `log_id`.

4. **Monitor and troubleshoot in the Quotient app:** Access the [Quotient dashboard](app.quotientai.co) to:
- Monitor you AI system over time
- Review flagged hallucinated sentences.
- See which documents were irrelevant.
- Compare across tags or environments for deeper insights.

For full implementation details, visit the Quotient [docs](https://docs.quotientai.co/).

# Step 4: Review detections in Quotient

You can now view your logs and detections in the [Quotient dashboard](app.quotientai.co), where you can also filter them by tags and environments to identify common failure patterns.

![Quotient AI Dashboard](Quotient_Dashboard.png "Quotient AI Dashboard")


## What You’ve Built

A lightweight search and monitoring pipeline that:
- Runs live AI search queries
- Automatically checks if answers are grounded in retrieved evidence
- Flags hallucinations and irrelevant retrievals

You can scale this to monitor production traffic, benchmark retrieval and search performance, or compare different models side by side.

## How to interpret the results
- Well-grounded systems typically show **< 5% hallucination rate**. If yours is higher, it’s often a signal that either your data ingestion, retrieval pipeline, or prompting needs improvement.

- High-performing systems typically show **> 75% document relevance**. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.


# (Optional) Grab the detection results from Quotient

Quotient's detections are now available to fetch via the Quotient SDK using the `log_id` you received earlier:

In [6]:
from tqdm import tqdm

hallucination_detections = []
doc_relevancy_detections = []

for id in tqdm(log_ids):
    try:
        detection = quotient.poll_for_detection(log_id=id)
        # Add the hallucination detection to the hallucination_detections list
        hallucination_detections.append(detection.has_hallucination)
        # Add the document relevancy detection to the doc_relevancy_detections list
        docs = detection.log_documents
        doc_relevancy_detections.append(sum(1 for doc in docs if doc.get('is_relevant') is True) / len(docs) if docs else None)
    except:
        continue

print(f"Number of results: {len(log_ids)}")
print(f"Percentage of hallucinations: {sum(hallucination_detections)/len(hallucination_detections)*100:.2f}%")
print(f"Average percentage of relevant documents: {sum(doc_relevancy_detections)/len(doc_relevancy_detections)*100:.2f}%")

100%|██████████| 10/10 [00:22<00:00,  2.26s/it]

Number of results: 10
Percentage of hallucinations: 10.00%
Average percentage of relevant documents: 69.00%



