# Build and Monitor a Web Research Agent with Tavily, OpenAI, LangChain & Quotient

<a target="_blank" href="https://colab.research.google.com/github/quotient-ai/quotient-cookbooks/blob/main/cookbooks/agents/research/tavily-quotient-agent.ipynb">
 <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This notebook shows how to build a LangChain-based **research assistant** powered by [Tavily](https://www.tavily.com/) and OpenAI. The agent answers real-world search queries using live web content via Tavily tools, and is monitored using [Quotient AI](https://www.quotientai.co/) to detect hallucinations, irrelevant retrievals, and other failure modes.

We’ll use API keys from:
 - [OpenAI](www.openai.com) — get your API key from the [OpenAI API platform](https://platform.openai.com/login)
 - [Tavily](https://www.tavily.com/) — get your API key from the [Tavily app](https://app.tavily.com)
 - [Quotient AI](https://www.quotientai.co) — get your API key from the [Quotient AI app](https://app.quotientai.co)
 
Both Tavily and Quotient offer generous free tiers to get started; you can check out their pricing  [here](https://www.tavily.com/#pricing) and [here](https://www.quotientai.co/pricing).


In [1]:
import os
# Set API keys:
os.environ['TAVILY_API_KEY'] ="tavily_api_key_here"
os.environ['QUOTIENT_API_KEY'] ="quotient_api_key_here"
os.environ['OPENAI_API_KEY'] ="openai_api_key_here"

In [2]:
# !pip install -qU langchain langchain-openai langchain-tavily

### Step 1: Set Up a Research Agent with Tavily + OpenAI

In this step, we define a LangChain agent that uses both **TavilySearch** and **TavilyExtract** to gather and summarize live web content.

What’s happening here:

- `ChatOpenAI` initializes the OpenAI model (`gpt-4o`) as the core reasoning engine.
- `TavilySearch` allows the agent to perform real-time web searches.
- `TavilyExtract` lets the agent extract structured content from search results.
- `ChatPromptTemplate` defines how the agent should behave. We provide:
  - A `system` instruction that frames the task as a research assistant
  - A placeholder for user messages (`messages`)
  - An `agent_scratchpad` where the agent stores and reasons through its tool calls

The agent is constructed using `create_openai_tools_agent`, which automatically maps available tools to OpenAI's function-calling format.

Finally, we wrap the agent with `AgentExecutor`, enabling it to execute multi-step reasoning. By setting `return_intermediate_steps=True`, we get full visibility into how the agent calls tools — essential for logging and evaluation later on.


In [3]:
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_tavily import TavilySearch, TavilyExtract
from langchain.schema import HumanMessage
import datetime

# Initialize LLM and tools
model = "gpt-4o"

llm = ChatOpenAI(model=model, temperature=0)
tools = [TavilySearch(max_results=5, topic="general"), TavilyExtract()]

# Set up prompt with 'agent_scratchpad'
today = datetime.datetime.today().strftime("%B %d, %Y")
prompt = ChatPromptTemplate.from_messages([
    ("system", f"""You are a helpful research assistant. You’ll be given a query and should search the web, extract relevant content, and summarize insights. Today is {today}."""),
    MessagesPlaceholder(variable_name="messages"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Create agent + executor
agent = create_openai_tools_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)

## Step 2: Initialize the Quotient SDK to Monitor the Agent

In this step, we set up [Quotient AI](https://www.quotientai.co) to **monitor our LangChain agent** as it answers real-world queries.

Quotient allows us to:

- Log each query and model response
- Attach the retrieved documents for grounding checks
- Automatically detect hallucinations (i.e., unsupported claims) and irrelevant documents

The configuration below tells Quotient to log 100% of interactions and run hallucination detection on every one. This will give us full visibility into how our agent is performing, and whether it's staying grounded in the context it retrieves.

In [4]:
from quotientai import QuotientAI

# Initialize the Quotient SDK

quotient = QuotientAI()

quotient.logger.init(
    # Name your application or project
    app_name="tavily-agent",
    # Set the environment (e.g., "dev", "prod", "staging")
    environment="test",
    # Set the sample rate for logging (0-1)
    sample_rate=1.0,
    # this will automatically run hallucination detection on 100% of your model outputs in relation to the documents you provide
    detections=[DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
    detection_sample_rate=1.0,
)

## Step 3: Run Queries Through the Agent and Log Responses to Quotient

In this step, we simulate real user queries by reading from a `.jsonl` file and sending each query to the LangChain agent. For each query, we:

1. **Invoke the agent** using `agent_executor`, which triggers OpenAI + Tavily tool calls.
2. **Capture the final model response** (`response['output']`).
3. **Extract all documents** returned by the agent via:
   - `TavilyExtract`: full raw page content
   - `TavilySearch`: short snippet content
4. **Format documents** into a structured list of `{"page_content": ..., "metadata": ...}` to support downstream evaluation.
5. **Log the full interaction to Quotient**, including:
   - The original query
   - The model's answer
   - The retrieved documents for grounding checks
   - Metadata such as model version (`gpt-4o`) for traceability

Each interaction is logged with `quotient_logger.log(...)`, enabling automatic hallucination detection and structured evaluation inside the Quotient platform.


In [5]:
import json

# Load queries from file
with open("search_queries.jsonl") as f:
    queries = [json.loads(line)["question"] for line in f]

log_ids = []
num_queries = 10
# Run each query through the agent
for i, query in enumerate(queries[:num_queries]):
    response = agent_executor.invoke({"messages": [HumanMessage(content=query)]})
    model_output = response['output']
    print(f"\n🧠 {query}\n➡️ {response['output']}")

    # Extract documents from the response
    documents = []
    for step in response.get("intermediate_steps", []):
        tool_call, tool_output = step

        # Handle tavily_extract (advanced search) - full content
        if getattr(tool_call, "tool", "") == "tavily_extract":
            for result in tool_output['results']:
                doc = {
                    "page_content": result.get('raw_content', ''),
                    "metadata": {"source": result.get('url', '')}
                }
                documents.append(doc)
        
        # Handle tavily_search (basic search) - snippets only
        elif getattr(tool_call, "tool", "") == "tavily_search":
            for result in tool_output['results']:
                doc = {
                    "page_content": result.get('content', ''),
                    "metadata": {"source": result.get('url', '')}
                }
                documents.append(doc)

    print(f"📄 Found {len(documents)} documents")

    # Log to Quotient
    log_id = quotient.log(
        user_query=query,
        model_output=model_output,
        documents=documents,
        tags={
            'model': model,
        },
    )
    
    print(f"📝 Logged to Quotient with log_id: {log_id}")

    log_ids.append(log_id)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `tavily_search` with `{'query': '25 New Technology Trends for 2025', 'search_depth': 'advanced'}`


[0m[36;1m[1;3m{'query': '25 New Technology Trends for 2025', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://www.geeksforgeeks.org/top-new-technology-trends/', 'title': 'Top 25 New Technology Trends in 2025 - GeeksforGeeks', 'content': '*   [1. Artificial Intelligence (AI) and Machine Learning (ML)](https://www.geeksforgeeks.org/top-new-technology-trends/#1-artificial-intelligence-ai-and-machine-learning-ml)\n    *   [2. 5G](https://www.geeksforgeeks.org/top-new-technology-trends/#2-5g)\n    *   [3. Internet of Things (IoT)](https://www.geeksforgeeks.org/top-new-technology-trends/#3-internet-of-things-iot)\n    *   [4. Blockchain Technology](https://www.geeksforgeeks.org/top-new-technology-trends/#4-blockchain-technology) [...] All these ****top technology trends**** in 2025

### How It Works

When `.log()` is called:

1. **Data ingestion:** The query, model output, and all retrieved document contents are logged to Quotient.

2. **Async detection pipeline:** Quotient runs:
  - **Hallucination detection**, labeling the output as hallucinated or not.
  - **Document relevance scoring**, marking which retrieved documents helped ground the output 

3. **Result retrieval:** You can poll or fetch detections linked to your `log_id`.

4. **Monitor and troubleshoot in the Quotient app:** Access the [Quotient dashboard](app.quotientai.co) to:
- Monitor your AI system over time
- Review flagged hallucinated sentences.
- See which documents were irrelevant.
- Compare across tags or environments for deeper insights.

For full implementation details, visit the Quotient [docs](https://docs.quotientai.co/).


## Review detections in Quotient

You can now view your logs and detections in the [Quotient dashboard](app.quotientai.co), where you can also filter them by tags and environments to identify common failure patterns.

![Quotient AI Dashboard](Agent_Monitoring.png "Quotient AI Dashboard")

### What You’ve Built

You’ve built a fully functional, monitoring-included search agent that:

- Handles real-time web queries using Tavily + OpenAI
- Retrieves and extracts live documents to support grounded answers
- Automatically logs each response with Quotient for monitoring
- Flags hallucinations, irrelevant context, or broken reasoning

This setup can scale from notebook experiments to production pipelines, letting you benchmark different models, debug search performance, and monitor AI agents for critical issues.

> ### How to interpret the results
> - Well-grounded systems typically show **< 5% hallucination rate**. If yours is higher, it’s often a signal that either your data ingestion, retrieval pipeline, or prompting needs improvement.
> - High-performing systems typically show **> 75% document relevance**. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.

## Optional: Grab the detection results from Quotient

Quotient's detections are now available to fetch via the Quotient SDK using the `log_id` you received earlier:

In [6]:
from tqdm import tqdm

hallucination_detections = []
doc_relevancy_detections = []

for id in tqdm(log_ids):
    try:
        detection = quotient.poll_for_detection(log_id=id)
        # Add the hallucination detection to the hallucination_detections list
        hallucination_detections.append(detection.has_hallucination)
        # Add the document relevancy detection to the doc_relevancy_detections list
        docs = detection.log_documents
        doc_relevancy_detections.append(sum(1 for doc in docs if doc.get('is_relevant') is True) / len(docs) if docs else None)
    except:
        continue

print(f"Number of results: {len(log_ids)}")
print(f"Percentage of hallucinations: {sum(hallucination_detections)/len(hallucination_detections)*100:.2f}%")
print(f"Average percentage of relevant documents: {sum(doc_relevancy_detections)/len(doc_relevancy_detections)*100:.2f}%")

100%|██████████| 10/10 [00:37<00:00,  3.72s/it]

Number of results: 10
Percentage of hallucinations: 10.00%
Average percentage of relevant document: 46.83%



