# Building a Live Web Documentation QA Agent with Qdrant, Tavily, LangChain and Quotient

This cookbook demonstrates building a system that automatically processes company documentation to power an intelligent question-answering agent. The system combines Tavily's intelligent web crawling, Qdrant's vector search, and Quotient's quality monitoring to create a robust documentation assistant.

### Architecture Overview
The system operates in two main phases. First, Tavily crawls your documentation website and extracts relevant content, which is then chunked, embedded, and stored in Qdrant for efficient similarity search. When users ask questions, the system retrieves the most relevant documentation snippets and generates contextual answers while Quotient monitors for hallucinations and retrieval quality.

We’ll use API keys from:
 - [OpenAI](www.openai.com) — get your API key from the [OpenAI API platform](https://platform.openai.com/login)
 - [Tavily](https://www.tavily.com/) — get your API key from the [Tavily app](https://app.tavily.com)
 - [Quotient AI](https://www.quotientai.co) — get your API key from the [Quotient AI app](https://app.quotientai.co)
 
Both Tavily and Quotient offer generous free tiers to get started; you can check out their pricing  [here](https://www.tavily.com/#pricing) and [here](https://www.quotientai.co/pricing).

In [None]:
import os
# Set API keys:
os.environ['QUOTIENT_API_KEY'] = "quotient_api_key_here"
os.environ['OPENAI_API_KEY'] = "openai_api_key_here"
os.environ['TAVILY_API_KEY'] = "tavily_api_key_here"

In [2]:
%pip install -qU qdrant-client langchain langchain-openai langchain-qdrant sentence-transformers quotientai tavily-python

Note: you may need to restart the kernel to use updated packages.


## Step 1: Setting up the core components

The foundation uses OpenAI's `text-embedding-3-small` for embeddings and Open AI's `4o-mini` for generation, Qdrant as the vector database, and Quotient for comprehensive monitoring. The system initializes with hallucination detection and document relevancy scoring enabled to ensure reliable responses.

In [3]:
"""Initialize core components for the RAG system including models, vector store, and monitoring."""
from datetime import datetime
from typing import List

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_core.prompts import ChatPromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from quotientai import QuotientAI, DetectionType
from tavily import TavilyClient
from langchain_core.documents import Document

# Initialize models
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(temperature=0, model="gpt-4o-mini")

# Initialize Tavily client
tavily_client = TavilyClient(api_key=os.environ['TAVILY_API_KEY'])

# Initialize Quotient monitoring
quotient = QuotientAI()
quotient.logger.init(
    app_name="qdrant-rag",
    environment="test",
    sample_rate=1.0,
    detections=[DetectionType.HALLUCINATION, DetectionType.DOCUMENT_RELEVANCY],
    detection_sample_rate=1.0,
)


<quotientai.client.QuotientLogger at 0x119e22a10>

## 2. Define the document processing pipeline

The document processing transforms raw web content into searchable chunks. Tavily crawls documentation sites with specific instructions to focus on relevant pages, then the `RecursiveCharacterTextSplitter` creates overlapping 700-token chunks with 50-token overlap to preserve context at boundaries.

The chunked documents are embedded and stored in Qdrant using the `QdrantVectorStore.from_documents()` method, which creates an in-memory collection for this demo but can easily be configured for persistent storage in production.

In [4]:
def crawl_and_process_docs(url, instructions):
    """Crawl documentation and convert to Document objects"""
    response = tavily_client.crawl(url, instructions=instructions, timeout=120)
    
    documents = []
    for result in response["results"]:
        # Create Document object with raw content
        doc = Document(
            page_content=result["raw_content"],
            metadata={
                "source": result["url"],
                "base_url": response["base_url"]
            }
        )
        documents.append(doc)
    
    return documents

In [5]:
# Define the document processing pipeline
def preprocess_dataset(docs_list):
    """Split documents into smaller chunks for better retrieval"""
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=700,
        chunk_overlap=50,
        disallowed_special=()
    )
    doc_splits = text_splitter.split_documents(docs_list)
    return doc_splits

# Create a retriever from documents using Qdrant vector store
def create_retriever(collection_name, doc_splits):
    """Create a retriever from documents using Qdrant vector store"""
    vectorstore = QdrantVectorStore.from_documents(
        doc_splits,
        OpenAIEmbeddings(model="text-embedding-3-small"),
        location=":memory:",  # Use in-memory storage
        collection_name=collection_name,
    )
    return vectorstore.as_retriever()


## 3. Implement the RAG chain

The RAG chain orchestrates the entire question-answering process with integrated monitoring. When a question comes in, the system retrieves relevant chunks using vector similarity, formats them as context, and generates an answer constrained to only use the provided documentation.

In [6]:
# Define the answer prompt template
answer_prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer the question using ONLY the provided documentation context. 
    If you don't know the answer, say "I don't have enough information to answer that."
    Include relevant quotes from the documentation to support your answer.
    
    IMPORTANT: Give ONLY your answer, do not include any system or human messages in your response."""),
    ("human", "Question: {question}\n\nContext: {context}"),
])

# Function to format documents for context
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create a simple RAG chain
def create_rag_chain(retriever):
    """Create a basic RAG chain with Quotient logging"""
    
    def get_context(question: str):
        """Get relevant documents for a question"""
        docs = retriever.invoke(question)  # Using invoke instead of deprecated get_relevant_documents
        return format_docs(docs), [doc.page_content for doc in docs]
    
    def generate_answer(question: str):
        """Generate an answer using RAG"""
        # Get context from vector store
        context, raw_docs = get_context(question)
        
        # Generate answer using LLM directly
        prompt = f"""Answer the question using ONLY the provided documentation context.
If you don't know the answer, say "I don't have enough information to answer that."
Include relevant quotes from the documentation to support your answer.

Question: {question}

Context: {context}

Answer: """
        
        response = llm.invoke(prompt).content
        
        # Log to Quotient
        quotient.log(
            user_query=question,
            model_output=response,
            documents=raw_docs,
            tags={"timestamp": datetime.now().isoformat()}
        )
        
        return response
    
    return generate_answer


The crawling process targets specific documentation areas using targeted instructions. For the Qdrant example, the system focuses on Python client usage, vector search capabilities, filtering options, and getting started guides to ensure comprehensive coverage of the most commonly needed information.

In [10]:
# Crawl Qdrant documentation
print("Crawling Qdrant documentation...")
docs = crawl_and_process_docs(
    "https://qdrant.tech/documentation/",
    instructions="Find all pages about Python client usage, vector search, and getting started guides"
)

# Split documents into chunks and create retriever
print(f"Processing {len(docs)} documents...")
doc_chunks = preprocess_dataset(docs)
retriever = create_retriever("demo_docs", doc_chunks)

print(f"Added {len(doc_chunks)} chunks to vector store")


Crawling Qdrant documentation...
Processing 40 documents...
Added 1553 chunks to vector store


## 5. Putting it all together: documentation QA agent with Quotient monitoring

With all components in place, we can now run the agent with real questions about Qdrant's documentation. The system handles various query types and demonstrates both successful retrieval and graceful degradation when information isn't available.


Every interaction is automatically logged to Quotient, which runs asynchronous detection pipelines to identify potential hallucinations and score document relevance. This monitoring helps maintain system reliability and provides insights into retrieval quality and answer accuracy over time.

In [13]:
# Test the RAG system with Qdrant documentation
answer_generator = create_rag_chain(retriever)

questions = [
    "How do I get started with Qdrant's Python client?",
    "What are the different filtering options in Qdrant?",
    "How does vector search work in Qdrant and what distance metrics are supported?"
]

print("Testing RAG system with Qdrant documentation:\n")
for question in questions:
    print(f"Q: {question}")
    print("-" * 40)
    response = answer_generator(question)
    print(f"{response}\n")

Testing RAG system with Qdrant documentation:

Q: How do I get started with Qdrant's Python client?
----------------------------------------
To get started with Qdrant's Python client, you can follow the "Quickstart" guide available in the documentation. The guide covers the installation process and provides examples to help you begin using the client effectively.

1. **Installation**: You can install the Qdrant client either with or without the `fastembed` option. The specific steps for installation are detailed in the Quickstart section.

For more information, you can visit the Quickstart page directly: [Quickstart](https://python-client.qdrant.tech/quickstart).

The documentation states, "We’ll cover the following here: Installation with fastembed, Installation without fastembed." This indicates that the Quickstart guide will provide the necessary steps to set up the client.

Q: What are the different filtering options in Qdrant?
----------------------------------------
The differen

# Step 6: Review detections in Quotient

You can now view your logs and detections in the [Quotient dashboard](app.quotientai.co), where you can also filter them by tags and environments to identify common failure patterns.

![Quotient AI Dashboard](Quotient_Dashboard.png "Quotient AI Dashboard")

## What You've Built
A documentation QA system that:

- Crawls and processes live web documentation using Tavily
- Stores documents in Qdrant for efficient retrieval
- Generates contextual answers using retrieved documentation chunks
- Monitors answer quality with Quotient

## How to interpret the results

- Well-grounded systems typically show **< 5% hallucination rate**. If yours is higher, it's often a signal that either your data ingestion, retrieval pipeline, or prompting needs improvement.
- High-performing systems typically show **> 75% document relevance**. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.