# ü§ñ RAG System Evaluation for AI-Bay Customer Support

This notebook walks you through building and evaluating a **Retrieval-Augmented Generation (RAG)** system for customer support.

**What you'll learn:**
- üìö How to prepare and chunk FAQ data
- üîç Creating a hybrid vector database (dense + sparse)
- üí¨ Building a conversational RAG assistant
- ‚úÖ Evaluating your RAG system with Langfuse

---

## Part 1: Data Ingestion & Preprocessing üì•

First, we'll load the FAQ data and prepare it for our vector database.

In [1]:
from pathlib import Path
import json

# Load FAQ data
notebook_dir = Path.cwd()
data_path = notebook_dir / "data" / "faq_en.json"

with open(data_path, "r") as f:
    faq_data = json.load(f)

### Text Cleaning Functions

We need to clean markdown formatting from our FAQ content for better embedding quality.

In [2]:
import re


def remove_markdown_links_or_images(text: str):
    """Remove markdown links and images from text."""
    text = re.sub(r"!\[.*?\]\(.*?\)", "", text)  # Remove images
    text = re.sub(r"\[(.*?)\]\(.*?\)", r"\1", text)  # Keep link text only
    return text


def remove_asterisks(text: str):
    """Remove asterisks used for markdown emphasis."""
    return re.sub(r"\*", "", text)


def clean_text(text: str):
    """Apply all cleaning operations."""
    text = remove_markdown_links_or_images(text)
    text = remove_asterisks(text)
    return text

---

## Part 2: Document Creation & Chunking ‚úÇÔ∏è

Convert FAQs into LangChain documents and split them into optimal chunks for retrieval.

In [3]:
from langchain_core.documents import Document

# Create documents from FAQ data
documents = [
    Document(
        page_content=faq["faq_body"],
        metadata={
            "faq_id": faq["faq_id"],
            "faq_body": faq["faq_body"],
            "faq_title": faq["faq_title"],
            "updated_at": faq["updated_at"],
        },
    )
    for faq in faq_data
]

In [4]:
from langchain_text_splitters.markdown import MarkdownTextSplitter

# Chunking configuration
CHUNK_SIZE = 300
CHUNK_OVERLAP = 70
TITLE_KEY = "faq_title"


def approx_token_length(text: str) -> int:
    """Estimate token count (rough approximation: 1 token ‚âà 4 chars)."""
    return len(text) // 4


def add_title_to_chunk(chunk: Document, title_key: str = TITLE_KEY):
    """Prepend FAQ title to chunk for better context."""
    chunk.page_content = chunk.metadata[title_key] + "\n\n" + chunk.page_content

In [5]:
# Split documents into chunks
splitter = MarkdownTextSplitter(
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP,
    length_function=approx_token_length,
)

chunks = splitter.split_documents(documents)

# Add titles to chunks
for chunk in chunks:
    add_title_to_chunk(chunk)

---

## Part 3: Vector Store Creation üóÑÔ∏è

We'll use **hybrid search** (combining dense vectors + BM25 sparse retrieval) for better accuracy.

In [6]:
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode

load_dotenv()

# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

In [7]:
# Create persistent vector store
path_to_vector_store = notebook_dir / "vector_store"

vector_store = QdrantVectorStore.from_documents(
    chunks,
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    retrieval_mode=RetrievalMode.HYBRID,
    vector_name="dense",
    sparse_vector_name="sparse",
    collection_name="faq_collection",
    path=path_to_vector_store,
)

---

## Part 4: RAG Conversation System üí¨

Build a conversational assistant that retrieves relevant FAQs and generates helpful responses.

In [8]:
from langchain_core.messages.base import BaseMessage
from langchain.messages import HumanMessage, AIMessage

# System prompt for the RAG assistant
prompt_template = """You are a helpful assistant answering questions about customer care for AI-Bay.

Use the following context documents to answer the user's question. If the answer is not in the provided documents, say "I don't have that information in the provided documents."

Context Documents:
{context}

Conversation History:
{history}

User Question: {question}

Instructions:
1. Answer based ONLY on the provided documents
2. Be specific and cite which document(s) you used
3. If information is unclear or missing, say so
4. Keep answers concise but complete
5. Use a friendly, informative tone

Answer:"""


def format_docs_alternative(docs):
    """Format retrieved documents for the prompt."""
    formatted = [
        f"Document {i}:\n{doc.metadata['faq_body']}\nSource: {doc.metadata['faq_id']}"
        for i, doc in enumerate(docs, 1)
    ]
    return "\n\n".join(formatted)

In [9]:
class RagConversation:
    """Manages RAG conversation with context retrieval and history."""

    def __init__(self, vector_store, llm, history=None):
        self.vector_store = vector_store
        self.llm = llm
        self.history = history if history else []

    def add_message(self, message: BaseMessage):
        self.history.append(message)

    def history_to_string(self):
        return "\n".join(
            [f"{message.type}: {message.content}" for message in self.history]
        )

    def get_response(self, question):
        # Retrieve relevant context
        context = self.vector_store.similarity_search(question, k=4)
        context_str = format_docs_alternative(context)

        # Create prompt and get response
        prompt = prompt_template.format(
            context=context_str, history=self.history_to_string(), question=question
        )
        response = self.llm.invoke(prompt)

        # Update history
        self.history.append(HumanMessage(content=question))
        self.history.append(AIMessage(content=response.content))

        return response

### Initialize the LLM

In [None]:
from langchain_openai import ChatOpenAI

# Configuration
MODEL_NAME = "gpt-5-nano"
REASONING_EFFORT = "minimal"  # could be   "minimal" | "low" | "medium" | "high"  see [https://platform.openai.com/docs/guides/latest-model]
TEMPERATURE = 0

llm = ChatOpenAI(
    model=MODEL_NAME, temperature=TEMPERATURE, reasoning_effort=REASONING_EFFORT
)

print(f"‚úÖ LLM initialized: {MODEL_NAME}")

### Test the RAG System üß™

In [None]:
# Create conversation instance
rag_conversation = RagConversation(vector_store, llm)

# Test with a sample question
response = rag_conversation.get_response("How do I contact a seller?")
print("ü§ñ Assistant:", response.content)

---

## Part 5: Interactive Chat Interface üé®

Try out your RAG system with a beautiful Gradio interface!

In [None]:
import gradio as gr

# Create fresh conversation for the UI
rag_conversation = RagConversation(vector_store, llm)


def rag_assistant_response(message, history):
    """Handle chat messages from Gradio interface."""
    return rag_conversation.get_response(message).content


# Launch interactive chat
demo = gr.ChatInterface(
    fn=rag_assistant_response,
    title="ü§ñ AI-Bay Customer Support Assistant",
    description="Ask me anything about AI-Bay! I'll search our FAQ database to help you.",
    examples=[
        "How do I post an ad?",
        "How can I contact a seller?",
        "Where can I see my messages?",
    ],
)

demo.launch()

---

## Part 6: Evaluation with Langfuse üìä

Now it's time to evaluate how well your RAG system performs!

### Step 1: Create Evaluation Dataset

Run this command in your terminal:

```bash
uv run python 5_Evaluation/create_langfuse_dataset.py
```

### Step 2: Run Evaluation

Then evaluate your RAG system:

```bash
uv run python 5_Evaluation/run_evaluation.py
```

### What Could be Evaluated?

- **Retrieval Quality**: Are we finding the right documents?
- **Answer Accuracy**: Are responses correct and relevant?
- **Hallucinations**: Is the model making things up?
- **Response Quality**: Is the answer helpful and well-formatted?

Check your Langfuse dashboard to see detailed metrics! üéØ

---

## üéì Summary

Congratulations! You've built a complete RAG system:

1. ‚úÖ Loaded and preprocessed FAQ data
2. ‚úÖ Created a hybrid vector database
3. ‚úÖ Built a conversational RAG assistant
4. ‚úÖ Tested it with an interactive UI
5. ‚úÖ Learned how to evaluate with Langfuse

### Next Steps

- Experiment with different chunk sizes
- Try different embedding models
- Adjust the number of retrieved documents (k parameter)
- Fine-tune the prompt template
- Compare retrieval modes (dense vs sparse vs hybrid)

Happy experimenting! üöÄ