# üîç RAG Chatbot with Langfuse Observability

This notebook demonstrates how to build a **Retrieval-Augmented Generation (RAG)** chatbot with **built-in observability** using Langfuse.

### What you'll learn:
- ‚úÖ Build a conversational RAG system
- ‚úÖ Track and monitor AI calls in real-time
- ‚úÖ Debug your AI application effectively

---

## üì¶ Step 1: Setup & Imports

First, let's import all the libraries we need and set up our configuration.

In [None]:
import os
from pathlib import Path

from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_core.messages.base import BaseMessage
from langchain.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from langfuse import Langfuse, observe, get_client
from qdrant_client import QdrantClient
from typing import Optional
import gradio as gr

## ‚öôÔ∏è Step 2: Configuration

Define our model settings and prompt template.

In [None]:
# üéØ Model Configuration
MODEL_NAME = "gpt-5-nano"
REASONING_EFFORT = "minimal"  # could be   "minimal" | "low" | "medium" | "high"  see [https://platform.openai.com/docs/guides/latest-model]
TEMPERATURE = 0
K_RETRIEVAL = 4

# üìù System Prompt Template
PROMPT_TEMPLATE = """You are a helpful assistant answering questions about customer care for AI-Bay.

Use the following context documents to answer the user's question. If the answer is not in the provided documents, say "I don't have that information in the provided documents."

Context Documents:
{context}

here is the history of the conversation:
{history}

User Question: {question}

Instructions:
1. Answer based ONLY on the provided documents
2. Be specific and cite which document(s) you used
3. If information is unclear or missing, say so
4. Keep answers concise but complete
5. Use a friendly, informative tone
Answer:"""

## üõ†Ô∏è Step 3: Helper Functions

These functions handle Langfuse connection, vector store loading, and document formatting.

In [None]:
def create_langfuse_client() -> Langfuse:
    """Initialize Langfuse for observability tracking."""
    load_dotenv()
    return Langfuse(
        public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
        secret_key=os.environ["LANGFUSE_SECRET_KEY"],
        host=os.environ["LANGFUSE_BASE_URL"],
    )


def load_vector_store(path_to_vector_store: Optional[Path] = None):
    """Load the vector database containing FAQ documents."""
    sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
    load_dotenv()

    this_dir = Path.cwd().parent
    path_to_vector_store = path_to_vector_store or this_dir / "vector_store"

    print(f"üìÇ Loading vector store from: {path_to_vector_store}")

    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

    # Remove lock file if exists
    lock_file = path_to_vector_store / ".lock"
    if lock_file.exists():
        os.remove(lock_file)

    client = QdrantClient(path=path_to_vector_store)
    return QdrantVectorStore(
        client=client,
        collection_name="faq_collection",
        embedding=embeddings,
        sparse_embedding=sparse_embeddings,
        vector_name="dense",
        sparse_vector_name="sparse",
        retrieval_mode=RetrievalMode.HYBRID,
    )


def format_docs_alternative(docs: list[Document]):
    """Format retrieved documents for the prompt."""
    formatted = []
    for i, doc in enumerate(docs, 1):
        formatted.append(
            f"Document {i}:\n{doc.metadata['faq_body']}\nSource: {doc.metadata['faq_id']}"
        )
    return "\n\n".join(formatted)

## ü§ñ Step 4: RAG Conversation Class

This is the heart of our chatbot! The `@observe` decorators enable **automatic tracking** in Langfuse.

### Key Features:
- üîç **Document Retrieval**: Finds relevant FAQs
- üí¨ **Response Generation**: Creates AI responses
- üìä **Auto-Logging**: Tracks everything in Langfuse

In [None]:
load_dotenv()
langfuse = get_client()


class RagConversation:
    """Conversational RAG system with Langfuse observability."""

    def __init__(self, vector_store, llm, history=None):
        self.vector_store = vector_store
        self.llm = llm
        self.history = history if history else []

    def add_message(self, message: BaseMessage):
        """Add a message to conversation history."""
        self.history.append(message)

    @observe(name="retriever-call", as_type="retriever")
    def retrieve_documents(self, question, K=K_RETRIEVAL):
        """üîç Retrieve relevant documents (tracked in Langfuse)."""
        docs_and_scores = self.vector_store.similarity_search_with_relevance_scores(
            question, k=K
        )
        return docs_and_scores

    @observe()
    def history_to_string(self):
        """Convert conversation history to string format."""
        # Notice how we can dynamically update the trace metadata and add the stuff we want to track
        # see https://langfuse.com/faq/all/empty-trace-input-and-output
        # because we don't return something explicitly we have to manually update the trace if we want to keep track of it
        langfuse.update_current_trace(metadata={"history": self.history})
        return "\n".join(
            [f"{message.type}: {message.content}" for message in self.history]
        )

    @observe(name="llm-call", as_type="generation")
    def generate_response(self, question, docs):
        """ü§ñ Generate AI response (tracked in Langfuse)."""
        context_str = format_docs_alternative(docs)
        prompt = PROMPT_TEMPLATE.format(
            context=context_str, question=question, history=self.history_to_string()
        )
        response = self.llm.invoke(prompt)

        # Update conversation history
        self.history.append(HumanMessage(content=question))
        self.history.append(AIMessage(content=response.content))

        return response.content

    @observe
    def get_response(self, question):
        """üì¨ Main method: retrieve docs and generate response."""
        # Retrieve relevant documents
        docs_and_scores = self.retrieve_documents(question, K=K_RETRIEVAL)
        docs = [doc for doc, score in docs_and_scores]
        scores = [score for doc, score in docs_and_scores]

        # Generate response
        response = self.generate_response(question, docs)

        return response, docs, scores

## üöÄ Step 5: Initialize the System

Load the vector store and initialize the language model.

In [None]:
load_dotenv()

# üìç Set up paths
notebook_dir = Path.cwd()
PATH_TO_VECTOR_STORE = notebook_dir.parent / "5_Evaluation" / "vector_store"

# üìö Load vector store
vector_store = load_vector_store(path_to_vector_store=PATH_TO_VECTOR_STORE)

# üß† Initialize LLM
llm = ChatOpenAI(model=MODEL_NAME, temperature=TEMPERATURE)

# üí¨ Create conversation instance
rag_conversation = RagConversation(vector_store, llm)

## üí¨ Step 6: Launch the Chat Interface

### üéâ Try it out!

Once you run this cell, a **Gradio interface** will appear below. You can:
- Ask questions about AI-Bay
- View responses in real-time
- Check **Langfuse dashboard** to see all tracked calls!

**Tip**: Open your Langfuse dashboard in another tab to watch the magic happen! ü™Ñ

In [None]:
import gradio as gr


def rag_assistant_response(message, history):
    """Handle chat messages from Gradio interface."""
    response, docs, scores = rag_conversation.get_response(message)
    return response


# Launch interactive chat
demo = gr.ChatInterface(
    fn=rag_assistant_response,
    title="ü§ñ AI-Bay Customer Support Assistant",
    description="Ask me anything about AI-Bay! I'll search our FAQ database to help you.",
    examples=[
        "How do I post an ad?",
        "How can I contact a seller?",
        "Where can I see my messages?",
    ],
)

demo.launch()

## üéì What's Happening Behind the Scenes?

Every time you ask a question:

1. **üîç Retrieval**: The system searches the vector database for relevant FAQs
2. **üìù Context Building**: Selected documents are formatted into the prompt
3. **ü§ñ Generation**: The LLM generates a response based on the context
4. **üìä Logging**: All steps are automatically logged to Langfuse

### üî¨ Check Langfuse to see:
- Response times for each step
- Token usage
- Retrieved documents and their scores
- Full conversation history
- Cost tracking

---

## üéØ Next Steps

Try experimenting with:
- Different `K_RETRIEVAL` values (2-10)
- Different `TEMPERATURE` settings (0-1)
- Different models (gpt-4, gpt-4-turbo, etc.)
- Modifying the `PROMPT_TEMPLATE`

**Happy Learning! üöÄ**