# üéØ Prompt Management with Langfuse

## Why Centralized Prompt Management?

Instead of hardcoding prompts in your application, Langfuse lets you:

- ‚úÖ **Edit prompts without redeploying** your app
- üìä **Version control** your prompts
- üîç **Track which prompts** are used in production
- üë• **Collaborate** with non-technical team members

**Trade-off**: Anyone with access can modify prompts (use versioning wisely!)

## üì¶ Setup & Imports

In [1]:
import os
from pathlib import Path

from dotenv import load_dotenv
from langchain_core.documents import Document
from langchain_core.messages.base import BaseMessage
from langchain.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from langfuse import Langfuse, observe, get_client
from qdrant_client import QdrantClient
from typing import Optional
import gradio as gr

## ‚öôÔ∏è Configuration

Set your model parameters here:

In [2]:
# üéØ Model Configuration
MODEL_NAME = "gpt-5-nano"
REASONING_EFFORT = "minimal"  # "minimal" | "low" | "medium" | "high"
TEMPERATURE = 0
K_RETRIEVAL = 4

# üìù Prompt Name (must match your Langfuse prompt name)
PROMPT_NAME = "rag_prompt"

## üìù Example Prompt Template

**‚ö†Ô∏è ACTION REQUIRED**: Copy this template to Langfuse UI:

1. Go to [Langfuse Prompts](https://langfuse.com/docs/prompt-management/get-started)
2. Create a new prompt named `rag_prompt`
3. Paste the template below
4. Use variables: `{{question}}`, `{{context}}`, `{{history}}`

In [3]:
# üìã Reference template (to copy to Langfuse UI)
PROMPT_TEMPLATE = """You are a helpful assistant answering questions about customer care for AI-Bay.

Use the following context documents to answer the user's question. If the answer is not in the provided documents, say "I don't have that information in the provided documents."

Context Documents:
{context}

Conversation History:
{history}

User Question: {question}

Instructions:
1. Answer based ONLY on the provided documents
2. Be specific and cite which document(s) you used
3. If information is unclear or missing, say so
4. Keep answers concise but complete
5. Use a friendly, informative tone

Answer:"""

# NOTE you have to put double brackets for the variables --> {{question}} instead of {question}.

## üîß Helper Functions

In [4]:
def create_langfuse_client() -> Langfuse:
    """Initialize Langfuse for observability tracking."""
    load_dotenv()
    return Langfuse(
        public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
        secret_key=os.environ["LANGFUSE_SECRET_KEY"],
        host=os.environ["LANGFUSE_BASE_URL"],
    )


def load_vector_store(path_to_vector_store: Optional[Path] = None):
    """Load the vector database containing FAQ documents."""
    sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
    load_dotenv()

    this_dir = Path.cwd().parent
    path_to_vector_store = path_to_vector_store or this_dir / "vector_store"

    print(f"üìÇ Loading vector store from: {path_to_vector_store}")

    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

    # Remove lock file if exists
    lock_file = path_to_vector_store / ".lock"
    if lock_file.exists():
        os.remove(lock_file)

    client = QdrantClient(path=path_to_vector_store)
    return QdrantVectorStore(
        client=client,
        collection_name="faq_collection",
        embedding=embeddings,
        sparse_embedding=sparse_embeddings,
        vector_name="dense",
        sparse_vector_name="sparse",
        retrieval_mode=RetrievalMode.HYBRID,
    )


def format_docs_alternative(docs: list[Document]):
    """Format retrieved documents for the prompt."""
    formatted = []
    for i, doc in enumerate(docs, 1):
        formatted.append(
            f"Document {i}:\n{doc.metadata['faq_body']}\nSource: {doc.metadata['faq_id']}"
        )
    return "\n\n".join(formatted)

## ü§ñ RAG Conversation Class

This class handles:
- üîç **Document retrieval** from vector store
- üéØ **Prompt fetching** from Langfuse (the magic happens here!)
- üí¨ **Response generation** with conversation history
- üìä **Automatic tracing** to Langfuse

In [6]:
load_dotenv()
langfuse = get_client()


class RagConversation:
    """Conversational RAG system with Langfuse observability."""

    def __init__(self, vector_store, llm, history=None):
        self.vector_store = vector_store
        self.llm = llm
        self.history = history if history else []

    def add_message(self, message: BaseMessage):
        """Add a message to conversation history."""
        self.history.append(message)

    @observe(name="retriever-call", as_type="retriever")
    def retrieve_documents(self, question, K=K_RETRIEVAL):
        """üîç Retrieve relevant documents (tracked in Langfuse)."""
        docs_and_scores = self.vector_store.similarity_search_with_relevance_scores(
            question, k=K
        )
        return docs_and_scores

    def get_prompt(self, question, context, history):
        # TODO: get the prompt from langfuse
        # see there https://langfuse.com/docs/prompt-management/get-started
        # Do not forget to update the "prompt" key of the trace with the langfuse.update_current_generation method see here https://langfuse.com/docs/prompt-management/get-started#link-with-langfuse-tracing-optional
        prompt = langfuse.get_prompt(PROMPT_NAME)
        # Insert variables into prompt template

        compiled_prompt = prompt.compile(
            question=question, context=context, history=history
        )
        langfuse.update_current_generation(
            prompt=prompt,
        )
        return compiled_prompt

    @observe()
    def history_to_string(self):
        """Convert conversation history to string format."""
        langfuse.update_current_trace(metadata={"history": self.history})
        return "\n".join(
            [f"{message.type}: {message.content}" for message in self.history]
        )

    @observe(name="llm-call", as_type="generation")
    def generate_response(self, question, docs):
        """ü§ñ Generate AI response (tracked in Langfuse)."""
        context_str = format_docs_alternative(docs)
        prompt = self.get_prompt(question, context_str, self.history_to_string())
        response = self.llm.invoke(prompt)

        # Update conversation history
        self.history.append(HumanMessage(content=question))
        self.history.append(AIMessage(content=response.content))

        return response.content

    @observe
    def get_response(self, question):
        """üî¨ Main method: retrieve docs and generate response."""
        # Retrieve relevant documents
        docs_and_scores = self.retrieve_documents(question, K=K_RETRIEVAL)
        docs = [doc for doc, score in docs_and_scores]
        scores = [score for doc, score in docs_and_scores]

        # Generate response
        response = self.generate_response(question, docs)

        return response, docs, scores

## üöÄ Initialize the System

In [None]:
load_dotenv()

# üìÇ Set up paths
notebook_dir = Path.cwd()
PATH_TO_VECTOR_STORE = notebook_dir.parent / "5_Evaluation" / "vector_store"

vector_store = load_vector_store(path_to_vector_store=PATH_TO_VECTOR_STORE)

llm = ChatOpenAI(model=MODEL_NAME, temperature=TEMPERATURE)

rag_conversation = RagConversation(vector_store, llm)

üìÇ Loading vector store from: /Users/aniszakari/Documents/ai-training-for-devs/5_Evaluation/vector_store


## üí¨ Interactive Chat Interface

**Try it out!** Ask questions about AI-Bay and see:
- How prompts are fetched from Langfuse
- Traces appearing in the Langfuse UI
- Which prompt version was used for each response

**Pro tip**: Go to Langfuse ‚Üí Prompts ‚Üí `rag_prompt` to see all traces using this prompt!

In [8]:
def rag_assistant_response(message, history):
    """Handle chat messages from Gradio interface."""
    response, docs, scores = rag_conversation.get_response(message)
    return response


# Launch interactive chat
demo = gr.ChatInterface(
    fn=rag_assistant_response,
    title="ü§ñ AI-Bay Customer Support Assistant",
    description="Ask me anything about AI-Bay! Powered by Langfuse prompt management.",
    examples=[
        "How do I post an ad?",
        "How can I contact a seller?",
        "Where can I see my messages?",
        "What are the posting guidelines?",
    ],
)

demo.launch()

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




## üéì Key Takeaways

### What you learned:

1. **Centralized Prompts**: Store prompts in Langfuse, not in code
2. **Version Control**: Track which prompt version is used in production
3. **No Redeployment**: Update prompts without touching your application
4. **Automatic Tracing**: Every prompt usage is logged in Langfuse

### Next steps:

- üîÑ Try modifying the prompt in Langfuse UI and see changes instantly
- üìä Explore the Langfuse traces to see prompt performance
- üß™ Create multiple prompt versions and A/B test them
- üìà Analyze which prompt version performs best