# RAG system with feedback loop

In this notebook, we implement a RAG pipeline enhanced with a user feedback loop. The goal is to improve the accuracy, relevance, and personalization of answers over time by:
- Dynamically adjusting document relevance based on user feedback.
- Periodically fine-tuning the document index with high-quality interactions.

This kind of feedback-aware architecture is ideal for applications like education, customer support, and expert systems where relevance and quality of responses matter — and evolve — with use.

In [1]:
import os
from dotenv import load_dotenv
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain import PromptTemplate
import json
from typing import List, Dict, Any
import pymupdf

# Load environment variables from a .env file
load_dotenv()

# Set the OpenAI API key environment variable
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

### Load the PDF
The first step in any RAG system is to ingest source content. In our case, we are loading and reading a PDF file to build our knowledge base.

Here, we read the PDF content and convert it into a string using the `pymupdf` library. This will allow us to process the text further.

In [2]:
# Path to the PDF document
path = "Understanding_Climate_Change.pdf"

# Open the PDF document located at the specified path
doc = pymupdf.open(path)
content = ""
# Iterate over each page in the document
for page_num in range(len(doc)):
    # Get the current page
    page = doc[page_num]
    # Extract the text content from the current page and append it to the content string
    content += page.get_text()

The PDF file is opened using `pymupdf.open()`, and we iterate over all pages to extract the text. The extracted text is then concatenated into one large string. This will allow us to process it further and split it into manageable chunks.


### Chunking and embedding the text
Now that we have the text, we split it into overlapping chunks and convert those into semantic embeddings. These chunks are what the retriever will later search over.

In [3]:
# Define chunking parameters
chunk_size = 1000
chunk_overlap = 200

# Create the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    length_function=len,
    is_separator_regex=False
)

# Split the text into manageable overlapping segments
chunks = text_splitter.create_documents([content])

# Initialize metadata for scoring
for chunk in chunks:
    chunk.metadata['relevance_score'] = 1.0

# Generate embeddings and create the vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)

We divide the raw text into overlapping sections to preserve context during retrieval. Each chunk gets embedded into a vector space using OpenAI’s embedding model and stored in a FAISS index for fast similarity search.

### Setting up the retriever and QA chain
We will now define a retriever to fetch relevant chunks and connect it with a LLM using a RetrievalQA chain.

In [4]:
# Convert vector index into a retriever
retriever = vectorstore.as_retriever()

# Load the LLM
llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini-2024-07-18", max_tokens=4000)

# Wrap everything in a RAG chain
qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)

This QA system works by retrieving relevant chunks of text from the vector index and passing them to the language model to generate contextually relevant answers. We are using LangChain's built-in `RetrievalQA` class, which is a wrapper that combines:
- A retriever — this pulls in the most relevant text chunks (in this case, from our FAISS vectorstore).
- An LLM — which then uses those chunks to generate a response to the user’s query.

### Function to format user feedback in a dictionary
Once a response is generated, we can collect user feedback on its relevance and quality, and persist that information locally.

In [5]:
# Create structured feedback
def get_user_feedback(query, response, relevance, quality, comments=""):
    return {
        "query": query,
        "response": response,
        "relevance": int(relevance),
        "quality": int(quality),
        "comments": comments
    }

### Function to store the feedback in a json file
Feedback is stored in JSON lines format for easy incremental access and later learning.

In [6]:
# Save feedback to disk
def store_feedback(feedback):
    with open("data/feedback_data.json", "a") as f:
        json.dump(feedback, f)
        f.write("\n")

### Function to load past feedback file
We will need to access past feedbacks to learn from them and adjust our document ranking accordingly.

In [7]:
def load_feedback_data():
    feedback_data = []
    try:
        with open("data/feedback_data.json", "r") as f:
            for line in f:
                feedback_data.append(json.loads(line.strip()))
    except FileNotFoundError:
        print("No feedback data file found. Starting with empty feedback.")
    return feedback_data

This function safely loads all prior feedback entries from the local file, making them available for relevance adjustments. This ensures that the system can leverage cumulative user knowledge to evolve over time.


### Function to adjust files relevance scores based on the feedbacks file
To make the system adaptive, we use the LLM to reason whether a past feedback is relevant to the current query and document context. This allows us to dynamically reweight document scores before retrieval.

In [8]:
# Define the expected structured response from the LLM
class Response(BaseModel):
    answer: str = Field(..., title="The answer to the question. The options can be only 'Yes' or 'No'")

# Function to adjust relevance scores using prior user feedback
def adjust_relevance_scores(query: str, docs: List[Any], feedback_data: List[Dict[str, Any]]) -> List[Any]:
    # Create a prompt template for relevance checking
    relevance_prompt = PromptTemplate(
        input_variables=["query", "feedback_query", "doc_content", "feedback_response"],
        template="""
        Determine if the following feedback response is relevant to the current query and document content.
        You are also provided with the Feedback original query that was used to generate the feedback response.
        Current query: {query}
        Feedback query: {feedback_query}
        Document content: {doc_content}
        Feedback response: {feedback_response}

        Is this feedback relevant? Respond with only 'Yes' or 'No'.
        """
    )
    # Initialize the LLM to use for reasoning
    llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini-2024-07-18", max_tokens=4000)

    # Create an LLMChain for relevance checking - Combine prompt and LLM with structured output (expects 'Yes' or 'No')
    relevance_chain = relevance_prompt | llm.with_structured_output(Response)

    # Loop through each retrieved document
    for doc in docs:
        relevant_feedback = []  # Holds feedback relevant to this doc+query pair

        # Loop through all historical feedback entries
        for feedback in feedback_data:
            # Use LLM to check relevance
            input_data = {
                "query": query,  # Current question
                "feedback_query": feedback['query'],  # Feedback's original question
                "doc_content": doc.page_content[:1000],  # Truncated doc content for LLM context
                "feedback_response": feedback['response']  # The response the user rated
            }
            # Let the LLM judge if the feedback applies to this document and query
            result = relevance_chain.invoke(input_data).answer

            # If LLM says "yes", we treat this feedback as relevant
            if result == 'yes':
                relevant_feedback.append(feedback)

        # If we found relevant feedback, adjust the relevance score based on feedback
        if relevant_feedback:
            # Compute average relevance rating from user
            avg_relevance = sum(f['relevance'] for f in relevant_feedback) / len(relevant_feedback)
            # Adjust document's score: scale it relative to a neutral value (3)
            doc.metadata['relevance_score'] *= (avg_relevance / 3)  # Assuming a 1-5 scale, 3 is neutral

    # Finally, sort the documents by adjusted score in descending order
    return sorted(docs, key=lambda x: x.metadata['relevance_score'], reverse=True)

This method dynamically adjusts the relevance score of each document using both user feedback and reasoning from the LLM. It ensures that more trusted documents are prioritized in future retrievals.

We take each document that was retrieved for a new query, and we ask the LLM. The LLM gets the new question, the current document (trimmed), the feedback question, and the feedback's rated response. It evaluates this and replies either “Yes” or “No”. If “Yes”, we treat that feedback as if it applies to this situation too.

Once we have gathered relevant feedback for a document, we compute the average relevance score from those feedback items, and use that to multiply the document's score, with 3 as our neutral baseline.

By the end, documents are re-ranked based on this adjusted score, meaning the ones more validated by historical feedback rise to the top.

### Function to fine tune the vector index to include also queries + answers that received good feedbacks
Periodically, we can also retrain the vectorstore by incorporating high-scoring user interactions to our index to reinforce strong answers.

In [9]:
def fine_tune_index(feedback_data: List[Dict[str, Any]], texts: List[str]) -> Any:
    # Filter high-quality responses
    good_responses = [f for f in feedback_data if f['relevance'] >= 4 and f['quality'] >= 4]

    # Extract queries and responses, and create new documents
    additional_texts = []
    for f in good_responses:
        # Merge question and answer into a single context block
        combined_text = f['query'] + " " + f['response']
        additional_texts.append(combined_text)

    # make the list a string
    additional_texts = " ".join(additional_texts)

    # Create a new index with original and high-quality texts
    all_texts = texts + additional_texts


    # Define chunking parameters
    chunk_size = 1000
    chunk_overlap = 200

    # Split the text into manageable overlapping segments
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        is_separator_regex=False
    )

    chunks = text_splitter.create_documents([all_texts])

    # Initialize metadata for scoring
    for chunk in chunks:
        chunk.metadata['relevance_score'] = 1.0

    # Generate embeddings and create the vector store
    embeddings = OpenAIEmbeddings()
    new_vectorstore = FAISS.from_documents(chunks, embeddings)

    return new_vectorstore

This block expands the document base with reliable user-evaluated interactions, strengthening the system’s contextual grounding and personalization over time.

First, we go through all the user feedback and pick only the ones that were rated highly — so we are not polluting our knowledge base with weak or misleading data. Then we treat these high-quality Q&A examples just like source content — we stitch the question and its answer together and toss it into our data pool.

We combine everything — old documents and new Q&A — into one big string and slice it into manageable chunks, like we did with the original PDF. Each chunk gets embedded into a vector and inserted into a FAISS index. From this point on, our retriever is not just searching the original document — it is also able to surface these strong answers when relevant.

The beauty here is that we are not retraining a language model. We are just evolving the retrieval layer with user input. That is efficient, safe, and very aligned with real-world needs.

### Full system demo: Query → Feedback → Adjustment
Here’s how the full pipeline works from query to fine-tuning.

In [10]:
# Ask a question
query = "What is the greenhouse effect?"

# Step 1: Get response from RAG system
response = qa_chain.invoke(query)["result"]

# Step 2: Assume the user provides high scores
relevance = 5
quality = 5
# Collect feedback
feedback = get_user_feedback(query, response, relevance, quality)

# Step 3: Store feedback
store_feedback(feedback)

# Step 4: Adjust relevance scores for future retrievals
docs = retriever.invoke(query)
adjusted_docs = adjust_relevance_scores(query, docs, load_feedback_data())

# Update the retriever with adjusted docs
retriever.search_kwargs['k'] = len(adjusted_docs)  # Set k to number of adjusted docs
retriever.search_kwargs['docs'] = adjusted_docs   # Inject the re-scored documents

This executes a full loop: generate → collect feedback → adapt ranking → prepare for improved next retrieval.

A query is asked, a response is generated, and the user gives feedback. But instead of just logging that feedback and moving on, we immediately put it to use.

We fetch the documents again, and now we use our adjustment logic: does any of the previous feedback apply to these documents? If so, we change how important those documents are. The ones supported by high-quality past feedback are bumped up in ranking, the others stay the same or get nudged down.

Then, instead of re-indexing the vectorstore (which is heavier), we do something much lighter: we tell the retriever, “Hey, next time you use this query, take these docs and treat them as pre-ranked based on learned preferences.” It's an effective way to make our system feel smarter without retraining anything. It gives the illusion of learning — because in a way, it is.





### Finetune the vectorstore periodicly
This should be scheduled periodically (e.g., daily) to keep the system sharp.

In [11]:
# Periodically (e.g., daily or weekly), fine-tune the index
feedback_data = load_feedback_data()
new_vectorstore = fine_tune_index(feedback_data, content)
retriever = new_vectorstore.as_retriever()

This ensures the index keeps learning from its best interactions.