# Student RAG Project - Guided Implementation

## Welcome!

In this project, you'll build a **RAG (Retrieval-Augmented Generation)** system that can answer questions about your documents.

### What You'll Learn:
- ‚úÖ File I/O (reading documents)
- ‚úÖ String manipulation (text chunking)
- ‚úÖ Functions and parameters
- ‚úÖ Lists and dictionaries
- ‚úÖ Loops and conditionals
- ‚úÖ Basic calculations and statistics

### What's Provided for You:
- ‚úÖ Embedding model (converts text to numbers)
- ‚úÖ Vector database (stores and searches embeddings)
- ‚úÖ LLM connection (generates answers)

### Your Tasks:
You'll complete **TODO sections** marked with `# TODO:` comments.

Let's get started! üöÄ

---
## Setup: Import Libraries

In [256]:
# Import the pre-built helper module
from rag_helpers import (
    EmbeddingModel,
    VectorDatabase,
    LLM,
    Timer,
    print_separator,
    print_search_results,
    print_rag_answer,
    check_setup
)

# Import standard Python libraries you'll use
from pathlib import Path
from typing import List, Dict
import json

# Check if everything is installed correctly
check_setup()

Checking setup...
‚úì chromadb is installed
‚úì sentence_transformers is installed
‚úì requests is installed

‚úì All required packages are installed!
You're ready to start!


True

---
## Configuration

Set up the basic settings for your RAG system.

In [257]:
# TODO: Change this to point to YOUR documents folder
DOCS_FOLDER = "./my_docs"

# Chunking settings (you can experiment with these!)
CHUNK_SIZE = 500      # How many characters per chunk
OVERLAP = 50          # How many characters overlap between chunks

# How many results to retrieve for each query
TOP_K = 3

print(f"Configuration:")
print(f"  Documents folder: {DOCS_FOLDER}")
print(f"  Chunk size: {CHUNK_SIZE} characters")
print(f"  Overlap: {OVERLAP} characters")
print(f"  Top-K results: {TOP_K}")

Configuration:
  Documents folder: ./my_docs
  Chunk size: 500 characters
  Overlap: 50 characters
  Top-K results: 3


---
## TODO #1: Document Loading

**Your Task:** Write a function to load all text files from a folder.

**What to do:**
1. Loop through all `.txt` files in the folder
2. Read each file's content
3. Store the content and filename in a dictionary
4. Return a list of these dictionaries

**Python concepts:** File I/O, loops, dictionaries, lists

In [258]:
def load_documents(folder_path: str) -> List[Dict[str, str]]:
    """
    Load all text documents from a folder.

    Args:
        folder_path: Path to folder containing .txt files

    Returns:
        List of dictionaries, each containing:
        - 'content': the text content of the file
        - 'filename': the name of the file

    Example:
        [
            {'content': 'This is doc 1...', 'filename': 'doc1.txt'},
            {'content': 'This is doc 2...', 'filename': 'doc2.txt'}
        ]
    """

    # TODO: Implement this function!
    # HINTS:
    # 1. Create an empty list to store documents
    # 2. Use Path(folder_path).glob("*.txt") to find all .txt files
    # 3. For each file:
    #    - Open it with open(file_path, 'r', encoding='utf-8')
    #    - Read the content with .read()
    #    - Create a dictionary with 'content' and 'filename'
    #    - Append to your list
    # 4. Return the list

    documents = []  # Start with empty list

    # Your code here:
    folder = Path(folder_path)

    for file_path in folder.glob("*.txt"):
        # Open and read the file
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()
            
        # Create a dictionary
        our_dict = {
            'content': content,
            'filename': file_path.name
        }

        # Add to documents list
        documents.append(our_dict)

    print(f"‚úì Loaded {len(documents)} documents")
    return documents


# Test your function
documents = load_documents(DOCS_FOLDER)

# Display first document (if any were loaded)
if documents:
    print(f"\nFirst document: {documents[0]['filename']}")
    print(f"Content preview: {documents[0]['content'][:200]}...")
else:
    print("‚ö†Ô∏è  No documents loaded! Check your folder path.")

‚úì Loaded 8 documents

First document: 1_Dunning-Kruger_Facts.txt
Content preview: Title: The Dunning‚ÄìKruger Effect ‚Äî Why Incompetence Feels Like Confidence

People often assume confidence means competence, but research shows otherwise. The Dunning‚ÄìKruger Effect describes how people...


---
## TODO #2: Text Chunking Function

**Your Task:** Write a function to split long text into smaller chunks with overlap.

**Why?** Long documents are too big for embeddings. We need to split them into smaller pieces.

**What to do:**
1. Start at the beginning of the text
2. Take a chunk of `chunk_size` characters
3. Move forward by `chunk_size - overlap` characters
4. Repeat until you reach the end

**Python concepts:** String slicing, loops, lists

In [259]:
def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
    """
    Split text into overlapping chunks.

    Args:
        text: The text to split
        chunk_size: Maximum characters per chunk
        overlap: How many characters to overlap between chunks

    Returns:
        List of text chunks

    Example:
        text = "This is a long document..."
        chunks = chunk_text(text, chunk_size=100, overlap=20)
        # Returns: ['This is a long...', 'long document...']
    """

    # TODO: Implement this function!
    # HINTS:
    # 1. Create an empty list to store chunks
    # 2. Start with position = 0
    # 3. While position < len(text):
    #    - Extract chunk from position to position+chunk_size
    #    - Add chunk to list (if not empty)
    #    - Move position forward by (chunk_size - overlap)
    # 4. Return the list of chunks

    chunks = []  # Start with empty list
    position = 0  # Start at beginning

    # Your code here:
    while position < len(text):
        # Extract a chunk
        chunk = text[position : position + chunk_size]

        # Add to chunks list
        if chunk.strip():
            chunks.append(chunk)

        # Move position forward
        position += (chunk_size - overlap)

    return chunks


# Test your chunking function
test_text = "This is a test. " * 50  # Create a long test string
test_chunks = chunk_text(test_text, chunk_size=100, overlap=20)

print(f"Test text length: {len(test_text)} characters")
print(f"Number of chunks: {len(test_chunks)}")
print(f"\nFirst chunk: {test_chunks[0]}")
if len(test_chunks) > 1:
    print(f"Second chunk: {test_chunks[1]}")

Test text length: 800 characters
Number of chunks: 10

First chunk: This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This
Second chunk: This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This


---
## TODO #3: Process All Documents into Chunks

**Your Task:** Use your chunking function to split ALL documents into chunks and create metadata.

**What to do:**
1. Loop through each document
2. Chunk the document's content
3. For each chunk, create metadata (which file it came from, which chunk number)
4. Store everything in a list

**Python concepts:** Nested loops, dictionaries, enumerate

In [260]:
from typing import List, Dict

def process_documents(documents: List[Dict[str, str]],
                     chunk_size: int = 500,
                     overlap: int = 50) -> tuple:
    """
    Process all documents into chunks with metadata.

    Args:
        documents: List of document dictionaries
        chunk_size: Size of each chunk
        overlap: Overlap between chunks

    Returns:
        Tuple of (chunk_texts, chunk_metadatas)
        - chunk_texts: List of chunk strings
        - chunk_metadatas: List of metadata dictionaries
    """

    # TODO: Implement this function!
    # HINTS:
    # 1. Create two empty lists: chunk_texts and chunk_metadatas
    # 2. For each document:
    #    - Get the document's content and filename
    #    - Use your chunk_text() function to split it
    #    - For each chunk (use enumerate to get index):
    #      - Add chunk text to chunk_texts
    #      - Create metadata dict with 'source' and 'chunk_id'
    #      - Add metadata to chunk_metadatas
    # 3. Return both lists as a tuple

    chunk_texts = []
    chunk_metadatas = []

    # Your code here:
    for doc in documents:
        filename = doc["filename"] # Get document content and filename
        content = doc["content"]
        
        chunks = chunk_text(content, chunk_size=chunk_size, overlap=overlap) # Chunk the document

        # For each chunk:
        for i, chunk in enumerate(chunks):
            chunk_texts.append(chunk) #   - Add chunk text to chunk_texts

        #   - Create metadata dictionary
            metadata = {
                    "source": filename,
                    "chunk_id": i
                }
            
        #   - Add metadata to chunk_metadatas
            chunk_metadatas.append(metadata)

    print(f"‚úì Created {len(chunk_texts)} chunks from {len(documents)} documents")
    return chunk_texts, chunk_metadatas


# Process all documents
chunk_texts, chunk_metadatas = process_documents(documents, CHUNK_SIZE, OVERLAP)

# Display example
if chunk_texts:
    print(f"\nExample chunk:")
    print(f"  Source: {chunk_metadatas[0]['source']}")
    print(f"  Chunk ID: {chunk_metadatas[0]['chunk_id']}")
    print(f"  Text: {chunk_texts[0][:200]}...")

‚úì Created 24 chunks from 8 documents

Example chunk:
  Source: 1_Dunning-Kruger_Facts.txt
  Chunk ID: 0
  Text: Title: The Dunning‚ÄìKruger Effect ‚Äî Why Incompetence Feels Like Confidence

People often assume confidence means competence, but research shows otherwise. The Dunning‚ÄìKruger Effect describes how people...


---
## Pre-Built: Create Embeddings and Store in Database

This part uses the pre-built helpers. Just run these cells - no coding needed! ‚ú®

In [261]:
# Initialize the embedding model (pre-built)
print("Initializing embedding model...")
embedder = EmbeddingModel()

# Create embeddings for all chunks (pre-built)
print("\nCreating embeddings...")
embeddings = embedder.embed_multiple(chunk_texts)
print(f"‚úì Created {len(embeddings)} embeddings")

Initializing embedding model...
Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
‚úì Model loaded!

Creating embeddings...
Embedding 24 texts...
‚úì Complete!
‚úì Created 24 embeddings


In [262]:
# Initialize vector database (pre-built)
print("Initializing vector database...")
vector_db = VectorDatabase()

# Add chunks to database (pre-built)
print("\nAdding chunks to database...")
vector_db.add_chunks(chunk_texts, embeddings, chunk_metadatas)

Initializing vector database...
‚úì Vector database initialized
  Collection: student_rag
  Current documents: 24

Adding chunks to database...
‚úì Added 24 chunks to database


In [263]:
# Initialize LLM connection (pre-built)
print("Connecting to Ollama LLM...")
llm = LLM(model="gemma3:1b-it-qat")

# Test the connection
print("\nTesting LLM connection...")
if llm.test_connection():
    print("‚úì LLM is working!")
else:
    print("‚ö†Ô∏è  LLM connection failed! Make sure Docker container is running.")

Connecting to Ollama LLM...
‚úì LLM initialized: gemma3:1b-it-qat at http://127.0.0.1:11434

Testing LLM connection...
‚úì LLM is working!


---
## TODO #4: RAG Query Function

**Your Task:** Write the main RAG function that ties everything together!

**What to do:**
1. Embed the user's question
2. Search the database for similar chunks
3. Build a prompt with the retrieved context
4. Ask the LLM to answer based on the context
5. Return the answer and metadata

**Python concepts:** Functions, string formatting, dictionaries

In [264]:
from typing import Dict
def rag_query(question: str, top_k: int = 3) -> Dict:
    """
    Answer a question using RAG (Retrieval-Augmented Generation).

    Args:
        question: The user's question
        top_k: How many chunks to retrieve

    Returns:
        Dictionary with:
        - 'question': the original question
        - 'answer': the LLM's answer
        - 'sources': list of source filenames
        - 'contexts': list of retrieved chunks
        - 'time': how long it took
    """

    # Start timer
    timer = Timer()
    timer.start()

    # TODO: Implement the RAG pipeline!
    # HINTS:
    # 1. Embed the question using: embedder.embed_text(question)
    # 2. Search database using: vector_db.search(query_embedding, top_k)
    # 3. Extract retrieved chunks and metadata from search results:
    #    - retrieved_chunks = results['documents'][0]
    #    - retrieved_metadata = results['metadatas'][0]
    # 4. Build context by joining chunks with newlines
    # 5. Create prompt (template below)
    # 6. Generate answer using: llm.generate_answer(prompt)
    # 7. Extract source filenames from metadata
    # 8. Return everything in a dictionary

    # Step 1: Embed question
    query_embedding = embedder.embed_text(question)

    # Step 2: Search database
    results = vector_db.search(query_embedding, top_k=top_k)

    # Step 3: Extract results
    retrieved_chunks = results["documents"][0]
    retrieved_metadata = results["metadatas"][0]

    # Step 4: Build context
    context = "\n\n".join(retrieved_chunks)

    # Step 5: Create prompt (use this template)
    prompt = f"""Answer the question based on the context below.

Context:
{context}

Question: {question}

Answer:"""

    # Step 6: Generate answer
    answer = llm.generate_answer(prompt)

    # Step 7: Extract sources
    # sources = [m.get("source", "unknown") for m in retrieved_metadata]  # Your code here (get 'source' from each metadata dict)
    # seen = set()
    # sources = []
    # for s in sources:
    #     if s not in seen:
    #         seen.add(s)
    #         sources.append(s)
    seen = set()
    sources = []
    for m in retrieved_metadata:
        src = m.get("source", "unknown")
        if src not in seen:
            seen.add(src)
            sources.append(src)

    # Stop timer
    elapsed_time = timer.stop()

    # Step 8: Return results
    return {
        'question': question,
        'answer': answer,
        'sources': sources,
        'contexts': retrieved_chunks,
        'time': elapsed_time
    }


print("‚úì RAG query function defined!")

‚úì RAG query function defined!


---
## Test Your RAG System!

Let's try asking some questions!

In [265]:
# Test question 1
result = rag_query("What are the attendance rules?")

# Pretty print the result
print_rag_answer(
    result['question'],
    result['answer'],
    result['sources'],
    result['time']
)


QUESTION: What are the attendance rules?

ANSWER:
The provided text does not discuss attendance rules. It focuses on the cognitive blind spot related to overconfidence and the impact of it on judgment.

SOURCES: 6_Premortem_Technique_and_Plan_Confidence.txt, 1_Dunning-Kruger_Facts.txt, 2_Overconfidence_Types_Moore_Healy.txt
TIME: 3.92 seconds


In [266]:
# Try your own question!
my_question = "What happens if you cheat?"  # Change this!

result = rag_query(my_question)
print_rag_answer(
    result['question'],
    result['answer'],
    result['sources'],
    result['time']
)


QUESTION: What happens if you cheat?

ANSWER:
The context doesn‚Äôt address what happens if someone cheats. It focuses on the dangers of overconfidence and the different types of overconfidence.

SOURCES: 5_Overconfidence_and_Leadership_Selection_Risks.txt, 2_Overconfidence_Types_Moore_Healy.txt
TIME: 4.03 seconds


---
## TODO #5: Create Test Dataset

**Your Task:** Create a list of test questions to evaluate your RAG system.

**What to do:**
1. Think of 10 questions your documents can answer
2. For each question, write the expected answer
3. Store them in a structured format

**Python concepts:** Lists, dictionaries, data structures

In [267]:
test_questions = [
    {
        'question': 'What is the Dunning‚ÄìKruger Effect?',
        'expected_answer': 'People with low skill overestimate their ability because they lack insight into their own incompetence',
        'category': 'factual'
    },
    {
        'question': 'What are the three main types of overconfidence described by Moore and Healy?',
        'expected_answer': 'Overestimation, overplacement, and overprecision',
        'category': 'factual'
    },
    {
        'question': 'How is overprecision defined in the readings?',
        'expected_answer': 'Being too certain that your beliefs or estimates are correct, with intervals that are too narrow',
        'category': 'factual'
    },
    {
        'question': 'How does overestimation differ from overplacement?',
        'expected_answer': 'Overestimation is overrating your own performance in absolute terms, while overplacement is thinking you rank higher than others',
        'category': 'conceptual'
    },
    {
        'question': 'Why can overconfidence lead to bad decisions according to the texts?',
        'expected_answer': 'It can cause people to take excessive risks, ignore feedback, or fail to prepare because they think they are already right',
        'category': 'explanatory'
    },
    {
        'question': 'What role does feedback play in correcting overconfidence?',
        'expected_answer': 'Accurate, timely feedback can help people recalibrate their beliefs and reduce overconfidence',
        'category': 'inferential'
    },
    {
        'question': 'According to the readings, in what kinds of tasks is overconfidence especially common?',
        'expected_answer': 'Difficult or ambiguous tasks where people cannot easily see their own mistakes',
        'category': 'factual'
    },
    {
        'question': 'How does experience or expertise affect overconfidence?',
        'expected_answer': 'More experience can reduce some forms of overconfidence, but experts can still show overprecision',
        'category': 'inferential'
    },
    {
        'question': 'What is miscalibration in the context of confidence judgments?',
        'expected_answer': 'A mismatch between stated confidence levels and actual accuracy, such as being 90% confident but only right 60% of the time',
        'category': 'factual'
    },
    {
        'question': 'What strategies do the readings suggest for reducing overconfidence?',
        'expected_answer': 'Considering alternative explanations, seeking disconfirming evidence, or using more structured forecasting methods',
        'category': 'application'
    },
    {
        'question': 'True or false, does overprecision mean being too sure that your answer is correct?',
        'expected_answer': 'True'
    },
    {
        'question': 'True or false, does hearing the same statement many times make it seem true?',
        'expected_answer': 'True'
    },
    {
        'question': 'True or false, people judge ideas only by evidence, not by how confidently they‚Äôre presented?',
        'expected_answer': 'False'
    },
    {
        'question': 'True or false, high performers are usually the most confident because they fully recognize how much better they are than others?',
        'expected_answer': 'False'
    },
    {
        'question': 'Is the following statement true or false: Misjudging someone‚Äôs confidence as competence is rare in group settings because people usually focus on evidence over presentation style.',
        'expected_answer': 'False'
    },
     {
        'question': 'Select a text source at random and give me its title and a short summary.',
        'expected_answer': 'The Dunning‚ÄìKruger Effect ‚Äî Why Incompetence Feels Like Confidence;'
    }
]

print(f"‚úì Created {len(test_questions)} test questions")
print(f"\nExample question:")
print(f"  Q: {test_questions[0]['question']}")
print(f"  Expected: {test_questions[0]['expected_answer']}")


‚úì Created 16 test questions

Example question:
  Q: What is the Dunning‚ÄìKruger Effect?
  Expected: People with low skill overestimate their ability because they lack insight into their own incompetence


---
## TODO #6: Calculate Evaluation Metrics

**Your Task:** Write functions to measure how well your RAG system performs.

**Python concepts:** Functions, calculations, statistics

In [268]:
    
def calculate_average_latency(results: List[Dict]) -> float:
    # TODO: Implement this function!
    # HINTS:
    # 1. Extract all 'time' values from results
    # 2. Sum them up
    # 3. Divide by the number of results
    # 4. Return the average

    # Your code here:
    """
    Calculate average response time.

    Args:
        results: List of result dictionaries (each has 'time' field)

    Returns:
        Average time in seconds
    """
    if not results:
        return 0.0

    total_time = 0.0
    for r in results:
        total_time += r.get('time', 0.0)

    avg_time = total_time / len(results)
    return avg_time



def count_successful_retrievals(results: List[Dict]) -> int:
    """
    Count how many queries successfully retrieved context.

    Args:
        results: List of result dictionaries

    Returns:
        Number of successful retrievals
    """

    # TODO: Implement this function!
    # HINTS:
    # 1. Start with count = 0
    # 2. For each result:
    #    - Check if 'contexts' is not empty
    #    - If yes, increment count
    # 3. Return count

    # Your code here:
def count_successful_retrievals(results: List[Dict]) -> int:
    """
    Count how many queries successfully retrieved context.

    Args:
        results: List of result dictionaries

    Returns:
        Number of successful retrievals
    """
    count = 0
    for r in results:
        contexts = r.get('contexts', [])
        if contexts:  # non-empty list means we retrieved something
            count += 1
    return count



def get_all_sources(results: List[Dict]) -> List[str]:
    """
    Get unique list of all sources used.

    Args:
        results: List of result dictionaries

    Returns:
        List of unique source filenames
    """
    all_sources = set()

    for r in results:
        sources = r.get('sources', [])
        for s in sources:
            all_sources.add(s)

    return list(all_sources)
    # Collect all sources

 


print("‚úì Evaluation functions defined!")

‚úì Evaluation functions defined!


---
## TODO #7: Run Complete Evaluation

**Your Task:** Test your RAG system with all test questions and calculate metrics.

**Python concepts:** Loops, function calls, data aggregation

In [269]:
def run_evaluation(test_questions: List[Dict]) -> List[Dict]:
    """
    Run RAG system on all test questions.

    Args:
        test_questions: List of test question dictionaries

    Returns:
        List of result dictionaries
    """
    results = []

    for test in test_questions:
        # Get the question text
        question = test.get('question', '')

        # Run RAG query
        result = rag_query(question)

        # Attach expected answer and category for reference
        result['expected_answer'] = test.get('expected_answer', '')
        result['category'] = test.get('category', '')

        # Store the result
        results.append(result)

    return results



# Run evaluation on all test questions
print("Running evaluation on all test questions...\n")
all_results = run_evaluation(test_questions)

print(f"\n‚úì Completed {len(all_results)} tests")


Running evaluation on all test questions...


‚úì Completed 16 tests


---
## Display Results

Show the evaluation metrics and results.

In [270]:
# Calculate metrics using your functions
avg_latency = calculate_average_latency(all_results)
successful_retrievals = count_successful_retrievals(all_results)
all_sources_used = get_all_sources(all_results)
hit_rate = successful_retrievals / len(all_results) if all_results else 0

# Display metrics
print_separator("EVALUATION RESULTS")
print(f"\nTotal Questions Tested: {len(all_results)}")
print(f"Successful Retrievals: {successful_retrievals}")
print(f"Hit Rate: {hit_rate:.2%}")
print(f"Average Latency: {avg_latency:.2f} seconds")
print(f"\nSources Used: {', '.join(all_sources_used)}")
print_separator()


Total Questions Tested: 16
Successful Retrievals: 16
Hit Rate: 100.00%
Average Latency: 4.11 seconds

Sources Used: 4_Dominance_Signals_and_Perceived_Competence.txt, 6_Premortem_Technique_and_Plan_Confidence.txt, 7_Calibration_Training_and_Forecast_Accuracy.txt, 5_Overconfidence_and_Leadership_Selection_Risks.txt, 1_Dunning-Kruger_Facts.txt, 3_Illusory_Truth_Effect_Evidence.txt, 2_Overconfidence_Types_Moore_Healy.txt


In [271]:
# Display individual results
print("\nIndividual Test Results:\n")

for i, result in enumerate(all_results, 1):
    print(f"[Test {i}]")
    print(f"Question: {result['question']}")
    print(f"Answer: {result['answer'][:500]}")
    print(f"Sources: {', '.join(set(result['sources']))}")
    print(f"Time: {result['time']:.2f}s")
    print("-" * 60)
    print()


Individual Test Results:

[Test 1]
Question: What is the Dunning‚ÄìKruger Effect?
Answer: The Dunning‚ÄìKruger Effect describes how people with low ability in a skill tend to overestimate their knowledge or performance.
Sources: 1_Dunning-Kruger_Facts.txt
Time: 4.86s
------------------------------------------------------------

[Test 2]
Question: What are the three main types of overconfidence described by Moore and Healy?
Answer: Overestimation, overplacement, and overprecision.
Sources: 2_Overconfidence_Types_Moore_Healy.txt
Time: 2.93s
------------------------------------------------------------

[Test 3]
Question: How is overprecision defined in the readings?
Answer: Overprecision happens when people are too certain about their beliefs or predictions, leaving too little room for error.
Sources: 7_Calibration_Training_and_Forecast_Accuracy.txt, 2_Overconfidence_Types_Moore_Healy.txt
Time: 3.65s
------------------------------------------------------------

[Test 4]
Question: How doe

---
## Save Your Results

Save your test results to a JSON file for your report.

In [272]:
# Save results to JSON file
results_summary = {
    'metrics': {
        'total_questions': len(all_results),
        'successful_retrievals': successful_retrievals,
        'hit_rate': hit_rate,
        'average_latency': avg_latency
    },
    'results': all_results
}

with open('evaluation_results.json', 'w') as f:
    json.dump(results_summary, f, indent=2)

print("‚úì Results saved to 'evaluation_results.json'")

‚úì Results saved to 'evaluation_results.json'


---
## Congratulations! üéâ

You've successfully built a RAG system!

### What You Accomplished:
‚úÖ Loaded documents from files  
‚úÖ Chunked text with overlap  
‚úÖ Created a RAG query pipeline  
‚úÖ Built a test dataset  
‚úÖ Calculated evaluation metrics  
‚úÖ Generated a results report  

### Next Steps:
- Try different chunk sizes and overlaps
- Add more test questions
- Experiment with different values for `top_k`
- Analyze which questions work best
- Write up your findings in a report

### For Your Report:
1. Describe your document collection
2. Explain your chunking strategy
3. Present your evaluation metrics
4. Show examples of good and bad answers
5. Discuss what you learned

Great job! üöÄ