# RAG Chatbot Evaluation Framework

This notebook provides a comprehensive evaluation framework for the Internal Knowledge Base RAG chatbot.

In [4]:
import sys
sys.path.append('..')

from src.retrieval.rag_pipeline import RAGPipeline
from src.retrieval.retriever import Retriever
from typing import List, Dict
import pandas as pd
import json
from datetime import datetime

## Manual Evaluation Checklist

### 1. Retrieval Quality
- [ ] Retrieved documents are relevant to the query
- [ ] Top-ranked document is most relevant
- [ ] No relevant documents are missing from top-k
- [ ] Retrieved chunks contain sufficient context
- [ ] Source metadata is accurate

### 2. Answer Quality
- [ ] Answer is factually correct
- [ ] Answer is based on retrieved context
- [ ] Answer is complete and addresses the question
- [ ] Answer is concise and well-structured
- [ ] No hallucinations or made-up information

### 3. Source Attribution
- [ ] Sources are correctly cited
- [ ] Source relevance scores are reasonable
- [ ] Sources actually support the answer
- [ ] File paths and metadata are correct

### 4. Edge Cases
- [ ] Handles questions with no relevant context
- [ ] Responds appropriately to ambiguous queries
- [ ] Handles multi-part questions
- [ ] Maintains context in conversation