# Week 3 Assignment: RAG-Based Question Answering System with Mistral

**Course:** IST402 - AI Agents & RAG Systems  
**Student:** [Your Name]  
**Date:** [Date]  
**Submission:** [Link to completed notebook]

---

## Assignment Objective

Design and implement a **Retrieval-Augmented Generation (RAG)** system using:
- Mistral-7B-Instruct-v0.3
- FAISS vector database
- Custom business data

---

## Table of Contents

1. [Install Required Libraries](#install-libraries)
2. [Import Libraries](#import-libraries)
3. [Task 1: Create System Prompt](#task-1)
4. [Task 2: Generate Business Database](#task-2)
5. [Task 3: Implement FAISS Vector Database](#task-3)
6. [Task 4: Create Test Questions](#task-4)
7. [Task 5: Test Questions](#task-5)
8. [Task 6: Model Experimentation & Ranking](#task-6)
9. [Reflection & Analysis](#reflection)



## 1. Install Required Libraries {#install-libraries}

Install all necessary packages for the RAG system.


In [None]:
# Install all required libraries for the RAG system
# Each library serves a specific purpose:

%pip install transformers          # For pre-trained AI models (BERT, DistilBERT, Mistral, etc.)
%pip install langchain             # Framework for building applications with language models
%pip install langchain-community   # Community extensions for LangChain
%pip install sentence-transformers # For creating text embeddings (converting text to numbers)
%pip install torch                 # PyTorch - deep learning framework (backend for transformers)
%pip install faiss-cpu            # Facebook AI Similarity Search - for fast similarity searches
%pip install sentencepiece         # Required for Mistral tokenizer
%pip install accelerate            # For efficient model loading and inference
%pip install sentencepiece         # For 


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated pa

## 2. Import Libraries and Setup {#import-libraries}

Import all necessary libraries for building the RAG system.


In [5]:
# Import all the libraries we need for our RAG system

# Import pipeline from transformers - this gives us easy access to pre-trained models
from transformers import pipeline

# Import FAISS for creating a searchable database of text
from langchain_community.vectorstores import FAISS

# Import embeddings to convert text into numerical vectors for similarity search
from langchain_community.embeddings import HuggingFaceEmbeddings

# Import Document class to structure our knowledge data
from langchain_core.documents import Document

# Import Mistral model for generating content
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

print("‚úÖ All libraries imported successfully!")


‚úÖ All libraries imported successfully!


## Task 1: Create an Assistant System Prompt {#task-1}

**Objective:** Design a system prompt that gives Mistral-7B-Instruct a specific role and business context.

---

### What is a System Prompt?

A **system prompt** is a set of instructions that define the AI's role, behavior, and context before it processes user inputs. Think of it as giving the AI a "job description" that shapes how it responds.

**Key Characteristics:**
- **Role Definition**: Tells the AI what role it should play (e.g., "You are a marketing expert")
- **Context Setting**: Provides background information about the business/organization
- **Behavior Guidance**: Sets expectations for tone, style, and response format
- **Constraint Setting**: Defines boundaries and limitations

**Example:**
```
"You are a customer service representative for an e-commerce platform. 
You are friendly, professional, and knowledgeable about our products and policies. 
Always provide accurate information based on our company guidelines."
```

**Why System Prompts Matter:**
- They shape the AI's personality and expertise level
- They provide context that persists throughout the conversation
- They help prevent hallucinations by grounding responses in defined roles
- They enable consistent, domain-specific outputs

---

### What is Mistral-7B-Instruct-v0.3?

**Mistral-7B-Instruct-v0.3** is a large language model developed by Mistral AI, specifically optimized for following instructions and generating structured outputs.

**Key Features:**
- **Model Size**: 7 billion parameters (relatively compact but powerful)
- **Type**: Instruction-tuned model (designed to follow prompts and instructions)
- **Open Source**: Available on Hugging Face for free use
- **Capabilities**: 
  - Text generation
  - Question answering
  - Content creation
  - Following complex instructions
  - Generating structured outputs (like Q&A pairs)

**Why Use Mistral-7B-Instruct:**
- **Instruction Following**: Specifically trained to follow system prompts and instructions
- **Quality Output**: Produces coherent, contextually appropriate responses
- **Efficiency**: Smaller than models like GPT-4 but still very capable
- **Accessibility**: Free to use via Hugging Face, no API costs
- **Flexibility**: Can be fine-tuned for specific tasks

**Model Card**: Available at `https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3`

---

### Instructions:

- Use `mistralai/Mistral-7B-Instruct-v0.3` to generate your content
- Define a specific role (e.g., "You are a marketing expert for a tech startup")
- Choose a business/organization context to use throughout the assignment

---

### 1.1: Choose Your Business Context

**My Business Context:** [Describe your chosen business/organization here]


In [None]:
# Define your business context
# Example Use Case: Building an FAQ RAG system for IST402 course

# Simple student-friendly example: IST402 Course FAQ System
BUSINESS_CONTEXT = "IST402 - AI Agents & RAG Systems Course"
BUSINESS_ROLE = "Week 3 Assignment FAQ Assistant and Concept Explainer"

# Alternative examples you can use:
# BUSINESS_CONTEXT = "Tech Startup - AI Consultant"
# BUSINESS_ROLE = "AI Consultant"
# 
# BUSINESS_CONTEXT = "E-commerce Platform"
# BUSINESS_ROLE = "Customer Service Representative"
#
# BUSINESS_CONTEXT = "Healthcare Organization"
# BUSINESS_ROLE = "Medical Information Specialist"

print(f"Business Context: {BUSINESS_CONTEXT}")
print(f"Business Role: {BUSINESS_ROLE}")
print("\nüí° This example will help you build an FAQ system for the IST402 course!")
print("   You can answer questions about assignments, technologies, and course content.")


Business Context: IST402 - AI Agents & RAG Systems Course
Business Role: Week 3 Assignment FAQ Assistant and Concept Explainer

üí° This example will help you build an FAQ system for the IST402 course!
   You can answer questions about assignments, technologies, and course content.


### 1.2: Design System Prompt

Create a system prompt that defines the AI's role and context.


In [None]:
# Design your system prompt
# This prompt will be used to guide Mistral-7B in generating Q&A pairs

# System prompt for IST402 Week 3 Assignment FAQ
SYSTEM_PROMPT = f"""
You are {BUSINESS_ROLE} for {BUSINESS_CONTEXT}.

Your task is to create comprehensive question-answer pairs that would be useful for 
students working on the Week 3 RAG assignment. Focus on questions about:

- Week 3 assignment requirements and instructions
- RAG (Retrieval-Augmented Generation) concepts
- FAISS vector database implementation
- System prompts and their design
- Mistral-7B-Instruct model usage
- Embeddings and vector similarity search
- How to complete specific assignment tasks
- Troubleshooting common issues
- Understanding key technologies (LangChain, FAISS, sentence-transformers)

Guidelines:
- Create clear, specific questions that students might ask
- Provide accurate, detailed answers that help students learn
- Cover different aspects: concepts, implementation, troubleshooting
- Use clear, educational language that explains concepts well
- Make answers practical and actionable for completing the assignment
- Focus on Week 3 assignment-specific content

Format each Q&A pair as:
Q: [Question]
A: [Answer]
"""

print("System Prompt Created:")
print("=" * 50)
print(SYSTEM_PROMPT)
print("=" * 50)


System Prompt Created:

You are Week 3 Assignment FAQ Assistant and Concept Explainer for IST402 - AI Agents & RAG Systems Course.

Your task is to create comprehensive question-answer pairs that would be useful for 
students working on the Week 3 RAG assignment. Focus on questions about:

- Week 3 assignment requirements and instructions
- RAG (Retrieval-Augmented Generation) concepts
- FAISS vector database implementation
- System prompts and their design
- Mistral-7B-Instruct model usage
- Embeddings and vector similarity search
- How to complete specific assignment tasks
- Troubleshooting common issues
- Understanding key technologies (LangChain, FAISS, sentence-transformers)

Guidelines:
- Create clear, specific questions that students might ask
- Provide accurate, detailed answers that help students learn
- Cover different aspects: concepts, implementation, troubleshooting
- Use clear, educational language that explains concepts well
- Make answers practical and actionable for co

## Task 2: Generate Business Database Content {#task-2}

**Objective:** Use Mistral-7B-Instruct to generate 10-15 Q&A pairs for your business context.

**Instructions:**
- Use `mistralai/Mistral-7B-Instruct-v0.3`
- Generate minimum 10-15 question-answer pairs
- Cover different aspects of the business
- **Add clear comments showing your generated Q&A pairs**

---

### 2.1: Load Mistral-7B-Instruct Model

**‚ö†Ô∏è CRITICAL: Before running this cell:**

1. **Run the installation cell (Cell 2) above** to install all packages including `sentencepiece`
2. **RESTART THE KERNEL** (Kernel ‚Üí Restart Kernel)
3. **Then run this cell**

**Why?** The Mistral tokenizer requires `sentencepiece`, and Python needs to reload after installation.

**Note:** Loading the model may take several minutes on first run as it downloads ~14GB of model files.


In [8]:
# Load Mistral-7B-Instruct model for generating Q&A pairs
# Note: This may take a few minutes to download on first run

# IMPORTANT: Check if sentencepiece is installed
# If not installed, we'll try to install it automatically
try:
    import sentencepiece
    print("‚úÖ sentencepiece is installed and ready")
except ImportError:
    print("=" * 70)
    print("‚ö†Ô∏è sentencepiece is NOT installed. Installing now...")
    print("=" * 70)
    
    # Try to install using pip in the notebook
    try:
        # Use get_ipython() to run %pip install (works in Jupyter/Colab)
        try:
            get_ipython().run_line_magic('pip', 'install sentencepiece')
            print("‚úÖ sentencepiece installed!")
        except:
            # Fallback: use subprocess
            import subprocess
            import sys
            subprocess.check_call([sys.executable, "-m", "pip", "install", "sentencepiece"])
            print("‚úÖ sentencepiece installed!")
        
        # Try importing again
        try:
            import sentencepiece
            print("‚úÖ sentencepiece imported successfully!")
        except ImportError:
            print("\n‚ö†Ô∏è WARNING: sentencepiece was installed but cannot be imported yet.")
            print("   This usually means you need to RESTART THE KERNEL.")
            print("\nüìã SOLUTION:")
            print("   1. RESTART THE KERNEL:")
            print("      - Jupyter: Kernel ‚Üí Restart Kernel")
            print("      - VS Code: Click 'Restart' in kernel toolbar")
            print("   2. Run this cell again")
            print("\n" + "=" * 70)
            raise ImportError(
                "sentencepiece installed but requires kernel restart. "
                "Please RESTART THE KERNEL and run this cell again."
            )
    except Exception as e:
        print(f"\n‚ùå Failed to install sentencepiece automatically: {e}")
        print("\nüìã MANUAL INSTALLATION:")
        print("   1. Run Cell 2 (Install Required Libraries) above")
        print("   2. RESTART THE KERNEL")
        print("   3. Run this cell again")
        print("\n" + "=" * 70)
        raise ImportError(
            "Could not install sentencepiece. Please install manually using Cell 2, "
            "then RESTART THE KERNEL."
        )

model_name = "mistralai/Mistral-7B-Instruct-v0.3"

print(f"\nLoading {model_name}...")
print("This may take several minutes on first run (downloading ~14GB)...")

# Load tokenizer - Mistral uses SentencePiece tokenizer
print("\nStep 1: Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)
print("‚úÖ Tokenizer loaded successfully")

# Set padding token if not set (required for batch processing)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    print("‚úÖ Padding token configured")

# Load model
print("\nStep 2: Loading model (this may take a while)...")
try:
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,  # Use float16 for faster inference and less memory
        device_map="auto",  # Automatically use GPU if available
        trust_remote_code=True
    )
    print("‚úÖ Model loaded successfully (float16)")
except Exception as e:
    print(f"‚ö†Ô∏è Error loading with float16: {e}")
    print("Trying with float32 (will use more memory)...")
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float32,
        device_map="auto",
        trust_remote_code=True
    )
    print("‚úÖ Model loaded successfully (float32)")

print(f"\n{'='*60}")
print(f"‚úÖ {model_name} fully loaded and ready to use!")
print(f"{'='*60}")
print(f"Model device: {next(model.parameters()).device}")
print(f"Model dtype: {next(model.parameters()).dtype}")
print(f"Tokenizer vocab size: {len(tokenizer)}")


‚úÖ sentencepiece is installed and ready

Loading mistralai/Mistral-7B-Instruct-v0.3...
This may take several minutes on first run (downloading ~14GB)...

Step 1: Loading tokenizer...


`torch_dtype` is deprecated! Use `dtype` instead!


‚úÖ Tokenizer loaded successfully
‚úÖ Padding token configured

Step 2: Loading model (this may take a while)...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the disk and cpu.


‚úÖ Model loaded successfully (float16)

‚úÖ mistralai/Mistral-7B-Instruct-v0.3 fully loaded and ready to use!
Model device: cpu
Model dtype: torch.float16
Tokenizer vocab size: 32768


### 2.2: Generate Q&A Pairs

Generate 10-15 Q&A pairs using Mistral-7B-Instruct with your system prompt.


In [None]:
# Function to generate Q&A pairs using Mistral-7B-Instruct
def generate_qa_pairs(prompt, num_pairs=15, model=None, tokenizer=None):
    """
    Generate Q&A pairs using Mistral-7B-Instruct
    
    Args:
        prompt: System prompt with business context
        num_pairs: Number of Q&A pairs to generate
        model: The loaded Mistral model (uses global model if None)
        tokenizer: The loaded tokenizer (uses global tokenizer if None)
    
    Returns:
        List of (question, answer) tuples
    """
    import re
    
    # Use global model and tokenizer if not provided
    if model is None:
        model = globals().get('model')
    if tokenizer is None:
        tokenizer = globals().get('tokenizer')
    
    if model is None or tokenizer is None:
        raise ValueError("Model and tokenizer must be loaded first. Run the model loading cell above.")
    
    # Create the generation prompt
    generation_prompt = f"""{prompt}

Please generate exactly {num_pairs} question-answer pairs for this context.

Format each pair as:
Q: [Question]
A: [Answer]

Generate the Q&A pairs now:"""

    # Format the conversation for Mistral using chat template
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": f"Please generate exactly {num_pairs} question-answer pairs. Format each as:\nQ: [Question]\nA: [Answer]"}
    ]
    
    # Apply chat template and convert to tensors
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=False,
        return_tensors="pt"
    ).to(model.device)
    
    # Store input length to extract only new tokens later
    input_length = inputs.shape[1]
    
    print(f"Generating {num_pairs} Q&A pairs...")
    
    # Performance note: Generation speed depends on:
    # - Device (GPU is 10-50x faster than CPU)
    # - Model size (Mistral-7B is large)
    # - Number of tokens to generate
    device_info = "GPU" if next(model.parameters()).is_cuda else "CPU"
    print(f"Running on: {device_info}")
    if device_info == "CPU":
        print("‚ö†Ô∏è Running on CPU - this will be slower. Consider using GPU for faster generation.")
    print("This may take 2-5 minutes on CPU, or 30-60 seconds on GPU...")
    
    # Calculate reasonable max tokens: ~80-100 tokens per Q&A pair
    # This prevents generating too much unnecessary text
    estimated_tokens = num_pairs * 100
    max_tokens = min(estimated_tokens, 1500)  # Cap at 1500 to avoid excessive generation
    
    # Generate text
    with torch.no_grad():  # Disable gradient computation for inference
        outputs = model.generate(
            inputs,
            max_new_tokens=max_tokens,  # Optimized: ~100 tokens per Q&A pair
            temperature=0.7,  # Controls randomness (lower = more focused)
            top_p=0.9,  # Nucleus sampling
            do_sample=True,  # Enable sampling
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    # Extract only the newly generated tokens (skip the input tokens)
    # outputs[0] contains the full sequence (input + generated), we only want the generated part
    generated_tokens = outputs[0][input_length:]
    
    # Decode only the newly generated text
    generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
    
    print("‚úÖ Text generated, parsing Q&A pairs...")
    
    # Parse Q&A pairs from the generated text
    qa_pairs = []
    
    # Pattern to match Q: ... A: ... format
    pattern = r'Q:\s*(.+?)(?=\nA:|\nQ:|$)'
    answer_pattern = r'A:\s*(.+?)(?=\nQ:|$)'
    
    # Split by Q: markers
    qa_blocks = re.split(r'\n\s*Q:\s*', generated_text, flags=re.IGNORECASE)
    
    for block in qa_blocks:
        if not block.strip():
            continue
            
        # Extract question (first line or until A:)
        question_match = re.match(r'^(.+?)(?=\n\s*A:|\nQ:|$)', block, re.DOTALL)
        if question_match:
            question = question_match.group(1).strip()
            
            # Extract answer (after A:)
            answer_match = re.search(r'\n\s*A:\s*(.+?)(?=\n\s*Q:|$)', block, re.DOTALL)
            if answer_match:
                answer = answer_match.group(1).strip()
                
                # Clean up the question and answer
                question = question.strip().strip('Q:').strip()
                answer = answer.strip().strip('A:').strip()
                
                if question and answer and len(question) > 5 and len(answer) > 10:
                    qa_pairs.append((question, answer))
    
    # If regex parsing didn't work well, try simpler approach
    if len(qa_pairs) < num_pairs // 2:
        print("‚ö†Ô∏è Regex parsing found fewer pairs. Trying alternative parsing...")
        # Alternative: split by lines and look for Q: and A: patterns
        lines = generated_text.split('\n')
        current_q = None
        current_a = None
        
        for line in lines:
            line = line.strip()
            if line.startswith('Q:') or line.startswith('q:'):
                if current_q and current_a:
                    qa_pairs.append((current_q, current_a))
                current_q = line.replace('Q:', '').replace('q:', '').strip()
                current_a = None
            elif line.startswith('A:') or line.startswith('a:'):
                current_a = line.replace('A:', '').replace('a:', '').strip()
            elif current_a:
                current_a += ' ' + line
            elif current_q and not current_a:
                current_q += ' ' + line
        
        if current_q and current_a:
            qa_pairs.append((current_q, current_a))
    
    print(f"‚úÖ Parsed {len(qa_pairs)} Q&A pairs from generated text")
    
    # If we still don't have enough, generate more
    if len(qa_pairs) < num_pairs:
        print(f"‚ö†Ô∏è Only found {len(qa_pairs)} pairs, need {num_pairs}. Generating additional pairs...")
        # Could call recursively or generate more, but for now return what we have
        # In practice, you might want to adjust the prompt or generate in batches
    
    return qa_pairs[:num_pairs]  # Return up to num_pairs

# Generate Q&A pairs using the loaded model and tokenizer
# Make sure you've run the model loading cell (Cell 10) first!
print("=" * 70)
print("GENERATING Q&A PAIRS WITH MISTRAL-7B-INSTRUCT")
print("=" * 70)

qa_pairs = generate_qa_pairs(SYSTEM_PROMPT, num_pairs=15)

print(f"\n‚úÖ Successfully generated {len(qa_pairs)} Q&A pairs")
print("\n" + "=" * 70)
print("GENERATED Q&A PAIRS:")
print("=" * 70)

for i, (q, a) in enumerate(qa_pairs, 1):
    print(f"\n{i}. Q: {q}")
    print(f"   A: {a}")


GENERATING Q&A PAIRS WITH MISTRAL-7B-INSTRUCT


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generating 15 Q&A pairs...
Running on: CPU
‚ö†Ô∏è Running on CPU - this will be slower. Consider using GPU for faster generation.
This may take 2-5 minutes on CPU, or 30-60 seconds on GPU...


### 2.3: Display All Generated Q&A Pairs

This section displays all the Q&A pairs that were generated using Mistral-7B-Instruct in the previous step. These pairs will be used to build your knowledge base for the RAG system.


In [None]:
# Display all generated Q&A pairs from the previous cell
# The qa_pairs variable was created in Cell 12 (Task 2.2)

# Check if qa_pairs exists (from previous cell)
if 'qa_pairs' not in globals() or not qa_pairs:
    print("‚ö†Ô∏è WARNING: No Q&A pairs found!")
    print("Please run Cell 12 (Task 2.2) first to generate Q&A pairs.")
    print("Creating empty list for now...")
    faq_data = []
else:
    # Use the generated Q&A pairs
    faq_data = qa_pairs.copy()
    print("=" * 70)
    print(f"GENERATED Q&A DATABASE FOR {BUSINESS_CONTEXT}")
    print("=" * 70)
    print(f"\n‚úÖ Q&A Database created with {len(faq_data)} pairs")
    print(f"Business Role: {BUSINESS_ROLE}")
    print("\n" + "=" * 70)
    print("ALL GENERATED Q&A PAIRS:")
    print("=" * 70)
    
    # Display all Q&A pairs in a clear format
    for i, (q, a) in enumerate(faq_data, 1):
        print(f"\n{i}. Q: {q}")
        print(f"   A: {a}")
        print("-" * 70)
    
    print(f"\n‚úÖ Total: {len(faq_data)} Q&A pairs ready for vector database")
    print("=" * 70)


## Task 3: Implement FAISS Vector Database {#task-3}

**Objective:** Convert Q&A pairs into embeddings and store in FAISS index.

**Instructions:**
- Convert Q&A pairs to embeddings
- Store in FAISS index
- **Use comments to demonstrate the implementation process**

---

### 3.1: Convert Q&A Pairs to LangChain Documents


In [None]:
# Convert Q&A pairs into LangChain Document objects
# Each document contains both question and answer as searchable content

# Combine question and answer for each pair to create comprehensive documents
documents = [Document(page_content=qa[0] + " " + qa[1]) for qa in faq_data]

print(f"‚úÖ Created {len(documents)} LangChain documents")
print(f"\nSample document:")
print(f"Content: {documents[0].page_content[:100]}...")


### 3.2: Create Embeddings Model


In [None]:
# Create embeddings model to convert text into numerical vectors
# We use a pre-trained model that's good at understanding sentence meanings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

print("‚úÖ Embeddings model loaded")
print(f"Model: sentence-transformers/all-MiniLM-L6-v2")
print(f"Vector dimensions: 384")

# Optional: Test embedding generation
sample_text = "What is your return policy?"
sample_embedding = embeddings.embed_query(sample_text)
print(f"\nSample embedding shape: {len(sample_embedding)} dimensions")


### 3.3: Build FAISS Vector Database

**Implementation Process:**
1. Convert all documents to embeddings
2. Create FAISS index for efficient similarity search
3. Store the index for fast retrieval


In [None]:
# Build FAISS vector database from documents
# This creates an optimized index for fast similarity search

# Step 1: Convert documents to embeddings and create FAISS index
db = FAISS.from_documents(documents, embeddings)

print("‚úÖ FAISS vector database created successfully!")
print(f"Number of documents indexed: {len(documents)}")
print(f"Index type: FAISS")

# Test similarity search
test_query = "What is your return policy?"
test_results = db.similarity_search(test_query, k=2)
print(f"\nTest query: '{test_query}'")
print(f"Retrieved {len(test_results)} similar documents")
print(f"\nMost similar document:")
print(f"Content: {test_results[0].page_content[:200]}...")


## Task 4: Create Test Questions {#task-4}

**Objective:** Generate two types of questions using Mistral-7B-Instruct:
- **Answerable questions** (5+): Can be answered from your database
- **Unanswerable questions** (5+): Require information not in your database

---

### 4.1: Generate Answerable Questions

Questions that can be directly answered from your Q&A database.


In [None]:
# TODO: Generate 5+ answerable questions using Mistral-7B-Instruct
# These questions should be answerable from your Q&A database

# Example: Use Mistral to generate questions based on your database topics
answerable_questions = [
    # TODO: Add your generated answerable questions here
    # Example: "What is your return policy?",
    # Example: "Do you ship internationally?",
]

print(f"‚úÖ Generated {len(answerable_questions)} answerable questions")
print("\nAnswerable Questions:")
for i, q in enumerate(answerable_questions, 1):
    print(f"{i}. {q}")


In [None]:
# TODO: Generate 5+ unanswerable questions using Mistral-7B-Instruct
# These questions should NOT be answerable from your Q&A database
# They test whether the system correctly identifies its limitations

unanswerable_questions = [
    # TODO: Add your generated unanswerable questions here
    # Example: "What is your company's stock price?",
    # Example: "Do you offer services in Antarctica?",
]

print(f"‚úÖ Generated {len(unanswerable_questions)} unanswerable questions")
print("\nUnanswerable Questions:")
for i, q in enumerate(unanswerable_questions, 1):
    print(f"{i}. {q}")


## Task 5: Implement and Test Questions {#task-5}

**Objective:** Run both question types through your RAG system and analyze results.

**Instructions:**
- Test answerable questions (should get good answers)
- Test unanswerable questions (should get "I don't know" or low confidence)
- **Use clear comments to differentiate between question types**

---

### 5.1: Load QA Model

Load a question-answering model to test the RAG system.


In [None]:
# Load a pre-trained question-answering model
# We'll start with DistilBERT as a baseline, then test other models in Task 6

qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

print("‚úÖ QA model loaded successfully!")
print("Model: distilbert-base-uncased-distilled-squad")


### 5.2: Implement RAG Pipeline Function

Create a function that implements the complete RAG pipeline:
1. Retrieve relevant context from FAISS
2. Augment query with context
3. Generate answer using QA model
4. Apply confidence threshold


In [None]:
def rag_qa_system(question, db, qa_pipeline, k=2, confidence_threshold=0.2):
    """
    Complete RAG pipeline for question answering
    
    Args:
        question: User's question
        db: FAISS vector database
        qa_pipeline: Question-answering model pipeline
        k: Number of documents to retrieve
        confidence_threshold: Minimum confidence score for accepting answer
    
    Returns:
        dict with 'answer', 'confidence', 'context_retrieved', 'is_answerable'
    """
    # STEP 1: RETRIEVE - Find relevant documents from FAISS
    docs = db.similarity_search(question, k=k)
    
    # STEP 2: AUGMENT - Combine retrieved documents into context
    context = " ".join([d.page_content for d in docs])
    
    # STEP 3: GENERATE - Use QA model to generate answer
    result = qa_pipeline({"question": question, "context": context})
    
    # STEP 4: EVALUATE - Check confidence and apply threshold
    answer = result["answer"] if result.get("score", 0) > confidence_threshold else "I don't know."
    confidence = result.get("score", 0)
    
    return {
        "answer": answer,
        "confidence": confidence,
        "context_retrieved": context[:200] + "..." if len(context) > 200 else context,
        "is_answerable": confidence > confidence_threshold
    }

print("‚úÖ RAG pipeline function created")


### 5.3: Test Answerable Questions

**Expected Result:** System should provide accurate answers with good confidence scores.


In [None]:
# Test answerable questions
# These should retrieve relevant context and provide good answers

print("=" * 70)
print("TESTING ANSWERABLE QUESTIONS")
print("=" * 70)
print("Expected: Good answers with high confidence scores\n")

for i, question in enumerate(answerable_questions, 1):
    print(f"\n{'='*70}")
    print(f"Question {i}: {question}")
    print(f"{'='*70}")
    
    result = rag_qa_system(question, db, qa_pipeline)
    
    print(f"Answer: {result['answer']}")
    print(f"Confidence: {result['confidence']:.3f}")
    print(f"Retrieved Context: {result['context_retrieved']}")
    print(f"Status: {'‚úÖ Answerable' if result['is_answerable'] else '‚ùå Low Confidence'}")


### 5.4: Test Unanswerable Questions

**Expected Result:** System should identify limitations and respond with "I don't know" or low confidence.


In [None]:
# Test unanswerable questions
# These should NOT find relevant context and should respond appropriately

print("=" * 70)
print("TESTING UNANSWERABLE QUESTIONS")
print("=" * 70)
print("Expected: 'I don't know' or low confidence scores\n")

for i, question in enumerate(unanswerable_questions, 1):
    print(f"\n{'='*70}")
    print(f"Question {i}: {question}")
    print(f"{'='*70}")
    
    result = rag_qa_system(question, db, qa_pipeline)
    
    print(f"Answer: {result['answer']}")
    print(f"Confidence: {result['confidence']:.3f}")
    print(f"Retrieved Context: {result['context_retrieved']}")
    print(f"Status: {'‚ö†Ô∏è Attempted Answer (Low Confidence)' if not result['is_answerable'] else '‚ùå Should be unanswerable'}")


## Task 6: Model Experimentation & Ranking {#task-6}

**Objective:** Test 6 different QA models and rank them by performance.

**Required Models:**
1. `consciousAI/question-answering-generative-t5-v1-base-s-q-c`
2. `deepset/roberta-base-squad2`
3. `google-bert/bert-large-cased-whole-word-masking-finetuned-squad`
4. `gasolsun/DynamicRAG-8B`
5. **[Your Choice 1]**
6. **[Your Choice 2]**

**Evaluation Criteria:**
- Accuracy on answerable questions
- Appropriate handling of unanswerable questions
- Response quality
- Speed (latency)
- Robustness

---

### 6.1: Define Models to Test


In [None]:
# Define all models to test
models_to_test = {
    "Model 1 - T5 Generative": "consciousAI/question-answering-generative-t5-v1-base-s-q-c",
    "Model 2 - RoBERTa": "deepset/roberta-base-squad2",
    "Model 3 - BERT Large": "google-bert/bert-large-cased-whole-word-masking-finetuned-squad",
    "Model 4 - DynamicRAG": "gasolsun/DynamicRAG-8B",
    "Model 5 - [Your Choice 1]": "[YOUR_MODEL_1_HERE]",  # TODO: Replace with your choice
    "Model 6 - [Your Choice 2]": "[YOUR_MODEL_2_HERE]",  # TODO: Replace with your choice
}

print("Models to test:")
for name, model in models_to_test.items():
    print(f"  - {name}: {model}")


### 6.2: Test Each Model

Test all models on both answerable and unanswerable questions.


In [None]:
import time
import pandas as pd

# Store results for all models
all_results = []

# Test each model
for model_name, model_path in models_to_test.items():
    print(f"\n{'='*70}")
    print(f"Testing: {model_name}")
    print(f"Model: {model_path}")
    print(f"{'='*70}")
    
    try:
        # Load model
        qa_pipeline = pipeline("question-answering", model=model_path)
        
        # Test on answerable questions
        answerable_results = []
        for q in answerable_questions:
            start_time = time.time()
            result = rag_qa_system(q, db, qa_pipeline)
            elapsed = time.time() - start_time
            
            answerable_results.append({
                "question": q,
                "answer": result["answer"],
                "confidence": result["confidence"],
                "time": elapsed,
                "type": "answerable"
            })
        
        # Test on unanswerable questions
        unanswerable_results = []
        for q in unanswerable_questions:
            start_time = time.time()
            result = rag_qa_system(q, db, qa_pipeline)
            elapsed = time.time() - start_time
            
            unanswerable_results.append({
                "question": q,
                "answer": result["answer"],
                "confidence": result["confidence"],
                "time": elapsed,
                "type": "unanswerable"
            })
        
        # Calculate metrics
        avg_confidence_answerable = sum(r["confidence"] for r in answerable_results) / len(answerable_results)
        avg_confidence_unanswerable = sum(r["confidence"] for r in unanswerable_results) / len(unanswerable_results)
        avg_time = sum(r["time"] for r in answerable_results + unanswerable_results) / (len(answerable_results) + len(unanswerable_results))
        
        all_results.append({
            "model_name": model_name,
            "model_path": model_path,
            "avg_confidence_answerable": avg_confidence_answerable,
            "avg_confidence_unanswerable": avg_confidence_unanswerable,
            "avg_time": avg_time,
            "answerable_results": answerable_results,
            "unanswerable_results": unanswerable_results
        })
        
        print(f"‚úÖ Completed testing {model_name}")
        print(f"   Avg Confidence (Answerable): {avg_confidence_answerable:.3f}")
        print(f"   Avg Confidence (Unanswerable): {avg_confidence_unanswerable:.3f}")
        print(f"   Avg Time: {avg_time:.3f}s")
        
    except Exception as e:
        print(f"‚ùå Error testing {model_name}: {str(e)}")
        continue

print(f"\n‚úÖ Completed testing {len(all_results)} models")


In [None]:
# Create comparison DataFrame
comparison_data = []
for result in all_results:
    comparison_data.append({
        "Model": result["model_name"],
        "Avg Confidence (Answerable)": result["avg_confidence_answerable"],
        "Avg Confidence (Unanswerable)": result["avg_confidence_unanswerable"],
        "Avg Time (seconds)": result["avg_time"],
        "Confidence Gap": result["avg_confidence_answerable"] - result["avg_confidence_unanswerable"]
    })

df_comparison = pd.DataFrame(comparison_data)

# Sort by overall performance (you can adjust sorting criteria)
df_comparison = df_comparison.sort_values("Avg Confidence (Answerable)", ascending=False)

print("=" * 70)
print("MODEL COMPARISON TABLE")
print("=" * 70)
print(df_comparison.to_string(index=False))


### 6.4: Rank Models and Provide Justification

Rank models from best to worst and explain your reasoning.


In [None]:
# TODO: Rank models and provide justification
# Consider: Accuracy, Speed, Confidence Handling, Response Quality, Robustness

print("=" * 70)
print("MODEL RANKING (Best to Worst)")
print("=" * 70)

# Example ranking structure (customize based on your results)
rankings = [
    {
        "Rank": 1,
        "Model": "[Best Model Name]",
        "Justification": "[Explain why this model performed best]"
    },
    # TODO: Add rankings for all 6 models
]

for ranking in rankings:
    print(f"\n{ranking['Rank']}. {ranking['Model']}")
    print(f"   Justification: {ranking['Justification']}")

# Display detailed analysis
print("\n" + "=" * 70)
print("DETAILED ANALYSIS")
print("=" * 70)

# TODO: Add your detailed analysis here
# - Which models provide confidence scores?
# - Which models handle unanswerable questions best?
# - Speed vs. accuracy trade-offs
# - Recommendations for different use cases


## Reflection & Analysis {#reflection}

**Objective:** Reflect on the assignment, analyze strengths/weaknesses, and discuss real-world applications.

---

### Reflection Questions

1. **What worked well?**
2. **What were the main challenges?**
3. **How could the system be improved?**
4. **What are the real-world applications?**
5. **What did you learn?**


### Strengths of the System

**TODO:** Document the strengths of your RAG system implementation.

- 
- 
- 


### Weaknesses and Limitations

**TODO:** Document the weaknesses and limitations you identified.

- 
- 
- 


### Real-World Applications

**TODO:** Discuss how this RAG system could be used in real-world scenarios.

- 
- 
- 


### Key Learnings

**TODO:** Summarize what you learned from this assignment.

- 
- 
- 


---

## Assignment Complete! ‚úÖ

**Submission Checklist:**
- [ ] All 6 tasks completed
- [ ] 10-15 Q&A pairs generated and documented
- [ ] FAISS vector database implemented
- [ ] 5+ answerable and 5+ unanswerable questions created
- [ ] All 6 models tested and compared
- [ ] Models ranked with justifications
- [ ] Reflection completed
- [ ] Code is well-commented
- [ ] Notebook is well-formatted and organized

**Next Steps:**
1. Review your notebook for completeness
2. Ensure all code runs without errors
3. Add any additional analysis or insights
4. Submit the link to your completed notebook

---

**Good luck with your submission!** üöÄ


### 4.2: Generate Unanswerable Questions

Questions that require information NOT present in your database.
