<a href="https://colab.research.google.com/github/oviya-raja/ist-402-assignments/blob/main/assignments/W3/exercises/W3__Prompt_Engineering_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prompt Engineering Basics

**IST402 - AI Agents & RAG Systems**

---

## üìã What You Need

- **HuggingFace Token**: Get from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
- **Google Colab** (recommended) or local Python environment

---

## üöÄ Quick Start

### Step 1: Open in Colab
[Click here to open in Colab](https://colab.research.google.com/github/oviya-raja/ist-402-assignments/blob/main/assignments/W3/exercises/W3__Prompt_Engineering_Basics.ipynb)

**Or manually:**
1. Go to [Google Colab](https://colab.research.google.com/)
2. **File** ‚Üí **Open notebook** ‚Üí **GitHub** tab
3. Enter: `oviya-raja/ist-402-assignments`
4. Navigate to: `assignments/W3/exercises/W3__Prompt_Engineering_Basics.ipynb`

### Step 2: Enable GPU (Recommended)
1. **Runtime** ‚Üí **Change runtime type** ‚Üí Select **GPU** ‚Üí **Save**
2. **Runtime** ‚Üí **Restart runtime**

### Step 3: Set Up Token
**In Colab:**
1. Run **Cell 3** (token setup cell)
2. Use Colab's `userdata.get('HUGGINGFACE_HUB_TOKEN')` or set environment variable

**Locally:**
1. Create `.env` file: `HUGGINGFACE_HUB_TOKEN=your_token_here`
2. Run the token setup cell

---

## ‚ñ∂Ô∏è Getting Started

1. Run **Cell 1**: Check environment
2. Run **Cell 2**: Install packages
3. Run **Cell 3**: Set up token
4. Continue with remaining cells in order

---

## üìñ What You'll Learn

- **Prompt Engineering**: Creating effective system prompts and user messages
- **Pipeline vs Direct Model**: Two ways to interact with AI models
- **Device Optimization**: Automatic CPU/GPU configuration
- **Class Exercises**: Build business-specific AI assistants

---

**Ready? Start with Cell 1! üéâ**


In [1]:
# Google Colab Setup Verification
# Run this cell FIRST to check if everything is set up correctly

import sys
print("üîç Checking Google Colab environment...")
print(f"   Python version: {sys.version.split()[0]}")

# Check if running in Colab
try:
    import google.colab
    IN_COLAB = True
    print("   ‚úÖ Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("   ‚ö†Ô∏è  Not running in Google Colab (local environment)")

# Check GPU availability
try:
    import torch
    if torch.cuda.is_available():
        print(f"   ‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
        print(f"   ‚úÖ CUDA Version: {torch.version.cuda}")
    else:
        print("   ‚ö†Ô∏è  GPU NOT detected")
        if IN_COLAB:
            print("   üí° TIP: Go to Runtime ‚Üí Change runtime type ‚Üí Select GPU ‚Üí Save")
            print("   üí° Then: Runtime ‚Üí Restart runtime")
except ImportError:
    print("   ‚ö†Ô∏è  PyTorch not installed yet (will be installed in next cell)")

print("\nüìã Next Steps:")
print("   1. If GPU not detected in Colab: Enable GPU runtime and restart")
print("   2. Run Cell 2: Install packages")
print("   3. Run Cell 3: Set up Hugging Face token")
print("   4. Continue with remaining cells")


üîç Checking Google Colab environment...
   Python version: 3.12.12
   ‚úÖ Running in Google Colab
   ‚úÖ GPU Available: NVIDIA A100-SXM4-80GB
   ‚úÖ CUDA Version: 12.6

üìã Next Steps:
   1. If GPU not detected in Colab: Enable GPU runtime and restart
   2. Run Cell 2: Install packages
   3. Run Cell 3: Set up Hugging Face token
   4. Continue with remaining cells


In [2]:
# Install required packages - run this cell first
# Note: FAISS package will be installed conditionally based on GPU availability in Cell 3

# Core packages (always needed)
%pip install transformers torch sentence-transformers datasets python-dotenv

# FAISS will be installed conditionally in Cell 3 based on device (CPU/GPU)
%pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.6 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m23.6/23.6 MB[0m [31m122.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.0


In [3]:
# This cell automatically handles both Colab and local environments

from google.colab import userdata
hf_token = userdata.get('HUGGINGFACE_HUB_TOKEN')

print("‚úÖ Hugging Face token loaded successfully!")
print(f"   Token preview: {hf_token[:10]}...{hf_token[-4:] if len(hf_token) > 14 else '****'}")


‚úÖ Hugging Face token loaded successfully!
   Token preview: hf_ThdSIol...ustv


## üìñ Learning: Two Ways to Use AI Models

As a student, it's important to understand both approaches:

---

### üì¶ METHOD 1: Pipeline Approach (The Easy Way)

**Think of it like: Using a vending machine**
- You put in your request (message)
- The machine does everything automatically
- You get your result (response)

**‚úÖ Pros:** Simple, fast to code, less error-prone  
**‚ùå Cons:** Less control, can't customize much  
**üéØ Best for:** Learning, quick tests, simple projects

---

### üîß METHOD 2: Direct Model Approach (The Detailed Way)

**Think of it like: Cooking from scratch**
- You prepare ingredients (tokenize text)
- You cook step by step (run model)
- You plate the food (decode response)

**‚úÖ Pros:** Full control, can customize everything  
**‚ùå Cons:** More code, more things that can go wrong  
**üéØ Best for:** Advanced projects, research, custom needs

---

### üí° Key Takeaway:

**Pipeline** = Easy but less control (like using a library function)  
**Direct** = More work but full control (like writing your own function)

---

**In the cells below, we'll demonstrate both approaches so you can see the difference!**


In [4]:
# Import libraries we need
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import SentenceTransformer
import torch
import json
import numpy as np
import faiss
import time

print("All libraries imported successfully!")

All libraries imported successfully!


---

## EXAMPLE 1: Pipeline Approach (The Easy Way)

In this example, we'll use the **Pipeline** method - the simplest way to interact with AI models.

**What you'll see:**
- We set up prompts (system role + user question)
- The pipeline handles everything automatically
- We get a clean response

**üìù Prompts we're using:**
- **System Prompt:** "You are Tom and I am Jerry" (sets the AI's role)
- **User Prompt:** "Who are you?" (the question we're asking)

**üí° Remember:** Pipeline does all the work automatically - tokenization, model inference, and decoding!


## üìñ Learning: Two Ways to Use AI Models

As a student, it's important to understand both approaches:

---

### üì¶ METHOD 1: Pipeline Approach (The Easy Way)

**Think of it like: Using a vending machine**
- You put in your request (message)
- The machine does everything automatically
- You get your result (response)

**‚úÖ Pros:** Simple, fast to code, less error-prone  
**‚ùå Cons:** Less control, can't customize much  
**üéØ Best for:** Learning, quick tests, simple projects

---

### üîß METHOD 2: Direct Model Approach (The Detailed Way)

**Think of it like: Cooking from scratch**
- You prepare ingredients (tokenize text)
- You cook step by step (run model)
- You plate the food (decode response)

**‚úÖ Pros:** Full control, can customize everything  
**‚ùå Cons:** More code, more things that can go wrong  
**üéØ Best for:** Advanced projects, research, custom needs

---

### üí° Key Takeaway:

**Pipeline** = Easy but less control (like using a library function)  
**Direct** = More work but full control (like writing your own function)

---

**In the cells below, we'll demonstrate both approaches so you can see the difference!**


---

## EXAMPLE 2: Direct Model Approach (The Detailed Way)

In this example, we'll use the **Direct Model** method - where we control each step manually.

**What you'll see:**
- Step 1: Convert text to tokens (numbers)
- Step 2: Run the model to generate tokens
- Step 3: Convert tokens back to readable text

**üìù Prompt we're using:**
- **User Prompt:** "What's the weather like in Paris?" (no system prompt in this example)

**üí° Remember:** We're doing each step manually for full control!


In [5]:
# Automatically detect and configure device (CPU or GPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cuda


---

## ‚úÖ Summary: What You Learned

Congratulations! You've now seen both approaches to using AI models:

### üì¶ Pipeline Approach (Example 1)
- **What it does:** Everything automatically in one function call
- **When to use:** Quick prototyping, learning, simple projects
- **Key takeaway:** Simple but less control

### üîß Direct Model Approach (Example 2)  
- **What it does:** Manual control over each step (tokenize ‚Üí generate ‚Üí decode)
- **When to use:** Advanced projects, custom needs, research
- **Key takeaway:** More work but full control

### üí° Remember:
- Both approaches use the **same model** - just different ways to interact with it
- Pipeline = Easy but less control (like using a library function)
- Direct = More work but full control (like writing your own function)

**Now you're ready to try the class exercises below!** üéâ


In [6]:
# Specify which Mistral model to use from Hugging Face
model_id = "mistralai/Mistral-7B-Instruct-v0.3"

# ‚ö†Ô∏è PERFORMANCE INFO:
# Mistral-7B is a LARGE model (7 billion parameters, ~14GB)
# Settings are automatically optimized based on device (CPU/GPU) detected above
# The code automatically switches between CPU and GPU optimizations

print(f"\n‚è≥ Loading Mistral-7B model...")
if device == "cpu":
    print(f"   ‚è±Ô∏è  Expected load time: 5-15 minutes")
    print(f"   ‚è±Ô∏è  Expected generation: 30-60 seconds per response")

    device_info = "Intel/AMD CPU"
    torch_dtype = torch.float32     # safest for CPUs

    max_new_tokens = 256            # reduce memory usage on CPU

else:  # GPU
    print(f"   ‚è±Ô∏è  Expected load time: 1-2 minutes")
    print(f"   ‚è±Ô∏è  Expected generation: 2-5 seconds per response")

    device_info = torch.cuda.get_device_name(0)
    torch_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

    max_new_tokens = 512

print(f"   Device: {device} ({device_info})")
print(f"   Torch: {device} ({torch_dtype})")
print(f"   üì¶ Model size: ~14GB (will download on first run)")

# Create a conversation with system prompt and user message
# System prompt defines the AI's role/personality
# User message is what the person is asking
messages = [
    {"role": "system", "content": "You are Tom and I am Jerry"},
    {"role": "user", "content": "Who are you?"},
]


# Set up the text generation pipeline with device-optimized parameters
# Settings automatically adapt based on device (CPU/GPU) detected in Cell 4
chatbot = pipeline(
    "text-generation",                              # Task type: generating text
    model=model_id,                                 # Which model to use
    token=hf_token,                                 # Authentication token for Hugging Face
    dtype=torch_dtype,                              # Automatically set: bfloat16 (GPU) or float32 (CPU)
    device_map="auto",                              # Automatically use GPU if available
    max_new_tokens=max_new_tokens,                  # Automatically set: 512 (GPU) or 256 (CPU)
    do_sample=True,                                 # Use random sampling for more creative responses
    top_k=10,                                       # Consider top 10 most likely next words
    num_return_sequences=1,                         # Generate only 1 response
    eos_token_id=2,                                 # Token ID that signals end of response
)


print("\n‚úÖ Model loaded! Generating response...")
if device == "cpu":
    print("   ‚è±Ô∏è  This may take 30-60 seconds on CPU...")
else:
    print("   ‚è±Ô∏è  This should take 2-5 seconds on GPU...")

# Generate response using the pipeline and print the result
import time
start_time = time.time()
result = chatbot(messages)
generation_time = time.time() - start_time

print(f"\n‚úÖ Response generated in {generation_time:.2f} seconds")
print("\n" + "="*60)
print(result)
print("="*60)


‚è≥ Loading Mistral-7B model...
   ‚è±Ô∏è  Expected load time: 1-2 minutes
   ‚è±Ô∏è  Expected generation: 2-5 seconds per response
   Device: cuda (NVIDIA A100-SXM4-80GB)
   Torch: cuda (torch.bfloat16)
   üì¶ Model size: ~14GB (will download on first run)


config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



‚úÖ Model loaded! Generating response...
   ‚è±Ô∏è  This should take 2-5 seconds on GPU...

‚úÖ Response generated in 3.45 seconds

[{'generated_text': [{'role': 'system', 'content': 'You are Tom and I am Jerry'}, {'role': 'user', 'content': 'Who are you?'}, {'role': 'assistant', 'content': " In this scenario, I am the assistant, and you are the user. I don't have a personality or identity like characters from cartoons. I'm here to help answer questions, provide information, and engage in conversation with you."}]}]


In [7]:
# Generate the response and store the full result
result = chatbot(messages)

# Extract just the assistant's response from the complex output structure
# result[0] gets the first (and only) generated sequence
# ["generated_text"] gets the conversation history with the new response
# [-1] gets the last message in the conversation (the assistant's reply)
# ["content"] gets just the text content without the role information
assistant_reply = result[0]["generated_text"][-1]["content"]

# Print only the clean assistant response (without all the extra structure)
print(assistant_reply)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 In this context, I am a helpful and friendly AI assistant, not Tom or Jerry. However, I can certainly play along with our cartoon characters for a bit of fun! So, let's say I am your friendly, intelligent, and somewhat mischievous assistant, Tom. How may I assist you today, Jerry? üòâ


In [8]:
# Load the tokenizer (converts text to numbers that the model understands)
tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)

# Load the actual model with device-optimized settings
# torch_dtype is automatically set in Cell 4: bfloat16 (GPU) or float32 (CPU)
model = AutoModelForCausalLM.from_pretrained(
    model_id,                    # Which model to load
    token=hf_token,             # Authentication token
    dtype=torch.bfloat16,       # Use 16-bit precision for faster processing
    device_map="auto"           # Automatically use GPU if available
)

# Create a simple conversation (just user input, no system prompt this time)
conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]

# Convert the conversation into the format the model expects
# This applies the model's chat template and converts to tensors
inputs = tokenizer.apply_chat_template(
    conversation,                # The conversation to format
    add_generation_prompt=True,  # Add prompt to signal the model should respond
    return_dict=True,           # Return as dictionary
    return_tensors="pt",        # Return as PyTorch tensors
).to(model.device)             # Move to same device as model (GPU/CPU)

# Generate the response using the model directly
outputs = model.generate(
    **inputs,                           # Pass all the formatted inputs
    max_new_tokens=1000,               # Maximum length of response
    pad_token_id=tokenizer.eos_token_id # Token to use for padding
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
# Print the raw model output tensor (this shows token IDs/numbers, not readable text yet)
print(outputs)

tensor([[    1,     3,  2592, 29510, 29481,  1040,  8854,  1505,  1065,  6233,
         29572,     4,  1083,  1717, 29510, 29475,  1274,  2121, 29501,  2304,
         17353, 29493,  1347,  1083,  1309, 29510, 29475,  3852,  1040,  2636,
          8854,  1065,  6233, 29491,  3761, 29493,  1083,  1309,  2680,  1136,
          1137,  6233, 29493,  1505,  1956,  1070, 13495,  5611, 29493,  1427,
          1032,  5794,  1148, 14761,  1062, 12027, 29491,  1183,  8854,  1117,
         17351,  1163,  5160, 28408,  5942,  6241,  1040,  1647, 29491,  1183,
          6868,  1142,  4138,  1228,  4980, 29493,  5166, 29493,  1072,  4396,
         29493,  1163, 18759, 14131,  4822,  2169, 29473, 29518, 29502, 29501,
         29518, 29550, 29670, 29511,  1093, 29552, 29551, 29501, 29555, 29555,
         29670, 29533,  1377,  1183,  6024,  1142,  4138,  1228,  5693, 29493,
          5392, 29493,  1072,  6121, 29493,  1163, 18759, 14131,  4822,  2169,
         29473, 29538, 29501, 29551, 29670, 29511,  

In [None]:
# Convert the token IDs back to readable text and print the result
# outputs[0] gets the first generated sequence, skip_special_tokens removes formatting tokens
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

What's the weather like in Paris? I don't have real-time capabilities, so I can't provide the current weather in Paris. However, I can tell you that Paris, like much of northern France, has a temperate oceanic climate. The weather is mild with regular precipitation throughout the year. The warmest months are June, July, and August, with temperatures averaging around 20-25¬∞C (68-77¬∞F). The coldest months are December, January, and February, with temperatures averaging around 3-8¬∞C (37-46¬∞F). It's always a good idea to check a reliable weather forecast before planning a trip.


Class Exercise

## Step 1: Create an Agentic/Assistant System Prompt

Choose a specific business context and create a system prompt that gives Mistral a professional role. This system prompt will define how the AI behaves and what expertise it has.

**Instructions:**
- Pick a realistic business or organization
- Choose a specific role/expertise for the AI (marketing expert, technical consultant, etc.)
- Create a system prompt that defines the AI's personality and knowledge area
- This will be used throughout the assignment for generating content


In [None]:
# TODO: Choose your business and role
# Examples:
# - "TechStart Solutions - AI Consulting Firm" with role "AI Solutions Consultant"
# - "Green Energy Corp - Solar Installation Company" with role "Solar Energy Expert"
# - "HealthTech Plus - Medical Software Company" with role "Healthcare IT Specialist"


# Begin writing Python codes here

## Step 2: Generate Business Database Content


Use Mistral to create a comprehensive Q&A database for your chosen business. You'll prompt Mistral to generate realistic question-answer pairs that customers might ask about your services, pricing, processes, and expertise.

**Instructions:**
- Use your system prompt from Step 1 to give Mistral the business context
- Create a prompt asking Mistral to generate 10-15 Q&A pairs for your business
- Ask for questions covering different topics: services, pricing, processes, technical details, contact info
- Format should be clear (Q: question, A: answer)
- Parse the generated text into a usable list of dictionaries

In [None]:
# TODO: Generate Q&A database using Mistral
# You need to:
# 1. Set up the Mistral model (use the pipeline approach from the original notebook)
# 2. Create a function to get clean responses from Mistral
# 3. Write a prompt asking Mistral to generate business Q&A pairs
# 4. Parse the generated text into a list of dictionaries with 'question' and 'answer' keys
# 5. Display your generated Q&A pairs clearly


# Begin writing Python codes here

## Step 3: Implement FAISS Vector Database

Convert your Q&A database into embeddings (numerical vectors) and store them in a FAISS index for fast similarity search. This allows users to ask questions and quickly find the most relevant information from your knowledge base.

**Instructions:**
- Install and import sentence-transformers for creating embeddings
- Convert all your questions into numerical vectors using an embedding model
- Create a FAISS index to store these vectors for fast similarity search
- Implement a search function that can find similar questions based on user input
- Test your search functionality with a sample query



In [None]:

# TODO: Implement FAISS Vector Database
# You need to:
# 1. Install sentence-transformers: !pip install sentence-transformers faiss-cpu
# 2. Import SentenceTransformer and faiss
# 3. Load an embedding model (e.g., 'distilbert-base-uncased-distilled-squad')
# 4. Extract questions and answers from your Q&A database
# 5. Convert questions to embeddings using the model
# 6. Create a FAISS index and add the embeddings
# 7. Create a search function that takes a user question and returns similar Q&A pairs
# 8. Test the search function with a sample query

# Begin writing Python codes here

## Step 4: Create Test Questions

Generate two types of questions to test your RAG system: questions that CAN be answered from your database (answerable) and questions that CANNOT be answered (unanswerable). This tests how well your system knows its limitations.

**Instructions:**
- Use Mistral to generate 5 questions that your business CAN answer (about your services, pricing, processes, etc.)
- Use Mistral to generate 5 questions that your business CANNOT answer (competitor info, unrelated topics, personal details, etc.)
- Extract the questions from the generated text into clean lists
- These will test whether your RAG system correctly identifies when it can and cannot provide good answers

In [None]:
# TODO: Create Test Questions
# You need to:
# 1. Generate ANSWERABLE questions using Mistral (questions your business can answer)
# 2. Generate UNANSWERABLE questions using Mistral (questions outside your expertise)
# 3. Parse both sets of questions into clean lists
# 4. Display both types of questions clearly
# 5. Make sure you have at least 5 questions of each type

# Begin writing Python codes here

## Step 5: Implement and Test Questions


Run both types of questions through your RAG system and analyze how well it distinguishes between questions it can answer well versus questions it cannot answer reliably.

**Instructions:**
- Test your answerable questions - they should get high similarity scores with your database
- Test your unanswerable questions - they should get low similarity scores
- Set a similarity threshold to determine "can answer" vs "cannot answer"
- Analyze the performance: did answerable questions score high? Did unanswerable questions score low?
- Calculate accuracy rates for both question types

In [None]:
# TODO: Test Your RAG System
# You need to:
# 1. Create a testing function that searches your database for each question
# 2. Set a similarity threshold (e.g., 0.7) to determine good vs poor matches
# 3. Test all answerable questions and count how many are correctly identified as answerable
# 4. Test all unanswerable questions and count how many are correctly identified as unanswerable
# 5. Calculate and display performance statistics
# 6. Show examples of good and poor matches

## Step 6: Model Experimentation and Ranking

Test multiple Q&A models from Hugging Face and rank them based on performance, speed, and confidence scores.

**Instructions:**
- Test the 4 required models plus 2 additional models of your choice
- Evaluate each model on speed, confidence scores, and answer quality
- Rank models from best to worst with clear explanations
- Identify which models provide good confidence scores while maintaining reasonable output
- Compare performance across different question types

In [None]:
# TODO: Test and Rank QA Models
# Required models to test:
# - "consciousAI/question-answering-generative-t5-v1-base-s-q-c"
# - "deepset/roberta-base-squad2"
# - "google-bert/bert-large-cased-whole-word-masking-finetuned-squad"
# - "gasolsun/DynamicRAG-8B"
# Plus 2 additional QA models of your choice
#
# You need to:
# 1. Set up QA pipelines for each model
# 2. Test them with your questions and retrieved contexts
# 3. Measure response time and confidence scores
# 4. Rank models based on composite performance
# 5. Identify models with good confidence handling
# 6. Explain why each model ranked where it did





"""
Write your explanation here:


"""