# Prompt Engineering Activity

---

## üìã Prerequisites

Before starting, make sure you have:
- **HuggingFace Token**: Get from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
- **Google Colab Account** (Free tier works!) - *Optional, can run locally too*

---

## üîó Quick Links

**üîó Open in Colab**: [Click here](https://colab.research.google.com/github/oviya-raja/ist-402-assignments/blob/main/IST402/assignments/W3/reference/W3__Prompt_Engineering_Basics.ipynb)

**üìÇ View on GitHub**: [Click here](https://github.com/oviya-raja/ist-402-assignments/blob/main/IST402/assignments/W3/reference/W3__Prompt_Engineering_Basics.ipynb)

> **‚ö†Ô∏è Note**: Colab link requires the repository to be public on GitHub. If you get a 404 error, see troubleshooting below.

---

## üöÄ Setup Instructions

### Option 1: Google Colab (Recommended for GPU)

#### Step 1: Open Notebook
- **Method A**: Click the "Open in Colab" link above
- **Method B**:
  1. Go to [Google Colab](https://colab.research.google.com/)
  2. Click **File** ‚Üí **Open notebook** ‚Üí **GitHub** tab
  3. Enter: `oviya-raja/ist-402-assignments`
  4. Navigate to: `IST402/assignments/W3/reference/W3__Prompt_Engineering_Basics.ipynb`

> **‚ÑπÔ∏è Security Warning**: When opening from GitHub, Colab will show a warning dialog: *"This notebook was not authored by Google"*. This is **normal and safe** - it's Colab's security feature for external notebooks. Simply click **"Run anyway"** to proceed. The notebook is safe to use.

#### Step 2: Enable GPU (Recommended)
1. Go to **Runtime** ‚Üí **Change runtime type**
2. Select **GPU** ‚Üí **Save**
3. **Runtime** ‚Üí **Restart runtime**

#### Step 3: Set Up Token (See Token Setup section below)

---

### Option 2: Local Environment

1. **Install dependencies**: Run Cell 2 (Install packages)
2. **Set up token**: See Token Setup section below
3. **Run cells in order**: Start from Cell 1

---

## üîê Token Setup

### For Google Colab Users

**Recommended Method: Using .env file**
1. Run **Cell 4** ‚Üí It will automatically create a `.env` file
2. Click the **folder icon (üìÅ)** in the left sidebar
3. Find and click `.env` file
4. Replace `your_token_here` with your actual token
5. Save (Ctrl+S or Cmd+S)
6. Re-run **Cell 4** ‚Üí Token loaded! ‚úÖ

**Quick Method: Direct environment variable**
Run this in a new cell before Cell 4:
```python
import os
os.environ["HUGGINGFACE_HUB_TOKEN"] = "your_actual_token_here"
```

### For Local Users

**Create `.env` file manually:**
1. Create a file named `.env` in the same directory as this notebook
2. Add this line: `HUGGINGFACE_HUB_TOKEN=your_actual_token_here`
3. No spaces around the `=` sign!
4. Run **Cell 4** ‚Üí Token loaded! ‚úÖ

**Get your token**: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

> **üìñ Learn more**: See `ENV_IN_COLAB.md` for detailed explanation of how `.env` files work in Colab

---

## üõ†Ô∏è Troubleshooting

### 404 Error When Opening from GitHub

**Possible causes:**
- Repository doesn't exist yet ‚Üí Use **Option 2** (Upload to Colab)
- Repository is private ‚Üí Make it public or use **Option 2**
- Wrong branch ‚Üí Try changing `main` to `master` in the link

**Solution**: Upload the notebook directly to Colab:
1. Download this notebook
2. Go to [Google Colab](https://colab.research.google.com/)
3. Click **File** ‚Üí **Upload notebook**
4. Select the downloaded file

### GPU Not Detected in Colab

1. Go to **Runtime** ‚Üí **Change runtime type**
2. Select **GPU** ‚Üí **Save**
3. **Runtime** ‚Üí **Restart runtime**
4. Re-run Cell 1 to verify

### Token Not Loading

- Check that `.env` file exists and has correct format: `HUGGINGFACE_HUB_TOKEN=token` (no spaces)
- Make sure you re-ran Cell 4 after creating/editing `.env`
- Verify token is valid at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

---

## ‚ñ∂Ô∏è Getting Started

1. **Run Cell 1**: Verify environment setup
2. **Run Cell 2**: Install required packages
3. **Run Cell 3**: Set up FAISS (CPU/GPU)
4. **Run Cell 4**: Set up Hugging Face token
5. **Continue**: Run remaining cells in order

**Happy coding! üéâ**

---

## üìñ About This Notebook

### What This Notebook Does

This notebook provides **fundamental prompt engineering exercises** using Mistral-7B-Instruct. It covers:

- **Basic Prompt Engineering**: Learn how to create effective system prompts and user messages
- **Model Interaction**: Understand how to use Hugging Face Transformers with Mistral-7B
- **Pipeline vs Direct Model Loading**: Compare different approaches to using language models
- **Device Optimization**: Automatic CPU/GPU detection and configuration
- **Class Exercises**: Step-by-step exercises for creating business-specific AI assistants

### Related Notebooks

- **üìò [W3__Prompt_Engineering w_QA Applications-2.ipynb](https://github.com/oviya-raja/ist-402-assignments/blob/main/IST402/assignments/W3/reference/W3__Prompt_Engineering%20w_QA%20Applications-2.ipynb)**: Advanced RAG (Retrieval-Augmented Generation) system with FAISS vector database and QA model comparison
  - [Open in Colab](https://colab.research.google.com/github/oviya-raja/ist-402-assignments/blob/main/IST402/assignments/W3/reference/W3__Prompt_Engineering%20w_QA%20Applications-2.ipynb)

### Learning Path

1. **Start here** ‚Üí Learn prompt engineering basics with this notebook
2. **Then** ‚Üí Move to the QA Applications notebook for RAG system implementation
3. **Finally** ‚Üí Complete the RAG Assignment for full system design

---

**üìö This notebook is part of the IST402 - AI Agents & RAG Systems course materials.**


In [1]:
# Google Colab Setup Verification
# Run this cell FIRST to check if everything is set up correctly

import sys
print("üîç Checking Google Colab environment...")
print(f"   Python version: {sys.version.split()[0]}")

# Check if running in Colab
try:
    import google.colab
    IN_COLAB = True
    print("   ‚úÖ Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("   ‚ö†Ô∏è  Not running in Google Colab (local environment)")

# Check GPU availability
try:
    import torch
    if torch.cuda.is_available():
        print(f"   ‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
        print(f"   ‚úÖ CUDA Version: {torch.version.cuda}")
    else:
        print("   ‚ö†Ô∏è  GPU NOT detected")
        if IN_COLAB:
            print("   üí° TIP: Go to Runtime ‚Üí Change runtime type ‚Üí Select GPU ‚Üí Save")
            print("   üí° Then: Runtime ‚Üí Restart runtime")
except ImportError:
    print("   ‚ö†Ô∏è  PyTorch not installed yet (will be installed in next cell)")

print("\nüìã Next Steps:")
print("   1. If GPU not detected in Colab: Enable GPU runtime and restart")
print("   2. Run Cell 2: Install packages")
print("   3. Run Cell 3: Set up Hugging Face token")
print("   4. Continue with remaining cells")


üîç Checking Google Colab environment...
   Python version: 3.12.12
   ‚úÖ Running in Google Colab
   ‚úÖ GPU Available: NVIDIA A100-SXM4-80GB
   ‚úÖ CUDA Version: 12.6

üìã Next Steps:
   1. If GPU not detected in Colab: Enable GPU runtime and restart
   2. Run Cell 2: Install packages
   3. Run Cell 3: Set up Hugging Face token
   4. Continue with remaining cells


In [2]:
# Install required packages - run this cell first
# Note: FAISS package will be installed conditionally based on GPU availability in Cell 3

# Core packages (always needed)
%pip install transformers torch sentence-transformers datasets python-dotenv

# FAISS will be installed conditionally in Cell 3 based on device (CPU/GPU)



In [3]:
# This cell automatically handles both Colab and local environments

from google.colab import userdata
hf_token = userdata.get('HUGGINGFACE_HUB_TOKEN')

print("‚úÖ Hugging Face token loaded successfully!")
print(f"   Token preview: {hf_token[:10]}...{hf_token[-4:] if len(hf_token) > 14 else '****'}")


üîç Detected: Google Colab environment

üìù .env file not found in Google Colab
   Creating .env file template...
‚úÖ .env file created!
‚ö†Ô∏è  IMPORTANT: Edit the .env file in the left sidebar
   1. Click on .env file in the file browser
   2. Replace 'your_token_here' with your actual token
   3. Get token from: https://huggingface.co/settings/tokens
   4. Re-run this cell after editing

   OR use Option B (direct method) below:
   # Quick setup - run this in a new cell:
   import os
   os.environ['HUGGINGFACE_HUB_TOKEN'] = 'your_actual_token_here'
   # Then re-run this cell

‚ö†Ô∏è  HUGGINGFACE_HUB_TOKEN not found or not set!

üìã Setup Instructions:

   For Local Environment:
   1. Create a .env file in the same directory as this notebook
   2. Add this line: HUGGINGFACE_HUB_TOKEN=your_actual_token_here
   3. No spaces around the = sign!
   4. Re-run this cell

   For Google Colab:
   1. Get token from: https://huggingface.co/settings/tokens
   2. Run this in a new cell:
      

ValueError: HUGGINGFACE_HUB_TOKEN not configured. Please follow instructions above.

In [None]:
# Import libraries we need
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import SentenceTransformer
import torch
import json
import numpy as np
import faiss
import time

print("All libraries imported successfully!")

All libraries imported successfully!


In [None]:
# Automatically detect and configure device (CPU or GPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cpu


In [None]:
# Specify which Mistral model to use from Hugging Face
model_id = "mistralai/Mistral-7B-Instruct-v0.3"

# ‚ö†Ô∏è PERFORMANCE INFO:
# Mistral-7B is a LARGE model (7 billion parameters, ~14GB)
# Settings are automatically optimized based on device (CPU/GPU) detected above
# The code automatically switches between CPU and GPU optimizations

print(f"\n‚è≥ Loading Mistral-7B model...")
print(f"   Device: {device} ({device_info})")
if device == "cpu":
    print(f"   ‚è±Ô∏è  Expected load time: 5-15 minutes")
    print(f"   ‚è±Ô∏è  Expected generation: 30-60 seconds per response")
else:
    print(f"   ‚è±Ô∏è  Expected load time: 1-2 minutes")
    print(f"   ‚è±Ô∏è  Expected generation: 2-5 seconds per response")
print(f"   üì¶ Model size: ~14GB (will download on first run)")

# Create a conversation with system prompt and user message
# System prompt defines the AI's role/personality
# User message is what the person is asking
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

# Set up the text generation pipeline with device-optimized parameters
# Settings automatically adapt based on device (CPU/GPU) detected in Cell 4
chatbot = pipeline(
    "text-generation",                              # Task type: generating text
    model="mistralai/Mistral-7B-Instruct-v0.3",   # Which model to use
    token=hf_token,                                 # Authentication token for Hugging Face
    dtype=torch_dtype,                              # Automatically set: bfloat16 (GPU) or float32 (CPU)
    device_map="auto",                              # Automatically use GPU if available
    max_new_tokens=max_new_tokens,                  # Automatically set: 512 (GPU) or 256 (CPU)
    do_sample=True,                                 # Use random sampling for more creative responses
    top_k=10,                                       # Consider top 10 most likely next words
    num_return_sequences=1,                         # Generate only 1 response
    eos_token_id=2,                                 # Token ID that signals end of response
)

print("\n‚úÖ Model loaded! Generating response...")
if device == "cpu":
    print("   ‚è±Ô∏è  This may take 30-60 seconds on CPU...")
else:
    print("   ‚è±Ô∏è  This should take 2-5 seconds on GPU...")

# Generate response using the pipeline and print the result
import time
start_time = time.time()
result = chatbot(messages)
generation_time = time.time() - start_time

print(f"\n‚úÖ Response generated in {generation_time:.2f} seconds")
print("\n" + "="*60)
print(result)
print("="*60)


‚è≥ Loading Mistral-7B model...


NameError: name 'device_info' is not defined

In [None]:
# Generate the response and store the full result
result = chatbot(messages)

# Extract just the assistant's response from the complex output structure
# result[0] gets the first (and only) generated sequence
# ["generated_text"] gets the conversation history with the new response
# [-1] gets the last message in the conversation (the assistant's reply)
# ["content"] gets just the text content without the role information
assistant_reply = result[0]["generated_text"][-1]["content"]

# Print only the clean assistant response (without all the extra structure)
print(assistant_reply)

In [None]:
# Your Hugging Face authentication token (replace with your actual token)
hf_token = "YOUR_HUGGINGFACE_TOKEN_HERE"

# Specify the Mistral model we want to use
model_id = "mistralai/Mistral-7B-Instruct-v0.3"

# Load the tokenizer (converts text to numbers that the model understands)
tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)

# Load the actual model with device-optimized settings
# torch_dtype is automatically set in Cell 4: bfloat16 (GPU) or float32 (CPU)
model = AutoModelForCausalLM.from_pretrained(
    model_id,                    # Which model to load
    token=hf_token,             # Authentication token
    dtype=torch.bfloat16,       # Use 16-bit precision for faster processing
    device_map="auto"           # Automatically use GPU if available
)

# Create a simple conversation (just user input, no system prompt this time)
conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]

# Convert the conversation into the format the model expects
# This applies the model's chat template and converts to tensors
inputs = tokenizer.apply_chat_template(
    conversation,                # The conversation to format
    add_generation_prompt=True,  # Add prompt to signal the model should respond
    return_dict=True,           # Return as dictionary
    return_tensors="pt",        # Return as PyTorch tensors
).to(model.device)             # Move to same device as model (GPU/CPU)

# Generate the response using the model directly
outputs = model.generate(
    **inputs,                           # Pass all the formatted inputs
    max_new_tokens=1000,               # Maximum length of response
    pad_token_id=tokenizer.eos_token_id # Token to use for padding
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the disk and cpu.


In [None]:
# Print the raw model output tensor (this shows token IDs/numbers, not readable text yet)
print(outputs)

In [None]:
# Convert the token IDs back to readable text and print the result
# outputs[0] gets the first generated sequence, skip_special_tokens removes formatting tokens
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Class Exercise

## Step 1: Create an Agentic/Assistant System Prompt

Choose a specific business context and create a system prompt that gives Mistral a professional role. This system prompt will define how the AI behaves and what expertise it has.

**Instructions:**
- Pick a realistic business or organization
- Choose a specific role/expertise for the AI (marketing expert, technical consultant, etc.)
- Create a system prompt that defines the AI's personality and knowledge area
- This will be used throughout the assignment for generating content


In [None]:
# TODO: Choose your business and role
# Examples:
# - "TechStart Solutions - AI Consulting Firm" with role "AI Solutions Consultant"
# - "Green Energy Corp - Solar Installation Company" with role "Solar Energy Expert"
# - "HealthTech Plus - Medical Software Company" with role "Healthcare IT Specialist"


# Begin writing Python codes here

## Step 2: Generate Business Database Content


Use Mistral to create a comprehensive Q&A database for your chosen business. You'll prompt Mistral to generate realistic question-answer pairs that customers might ask about your services, pricing, processes, and expertise.

**Instructions:**
- Use your system prompt from Step 1 to give Mistral the business context
- Create a prompt asking Mistral to generate 10-15 Q&A pairs for your business
- Ask for questions covering different topics: services, pricing, processes, technical details, contact info
- Format should be clear (Q: question, A: answer)
- Parse the generated text into a usable list of dictionaries

In [None]:
# TODO: Generate Q&A database using Mistral
# You need to:
# 1. Set up the Mistral model (use the pipeline approach from the original notebook)
# 2. Create a function to get clean responses from Mistral
# 3. Write a prompt asking Mistral to generate business Q&A pairs
# 4. Parse the generated text into a list of dictionaries with 'question' and 'answer' keys
# 5. Display your generated Q&A pairs clearly


# Begin writing Python codes here

## Step 3: Implement FAISS Vector Database

Convert your Q&A database into embeddings (numerical vectors) and store them in a FAISS index for fast similarity search. This allows users to ask questions and quickly find the most relevant information from your knowledge base.

**Instructions:**
- Install and import sentence-transformers for creating embeddings
- Convert all your questions into numerical vectors using an embedding model
- Create a FAISS index to store these vectors for fast similarity search
- Implement a search function that can find similar questions based on user input
- Test your search functionality with a sample query



In [None]:

# TODO: Implement FAISS Vector Database
# You need to:
# 1. Install sentence-transformers: !pip install sentence-transformers faiss-cpu
# 2. Import SentenceTransformer and faiss
# 3. Load an embedding model (e.g., 'distilbert-base-uncased-distilled-squad')
# 4. Extract questions and answers from your Q&A database
# 5. Convert questions to embeddings using the model
# 6. Create a FAISS index and add the embeddings
# 7. Create a search function that takes a user question and returns similar Q&A pairs
# 8. Test the search function with a sample query

# Begin writing Python codes here

## Step 4: Create Test Questions

Generate two types of questions to test your RAG system: questions that CAN be answered from your database (answerable) and questions that CANNOT be answered (unanswerable). This tests how well your system knows its limitations.

**Instructions:**
- Use Mistral to generate 5 questions that your business CAN answer (about your services, pricing, processes, etc.)
- Use Mistral to generate 5 questions that your business CANNOT answer (competitor info, unrelated topics, personal details, etc.)
- Extract the questions from the generated text into clean lists
- These will test whether your RAG system correctly identifies when it can and cannot provide good answers

In [None]:
# TODO: Create Test Questions
# You need to:
# 1. Generate ANSWERABLE questions using Mistral (questions your business can answer)
# 2. Generate UNANSWERABLE questions using Mistral (questions outside your expertise)
# 3. Parse both sets of questions into clean lists
# 4. Display both types of questions clearly
# 5. Make sure you have at least 5 questions of each type

# Begin writing Python codes here

## Step 5: Implement and Test Questions


Run both types of questions through your RAG system and analyze how well it distinguishes between questions it can answer well versus questions it cannot answer reliably.

**Instructions:**
- Test your answerable questions - they should get high similarity scores with your database
- Test your unanswerable questions - they should get low similarity scores
- Set a similarity threshold to determine "can answer" vs "cannot answer"
- Analyze the performance: did answerable questions score high? Did unanswerable questions score low?
- Calculate accuracy rates for both question types

In [None]:
# TODO: Test Your RAG System
# You need to:
# 1. Create a testing function that searches your database for each question
# 2. Set a similarity threshold (e.g., 0.7) to determine good vs poor matches
# 3. Test all answerable questions and count how many are correctly identified as answerable
# 4. Test all unanswerable questions and count how many are correctly identified as unanswerable
# 5. Calculate and display performance statistics
# 6. Show examples of good and poor matches

## Step 6: Model Experimentation and Ranking

Test multiple Q&A models from Hugging Face and rank them based on performance, speed, and confidence scores.

**Instructions:**
- Test the 4 required models plus 2 additional models of your choice
- Evaluate each model on speed, confidence scores, and answer quality
- Rank models from best to worst with clear explanations
- Identify which models provide good confidence scores while maintaining reasonable output
- Compare performance across different question types

In [None]:
# TODO: Test and Rank QA Models
# Required models to test:
# - "consciousAI/question-answering-generative-t5-v1-base-s-q-c"
# - "deepset/roberta-base-squad2"
# - "google-bert/bert-large-cased-whole-word-masking-finetuned-squad"
# - "gasolsun/DynamicRAG-8B"
# Plus 2 additional QA models of your choice
#
# You need to:
# 1. Set up QA pipelines for each model
# 2. Test them with your questions and retrieved contexts
# 3. Measure response time and confidence scores
# 4. Rank models based on composite performance
# 5. Identify models with good confidence handling
# 6. Explain why each model ranked where it did





"""
Write your explanation here:


"""