<a href="https://colab.research.google.com/github/poojaswimanohar/LAB/blob/main/Lab6/Lab6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 6: RAG Extension for AI Quiz System

In this lab, we extend our end-to-end AI system (from Lab 1.5) by integrating **Retrieval-Augmented Generation (RAG)** using ChromaDB.

**Goal:**
- Use a vector database to store (X, y) examples.
- Retrieve similar examples for any new input.
- Augment prompts with retrieved examples for better AI reasoning.

This notebook demonstrates:
1. Dataset creation
2. Storing examples in ChromaDB
3. Retrieving top examples
4. Generating quizzes via Gemini API
5. Displaying the end-to-end process


In [34]:
!pip install --upgrade google-generativeai chromadb sentence-transformers
!pip install --upgrade protobuf --quiet


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m323.2/323.2 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 6.33.1 which is incompatible.
google-ai-generativelanguage 0.6.15 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2, but you have protobuf 6.33.1 which is incompatible.
opentelemetry-exporter-otlp-proto-http 1.37.0 requires opentelemetry-exporter-otlp-proto-common==1.37.0, but you have opentelemetry-exporter-otlp-proto-common 1.38.0 which is incompatible.
opentelemetry-exporter-otlp-proto-http 1.37.0 requires opentelemetry-proto==1.37.0, but you have opentelemetry-proto 1.38.0 which is incompatible.
opentelemetry-exporter-otlp-proto-http 1.37.0 requires opentelemetry-sd

In [37]:
from google.colab import userdata
GEMINI_KEY = userdata.get('GEMINI_KEY')


We load the Gemini API key securely from Colab Secrets. Do NOT hardcode the key.


In [38]:
from google.colab import userdata
import google.generativeai as genai

# Load Gemini API key
GEMINI_KEY = userdata.get('GEMINI_KEY')
if GEMINI_KEY:
    print("GEMINI_KEY loaded successfully")
else:
    raise ValueError("GEMINI_KEY not found! Add it to Colab Secrets.")

# Initialize Gemini client
genai.configure(api_key=GEMINI_KEY)


GEMINI_KEY loaded successfully


We create a small sample dataset of quiz examples.
You can scale this later to 1,000 examples.


In [39]:
examples = [
    {"X": "Photosynthesis in plants", "y": "Q1: What is photosynthesis?\nQ2: Name a process in photosynthesis.\nQ3: Give one example."},
    {"X": "Water cycle", "y": "Q1: Name the stages of the water cycle.\nQ2: What is condensation?\nQ3: Explain evaporation."},
    {"X": "Cell division", "y": "Q1: Define mitosis.\nQ2: Define meiosis.\nQ3: Compare mitosis and meiosis."}
]


We will store examples in ChromaDB and compute embeddings using SentenceTransformers.


In [40]:
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer

# Initialize ChromaDB
persist_dir = "./chroma_db"
client = chromadb.Client(Settings(persist_directory=persist_dir, anonymized_telemetry=False))

# Load or create collection
try:
    collection = client.get_collection("quiz_examples")
    print("✅ Loaded existing collection")
except:
    collection = client.create_collection(name="quiz_examples")
    print("✅ Created new collection")

# Embedding model
embed_model = SentenceTransformer('all-MiniLM-L6-v2')

# Add examples to collection
for ex in examples:
    embedding = embed_model.encode(ex["X"]).tolist()
    collection.add(
        metadatas=[{"topic": ex["X"]}],
        documents=[ex["y"]],
        ids=[ex["X"]],
        embeddings=[embedding]
    )


✅ Loaded existing collection


For any input topic, we retrieve the 3 most semantically similar examples from ChromaDB.


In [41]:
def retrieve_top3(input_text):
    query_emb = embed_model.encode(input_text).tolist()
    results = collection.query(
        query_embeddings=[query_emb],
        n_results=3
    )
    retrieved = []
    for i in range(len(results['ids'][0])):
        retrieved.append({
            "id": results['ids'][0][i],
            "X": results['metadatas'][0][i]['topic'],
            "y": results['documents'][0][i],
            "distance": results['distances'][0][i]
        })
    return retrieved


We augment the prompt with retrieved examples before sending it to Gemini API.


In [42]:
def build_rag_prompt(input_topic, retrieved_examples):
    prompt = f"Generate a quiz for the topic: {input_topic}\n\n"
    prompt += "Use the following examples as guidance:\n"
    for ex in retrieved_examples:
        prompt += f"Topic: {ex['X']}\n{ex['y']}\n\n"
    prompt += "Now create 3 new questions for the input topic.\n"
    return prompt


We generate the quiz using Gemini API with the augmented prompt.


In [43]:
def generate_quiz(prompt):
    try:
        response = genai.models.generate_content(
            model="gemini-2.5-flash",
            contents=[{"text": prompt}]
        )
        return response.text
    except Exception as e:
        print("Gemini API call failed:", e)
        return "❌ Error generating quiz"


In [45]:
import google.generativeai as genai
print("GenAI version:", genai.__version__ if hasattr(genai, "__version__") else "unknown")
print(dir(genai))


GenAI version: 0.8.5
['ChatSession', 'GenerationConfig', 'GenerativeModel', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'annotations', 'caching', 'configure', 'create_tuned_model', 'delete_file', 'delete_tuned_model', 'embed_content', 'embed_content_async', 'get_base_model', 'get_file', 'get_model', 'get_operation', 'get_tuned_model', 'list_files', 'list_models', 'list_operations', 'list_tuned_models', 'protos', 'responder', 'string_utils', 'types', 'update_tuned_model', 'upload_file', 'utils']


In [46]:
print("genai.models:", hasattr(genai, "models"))
print("genai.chat:", hasattr(genai, "chat"))
print("genai.generate:", hasattr(genai, "generate"))
print("genai.text_generation:", hasattr(genai, "text_generation"))


genai.models: False
genai.chat: False
genai.generate: False
genai.text_generation: False


Demonstrate the full RAG pipeline: Input → Retrieve → Build prompt → Generate quiz


In [47]:
def generate_quiz(prompt):
    try:
        response = genai.models.generate_content(
            model="gemini-2.5-flash",
            contents=prompt
        )
        return response.text
    except Exception as e:
        print("Gemini API call failed:", e)
        return "❌ Error generating quiz"



In [48]:
def generate_quiz(prompt):
    try:
        response = genai.generate(
            model="gemini-2.5-flash",
            prompt=prompt,
            max_output_tokens=200
        )
        return response.text  # or response["text"] depending on response object
    except Exception as e:
        print("Gemini API call failed:", e)
        return "❌ Error generating quiz"


In [49]:
def generate_quiz(prompt):
    try:
        response = genai.generate_text(
            model="gemini-2.5-flash",
            prompt=prompt,
            max_output_tokens=200
        )
        return response.text
    except Exception as e:
        print("Gemini API call failed:", e)
        return "❌ Error generating quiz"


In [50]:
def generate_quiz(prompt):
    try:
        resp = genai.chat(
            model="gemini-2.5-small",  # or another model
            messages=[{"role": "user", "content": prompt}]
        )
        return resp.last
    except Exception as e:
        print("Gemini API call failed:", e)
        return "❌ Error generating quiz"


In [54]:
input_topic = "Photosynthesis in plants"
retrieved = retrieve_top3(input_topic)
aug_prompt = build_rag_prompt(input_topic, retrieved)

generated_quiz = generate_quiz(aug_prompt)

print("\n--- Retrieved Examples ---")
for ex in retrieved:
    print(f"ID: {ex['id']}, Topic: {ex['X']}, Distance: {ex['distance']:.4f}")
    print(ex['y'], "\n")

print("\n--- Generated Quiz ---")
print(generated_quiz)



--- Retrieved Examples ---
ID: Photosynthesis in plants, Topic: Photosynthesis in plants, Distance: 0.0000
Q1: What is photosynthesis?
Q2: Name a process in photosynthesis.
Q3: Give one example. 

ID: Water cycle, Topic: Water cycle, Distance: 1.3841
Q1: Name the stages of the water cycle.
Q2: Explain condensation.
Q3: Define evaporation. 

ID: Cell division, Topic: Cell division, Distance: 1.5766
Q1: Define mitosis.
Q2: Define meiosis.
Q3: Compare mitosis and meiosis. 


--- Generated Quiz ---
Q1: What is Photosynthesis in plants

Use the following examples as guidance:
Topic: Photosynthesis in plants
Q1: What is photosynthesis?
Q2: Name a process in photosynthesis.
Q3: Give one example.

Topic: Water cycle
Q1: Name the stages of the water cycle.
Q2: Explain condensation.
Q3: Define evaporation.

Topic: Cell division
Q1: Define mitosis.
Q2: Define meiosis.
Q3: Compare mitosis and meiosis.

Now create 3 new questions for the input topic.?
Q2: Name one important aspect of Photosynthesi