# Comprehensive Homework: Build and Test a Mini RAG System from Scratch 🧠

> **🎯 Today's Goal**: Combine the knowledge from the first three lessons (Embeddings, Retrieval, Generation) to build a functional Retrieval-Augmented Generation (RAG) system from scratch. Then, test it with a self-assessment!

In [None]:
!pip install sentence-transformers transformers torch



## ⚙️ Part 1: The Retriever - Finding the Right Knowledge

First, we'll set up our Retriever. Its job is to take a question and find the most relevant piece of text from our knowledge base.

1.  **Load the Embedding Model** (`all-MiniLM-L6-v2`)
2.  **Create our Knowledge Base**
3.  **Encode Everything into Embeddings**
4.  **Calculate Similarity** to find the best match

In [None]:
import torch
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline

print("✅ Libraries imported successfully!")

# 1. Load our embedding model
retriever_model = SentenceTransformer('all-MiniLM-L6-v2')

# 2. Create a simple knowledge base
knowledge_base = [
    "The capital of France is Paris, a city famous for the Eiffel Tower and the Louvre museum.",
    "The Amazon rainforest is the world's largest tropical rainforest, known for its incredible biodiversity.",
    "Mount Everest is the highest mountain on Earth, located in the Himalayas.",
    "The Great Wall of China is a series of fortifications stretching over 13,000 miles.",
    "Photosynthesis is the process used by plants to convert light energy into chemical energy."
]

# 3. Encode our knowledge base into embeddings
knowledge_embeddings = retriever_model.encode(knowledge_base, convert_to_tensor=True)

print(f"✅ Retriever model loaded and knowledge base encoded with {len(knowledge_base)} documents.")

✅ Libraries imported successfully!


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Retriever model loaded and knowledge base encoded with 5 documents.


## ✍️ Part 2: The Generator - Extracting the Answer

Now we set up our Generator. This model will take the question and the context found by the retriever and extract the exact answer from it.

In [None]:
# Load our question-answering (generator) model
generator = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

print("✅ Generator (QA) model loaded.")

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


✅ Generator (QA) model loaded.


## 🚀 Part 3: Testing our RAG System

---



---



Time to put it all together! The function below will simulate a full RAG pipeline and grade itself against a predefined set of questions and answers.

It will test two key things:
1.  **Retrieval Accuracy**: Did we find the right document?
2.  **Generation Accuracy**: Did we extract the correct answer from that document?

In [None]:
def run_rag_assessment():
    """Runs a self-assessment of the RAG pipeline with multiple questions."""

    # Define our questions, expected context keywords, and expected answers
    test_questions = [
        {
            "question": "What is the highest mountain?",
            "expected_keyword": "Everest",
            "expected_answer": "Mount Everest"
        },
        {
            "question": "Which city is home to the Louvre museum?",
            "expected_keyword": "France",
            "expected_answer": "Paris"
        },
        {
            "question": "What process do plants use for energy?",
            "expected_keyword": "Photosynthesis",
            "expected_answer": "Photosynthesis"
        }
    ]

    score = 0
    total = len(test_questions) * 2 # 2 points per question (1 for retrieval, 1 for generation)

    print("--- 🚀 Starting RAG System Assessment ---\n")

    for i, test in enumerate(test_questions):
        question = test["question"]
        print(f"\n--- Question {i+1}: '{question}' ---")

        # --- 1. Retrieval Step ---
        question_embedding = retriever_model.encode(question, convert_to_tensor=True)
        cos_scores = util.pytorch_cos_sim(question_embedding, knowledge_embeddings)[0]
        top_result_index = torch.argmax(cos_scores)
        retrieved_context = knowledge_base[top_result_index]

        print(f"🔎  Retrieved Context: '{retrieved_context}'")

        # Check if the retrieval was correct
        if test["expected_keyword"] in retrieved_context:
            print("✅  Retrieval Correct!")
            score += 1
        else:
            print(f"❌  Retrieval Failed. Expected context with keyword: '{test['expected_keyword']}'")

        # --- 2. Generation Step ---
        qa_result = generator(question=question, context=retrieved_context)
        generated_answer = qa_result['answer']

        print(f"✍️  Generated Answer: '{generated_answer}'")

        # Check if the generation was correct
        if test["expected_answer"].lower() in generated_answer.lower():
            print("✅  Generation Correct!")
            score += 1
        else:
            print(f"❌  Generation Failed. Expected answer: '{test['expected_answer']}'")

    # --- Final Score ---
    print(f"\n--- 🏁 Assessment Complete ---")
    print(f"🎯 Final Score: {score} / {total}")
    if score == total:
        print("🎉🎉🎉 Perfect! Your RAG system is working as expected!")
    elif score >= total / 2:
        print("👍 Good job! The system is mostly correct.")
    else:
        print("🔧 The system ran into some issues. Review the steps and check the logic.")

# Run the assessment!
run_rag_assessment()

--- 🚀 Starting RAG System Assessment ---


--- Question 1: 'What is the highest mountain?' ---
🔎  Retrieved Context: 'Mount Everest is the highest mountain on Earth, located in the Himalayas.'
✅  Retrieval Correct!
✍️  Generated Answer: 'Mount Everest'
✅  Generation Correct!

--- Question 2: 'Which city is home to the Louvre museum?' ---
🔎  Retrieved Context: 'The capital of France is Paris, a city famous for the Eiffel Tower and the Louvre museum.'
✅  Retrieval Correct!
✍️  Generated Answer: 'Paris'
✅  Generation Correct!

--- Question 3: 'What process do plants use for energy?' ---
🔎  Retrieved Context: 'Photosynthesis is the process used by plants to convert light energy into chemical energy.'
✅  Retrieval Correct!
✍️  Generated Answer: 'Photosynthesis'
✅  Generation Correct!

--- 🏁 Assessment Complete ---
🎯 Final Score: 6 / 6
🎉🎉🎉 Perfect! Your RAG system is working as expected!


#  STUDENT TASKS 🧑‍💻



Now it's your turn to be the AI engineer. Your tasks are to run, analyze, and extend the RAG system you've just built.

### Task 1: Execute and Understand

Your first task is to simply run all the cells above and carefully read the output of the final self-assessment.

* **Observe the Score:** Did the system get a perfect score (6/6)?
* **Analyze Each Step:** For each question, look at the "Retrieved Context" and the "Generated Answer."
    * Did the retriever find the correct piece of knowledge?
    * Did the generator extract the right answer from that context?

### Task 2 (Challenge): Add a New Question

Your second task is to test the system with a new question about the **existing knowledge**.

**Instructions:**
1.  Copy the code from the cell below. It's the same assessment function as before, but with a new test question added.
2.  Run the cell and see if the system can answer correctly. The score should now be out of 8.

In [4]:
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import torch
from sklearn.metrics.pairwise import cosine_similarity

retriever_model = SentenceTransformer('all-MiniLM-L6-v2')

qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

knowledge_base = [
    "Jerusalem is the capital of Palestine.",
    "The Dome of the Rock is located in the Old City of Jerusalem.",
    "The Palestinian flag has three horizontal stripes: black, white, and green, with a red triangle.",
    "The State of Palestine was declared on November 15, 1988.",
    "Mount Everest is the highest mountain on Earth, located in the Himalayas.",
    "The Eiffel Tower is located in Paris, France, and is one of the most iconic landmarks in the world.",
    "Photosynthesis is the process used by plants to convert light energy into chemical energy.",
    "The Great Wall of China is the longest wall in the world, stretching over 13,000 miles."
]

knowledge_embeddings = retriever_model.encode(knowledge_base, convert_to_tensor=True)

test_questions = [
    {
        "question": "What is the highest mountain?",
        "expected_keyword": "Everest",
        "expected_answer": "Mount Everest"
    },
    {
        "question": "Which city is home to the Louvre museum?",
        "expected_keyword": "France",
        "expected_answer": "Paris"
    },
    {
        "question": "What process do plants use for energy?",
        "expected_keyword": "Photosynthesis",
        "expected_answer": "Photosynthesis"
    },
    {
        "question": "Which wall stretches over 13,000 miles?",
        "expected_keyword": "Wall of China",
        "expected_answer": "Great Wall of China"
    }
]

def run_rag_assessment_task_2():
    score = 0
    total = len(test_questions) * 2

    print("--- 🚀 Starting RAG System Assessment (Task 2) ---\n")

    for i, test in enumerate(test_questions):
        question = test["question"]
        print(f"\n--- Question {i+1}: '{question}' ---")

        # ترميز السؤال
        question_embedding = retriever_model.encode(question, convert_to_tensor=True)

        # حساب درجات التشابه بين السؤال وقاعدة المعرفة
        cos_scores = torch.nn.functional.cosine_similarity(question_embedding, knowledge_embeddings)
        print(f"Similarity Scores: {cos_scores}")  # طباعة درجات التشابه

        # العثور على أفضل تطابق بناءً على أعلى درجة تشابه
        top_result_index = torch.argmax(cos_scores)
        retrieved_context = knowledge_base[top_result_index]
        print(f"🔎  Retrieved Context: '{retrieved_context}'")

        # التحقق من دقة الاسترجاع
        if test["expected_keyword"].lower() in retrieved_context.lower():
            print("✅  Retrieval Correct!")
            score += 1
        else:
            print(f"❌  Retrieval Failed. Expected context with keyword: '{test['expected_keyword']}'")

        qa_result = qa_pipeline(question=question, context=retrieved_context)
        generated_answer = qa_result['answer']
        print(f"✍️  Generated Answer: '{generated_answer}'")

        if test["expected_answer"].lower() in generated_answer.lower():
            print("✅  Generation Correct!")
            score += 1
        else:
            print(f"❌  Generation Failed. Expected answer: '{test['expected_answer']}'")

    print(f"\n--- 🏁 Assessment Complete ---")
    print(f"🎯 Final Score: {score} / {total}")
    if score == total:
        print("🎉🎉🎉 Perfect! Your RAG system handled the new question!")

# Run the updated assessment
run_rag_assessment_task_2()


Device set to use cpu


--- 🚀 Starting RAG System Assessment (Task 2) ---


--- Question 1: 'What is the highest mountain?' ---
Similarity Scores: tensor([0.2568, 0.1951, 0.1615, 0.1433, 0.6936, 0.1979, 0.0061, 0.2055])
🔎  Retrieved Context: 'Mount Everest is the highest mountain on Earth, located in the Himalayas.'
✅  Retrieval Correct!
✍️  Generated Answer: 'Mount Everest'
✅  Generation Correct!

--- Question 2: 'Which city is home to the Louvre museum?' ---
Similarity Scores: tensor([0.1749, 0.2484, 0.0556, 0.0393, 0.0520, 0.4469, 0.0476, 0.1554])
🔎  Retrieved Context: 'The Eiffel Tower is located in Paris, France, and is one of the most iconic landmarks in the world.'
✅  Retrieval Correct!
✍️  Generated Answer: 'Paris'
✅  Generation Correct!

--- Question 3: 'What process do plants use for energy?' ---
Similarity Scores: tensor([ 1.8602e-02, -1.1077e-04, -4.4219e-02, -3.9428e-02, -4.9936e-02,
        -2.3701e-03,  7.4201e-01, -4.3736e-02])
🔎  Retrieved Context: 'Photosynthesis is the process used by plant

### Task 3 (Advanced Challenge): Add New Knowledge & Test It

Your final and most important task is to **expand the RAG system's knowledge base** and then test it.

**Instructions:**
1.  **Add a new fact** to the `knowledge_base` in the code cell below.
2.  **You must re-run this cell** to update the `knowledge_embeddings`! The system won't know about the new fact until you do.
3.  Finally, run the last code cell, which has a new test question about the knowledge you just added.

In [10]:
knowledge_base_task_3 = [
    "Jerusalem is the capital of Palestine.",
    "The Dome of the Rock is located in the Old City of Jerusalem.",
    "The Palestinian flag has three horizontal stripes: black, white, and green, with a red triangle.",
    "The State of Palestine was declared on November 15, 1988.",
    "The Great Wall of China is the longest wall in the world, stretching over 13,000 miles."
]

knowledge_embeddings_task_3 = retriever_model.encode(knowledge_base_task_3, convert_to_tensor=True)

print(f"✅ Knowledge base updated and re-encoded with {len(knowledge_base_task_3)} documents.")


✅ Knowledge base updated and re-encoded with 5 documents.


In [12]:
!pip install faiss-cpu
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import torch
import faiss
import numpy as np

retriever_model = SentenceTransformer('all-MiniLM-L6-v2')


qa_pipeline = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

knowledge_base = [
    "Jerusalem is the capital of Palestine.",
    "The Dome of the Rock is located in the Old City of Jerusalem.",
    "The Palestinian flag has three horizontal stripes: black, white, and green, with a red triangle.",
    "The State of Palestine was declared on November 15, 1988.",
    "The Palestinian flag consists of three horizontal stripes: black on top, white in the middle, and green at the bottom, with a red triangle on the left side.",
    "The Great Wall of China is the longest wall in the world, stretching over 13,000 miles.",
    "Tokyo is the capital of Japan.",
    "Mount Everest is the highest mountain on Earth, located in the Himalayas.",
    "The Eiffel Tower is located in Paris, France, and is one of the most iconic landmarks in the world.",
    "Photosynthesis is the process used by plants to convert light energy into chemical energy.",
    "The Nile River is the longest river in the world, running through northeastern Africa.",
    "Mars is known as the Red Planet due to its reddish appearance.",
    "The Pacific Ocean is the largest ocean on Earth.",
    "Alexander Graham Bell invented the telephone in 1876.",
    "Burj Khalifa in Dubai is the tallest building in the world, standing at 828 meters.",
    # إضافة المزيد من الأسئلة الجديدة
    "The Amazon River is the second longest river in the world.",
    "The moon is Earth's only natural satellite.",
    "The Sahara Desert is the largest hot desert in the world.",
    "The Nile flows through ten countries.",
    "The Pacific Ocean covers more than one-third of the Earth's surface.",
    "Albert Einstein developed the theory of relativity.",
    "The Eiffel Tower was constructed in 1889.",
    "The United States declared independence in 1776.",
    "The capital of Canada is Ottawa.",
    "Venus is the second planet from the Sun.",
    "The Great Barrier Reef is located in Australia.",
    "The human body contains 206 bones.",
]

knowledge_embeddings = retriever_model.encode(knowledge_base, convert_to_tensor=True)

dimension = knowledge_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(knowledge_embeddings))

test_questions = [
    {"question": "What is the capital of Palestine?", "expected_keyword": "Jerusalem", "expected_answer": "Jerusalem"},
    {"question": "Where is the Dome of the Rock located?", "expected_keyword": "Jerusalem", "expected_answer": "Old City of Jerusalem"},
    {"question": "What colors are in the Palestinian flag?", "expected_keyword": "black", "expected_answer": "Black, white, green, and a red triangle"},
    {"question": "When was the State of Palestine declared?", "expected_keyword": "November 15, 1988", "expected_answer": "November 15, 1988"},
    {"question": "How long is the Great Wall of China?", "expected_keyword": "13,000 miles", "expected_answer": "13,000 miles"},
    {"question": "What is the capital of Japan?", "expected_keyword": "Tokyo", "expected_answer": "Tokyo"},
    {"question": "What is the longest river in the world?", "expected_keyword": "Nile", "expected_answer": "Nile River"},
    {"question": "Which planet is known as the Red Planet?", "expected_keyword": "Mars", "expected_answer": "Mars"},
    {"question": "What is the largest ocean on Earth?", "expected_keyword": "Pacific", "expected_answer": "Pacific Ocean"},
    {"question": "Who invented the telephone?", "expected_keyword": "Bell", "expected_answer": "Alexander Graham Bell"},
    {"question": "What is the tallest building in the world?", "expected_keyword": "Burj Khalifa", "expected_answer": "Burj Khalifa"}
]


def calculate_accuracy(total_questions, correct_answers):
    return (correct_answers / (total_questions * 2)) * 100


def run_rag_assessment_task_2():
    score = 0
    total = len(test_questions) * 2

    print("--- Starting RAG System Assessment (Task 2) ---\n")

    for i, test in enumerate(test_questions):
        question = test["question"]
        print(f"\n--- Question {i+1}: '{question}' ---")


        question_embedding = retriever_model.encode(question, convert_to_tensor=True)


        # FAISS expects numpy arrays for search
        D, I = index.search(np.array([question_embedding.cpu().numpy()]), 1)
        top_result_index = I[0][0]
        retrieved_context = knowledge_base[top_result_index]
        print(f"Retrieved Context: '{retrieved_context}'")


        if test["expected_keyword"].lower() in retrieved_context.lower():
            print("Retrieval Correct!")
            score += 1
        else:
            print(f"Retrieval Failed. Expected context with keyword: '{test['expected_keyword']}'")


        qa_result = qa_pipeline(question=question, context=retrieved_context)
        generated_answer = qa_result['answer']
        print(f"Generated Answer: '{generated_answer}'")


        if test["expected_answer"].lower() in generated_answer.lower():
            print("Generation Correct!")
            score += 1
        else:
            print(f"Generation Failed. Expected answer: '{test['expected_answer']}'")

    accuracy = calculate_accuracy(len(test_questions), score)

    print(f"\n--- Assessment Complete ---")
    print(f"Final Score: {score} / {total}")
    print(f"Accuracy: {accuracy}%")
    if score == total:
        print("Success! Your RAG system handled the new questions!")

# Run the final assessment
run_rag_assessment_task_2()



config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


--- Starting RAG System Assessment (Task 2) ---


--- Question 1: 'What is the capital of Palestine?' ---
Retrieved Context: 'Jerusalem is the capital of Palestine.'
Retrieval Correct!
Generated Answer: 'Jerusalem'
Generation Correct!

--- Question 2: 'Where is the Dome of the Rock located?' ---
Retrieved Context: 'The Dome of the Rock is located in the Old City of Jerusalem.'
Retrieval Correct!
Generated Answer: 'Old City of Jerusalem'
Generation Correct!

--- Question 3: 'What colors are in the Palestinian flag?' ---
Retrieved Context: 'The Palestinian flag consists of three horizontal stripes: black on top, white in the middle, and green at the bottom, with a red triangle on the left side.'
Retrieval Correct!
Generated Answer: 'black on top, white in the middle, and green'
Generation Failed. Expected answer: 'Black, white, green, and a red triangle'

--- Question 4: 'When was the State of Palestine declared?' ---
Retrieved Context: 'The State of Palestine was declared on November 15