# Mini-RAG model

In this exercise, we will create a knowledge base to set up our own RAG architecture

First we import the required libraries:

*   random: used for selecting random values.
*   TfidfVectorizer: converts text into numerical vectors using TF-IDF (Term Frequency-Inverse Document Frequency), which helps in ranking word importance within documents.
*   cosine_similarity: measures how similar two pieces of text are based on their TF-IDF vectors.
*   pipeline (from Hugging Face Transformers): loads pre-trained models for text-based tasks like summarization, question answering, and more.

We also define a hard-coded knowledge base (real systems would link to a csv table).









In [None]:
import random
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from transformers import pipeline

# Hardcoded knowledge base - I could link to a git with a book chapter exercept as a knowledge base
knowledge_base = {
    "doc1": "AI systems are composed of models, data pipelines, and deployment mechanisms.",
    "doc2": "Retrieval-augmented generation enhances AI outputs by integrating external knowledge.",
    "doc3": "Reinforcement learning is used to train agents to make decisions in dynamic environments.",
    "doc4": "Language models like GPT are used for tasks such as summarization and question answering.",
}

We define how to retrieve the most relevant document from a knowledge base using TF-IDF and cosine similarity.
We load a pre-trained language model (Flan-T5) for generating responses.

In [None]:
# Define a Retrieval Function
def retrieve_document(query, knowledge_base):
    documents = list(knowledge_base.values())
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(documents)
    query_vector = vectorizer.transform([query])
    similarities = cosine_similarity(query_vector, tfidf_matrix)
    most_similar_idx = similarities.argmax()
    return list(knowledge_base.keys())[most_similar_idx], documents[most_similar_idx]

# Load a Pretrained Language Model
qa_model = pipeline("text2text-generation", model="google/flan-t5-small")

We takes in a user query and search the knowledge base for the most relevant document. It calls the retrieve_document() function (which uses TF-IDF + cosine similarity) to find the best-matching document.
Prints the document ID (doc_id) and text (retrieved_doc). It then formats the retrieved document and query into a structured prompt for Flan-T5 and generates the output.

In [None]:
# Integration Function
def answer_query(query, knowledge_base):
    doc_id, retrieved_doc = retrieve_document(query, knowledge_base)
    print(f"Retrieved Document ({doc_id}): {retrieved_doc}\n")
    prompt = f"Context: {retrieved_doc}\n\nQuestion: {query}\nAnswer:"
    response = qa_model(prompt, max_length=50, num_return_sequences=1)
    return response[0]["generated_text"]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Welcome to the AI QA System!
Retrieved Document (doc3): Reinforcement learning is used to train agents to make decisions in dynamic environments.

AI Answer: reinforcement learning
We'll work on improving the system!
Retrieved Document (doc3): Reinforcement learning is used to train agents to make decisions in dynamic environments.

AI Answer: a former president


We then create a function with an interactive loop where a user can input queries, receive AI-generated answers, and provide feedback. It continuously runs until the user types "exit", making it a simple retrieval-augmented AI chatbot.

In [None]:
# Simulate a User Interaction Loop
def user_interaction(knowledge_base):
    print("Welcome to the AI QA System!")
    while True:
        query = input("\nEnter your query (or 'exit' to quit): ")
        if query.lower() == "exit":
            print("Goodbye!")
            break
        answer = answer_query(query, knowledge_base)
        print(f"AI Answer: {answer}")

        # Simulate a feedback loop
        feedback = input("Was this answer helpful? (yes/no): ")
        if feedback.lower() == "yes":
            print("Great! Thank you for your feedback.")
        else:
            print("We'll work on improving the system!")

We then run the demo!

In [None]:
# Run the demo
user_interaction(knowledge_base)