# Retrieval-Augmented Generation (RAG) Chatbot over Course Materials

## Project Overview
This project implements a **Retrieval-Augmented Generation (RAG)** application that allows users to ask questions about university course materials (.pdf)

The system retrieves relevant document chunks using semantic search and generates answers using an open-source Hugging Face language model.
If the answer is not present in the documents, the system responds **"I don't know"** to prevent hallucinations.


##  System Architecture

The RAG system follows a standard pipeline:

1. PDF and TXT Documents loading
2. Text chunking
3. Embedding with Sentence-Transformers
4. Vector storage using Chroma
5. Retrieval of relevant chunks
6. Answer generation using a Hugging Face LLM

This architecture separates **retrieval** from **generation**, improving factual accuracy.


In [1]:
!pip install -q \
  langchain==0.1.16 \
  langchain-community==0.0.36 \
  langchain-core==0.1.48 \
  langchain-text-splitters==0.0.1 \
  chromadb sentence-transformers transformers pypdf accelerate gradio


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m31.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m75.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.9/302.9 kB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.7/21.7 MB[0m [31m77.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.2/328.2 kB[0m [31m18.3 MB/s[0m eta 

In [1]:
from pathlib import Path
from google.colab import drive

from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from langchain_community.llms import HuggingFacePipeline

from langchain.chains import RetrievalQA

print("RetrievalQA imported successfully ✅")


RetrievalQA imported successfully ✅


## 📂 Document Collection

Course materials are stored in Google Drive and loaded automatically.


In [2]:
drive.mount("/content/drive")

DATA_DIR = "/content/drive/MyDrive/Course_Materials_RAG"
CHROMA_DIR = "/content/chroma_db"

Path(DATA_DIR).mkdir(parents=True, exist_ok=True)
Path(CHROMA_DIR).mkdir(parents=True, exist_ok=True)

Mounted at /content/drive


In [3]:
# Documents Loading
def load_documents(folder):
    docs = []
    for file in Path(folder).glob("**/*"):
        if file.suffix.lower() == ".pdf":
            docs.extend(PyPDFLoader(str(file)).load())
        elif file.suffix.lower() in [".txt", ".md"]:
            docs.extend(TextLoader(str(file), encoding="utf-8").load())
    return docs

documents = load_documents(DATA_DIR)
print(f"Loaded {len(documents)} documents")


Loaded 448 documents


##  Text Chunking

Documents are split into overlapping chunks to preserve semantic continuity.


In [4]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=150
)

splits = text_splitter.split_documents(documents)
print(f"Created {len(splits)} chunks")

Created 463 chunks


## 🔎 Embeddings and Vector Database

Chunks are embedded using a Sentence-Transformer model and stored in Chroma.

In [5]:
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [6]:
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=CHROMA_DIR
)

retriever = vectordb.as_retriever(search_kwargs={"k": 3})

##  Language Model

An open-source instruction-tuned Hugging Face model is used for generation.


In [7]:
LLM_NAME = "google/flan-t5-large"

tokenizer = AutoTokenizer.from_pretrained(LLM_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(LLM_NAME)

pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    do_sample=False
)

llm = HuggingFacePipeline(pipeline=pipe)

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cuda:0


##  Hallucination Control !

The model is forced to answer **only from retrieved context**.


In [8]:
from langchain.prompts import PromptTemplate

RAG_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template="""
You are an academic assistant.
Answer the QUESTION using ONLY the CONTEXT below.

Rules:
-Answer in the same language as the context.
- Do NOT use external knowledge.
- Do NOT invent information.
- If the answer is NOT explicitly contained in the CONTEXT, reply exactly:
  "I don't know"

CONTEXT:
{context}

QUESTION:
{question}

ANSWER:
"""
)


In [9]:
# RAG Chain (CORE)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    chain_type_kwargs={"prompt": RAG_PROMPT},
    return_source_documents=False
)

##  System Evaluation


In [10]:
def ask_rag(question):
    return qa_chain.run(question)

In [11]:
print(ask_rag("C'est quoi un agent intelligent ?"))

  warn_deprecated(


Les agents intelligents sont des entités logiciels qui réalisent des opérations à la place d'un utilisateur ou d'un autre programme, avec une sorte d'autonomie, et pour faire cela ils utilise une sorte de connaissance ou de représentation des buts ou des désirs de l'utilisateur.


In [12]:
import gradio as gr

def chat_rag(user_message, history):
    """
    Simple RAG chat:
    - Question
    - Retrieval
    - Strict answer from documents
    """
    try:
        answer = qa_chain.run(user_message)
    except Exception as e:
        answer = f"Error: {str(e)}"

    history = history + [(user_message, answer)]
    return history, history, ""

with gr.Blocks(title="📘 Course RAG Assistant") as demo:

    gr.Markdown("""
    # 📘 Course RAG Assistant
    """)

    chatbot = gr.Chatbot(height=400)

    with gr.Row():
        msg = gr.Textbox(
            placeholder="Ask a question about the course...",
            show_label=False
        )

    with gr.Row():
        send = gr.Button("Send")
        clear = gr.Button("Clear")

    send.click(
        chat_rag,
        inputs=[msg, chatbot],
        outputs=[chatbot, chatbot, msg]
    )

    msg.submit(
        chat_rag,
        inputs=[msg, chatbot],
        outputs=[chatbot, chatbot, msg]
    )

    clear.click(lambda: ([], ""), outputs=[chatbot, msg])

demo.launch()

  chatbot = gr.Chatbot(height=400)
  chatbot = gr.Chatbot(height=400)


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://b92338868761cb056e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




##  Conclusion

This project demonstrates a complete vanilla RAG pipeline using open-source tools.
By combining semantic retrieval with controlled generation, the system provides accurate and reliable answers grounded in course materials.

**Limitations:**  
The system depends on the quality and coverage of the provided documents; questions outside this scope are intentionally rejected.

