# RAG Chatbot for Technical Documentation

## Introduction

In this notebook, we demonstrate how to build a RAG (Retrieval-Augmented Generation) chatbot for technical documentation. We used the the *Artificial Intelligence Act* (https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf ) as the technical documentation for this demonstration.

Our approach follows the steps below:
1. Split the document into chunks of text
2. Generate and store the embeddings for each chunk
3. Create a retriever model to find the most relevant chunks for a given question
4. Initialize the LLM and prompt template
5. Define RAG chain
5. Invoke RAG chain

## Pre-requisites

To reproduce the following notebook, you need to install the following libraries/packages:
```bash
pip install transformers
pip install torch
pip install faiss-cpu
pip install langchain
pip install langchain_huggingface
```

Additionally, make sure that you have a data folder with the PDF document in it.

---

In [1]:
import logging
import os
import pickle
from transformers import AutoModelForCausalLM, pipeline, AutoTokenizer
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy
import faiss 
import transformers

In [2]:
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

EMBEDDING_MODEL_NAME = "thenlper/gte-small"
READER_MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
FAISS_INDEX_PATH = "../embeddings/knowledge_vector_database.faiss"
PDF_FILE_PATH = "../data/raw/TA-9-2024-0138_EN.pdf"

In [3]:
# Function to split the document into chunks
def split_document_into_chunks(file_path: str, chunk_size: int, tokenizer_name: str = EMBEDDING_MODEL_NAME):
    """
    Load a document and split it into smaller chunks for processing.

    Args:
        file_path (str): Path to the document file.
        chunk_size (int): The maximum size of each chunk (number of tokens).
        tokenizer_name (str): The name of the tokenizer to use for splitting the document.

    Returns:
        List of split document chunks.
    """
    # Check if the document file exists
    if not os.path.isfile(file_path):
        logging.error(f"The file '{file_path}' does not exist.")
        return None

    # Load the document using PyPDFLoader
    loader = PyPDFLoader(file_path)
    pages = loader.load()
    logging.info(f"The document has been loaded successfully. Total number of pages: {len(pages)}.")

    # Initialize a text splitter
    text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
        AutoTokenizer.from_pretrained(tokenizer_name),
        chunk_size=chunk_size,
        chunk_overlap=int(chunk_size * 0.1),  
        add_start_index=True, 
        strip_whitespace=True
    )

    chunks = text_splitter.split_documents(pages)
    logging.info(f"The document has been split into {len(chunks)} chunks.")

    return chunks

In [4]:
chunks = split_document_into_chunks(PDF_FILE_PATH, 256)
print(chunks[0])

2024-11-03 09:58:52,541 - INFO - The document has been loaded successfully. Total number of pages: 459.
2024-11-03 09:58:54,117 - INFO - The document has been split into 814 chunks.


page_content='European Parliament
2019-2024
TEXTS ADOPTED
P9_TA(2024)0138
Artificial Intelligence Act
European Parliament legislative resolution of 13 March 2024 on the proposal for a 
regulation of the European Parliament and of the Council on laying down harmonised 
rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain Union 
Legislative Acts (COM(2021)0206 – C9-0146/2021 – 2021/0106(COD))
(Ordinary legislative procedure: first reading)
The European Parliament,
– having regard to the Commission proposal to Parliament and the Council 
(COM(2021)0206),
– having regard to Article 294(2) and Articles 16 and 114 of the Treaty on the 
Functioning of the European Union, pursuant to which the Commission submitted the 
proposal to Parliament (C9-0146/2021),
– having regard to Article 294(3) of the Treaty on the Functioning of the European Union,' metadata={'source': '../data/raw/TA-9-2024-0138_EN.pdf', 'page': 0, 'start_index': 0}


In [5]:
# Function to generate embeddings for the document chunks
def generate_embeddings(chunks: list):
    """
    Generate embeddings for the given document chunks and store them using FAISS (uses the nearest neighbor search algorithm).
    
    Args:
        chunks (list): List of document chunks to generate embeddings for.
        
    Returns:
        FAISS object containing the document embeddings.
    """
    # Initialize the embedding model
    embedding_model = HuggingFaceEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        multi_process=True,
        model_kwargs={"device": "cpu"},  # Use CPU for embeddings
        encode_kwargs={"normalize_embeddings": True}
    )
    logging.info(f"Embedding model '{EMBEDDING_MODEL_NAME}' loaded successfully.")
    # Generate embeddings for the document chunks 
    KNOWLEDGE_VECTOR_DATABASE = FAISS.from_documents(
        chunks, embedding_model, distance_strategy=DistanceStrategy.COSINE
    )
    logging.info("Embeddings generated successfully.")

    return KNOWLEDGE_VECTOR_DATABASE

In [6]:
# Function to save the FAISS object containing the document embeddings to a file
def save_knowledge_vector_database(knowledge_vector_database, file_path):
    """
    Save the FAISS object containing the document embeddings to a file.
    Args:
        knowledge_vector_database (FAISS): FAISS object containing the document embeddings.
        file_path (str): Path to save the knowledge vector database.
    """
    with open(file_path, 'wb') as f:
        pickle.dump(knowledge_vector_database, f)
    logging.info(f"Knowledge vector database saved to {file_path}")

## Preprocessing

We use the functions implemented above to split the document into chunks, generate embeddings for each chunk, and store them. We also ensure that the directory for saving the knowledge vector database exists.


In [7]:
# Ensure the directory for saving the knowledge vector database exists
os.makedirs(os.path.dirname(FAISS_INDEX_PATH), exist_ok=True)

chunks = split_document_into_chunks(PDF_FILE_PATH, chunk_size=256)
if chunks is not None:
    # Generate embeddings for the document chunks
    knowledge_vector_database = generate_embeddings(chunks)
    # Save the entire knowledge vector database to a file
    save_knowledge_vector_database(knowledge_vector_database, FAISS_INDEX_PATH)
else:
    logging.error("Failed to split the document into chunks.")

2024-11-03 09:58:55,538 - INFO - The document has been loaded successfully. Total number of pages: 459.
2024-11-03 09:58:56,931 - INFO - The document has been split into 814 chunks.
2024-11-03 09:58:57,007 - INFO - Load pretrained SentenceTransformer: thenlper/gte-small
2024-11-03 09:58:59,063 - INFO - Embedding model 'thenlper/gte-small' loaded successfully.
2024-11-03 09:58:59,065 - INFO - CUDA/NPU is not available. Starting 4 CPU workers
2024-11-03 09:58:59,065 - INFO - Start multi-process pool on devices: cpu, cpu, cpu, cpu
2024-11-03 09:59:54,209 - INFO - Embeddings generated successfully.
2024-11-03 09:59:54,586 - INFO - Knowledge vector database saved to ../embeddings/knowledge_vector_database.faiss


In [8]:
# Function to initialize the reader model
def initialize_reader_model(model_name: str = READER_MODEL_NAME):
    """
    Initialize the LLM model for text generation.
    
    Args:
        model_name (str): The name of the model to use for the LLM.
    
    Returns:
        A A HuggingFace pipeline for text generation and the tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(READER_MODEL_NAME)
    model = AutoModelForCausalLM.from_pretrained(READER_MODEL_NAME, device_map="auto", torch_dtype="auto")    

    # Initialize the reader LLM model
    reader_llm = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer, 
    do_sample=True,
    temperature=0.7,
    repetition_penalty=1.2,
    return_full_text=False,
    )
    logging.info(f"Reader LLM model '{model_name}' initialized successfully.")
    return reader_llm, tokenizer

In [9]:
# Function to load the entire knowledge vector database from a file
def load_knowledge_vector_database(file_path):
    """
    Load the entire FAISS object from a file.

    Args:
        file_path (str): Path to the knowledge vector database file.

    Returns:
        The loaded FAISS object.
    """
    if os.path.exists(file_path):
        with open(file_path, 'rb') as f:
            knowledge_vector_database = pickle.load(f)
        logging.info(f"Knowledge vector database loaded from {file_path}")
        return knowledge_vector_database
    else:
        logging.error(f"Knowledge vector database file {file_path} does not exist.")
        return None

In [10]:
# Function to retrieve relevant documents from the knowledge base
def retrieve_relevant_docs(query: str, knowledge_vector_database, k: int = 5):
    """
    Retrieve the most relevant documents from the FAISS knowledge base.
    
    Args:
        query (str): The user query.
        knowledge_vector_database: The FAISS knowledge base for retrieval.
        k (int): The number of top documents to retrieve.
    
    Returns:
        A tuple containing the retrieved documents, their combined text, and their metadata.
    """
    logging.info(f"Starting retrieval for query: {query}")
    retrieved_docs = knowledge_vector_database.similarity_search(query=query, k=k)

    retrieved_docs_text = [doc.page_content for doc in retrieved_docs]
    retrieved_docs_metadata = [doc.metadata for doc in retrieved_docs]
    context = "\nExtracted documents:\n"
    context += "".join([f"Document {i}:::\n{doc}\n" for i, doc in enumerate(retrieved_docs_text)])

    return retrieved_docs, context, retrieved_docs_metadata

In [None]:
# Function to generate the final answer using the retrieved documents and LLM
def generate_answer_from_docs(query: str, context: str, reader_llm, tokenizer, retrieved_docs_metadata, max_new_tokens=500):
    """
    Generate an answer using the LLM based on the retrieved documents.

    Args:
        query (str): The user query.
        context (str): The text of the retrieved documents.
        reader_llm: The text generation pipeline (LLM).
        tokenizer: The tokenizer for formatting the chat-based prompt.
        retrieved_docs_metadata (list): Metadata of the retrieved documents.
        max_new_tokens (int): Maximum number of tokens for the generated answer.

    Returns:
        The generated answer from the LLM, including the page numbers of the retrieved chunks.
    """
    # Chat-style prompt for the model - prompt template

    prompt_in_chat_format = [
        {
            "role": "system",
            "content": """Using the information contained in the context,
give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Avoid referencing specific document names, such as "According to Document 0". 
If the answer cannot be deduced from the context, do not give an answer.""",
        },
        {
            "role": "user",
            "content": f"""Context:
            {context}
            ---
            Now here is the question you need to answer:

            Question: {query}"""
        },
    ]

    # Apply the chat template to the prompt
    rag_prompt_template = tokenizer.apply_chat_template(
        prompt_in_chat_format, tokenize=False, add_generation_prompt=True
    )

    # Generate the final answer
    generated_text = reader_llm(rag_prompt_template, truncation=True, max_new_tokens=max_new_tokens)

    # Process the generated text
    answer = generated_text[0]['generated_text']

    # Extract page numbers from metadata
    page_numbers = sorted(set([metadata['page'] for metadata in retrieved_docs_metadata]))
    page_numbers_str = ", ".join(map(str, page_numbers))

    # Append page numbers to the answer
    answer += f"\n\nPages retrieved from the document and included in the context for generating the answer: {page_numbers_str}"

    return answer

## Question-Answering Pipeline

In this section, we load the precomputed **FAISS object** (knowledge vector database) to enable the retrieval of relevant information from the document. If loading fails, an error is logged, and the script exits.

Once the knowledge base is loaded successfully, we initialize the **Reader LLM model**, which will generate relevant answers based on the retrieved document segments. 

For demonstration, we define a sample query: *"What is the purpose of this Regulation?"*

Using this query, the following steps are performed:
1. **Retrieve Relevant Documents**: We query the knowledge vector database to retrieve the most relevant chunks related to the question, aggregating them into a contextualized format.
2. **Generate Answer**: The reader model processes the retrieved context to produce a concise, relevant answer to the query, which is then displayed.

This setup creates a streamlined pipeline for asking questions and getting precise, document-based answers.

In [12]:
# Load the knowledge vector database
knowledge_vector_database = load_knowledge_vector_database(FAISS_INDEX_PATH)
if knowledge_vector_database is None:
    logging.error("Failed to load the knowledge vector database.")
    exit(1)

# Initialize the reader model
reader_llm, tokenizer = initialize_reader_model()

# Define the query
query = "How are general-purpose AI models defined in the AI Act?"

# Retrieve relevant documents
retrieved_docs, context, retrieved_docs_metadata = retrieve_relevant_docs(query, knowledge_vector_database)

# Generate the answer
answer = generate_answer_from_docs(query, context, reader_llm, tokenizer, retrieved_docs_metadata)
print("Answer:", answer)

2024-11-03 09:59:54,906 - INFO - Knowledge vector database loaded from ../embeddings/knowledge_vector_database.faiss
2024-11-03 09:59:56,738 - INFO - Reader LLM model 'Qwen/Qwen2.5-1.5B-Instruct' initialized successfully.
2024-11-03 09:59:56,741 - INFO - Starting retrieval for query: How are general-purpose AI models defined in the AI Act?
2024-11-03 09:59:56,742 - INFO - CUDA/NPU is not available. Starting 4 CPU workers
2024-11-03 09:59:56,745 - INFO - Start multi-process pool on devices: cpu, cpu, cpu, cpu
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


Answer: General-purpose AI models are defined according to Article 51 of Chapter V's Section 1 within the AI Act. They must meet certain key functional characteristics, specifically being capable of performing widely diverse tasks across different domains. Additionally, there are evaluation standards outlined under Article 51 regarding impacts and capacities related to systematic risks, requiring assessment using appropriate technical tools and methodologies along with reference to specified guidelines detailed in Annex XIII.

Pages retrieved from the document and included in the context for generating the answer: 90, 179, 285, 380
