# Simple RAG pipeline allowing you to "talk" to your documentation

This notebook contains a simple application for using retrieval augmented generation (RAG) to "ask questions" from a PDF, using a powerful package called `langchain`. In this case, we're going to use a PDF of the PyCharm documentation, but `langchain` allows you to use a [wide variety of input formats](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/), giving you significant flexibility over your input data source.

In this pipeline, we'll need to do the following:
* Load in (for local models) or connect to the API of (for remote models) our LLM;
* Load in our PDF that we want to "chat" to;
* We can't pass the whole PDF into a model at the same time (it's almost 2000 pages!). As such, we need to split it into chunks;
* Rather than needing to pass every individual chunk through the LLM to find the information in the document relevant to a question, we can convert these chunks into document embeddings, which we then store in a vector database. At query time, the question is also converted into a document embedding, and the most similar document chunks to the question are retrieved.

In [2]:
from dotenv import load_dotenv
from langchain import chains, document_loaders, vectorstores
from langchain.text_splitter import CharacterTextSplitter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
import re

In [3]:
class PdfQA:
    """
    Initializes the PdfQA class with the specified parameters.

    :param model: The name or path of the model to be loaded.
    :param pdf_document: The path to the PDF document to be loaded.
    :param chunk_size: The desired size of each chunk.
    :param chunk_overlap: The specified overlap between chunks.
    :param search_type: The type of search to be performed.
    :param n_documents: The number of documents to be retrieved.
    :param chain_type: The type of chain to create.
    """

    def __init__(self, model, pdf_document, chunk_size, chunk_overlap,
                 search_type, n_documents, chain_type):
        load_dotenv()
        self.init_chat_model(model)
        self.load_documents(pdf_document)
        self.split_documents(chunk_size, chunk_overlap)
        self.select_embedding = OpenAIEmbeddings()
        self.create_vectorstore()
        self.create_retriever(search_type, n_documents)
        self.chain = self.create_chain(chain_type)

    def init_chat_model(self, model):
        """
        Initialize the chat model.

        :param model: The name or path of the model to be loaded.
        :return: None

        """
        print("Loading model")
        self.llm = ChatOpenAI(model_name=model, temperature=0)

    def load_documents(self, pdf_document):
        """
        Load documents from a PDF file and convert to a format that can be ingested by the langchain
        document splitter.

        :param pdf_document: The path to the PDF document to be loaded.
        :return: None
        """
        print("Loading PDFs")
        pdf_loader = document_loaders.PyPDFLoader(pdf_document)
        self.documents = pdf_loader.load()

    def split_documents(self, chunk_size, chunk_overlap):
        """
        Split the documents into chunks of a given size with a specified overlap.

        :param chunk_size: The desired size of each chunk.
        :type chunk_size: int
        :param chunk_overlap: The specified overlap between chunks.
        :type chunk_overlap: int
        :return: None
        """
        print("Splitting documents")
        text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
        self.texts = text_splitter.split_documents(self.documents)

    def create_vectorstore(self):
        """
        Create Vector Store.

        This method creates document embeddings using the Chroma algorithm from the given texts and selected embedding.

        :return: None
        """
        print("Creating document embeddings")
        self.db = vectorstores.Chroma.from_documents(self.texts, self.select_embedding)

    def create_retriever(self, search_type, n_documents):
        """
        Generate a chunk retriever for the given search type and number of documents.

        :param search_type: The type of search to be performed.
        :param n_documents: The number of documents to be retrieved.
        :return: None
        """
        print("Generating chunk retriever")
        self.retriever = self.db.as_retriever(search_type=search_type, search_kwargs={"k": n_documents})

    def create_chain(self, chain_type):
        """
        :param chain_type: The type of chain to create.
        :return: The created chain.
        """
        qa = chains.RetrievalQA.from_chain_type(llm=self.llm,
                                                chain_type=chain_type,
                                                retriever=self.retriever,
                                                return_source_documents=True)
        return qa

    def query_chain(self):
        """
        Returns the chain of the object.

        :return: The chain of the object.
        """
        return self.chain

## Levers in the RAG pipeline
RAG is quite tricky to get right, especially if you need it to be efficient. There are many levers we can pull in our pipeline, which influence the following things:
* How fast we can get our answers;
* How relevant our answers are (and related, how likely we are to get a hallucination);
* How complete our answers are.

Let's instantiate our PDF questioner with the following values:
* `model`: the LLM used to generate answers using information from the document. In this case, `gpt-3.5-turbo`.
* `pdf_document`: the PDF we want to "chat with". In our case, we've selected our PDF containing almost all of the PyCharm documentation.
* `chunk_size`: the maximum number of tokens to include in each chunk. We've selected 1000.
* `chunk_overlap`: the number of tokens that should overlap between adjacent chunks. We've selected 0, so no overlapping tokens.
* `search_type`: the metric by which chunks are selected. In this case, we've selected "similarity", so those chunks with the highest (cosine) similarity to the content of the question we're asking. However, you can also use "mmr" (if supported by your document store) which tries to maximise for relevancy and diversity of results.
* `n_documents`: the maximum number of chunks to use to generate the answer. In this case, we've used 5.
* `chain_type`: this controls how the content is passed into the LLM. In the case of "stuff" it passes all gathered context chunks into the context window at once. Other options are "refine", which feeds in the chunks in batches, plus the answer generated so far, and "map-rerank", which feeds in each chunk and assigns a score based on how well it answered the question.

Other levers I've chosen not to make arguments in this class are the model used for embeddings (the `OpenAIEmbeddings` were used) and which vector database we use to store the document embeddings (in this case, the `Chroma` vector store was used).

In [4]:
pdf_qa = PdfQA("gpt-3.5-turbo", "../materials/pycharm-documentation.pdf", 1000, 0, "similarity", 5,
               "stuff")
pdf_qa_chain = pdf_qa.query_chain()

Loading model
Loading PDFs
Splitting documents
Creating document embeddings
Generating chunk retriever


Let's try it out by asking how we can debug in PyCharm.

In [5]:
answer1 = pdf_qa_chain.invoke({"query": "What are the options for debugging with PyCharm?"})

In [6]:
answer1["result"]

'The options for debugging with PyCharm include placing breakpoints at specific lines of code, stepping through the code line by line, evaluating expressions, adding watches, and manually setting variable values. You can start debugging by pressing a specific key, and then navigate through the program execution using the available options in the Run menu or the Debug tool window. The Debug tool window consists of panes for frames, variables, watches, and a Console tab for input and output information.'

We can see the answer is very comprehensive. Let's have a look at the information it was based on from the documentation.

In [7]:
for document in answer1["source_documents"]:
    index_n = answer1["source_documents"].index(document)
    print(f"\nDOCUMENT {index_n + 1}")
    print(re.sub(r"\s+", " ", document.page_content.strip()))


DOCUMENT 1
Debug Does your application stumble on a runtime error? To find out what’s causing it, you will have to do some debugging. PyCharm supports the debugger on all platforms. Debugging starts with placing breakpoints at which program execution will be suspended, so you can explore program data. Just click the gutter of the line where you want the breakpoint to appear. To start debugging your application, press . Then go through the program execution step by step (see the available options in the Run menu or the Debug tool window), evaluate any arbitrary expression, add watches, and manually set values for the variables. For more information, refer to Debugging. Test It is a good idea to test your applications, and PyCharm helps doing it as simple as possible. With PyCharm, you can: ⌃Ctrl D Create tests • Create special testing run/debug configurations. • Run and debug tests right from the IDE, using the testing run/debug configurations. •

DOCUMENT 2
Launch it and observe resul

We can see that the first three chunks are the most relevant, while the last three don't really add that much to the answer.

If we'd like, we can go a bit deeper with our answer. We can set up a memory for the last answer the LLM gave us so we can ask follow up questions. In this case, as a data scientist, I'd like to know about how to debug in Jupyter notebooks.

In [8]:
chat_history1 = [(answer1["query"], answer1["result"])]
answer2 = pdf_qa_chain.invoke({"query": "Have you left out any other types of debugging?",
                               "chat_history": chat_history1})

In [9]:
answer2["result"]

'Yes, there are other types of debugging mentioned in the context provided:\n\n1. Debugging JavaScript: This is mentioned as the next step intended for Professional edition users.\n2. Debugging Django templates: It is mentioned that you have learned how to step through your template, evaluate expressions, and add watches in the context of a Django project.\n3. Working in the Threads and Variables tab: It is mentioned that you can observe the variables used in the application by stepping through all the set breakpoints.\n4. Working in the Console tab: It is mentioned that you can use the Console tab to see error messages or perform calculations not related to the current application.\n\nThese are the additional types of debugging mentioned in the context provided.'

If our model is capable of it, we can even enter queries in a different language to the source documentation, and get relevant answers back in this language. Here we question our English-language documentation in German ...

In [10]:
answer3 = pdf_qa_chain.invoke({"query": "Wie kann man PyCharm installieren?"})

... and get a relevant answer in German!

In [11]:
answer3["result"]

'PyCharm kann über die Toolbox oder eigenständige Installationen installiert werden. Wenn du Hilfe bei der Installation benötigst, sieh dir die Installationsanweisungen an. Es gibt auch eine stille Installationsoption für Netzwerkadministratoren, um PyCharm auf mehreren Maschinen zu installieren, ohne andere Benutzer zu unterbrechen. Es gibt auch eine separate ARM64-Installationsdatei. Es wird empfohlen, die Integrität des Installationsprogramms mit dem SHA-Prüfsummenlink von der Downloadseite zu überprüfen.'

In [12]:
for document in answer3["source_documents"]:
    index_n = answer3["source_documents"].index(document)
    print(f"\nDOCUMENT {index_n + 1}")
    print(re.sub(r"\s+", " ", document.page_content.strip()))


DOCUMENT 1
PyCharm 2024.1 Getting started/Installation guide Last modified: 06 May 2024 PyCharm is a cross-platform IDE that provides consistent experience on the Windows, macOS, and Linux operating systems. PyCharm is available in two editions: Professional, and Community. The Community edition is an open-source project, and it's free, but it has fewer features. The Professional edition is commercial, and provides an outstanding set of tools and features. For more information, refer to the editions comparison matrix ↗.Install PyCharm

DOCUMENT 2
You can install PyCharm using Toolbox or standalone installations. If you need assistance installing PyCharm, see the installation instructions: Install PyCharmRequirement Minimum Recommended Operating systemOfficially released versions of the following: Pre-release versions are not supported.The latest versions of the following: Microsoft Windows 10 1809 64-bit or later Windows Server 2019 64- bit or later• macOS 12.0 or later • Ubuntu Linux