Here we will be using a pdf file which contains the information about the Series of Llama models. The pdf file is accumulated from the wikipedia page of Llama. The pdf file is available in the GitHub repository of this blog.

In [1]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_chunks_from_pdf(data_path, chunk_size, chunk_overlap):

   '''
   This function takes a directory of PDF files and creates chunks of text from each file.
   The text is split into chunks of size `chunk_size` with an overlap of `chunk_overlap`.
   This chunk is then converted into a langchain Document object.

   Args:
      data_path (str): The path to the directory containing the PDF files.
      chunk_size (int): The size of each chunk.
      chunk_overlap (int): The overlap between each chunk.

   Returns:
      docs (list): A list of langchain Document objects, each containing a chunk of text.
   '''

   # Load the documents from the directory
   loader = DirectoryLoader(data_path, loader_cls=PyPDFLoader)

   # Split the documents into chunks
   text_splitter = RecursiveCharacterTextSplitter(
      chunk_size=chunk_size,
      chunk_overlap=chunk_overlap,
      length_function=len,
      is_separator_regex=False,
   )
   docs = loader.load_and_split(text_splitter=text_splitter)
   return docs

data_path = '../data'
chunk_size = 500
chunk_overlap = 50

docs = create_chunks_from_pdf(data_path, chunk_size, chunk_overlap)

We first start by loading the documents from the directory. We then split the documents into several chunks of equal size and finally convert them to langchain docs

Now the next step is to index the documents in the Qdrant database. But before moving on to that step, we will first need to load the embeddings model which we will be using to convert the documents into embeddings. To build the simple RAG model we will first start with 'BAAI/bge-large-en' embeddings model. In the evaluation section, we will also try with other embeddings models to see how the performance changes.

In [2]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding_models = ['BAAI/bge-large-en']

# Load the embeddings model
embeddings = HuggingFaceEmbeddings(model_name=embedding_models[0])

  from tqdm.autonotebook import tqdm, trange


Great, Now that we have created the chunked documents, loaded the embeddings model, we can now index the documents using Qdrant.

In [3]:
from langchain_qdrant import Qdrant

def index_documents_and_retrieve(docs, embeddings):

    '''
    This function uses the Qdrant library to index the documents using the chunked text and embeddings model.
    For the simplicity of the example, we are using in-memory storage only.

    Args:
    docs: List of documents generated from the document loader of langchain
    embeddings: List of embeddings generated from the embeddings model

    Returns:
    retriever: Qdrant retriever object which can be used to retrieve the relevant documents
    '''

    qdrant = Qdrant.from_documents(
        docs,
        embeddings,
        location=":memory:",  # Local mode with in-memory storage only
        collection_name="my_documents",
    )

    retriever = qdrant.as_retriever()

    return retriever

retriever = index_documents_and_retrieve(docs, embeddings)

For the simplicity of this blog, we have indexed the documents in the Qdrant database using the in-memory mode. But in the production environment, you can use the persistent mode to store the indexed documents in the disk. If you want to know more about these approaches you can visit my other blogs on Qdrant.

Now that we have the retriever object, we can use it to retrieve the relevant documents based on the query and finally generate the answer using the LLM model.

In [5]:
from langchain_community.chat_models import ChatOllama

model_id = "llama3:instruct"

# Load the Llama-3 model using the Ollama
llm = ChatOllama(model=model_id)

Now let's build the simple RAG chain from the retriever and the LLM model.

In [6]:
from langchain_core.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

def build_rag_chain(llm, retriever):

    '''
    This function builds the RAG chain using the LLM model and the retriever object. 
    The RAG chain is built using the following steps:
    1. Retrieve the relevant documents using the retriever object
    2. Pass the retrieved documents to the LLM model along with prompt generated using the context and question
    3. Parse the output of the LLM model

    Args:
    llm: LLM model object
    retriever: Qdrant retriever object

    Returns:
    rag_chain: RAG chain object which can be used to answer the questions based on the context
    '''
    
    template = """
        Answer the question based only on the following context:
        
        {context}
        
        Question: {question}
        """
    
    prompt = PromptTemplate(
        template=template,
        input_variables=["context","question"]
        )
    
    rag_chain = (
        {"context": retriever,  "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    
    return rag_chain

rag_chain = build_rag_chain(llm, retriever)

Now that we have the RAG model, we can use it to generate the answers for the queries. Let's test it out

In [8]:
rag_chain.invoke('What is this document about?')

'Based on the provided context, this document appears to be about LLaMA, a language model that was initially announced in February 2023. The document discusses the release of the model, its training and architecture, as well as reactions to its leak and accessibility. It also mentions issues related to unauthorized distribution, DMCA takedown requests, and subsequent releases of updated versions under different licenses.'

There you go! You have successfully built a RAG App using Langchain, Qdrant, HuggingFace, Ollama and Llama-3. You can now use this RAG App to answer questions based on the context of the documents.

Now let's move on to the next section where we will learn how to evaluate the RAG App using the evaluation metrics discussed in the above sections.