# RAG-Based Solution Implementation

## Author
**Kevin Joy D'Souza**

## Purpose
This notebook demonstrates the implementation of a Retrieval-Augmented Generation (RAG) based solution using a Hugging Face model. The purpose is to showcase how RAG can be leveraged for enhanced performance in natural language understanding and generation tasks.

## Overview
The solution integrates Hugging Face's model capabilities with RAG to provide a robust framework for handling complex queries and generating accurate responses based on retrieved information.


## import required libraries

In [1]:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain_community.llms import HuggingFaceEndpoint
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import PromptTemplate

from keys import INFERENCE_API_KEY  # Your Hugging Face API key should be stored here
from prompt import TEMPLATE  # Your custom prompt template





# Register and get Huggingface Key for Inference

In [2]:
INFERENCE_API_KEY="insert your huggingface api key"

# function to Load and Preprocess PDF Document

In [3]:
def load_pdf_text(file):
    loader = PyPDFLoader(file)
    docs = loader.load()

    doc_length = sum(len(doc.page_content.split()) for doc in docs)    
    return docs, doc_length
file="/home/kevin/PycharmProjects/RAG-Q&A-Q/QnA-PDF-RAG-LangChain/sample-biology-pdf/Concepts of Biology-chap1-chap2.pdf"
docs, doc_length = load_pdf_text(file)
print(f"Document length: {doc_length} words")
# Display first two documents to verify
docs[:2] 

Document length: 30473 words


[Document(metadata={'source': '/home/kevin/PycharmProjects/RAG-Q&A-Q/QnA-PDF-RAG-LangChain/sample-biology-pdf/Concepts of Biology-chap1-chap2.pdf', 'page': 0}, page_content='INTR ODUCT IONCHAP TER 1\nIntroduction t o Biolog y\n1.1Themes and Conc epts of Biolog y\n1.2The P rocess of Scienc e\nViewed fr om spac e, Ear th (Figure 1.1 ) offers f ew clues about the div ersity o f life\nforms that r eside ther e. The firs t forms o f life on Ear th ar e thought t o ha ve been micr oorganisms\nthat e xisted for bil lions o f years bef ore plants and animals appear ed. The mammals , birds, and\nflowers so familiar t o us ar e all relativ ely recent, originating 130 t o 200 mil lion y ears ag o. Humans\nhave inhabit ed this planet f or onl y the las t 2.5 mil lion y ears , and onl y in the las t 300,000 y ears\nhave humans s tarted looking lik e we do t oday.\n1.1Themes and C oncepts of Biology\nLEARNING OB JECTIVE S\nBy the end o f this section, y ou wil l be able t o:\n•Identif y and describe

# Determine Optimal Chunk Size

In [4]:
# Depending on the length of the document, determine the chunk size for splitting.
def determine_optimal_chunk_size(doc_length):
    if (doc_length < 5000):  
        return 500, 100  
    elif (doc_length < 20000):  
        return 1000, 250  
    else:  
        return 2000, 500

chunk_size, chunk_overlap = determine_optimal_chunk_size(doc_length)
print(f"Chunk Size: {chunk_size}, Chunk Overlap: {chunk_overlap}")

Chunk Size: 2000, Chunk Overlap: 500


# Chunk and Store Documents in Vectorstore

In [5]:
# Split the documents into chunks and store them in a Chroma vectorstore using the Hugging Face embeddings.
def chunk_and_store_in_vector_store(docs, chunk_size, chunk_overlap):
    embeddings = HuggingFaceInferenceAPIEmbeddings(
        api_key=INFERENCE_API_KEY, 
        model_name="sentence-transformers/all-MiniLM-l6-v2"
    )

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
    return vectorstore

vectorstore = chunk_and_store_in_vector_store(docs, chunk_size, chunk_overlap)
print("Vectorstore created and documents stored.")

Vectorstore created and documents stored.


# Utility functions to process user inputs and return query response

In [6]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.llms import HuggingFaceEndpoint

TEMPLATE = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

{context}

Question: {question}

Answer:"""

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def substring_after(s, delim):
    return s.partition(delim)[2]

def process_user_input(user_query, vectorstore):
    retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
    print("Retriever set up.")

    llm = HuggingFaceEndpoint(
        huggingfacehub_api_token=INFERENCE_API_KEY,
        repo_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
        task="text-generation",
        max_new_tokens=512,
        top_k=50,
        top_p=0.8,
        temperature=0.1,
        repetition_penalty=1
    )
    print("LLM set up successfully.")

    custom_rag_prompt = PromptTemplate.from_template(TEMPLATE)

    rag_chain_from_docs = (
        RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
        | custom_rag_prompt
        | llm
        | StrOutputParser()
    )

    rag_chain_with_source = RunnableParallel(
        {"context": retriever, "question": RunnablePassthrough()}
    ).assign(answer=rag_chain_from_docs)

    llm_response = rag_chain_with_source.invoke(user_query)
    print("LLM Response: ", llm_response)

    if 'answer' not in llm_response:
        print("No 'answer' in LLM response.")
        return "LLM response did not return an answer."

    final_output = substring_after(llm_response['answer'], "Helpful Answer:")
    print(f"Final extracted output is seen in answer: {final_output}")

    return final_output.strip()




# Q & A

In [8]:
user_input = "What is the smallest unit of life?"  # Example input
response = process_user_input(user_input, vectorstore)
#print("Response: ", response)


Retriever set up.
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /home/kevin/.cache/huggingface/token
Login successful
LLM set up successfully.
LLM Response:  {'context': [Document(metadata={'page': 19, 'source': '/home/kevin/PycharmProjects/RAG-Q&A-Q/QnA-PDF-RAG-LangChain/sample-biology-pdf/Concepts of Biology-chap1-chap2.pdf'}, page_content='Key Terms\napplied scienc ea form o f scienc e that sol ves real-\nworld pr oblems\natom a basic unit o f mat ter that cannot be br oken\ndown b y normal chemical r eactions\nbasic scienc escienc e that seek s to expand\nknowledg e regardles s of the shor t-term application\nof that kno wledg e\nbiolog ythe s tudy o f life\nbiospher ea collection o f all ecosystems on Ear th\ncellthe smal lest fundam