# **Question-Answering with PDF Files in LangChain**

## **RAG program with [LangChain](https://deepsense.ai/langchain-announces-partnership-with-deepsense-ai/) library**

We will use the LangChain to build a basic RAG program.

### **1. Install required packages**  

In [19]:
!pip install -q transformers==4.41.2
!pip install -q bitsandbytes==0.43.1
!pip install -q accelerate==0.31.0
!pip install -q langchain==0.2.5
!pip install -q langchainhub==0.1.20
!pip install -q langchain-chroma==0.1.1
!pip install -q langchain-community==0.2.5
!pip install -q langchain_huggingface==0.0.3
!pip install -q python-dotenv==1.0.1
!pip install -q pypdf==4.2.0
!pip install -q numpy==1.24.4

### **2. Build vector database**

#### **2.1 Import libraries**

In [2]:
import torch

from transformers import BitsAndBytesConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_huggingface.llms import HuggingFacePipeline

from langchain.memory import ConversationBufferMemory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.chains import ConversationalRetrievalChain

from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain import hub

#### **2.2 Read PDF file**

In [3]:
Loader = PyPDFLoader
FILE_PATH = "./YOLOv10_Tutorials.pdf"
loader = Loader(FILE_PATH)
documents = loader.load()

#### **2.3 Initialize text splitter**

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size =1000, chunk_overlap =100)

In [5]:
docs = text_splitter.split_documents(documents)

In [None]:
print("Number of sub-documents: ", len(docs))
print(docs[0])

#### **2.4 Initialize instance vectorization**

In [None]:
embedding = HuggingFaceEmbeddings()

#### **2.5 Initialize vector database**

In [8]:
vector_db = Chroma.from_documents(documents=docs, embedding=embedding)
retriever = vector_db.as_retriever()

In [9]:
result = retriever.invoke("What is YOLO?")
print("Number of relevant documents: ", len(result))

Number of relevant documents:  4


### **3. Create LLMs: [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/)**

#### **3.1 Declare parameters**

In [12]:
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

#### **3.2 Initialize model and tokenizer**

In [None]:
MODEL_NAME = "lmsys/vicuna-7b-v1.5"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=nf4_config,
    low_cpu_mem_usage=True
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

#### **3.3 Integrate tokenizer and model into one pipeline**

In [15]:
model_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    pad_token_id = tokenizer.eos_token_id,
    device_map="auto"
)

llm = HuggingFacePipeline(pipeline=model_pipeline)

### **4. Run program**

In [None]:
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

user_question = "YOLOv10 là gì?"
output = rag_chain.invoke(user_question)
answer = output.split("Answer:")[1].strip()

In [None]:
print(answer)

## **Build Chat interface**

In this project, we will use the [Chainlit](https://docs.chainlit.io/get-started/overview) to build Chat interface.

### **1. Install required packages**

In [None]:
!pip install -q transformers==4.41.2
!pip install -q bitsandbytes==0.43.1
!pip install -q accelerate==0.31.0
!pip install -q langchain==0.2.5
!pip install -q langchainhub==0.1.20
!pip install -q langchain-chroma==0.1.1
!pip install -q langchain-community==0.2.5
!pip install -q langchain-openai==0.1.9
!pip install -q langchain_huggingface==0.0.3
!pip install -q chainlit==1.1.304
!pip install -q python-dotenv==1.0.1
!pip install -q pypdf==4.2.0
!npm install -g localtunnel
!pip install -q numpy==1.24.4

### **2.Import libraries**

In [2]:
import chainlit as cl
import torch

from chainlit.types import AskFileResponse

from transformers import BitsAndBytesConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface.llms import HuggingFacePipeline

from langchain.memory import ConversationBufferMemory
from langchain_community.chat_message_histories import ChatMessageHistory

from langchain.chains import ConversationalRetrievalChain

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain import hub

### **3. Re-install functions and instance in previous file**

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)

embedding = HuggingFaceEmbeddings()

### **4. Build initialization function to handle input data**

In [4]:
def process_file(file: AskFileResponse):
    if file.type == "text/plain":
        Loader = TextLoader
    elif file.type == "application/pdf":
        Loader = PyPDFLoader
    else:
        raise ValueError("Unsupported file type")

    loader = Loader(file.path)
    documents = loader.load()
    docs = text_splitter.split_documents(documents)
    for i, doc in enumerate(docs):
        doc.metadata["source"] = f"source_{i}"
    return docs

### **5. Build Chroma database initialization function**

In [5]:
def get_vector_db(file: AskFileResponse):
    docs = process_file(file)
    cl.user_session.set("docs", docs)
    vector_db = Chroma.from_documents(
        documents=docs,
        embedding=embedding
    )
    return vector_db

### **6. Initialize Large Language Model**

In [None]:
def get_huggingface_llm(model_name: str="lmsys/vicuna-7b-v1.5", max_new_token: int=512):
    nf4_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=nf4_config,
        low_cpu_mem_usage=True
    )

    tokenizer = AutoTokenizer.from_pretrained(model_name)

    model_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=max_new_token,
        pad_token_id=tokenizer.eos_token_id,
        device_map="auto"
    )

    llm = HuggingFacePipeline(pipeline=model_pipeline)
    return llm

LLM = get_huggingface_llm()

### **7. Initialize welcome message**


In [7]:
welcome_message = """Welcome to the PDF QA! To get started :
1. Upload a PDF or text file
2. Ask a question about the file
"""

### **8. Initialize on_chat_start function**

In [8]:
@cl.on_chat_start
async def on_chat_start():
    files = None
    while files is None:
        files = await cl.AskFileMessage(
            content=welcome_message,
            accept=["text/plain", "application/pdf"],
            max_size_mb=40,
            timeout=180,
        ).send()
    file_data = files[0]

    msg = cl.Message(content=f"Processing '{file_data.name}'...", disable_feedback=True)

    await msg.send()

    vector_db = await cl.make_async(get_vector_db)(file_data)

    message_history = ChatMessageHistory()
    memory = ConversationBufferMemory(
        memory_key="chat_history",
        output_key="answer",
        chat_memory=message_history,
        return_messages=True
    )

    retriever = vector_db.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 3}
    )

    chain = ConversationalRetrievalChain.from_llm(
        llm=LLM,
        chain_type="stuff",
        retriever=retriever,
        memory=memory,
        return_source_documents=True
    )

    msg.content = f"'{file_data.name}' processed. You can now ask questions!"
    await msg.update()

    cl.user_session.set("chain", chain)

### **9. Initialize on_message function**

In [9]:
@cl.on_message
async def on_message(message: cl.Message):
    chain = cl.user_session.get("chain")
    cb = cl.AsyncLangchainCallbackHandler()
    res = await chain.ainvoke(message.content, callbacks=[cb])
    answer = res["answer"]
    source_documents = res["source_documents"]
    text_elements = []

    if source_documents:
        for source_idx, source_doc in enumerate(source_documents):
            source_name = f"source_{source_idx}"
            text_elements.append(
                cl.Text(content=source_doc.page_content, name=source_name)
            )
        source_names = [text_el.name for text_el in text_elements]

        if source_names:
            answer += f"\nSources: {', '.join(source_names)}"
        else:
            answer += "\nNo sources found."

    await cl.Message(content=answer, elements=text_elements).send()

### **10. Run chainlit app**

To run app, we should export code to app.py

In [12]:
%%writefile app.py
import chainlit as cl
import torch

from chainlit.types import AskFileResponse

from transformers import BitsAndBytesConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface.llms import HuggingFacePipeline

from langchain.memory import ConversationBufferMemory
from langchain_community.chat_message_histories import ChatMessageHistory

from langchain.chains import ConversationalRetrievalChain

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain import hub


#
TEXT_SPLITTER = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)

embedding = HuggingFaceEmbeddings()


#
def process_file(file: AskFileResponse):
    if file.type == "text/plain":
        Loader = TextLoader
    elif file.type == "application/pdf":
        Loader = PyPDFLoader
    else:
        raise ValueError("Unsupported file type")

    loader = Loader(file.path)
    documents = loader.load()
    docs = text_splitter.split_documents(documents)
    for i, doc in enumerate(docs):
        doc.metadata["source"] = f"source_{i}"
    return docs


#
def get_vector_db(file: AskFileResponse):
    docs = process_file(file)
    cl.user_session.set("docs", docs)
    vector_db = Chroma.from_documents(
        documents=docs,
        embedding=embedding
    )
    return vector_db


#
def get_huggingface_llm(model_name: str="lmsys/vicuna-7b-v1.5", max_new_token: int=512):
    nf4_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=nf4_config,
        low_cpu_mem_usage=True
    )

    tokenizer = AutoTokenizer.from_pretrained(model_name)

    model_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=max_new_token,
        pad_token_id=tokenizer.eos_token_id,
        device_map="auto"
    )

    llm = HuggingFacePipeline(pipeline=model_pipeline)
    return llm

LLM = get_huggingface_llm()


WELCOME_MESSAGE = """Welcome to the PDF QA! To get started :
1. Upload a PDF or text file
2. Ask a question about the file
"""


@cl.on_chat_start
async def on_chat_start():
    files = None
    while files is None:
        files = await cl.AskFileMessage(
            content=WELCOME_MESSAGE,
            accept=["text/plain", "application/pdf"],
            max_size_mb=40,
            timeout=180,
        ).send()
    file_data = files[0]

    msg = cl.Message(content=f"Processing '{file_data.name}'...", disable_feedback=True)

    await msg.send()

    vector_db = await cl.make_async(get_vector_db)(file_data)

    message_history = ChatMessageHistory()
    memory = ConversationBufferMemory(
        memory_key="chat_history",
        output_key="answer",
        chat_memory=message_history,
        return_messages=True
    )

    retriever = vector_db.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 3}
    )

    chain = ConversationalRetrievalChain.from_llm(
        llm=LLM,
        chain_type="stuff",
        retriever=retriever,
        memory=memory,
        return_source_documents=True
    )

    msg.content = f"'{file_data.name}' processed. You can now ask questions!"
    await msg.update()

    cl.user_session.set("chain", chain)


@cl.on_message
async def on_message(message: cl.Message):
    chain = cl.user_session.get("chain")
    cb = cl.AsyncLangchainCallbackHandler()
    res = await chain.ainvoke(message.content, callbacks=[cb])
    answer = res["answer"]
    source_documents = res["source_documents"]
    text_elements = []

    if source_documents:
        for source_idx, source_doc in enumerate(source_documents):
            source_name = f"source_{source_idx}"
            text_elements.append(
                cl.Text(content=source_doc.page_content, name=source_name)
            )
        source_names = [text_el.name for text_el in text_elements]

        if source_names:
            answer += f"\nSources: {', '.join(source_names)}"
        else:
            answer += "\nNo sources found."

    await cl.Message(content=answer, elements=text_elements).send()



# if __name__ == "__main__":
#     cl.run()

Overwriting app.py


Run app by chainlit

In [15]:
!chainlit run app.py --host 0.0.0.0 --port 8000 &>/content/logs.txt &

### **11. Expose localhost to public host by localtunnel**

In [None]:
import urllib

print("Password/Enpoint IP for localtunnel is:", urllib.request.urlopen("https://ipv4.icanhazip.com").read().decode("utf8").strip("\n"))

!lt --port 8000 --subdomain aivn-simple-rag