# Conversational RAG agent

## How to use

Upload your documents (see on the left) and use your HuggingFace token in the secrets. The variable name must be `HF_TOKEN`.

Then run everything, wait and go to the bottom: start chatting.

In [None]:
!apt-get update
!apt-get install -y tesseract-ocr poppler-utils

0% [Working]            Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
0% [Connecting to archive.ubuntu.com (91.189.91.82)] [Connecting to security.ubuntu.com (91.189.91.8                                                                                                    Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
0% [Waiting for headers] [Waiting for headers] [Connected to r2u.stat.illinois.edu (192.17.190.167)]                                                                                                    Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Waiting for headers] [Connected to r2u.stat.illinois.edu (192.17.190.167)]                                                                                                    Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
0% [Waiting for headers] [Connected to r2u.stat.illinois.edu (192.17.190.167

In [None]:
%pip install --quiet --upgrade \
  bitsandbytes \
  torch \
  transformers \
  langchain \
  langchain_community \
  langchain_huggingface \
  "unstructured[pdf]" \
  sentence_transformers \
  faiss-gpu \
  pdf2image \
  pytesseract \
  langgraph \
  nltk

In [None]:
import os
import nltk

# Set NLTK data path
os.environ['NLTK_DATA'] = '/root/nltk_data'

# Download the 'punkt' tokenizer data
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
)

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
        return_full_text=False,
    ),
    model_kwargs={"quantization_config": quantization_config},
)

chat_model = ChatHuggingFace(llm=llm)

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]



In [None]:
from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
    ),
]

ai_msg = chat_model.invoke(messages)

In [None]:
ai_msg

AIMessage(content='According to a popular philosophical paradox, when an unstoppable force meets an immovable object, it is impossible to determine which will prevail because both are defined as being incapable of being stopped or moved, respectively. This paradox raises questions about the nature of force and motion and challenges our understanding of cause and effect. However, in reality, such a scenario is hypothetical and cannot occur in the physical world as both concepts are theoretical extremes that cannot exist simultaneously in the real world.', additional_kwargs={}, response_metadata={}, id='run-4c02c1f2-8a0f-4c74-9695-6c708960c75a-0')

In [None]:
!ls /root/nltk_data/tokenizers/punkt

czech.pickle	 finnish.pickle  malayalam.pickle   PY3_tab	    swedish.pickle
danish.pickle	 french.pickle	 norwegian.pickle   README	    turkish.pickle
dutch.pickle	 german.pickle	 polish.pickle	    russian.pickle
english.pickle	 greek.pickle	 portuguese.pickle  slovene.pickle
estonian.pickle  italian.pickle  PY3		    spanish.pickle


In [None]:
!cp -R /root/nltk_data/tokenizers/punkt/PY3 /root/nltk_data/tokenizers/punkt/PY3_tab

In [None]:
import os, sys
import typing as ty

from langchain_core.documents import Document
from langchain_community.document_loaders.directory import DirectoryLoader

loader = DirectoryLoader(
    path=os.path.join('.'),
    glob="*.pdf",
    recursive=True,
)

In [None]:
docs: ty.List[Document] = loader.load()

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores.faiss import FAISS

# For all model names, see: https://www.sbert.net/docs/pretrained_models.html
embedding = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
db = FAISS.from_documents(chunked_docs, embedding=embedding)

In [None]:
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

In [None]:
from langchain.tools.retriever import create_retriever_tool

# Build retriever tool
tool = create_retriever_tool(
    history_aware_retriever,
    name="document_retriever",
    description="Searches and returns excerpts from the local database of documents.",
)
tools = [tool]

In [None]:
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
agent_executor = create_react_agent(chat_model, tools, checkpointer=memory)

In [None]:
config = {"configurable": {"thread_id": "abc123"}}

for event in agent_executor.stream(
    {"messages": [HumanMessage(content="What is Task Decomposition?")]},
    config=config,
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


What is Task Decomposition?

Task decomposition is a process in project management and systems engineering that involves breaking down a complex task or project into smaller, more manageable subtasks or components. This approach helps to clarify the scope of the project, identify dependencies between tasks, and allocate resources more effectively. By decomposing tasks into smaller pieces, it becomes easier to estimate time, cost, and resource requirements, as well as to monitor progress and identify potential issues or risks. Task decomposition is an essential part of project planning and execution, as it enables teams to develop detailed work plans, schedules, and budgets, and to ensure that all necessary steps are taken to achieve the desired outcome.


In [None]:
query = "What are common ways of doing it?"

for event in agent_executor.stream(
    {"messages": [HumanMessage(content=query)]},
    config=config,
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


What are common ways of doing it?

There are several common ways to decompose tasks, depending on the nature of the project and the preferences of the project manager or systems engineer. Here are some common approaches:

1. Top-down decomposition: This is a hierarchical approach where the project is broken down into larger, more general tasks, which are then further decomposed into smaller, more specific tasks. This approach helps to ensure that all critical aspects of the project are identified and addressed.

2. Bottom-up decomposition: This is a more detailed approach where the project is broken down into its smallest possible components, and then these components are grouped together to form larger tasks. This approach helps to ensure that all necessary details are captured and that the project is executed with a high degree of precision.

3. Cross-functional decomposition: This approach involves breaking down tasks across multiple functional areas or disciplines, such as enginee

In [None]:
event

{'messages': [HumanMessage(content='What is Task Decomposition?', additional_kwargs={}, response_metadata={}, id='9d7fbf5c-fe3e-49ad-a037-64fa90dfcee3'),
  AIMessage(content='Task decomposition is a process in project management and systems engineering that involves breaking down a complex task or project into smaller, more manageable subtasks or components. This approach helps to clarify the scope of the project, identify dependencies between tasks, and allocate resources more effectively. By decomposing tasks into smaller pieces, it becomes easier to estimate time, cost, and resource requirements, as well as to monitor progress and identify potential issues or risks. Task decomposition is an essential part of project planning and execution, as it enables teams to develop detailed work plans, schedules, and budgets, and to ensure that all necessary steps are taken to achieve the desired outcome.', additional_kwargs={}, response_metadata={}, id='run-293a1781-b943-4568-b2ff-417efc1aee45

## Chat

In [None]:
while True:
  query = input("Ask your question: ")
  if query.lower().strip() == "stop":
    break
  for event in agent_executor.stream(
      {"messages": [HumanMessage(content=query)]},
      config=config,
      stream_mode="values",
  ):
      event["messages"][-1].pretty_print()

Ask your question: is today a good day for you?

is today a good day for you?

As an artificial intelligence language model, I don't experience days or have personal feelings. I'm always available to assist you with any questions or requests you may have, regardless of the time or day. So, you can feel free to reach out to me anytime you need help.
Ask your question: what did i just ask?

what did i just ask?

I'm not privy to your previous conversations or thoughts. Please provide more context so I can assist you better. If you're asking this question because you've forgotten what you asked me earlier, please let me know what you remember about the conversation, and I'll do my best to help you recall what was discussed.
Ask your question: did i ask about the day?

did i ask about the day?

I'm not aware of your previous conversations or messages. However, based on our previous interactions, I can tell you that we've discussed various topics, but I don't recall if you asked about the d

KeyboardInterrupt: Interrupted by user