### **Conversational RAG** Application with Langchain & OpenAI LLM

In [2]:
import os
from dotenv import load_dotenv

load_dotenv("../.env")

True

#### Initialize OpenAI LLM

In [3]:
from langchain_openai import ChatOpenAI

# set openAI API key
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Initialize that ChatOpenAI model
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0
)

#### Initializig Embedding Model

In [4]:
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

#### Load PDF Document

In [5]:
from langchain_community.document_loaders import PyPDFLoader

# Load the PDF document
loader = PyPDFLoader(".\storytelling with data.pdf")
doc = loader.load()

In [6]:
len(doc)

75

In [7]:
doc[0]

Document(metadata={'producer': 'PyPDF', 'creator': 'Google', 'creationdate': '', 'moddate': '2025-03-14T14:53:05+05:30', 'source': '.\\storytelling with data.pdf', 'total_pages': 75, 'page': 0, 'page_label': '1'}, page_content='Storytelling \nwitǝ DatƝ\nVisual Analytics and User Experience Design\n(IT4031)')

#### Split document into chunks

In [8]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=20)

# Split the document into chunks
splits = text_splitter.split_documents(doc)

In [9]:
len(splits)

75

In [10]:
splits[20].page_content

'The most powerful person in \nthe world is the storyteller.\nThe storyteller sets the \nvision, values, and agenda of \nan entire generation that is to \ncome.\n- Steve Jobs\n“\n“'

#### Create Vector Store and Retriever

In [11]:
from langchain_chroma import Chroma

# create a vector store from the document chunks
vector_store = Chroma.from_documents(documents=splits, embedding=embedding_model)

In [12]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever()

#### Define Prompt Template

In [13]:
from langchain_core.prompts import ChatPromptTemplate

# Define the system prompt
system_prompt = (
    "You are an intelligent chat bot. Use the following context to answer the question. If you don't know the answer, just say you don't know." 
    "\n\n" 
    "{context}"
)

# create the prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}")
    ]
)

In [14]:
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an intelligent chat bot. Use the following context to answer the question. If you don't know the answer, just say you don't know.\n\n{context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

#### Create Retrieval Augmented Generation (RAG) Chain

In [15]:
from langchain_core.runnables import RunnablePassthrough

# Function to join documents
def join_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

# Final RAG chain (CORRECT)
rag_chain = (
    {
        "context": retriever | join_docs,
        "input": RunnablePassthrough(),
    }
    | prompt
    | llm
)

#### Invoke RAG chain with example question

In [16]:
response = rag_chain.invoke("What is the meaning of storytelling with data?")

In [18]:
response = rag_chain.invoke("what are things we should learn under storytelling with data")

In [20]:
response = rag_chain.invoke("can you list down?")

In [21]:
response.content

"I'm sorry, I don't have access to specific lists or information about how bad things can get. If you have a specific question or topic in mind, feel free to ask!"

### Add Chat History

#### Create History Aware Retriever

In [41]:
# from langchain_core.prompts import MessagesPlaceholder
# from langchain_core.output_parsers import StrOutputParser

# # Define contextualize system prompt
# contextualize_system_prompt = (
#     "Given a chat history and the latest user question, "
#     "reformulate the question if it refers to context in the chat history. "
#     "Otherwise, return the question as is. "
#     "ONLY return the reformulated question, nothing else."
# )

# contextualize_prompt = ChatPromptTemplate.from_messages(
#     [
#         ("system", contextualize_system_prompt),
#         MessagesPlaceholder("chat_history"),
#         ("human", "{input}")
#     ]
# )

# # Create a chain to convert chat history + question into a standalone question
# standalone_question_chain = contextualize_prompt | llm | StrOutputParser()

#### Create History Aware RAG Chain

In [42]:
# # Define the main system prompt
# system_prompt = (
#     "You are an intelligent chatbot. Use the following context to answer the question. "
#     "If you don't know the answer based on the context, just say you don't know. "
#     "Use the chat history to understand the context of the conversation.\n\n"
#     "Context:\n{context}"
# )

# # Create the main prompt template
# prompt = ChatPromptTemplate.from_messages(
#     [
#         ("system", system_prompt),
#         MessagesPlaceholder("chat_history"),
#         ("human", "{input}")
#     ]
# )

# prompt

In [None]:
# from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# def format_docs(docs):
#     return "\n\n".join(doc.page_content for doc in docs)

# # Create the complete RAG chain
# rag_chain = (
#     {
#         # Get chat history
#         "chat_history": lambda x: x.get("chat_history", []),
#         # Create standalone question from history and input
#         "standalone_question": lambda x: standalone_question_chain.invoke({
#             "input": x["input"], 
#             "chat_history": x.get("chat_history", [])
#         }) if x.get("chat_history") else x["input"],
#         # Keep original input
#         "input": lambda x: x["input"]
#     }
#     | RunnableLambda(lambda x: {
#         # Retrieve documents using standalone question
#         "context": retriever.get_relevant_documents(x["standalone_question"]),
#         "chat_history": x["chat_history"],
#         "input": x["input"]
#     })
#     | RunnableLambda(lambda x: {
#         # Format the documents
#         "context": format_docs(x["context"]),
#         "chat_history": x["chat_history"],
#         "input": x["input"]
#     })
#     | prompt
#     | llm
#     | StrOutputParser()
# )

#### Manage Chat Session History

In [43]:
# from langchain_core.chat_history import BaseChatMessageHistory
# from langchain_community.chat_message_histories import ChatMessageHistory
# from langchain_core.runnables.history import RunnableWithMessageHistory

# # Initialize the store for session history
# store = {}

# # Function to get the session history for a given session ID
# def get_session_history(session_id: str) -> BaseChatMessageHistory:
#     if session_id not in store:
#         store[session_id] = ChatMessageHistory()
#     return store[session_id]

# conversational_rag_chain = RunnableWithMessageHistory(
#     rag_chain,
#     get_session_history,
#     input_messages_key="input",
#     history_messages_key="chat_history",
#     output_messages_key="answer"
# )

In [44]:
# # Test the chain
# response = conversational_rag_chain.invoke(
#     {"input": "What is the meaning of storytelling with data?"},
#     config={"configurable": {"session_id": "101"}}
# )
# print(response)

In [45]:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.output_parsers import StrOutputParser

# Define contextualize system prompt
contextualize_system_prompt = (
    "Given a chat history and the latest user question, "
    "reformulate the question if it refers to context in the chat history. "
    "Otherwise, return the question as is. "
    "ONLY return the reformulated question, nothing else."
)

contextualize_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
    ]
)

# Create a chain to convert chat history + question into a standalone question
standalone_question_chain = contextualize_prompt | llm | StrOutputParser()

# Function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Define the main system prompt
system_prompt = (
    "You are an intelligent chatbot. Use the following context to answer the question. "
    "If you don't know the answer based on the context, just say you don't know. "
    "Use the chat history to understand the context of the conversation.\n\n"
    "Context:\n{context}"
)

# Create the main prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
    ]
)

# Create the complete RAG chain
rag_chain = (
    {
        # Get chat history
        "chat_history": lambda x: x.get("chat_history", []),
        # Create standalone question from history and input
        "standalone_question": lambda x: standalone_question_chain.invoke({
            "input": x["input"], 
            "chat_history": x.get("chat_history", [])
        }) if x.get("chat_history") else x["input"],
        # Keep original input
        "input": lambda x: x["input"]
    }
    | RunnableLambda(lambda x: {
        # Retrieve documents using standalone question
        "context": retriever.invoke(x["standalone_question"]),  # Use invoke() instead
        "chat_history": x["chat_history"],
        "input": x["input"]
    })
    | RunnableLambda(lambda x: {
        # Format the documents
        "context": format_docs(x["context"]),
        "chat_history": x["chat_history"],
        "input": x["input"]
    })
    | prompt
    | llm
    | StrOutputParser()
)

# Initialize the store for session history
store = {}

# Function to get the session history for a given session ID
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Create the conversation RAG with session history
conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="output"
)

# Test the chain
response = conversational_rag_chain.invoke(
    {"input": "What is the meaning of storytelling with data?"},
    config={"configurable": {"session_id": "101"}}
)
print(response)

Storytelling with data refers to the practice of presenting data in a meaningful and engaging manner by weaving it into a narrative. It involves using data visualization techniques to effectively communicate insights, trends, and key findings to an audience in a way that is easy to understand and remember. This approach helps make data more impactful and actionable for decision-making purposes.


In [46]:
response = conversational_rag_chain.invoke(
    {"input": "what are things we should learn under storytelling with data"},
    config={"configurable": {"session_id": "101"}}
)
print(response)

Under storytelling with data, there are several key things to learn and master:

1. Data Visualization: Understanding how to create effective charts, graphs, and other visual representations of data to convey information clearly and efficiently.

2. Narrative Building: Learning how to structure a compelling story around the data, including setting the context, presenting the main points, and drawing conclusions.

3. Audience Understanding: Knowing your audience and tailoring your data storytelling approach to their needs, preferences, and level of understanding.

4. Data Interpretation: Developing the ability to analyze data accurately and draw meaningful insights that can be communicated effectively.

5. Communication Skills: Enhancing your communication skills to present data in a clear, concise, and engaging manner, whether through written reports, presentations, or data visualizations.

By mastering these aspects of storytelling with data, you can effectively convey the significanc

In [47]:
response = conversational_rag_chain.invoke(
    {"input": "can you give me a short paragraph"},
    config={"configurable": {"session_id": "101"}}
)
print(response)

Data storytelling is a powerful technique that involves presenting data in a compelling narrative to convey insights and drive decision-making. By mastering data visualization, narrative building, audience understanding, data interpretation, and communication skills, you can effectively engage your audience and make data more impactful and actionable. Through clear and engaging storytelling with data, you can bring your insights to life and inspire informed decisions within your organization.
