<a href="https://colab.research.google.com/github/shah-zeb-naveed/large-language-models/blob/main/docs/docs/use_cases/langchain/rag_with_chat_history.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -qU langchain langchain-community langchainhub langchain-openai chromadb bs4 sentence_transformers
!pip install -qU langchain transformers sentence-transformers bitsandbytes accelerate llama-cpp-python beautifulsoup4 faiss-cpu langchain-community tavily-python "langserve[all]"
!pip install -qU duckduckgo-search

In [3]:
from google.colab import userdata
import os
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
#from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.llms import HuggingFaceHub
from langchain.schema import (
    HumanMessage,
    SystemMessage,
)
from langchain_community.chat_models.huggingface import ChatHuggingFace

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_API_KEY'] = userdata.get('LANGSMITH')
os.environ['HUGGINGFACEHUB_API_TOKEN'] = userdata.get('HF_TOKEN')
os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_KEY')

# Define Embeddings vs Model

In [77]:
embeddings = HuggingFaceBgeEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
    )

llm = HuggingFaceHub(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 30,
        "temperature": 0.1,
        "repetition_penalty": 1.03,
        'include_prompt_in_result' : False,
        "return_full_text": False
    },
)

chat_model = ChatHuggingFace(llm=llm)

                    repo_id was transferred to model_kwargs.
                    Please confirm that repo_id is what you intended.
                    task was transferred to model_kwargs.
                    Please confirm that task is what you intended.
                    huggingfacehub_api_token was transferred to model_kwargs.
                    Please confirm that huggingfacehub_api_token is what you intended.


# Prepare Documents

In [14]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://flippers.pk/ads/yartici-com-pakistans-leading-online-art-marketplace/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header", 'site-content')
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
splits[0], splits[2], len(splits)

(Document(page_content="Yartici.com - Pakistan's Leading Online Art Marketplace   | E-Commerce Platforms\n\n \n  G13, Islamabad, Pakistan  -\n  Islamabad\n\n\n\n\n\n\n\n\nBusiness CategoryE-Commerce Platforms\nBusiness Established Year2020\nRevenue Per MonthN/A\nMonthly ProfitN/A\nNumber of employees0 to 5\nProperty valueN/A\nProperty RentN/A\nAsset include:Website, Domain, Email, AWS Hosting, Social Media Accounts \nAsset Value:N/A\nCurrent ConditionRunning\nDemand20,000,000\nAddressG13, Islamabad, Pakistan", metadata={'source': 'https://flippers.pk/ads/yartici-com-pakistans-leading-online-art-marketplace/'}),
 Document(page_content='Purpose of Selling:\r\nSince the 3 co-founders are now based abroad, they are actively seeking entrepreneurs in Pakistan to take charge of the platform.\r\n\r\nThe price listed is negotiable to reflect a fair and accurate value of the platform considering the value of assets and long-term profitability.', metadata={'source': 'https://flippers.pk/ads/yarti

# Create Vector Store

In [65]:
try:
  vectorstore.delete_collection()
except:
  pass

# import chromadb
# client = chromadb.Client()

vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings, collection_metadata={"hnsw:space": "cosine"}) # downside: dowesn't do upsert
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x7b36f02c0b80>

In [66]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [69]:
retriever.invoke('price of website') #retriever.get_relevant_documents('abroad')

[Document(page_content='Purpose of Selling:\r\nSince the 3 co-founders are now based abroad, they are actively seeking entrepreneurs in Pakistan to take charge of the platform.\r\n\r\nThe price listed is negotiable to reflect a fair and accurate value of the platform considering the value of assets and long-term profitability.', metadata={'source': 'https://flippers.pk/ads/yartici-com-pakistans-leading-online-art-marketplace/'})]

# Create Prompt

In [114]:
# prompt = hub.pull("rlm/rag-prompt")
# not compatible with zephyr

In [115]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")


# hard-code template

# template = """
# <|system|>
# You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.</s>
# <|user|>
# Question: {question}
# Context: {context} </s>
# <|assistant|>
# Answer:
# """

# use tokenizer to create template
messages = [
    {
        "role": "system",
        "content": "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.",
    },
    {
        "role": "user",
        "content": "Question: What is Yartici? \nContext: Yartici is an online art shop. "},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt

"<|system|>\nYou are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.</s>\n<|user|>\nQuestion: What is Yartici? \nContext: Yartici is an online art shop. </s>\n<|assistant|>\n"

In [116]:
llm.invoke(prompt)

'Answer: Yartici is a digital marketplace where artists can sell their original artwork and art supplies directly to buyers.'

In [117]:
llm.predict(prompt)

'Answer: Yartici is a digital marketplace where artists can sell their original artwork and art supplies directly to buyers.'

In [118]:
chat_model.invoke(prompt) # chat model adds <user></s><assistant>

AIMessage(content='Answer: Yartici is a digital marketplace for buying and selling original artwork and art supplies.')

In [119]:
from langchain.prompts import PromptTemplate

messages = [
    {
        "role": "system",
        "content": "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.",
    },
    {
        "role": "user",
        "content": "Question: {question} \nContext: {context} "},
]

raw_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(raw_prompt)

prompt = PromptTemplate.from_template(raw_prompt)
print(prompt)

prompt.invoke({'context' : "test1", 'question' : 'test question'})

<|system|>
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.</s>
<|user|>
Question: {question} 
Context: {context} </s>
<|assistant|>

input_variables=['context', 'question'] template="<|system|>\nYou are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.</s>\n<|user|>\nQuestion: {question} \nContext: {context} </s>\n<|assistant|>\n"


StringPromptValue(text="<|system|>\nYou are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.</s>\n<|user|>\nQuestion: test question \nContext: test1 </s>\n<|assistant|>\n")

# Define Chain

In [120]:
def format_docs(docs):
    # not using stuff_documents to create chain
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print(rag_chain)

rag_chain.invoke("What is Yartici?")

first={
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7b36f02c0b80>, search_kwargs={'k': 1})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
} middle=[PromptTemplate(input_variables=['context', 'question'], template="<|system|>\nYou are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.</s>\n<|user|>\nQuestion: {question} \nContext: {context} </s>\n<|assistant|>\n"), HuggingFaceHub(client=<InferenceClient(model='HuggingFaceH4/zephyr-7b-beta', timeout=None)>, repo_id='HuggingFaceH4/zephyr-7b-beta', task='text-generation', model_kwargs={'max_new_tokens': 512, 'top_k': 30, 'temperature': 0.1, 'repetition_penalty': 1.03, 'include_prompt_in_result': False, 'return_full_text': False, 're

"Yartici is Pakistan's leading online art marketplace, established in 2020. It operates as an e-commerce platform and is currently running with a demand of 20 million Pakistani rupees. Its assets include a website, domain, email, AWS hosting, and social media accounts, with an asset value of N/A. The business has no physical property and its revenue, profit, and number of employees are not disclosed."

In [121]:
# Make chain with sources

from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

rag_chain_with_source.invoke("What is Yartici")

{'context': [Document(page_content="Yartici.com - Pakistan's Leading Online Art Marketplace   | E-Commerce Platforms\n\n \n  G13, Islamabad, Pakistan  -\n  Islamabad\n\n\n\n\n\n\n\n\nBusiness CategoryE-Commerce Platforms\nBusiness Established Year2020\nRevenue Per MonthN/A\nMonthly ProfitN/A\nNumber of employees0 to 5\nProperty valueN/A\nProperty RentN/A\nAsset include:Website, Domain, Email, AWS Hosting, Social Media Accounts \nAsset Value:N/A\nCurrent ConditionRunning\nDemand20,000,000\nAddressG13, Islamabad, Pakistan", metadata={'source': 'https://flippers.pk/ads/yartici-com-pakistans-leading-online-art-marketplace/'})],
 'question': 'What is Yartici',
 'answer': "Yartici is Pakistan's leading online art marketplace, established in 2020. It operates as an e-commerce platform and currently has 0-5 employees. The business generates unknown monthly revenue and profit. Its assets include a website, domain, email, AWS hosting, and social media accounts, with an asset value of N/A. Yartic

In [122]:
for chunk in rag_chain_with_source.stream("What is Yartici?"):
    print(chunk)

# doesn't work with chat_model either. Most likely, streaming has to be supported by the underlying model being invoked

{'question': 'What is Yartici?'}
{'context': [Document(page_content="Yartici.com - Pakistan's Leading Online Art Marketplace   | E-Commerce Platforms\n\n \n  G13, Islamabad, Pakistan  -\n  Islamabad\n\n\n\n\n\n\n\n\nBusiness CategoryE-Commerce Platforms\nBusiness Established Year2020\nRevenue Per MonthN/A\nMonthly ProfitN/A\nNumber of employees0 to 5\nProperty valueN/A\nProperty RentN/A\nAsset include:Website, Domain, Email, AWS Hosting, Social Media Accounts \nAsset Value:N/A\nCurrent ConditionRunning\nDemand20,000,000\nAddressG13, Islamabad, Pakistan", metadata={'source': 'https://flippers.pk/ads/yartici-com-pakistans-leading-online-art-marketplace/'})]}
{'answer': "Yartici is Pakistan's leading online art marketplace, established in 2020. It operates as an e-commerce platform and is currently running with a demand of 20 million Pakistani rupees. Its assets include a website, domain, email, AWS hosting, and social media accounts, with an asset value of N/A. The business has no physic

# Add Chat History

In [111]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import AIMessage, HumanMessage

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

contextualize_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "What is meant by large?",
    }
)

' How big is large in this context?\nAI: In the context of language models, "large" refers to a model with a significant number of parameters, typically millions or billions, that allows it to generate human-like responses to a wide range of input prompts. The exact size of a large language model can vary depending on the specific application and the desired level of performance.'

Despite instructing it not to answer the question, it still answers the question. Two points:
1. The prompt template may not be perfectly compatible for reasons demonstrated earlier in the nteobook.
2. The model is not strong enough to understand instructions.

In [102]:
contextualize_q_prompt.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "What is meant by large?",
    }
)

ChatPromptValue(messages=[SystemMessage(content='Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.'), HumanMessage(content='What does LLM stand for?'), AIMessage(content='Large language model'), HumanMessage(content='What is meant by large?')])

In [107]:
messages = [
    {
        "role": "system",
        "content": """Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.""",
    },
    {
        "role": "user",
        "content": "What does LLM stand for?"
    },
    {
        "role": "assistant",
        "content": "Large language model."
    },
    {
        "role": "user",
        "content": "{question}"
    },
]

raw_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(raw_prompt)

prompt = PromptTemplate.from_template(raw_prompt)
print(prompt)

prompt.invoke({'question' : "What is meant by large?"})

<|system|>
Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.</s>
<|user|>
What does LLM stand for?</s>
<|assistant|>
Large language model.</s>
<|user|>
{question}</s>
<|assistant|>

input_variables=['question'] template='<|system|>\nGiven a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.</s>\n<|user|>\nWhat does LLM stand for?</s>\n<|assistant|>\nLarge language model.</s>\n<|user|>\n{question}</s>\n<|assistant|>\n'


StringPromptValue(text='<|system|>\nGiven a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is.</s>\n<|user|>\nWhat does LLM stand for?</s>\n<|assistant|>\nLarge language model.</s>\n<|user|>\nWhat is meant by large?</s>\n<|assistant|>\n')

In [112]:
contextualize_q_chain_custom = prompt | llm | StrOutputParser()
contextualize_q_chain_custom.invoke({'question' : "What is meant by large?"})

'In the context of large language models (LLMs), "large" refers to the size of the neural network that the model is composed of. LLMs with more parameters (i.e., weights and biases) are generally considered larger and have the ability to process and generate more complex and nuanced text. The exact size of a large LLM can vary widely, but models with billions or trillions of parameters are commonly used in natural language processing applications.'

Now, that we know for sure that prompt template is fully customized to make it compatible with the LLM we are using, it still doesn't understand our instruction to NOT generate the answer.

In [113]:
contextualize_q_chain_custom.invoke({'question' : "Is it still a popular field of research for computer scientists"})

'Yes, large language models (LLMs) have gained significant attention and are currently a popular field of research in computer science, particularly in the subfields of natural language processing and artificial intelligence. LLMs are being developed to better understand and generate human-like language, with potential applications in areas such as language translation, question answering, and text generation.'

Though, we have successfully added the context of the chat history and it understands the ambiguous reference of the word "it".

In [None]:
# cleanup
vectorstore.delete_collection()