# Create a chatbot that works on your documents

LangChain has a number of components designed to help build question-answering applications, and RAG applications more generally. To familiarize ourselves with these, we'll build a simple Q&A application over a text data source. Along the way we'll go over a typical Q&A architecture, discuss the relevant LangChain components, and highlight additional resources for more advanced Q&A techniques. We'll also see how LangSmith can help us trace and understand our application. LangSmith will become increasingly helpful as our application grows in complexity.

## Architecture
We'll create a typical RAG application as outlined in the [Q&A introduction](/docs/use_cases/question_answering/), which has two main components:

**Indexing**: a pipeline for ingesting data from a source and indexing it. *This usually happens offline.*

**Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

The full sequence from raw data to answer will look like:

#### Indexing
1. **Load**: First we need to load our data. We'll use [DocumentLoaders](/docs/modules/data_connection/document_loaders/) for this.
2. **Split**: [Text splitters](/docs/modules/data_connection/document_transformers/) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.
3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/modules/data_connection/vectorstores/) and [Embeddings](/docs/modules/data_connection/text_embedding/) model.

#### Retrieval and generation
4. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a [Retriever](/docs/modules/data_connection/retrievers/).
5. **Generate**: A [ChatModel](/docs/modules/model_io/chat/) / [LLM](/docs/modules/model_io/llms/) produces an answer using a prompt that includes the question and the retrieved data

## Setup

### Dependencies

We'll use an LLma2 chat model and embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/modules/model_io/chat/) or [LLM](/docs/modules/model_io/llms/), [Embeddings](/docs/modules/data_connection/text_embedding/), and [VectorStore](/docs/modules/data_connection/vectorstores/) or [Retriever](/docs/modules/data_connection/retrievers/).

We'll use the following packages:

In [1]:
%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai chromadb bs4

In [3]:
# GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python



In [4]:
# For download the models
!pip install huggingface_hub



In [5]:
from huggingface_hub import hf_hub_download

model_name_or_path = "TheBloke/Llama-2-7B-Chat-GGUF"
model_basename = "llama-2-7b-chat.Q6_K.gguf"

model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [6]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [8]:
from langchain import hub

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain import LlamaCpp

In [9]:
# Load the LLM Model

llm = None
llm = LlamaCpp(model_path= model_path, max_tokens=4096,
               n_threads=2,
               n_gpu_layers=40,
               n_batch=16,
               callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
               n_ctx=2048, # Context window
               verbose = True,
               temperature = 0.0,
               repeat_penalty = 1.2
               )

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


# 1. Chat with YouTube Transcript

In this guide I build a QA app over the [Taiwan election angers China: BBC News Review](https://www.youtube.com/watch?v=1MWX-3ZqcMk) Youtube News from BBC.

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.embeddings import HuggingFaceEmbeddings

In [11]:
!pip install --upgrade --quiet  youtube-transcript-api
!pip install --upgrade --quiet  pytube

In [12]:
from langchain_community.document_loaders import YoutubeLoader

In [13]:
%pip install --upgrade --quiet  langchain sentence_transformers

In [14]:
# Load, chunk and index the contents of the blog.
loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=1MWX-3ZqcMk",
    add_video_info=True,
    language=["en-GB"],
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=HuggingFaceEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [15]:
answer_1 = rag_chain.invoke("What is the News about?")

 Taiwan has kept the status quo but the victory hides a transformed political landscape. Beijing is fuming over the election results and their commitment to Taiwan is rock solid.

In [16]:
answer_1

' Taiwan has kept the status quo but the victory hides a transformed political landscape. Beijing is fuming over the election results and their commitment to Taiwan is rock solid.'

In [17]:
answer_2 = rag_chain.invoke("What makes China angry with the Taiwan new president?")

Llama.generate: prefix-match hit


 Taiwan has kept the status quo by re-electing its current government. China is angry about this because it wants a peaceful reunification, but has also not ruled out the use of force.

In [18]:
answer_2

' Taiwan has kept the status quo by re-electing its current government. China is angry about this because it wants a peaceful reunification, but has also not ruled out the use of force.'

In [19]:
# cleanup
vectorstore.delete_collection()

# 2. Chat with PDF file

In [20]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [21]:
! pip install pypdf



In [22]:
from langchain.document_loaders import PyPDFLoader

file_path = "/content/drive/My Drive/Colab Notebooks/langchain_chat_with_your_data/MachineLearning-Lecture01.pdf"
loader = PyPDFLoader(file_path)
pages = loader.load()

In [23]:
pages[1]

Document(page_content="many biologers are there here? Wow, just a few, not many. I'm surprised. Anyone from \nstatistics? Okay, a few. So where are the rest of you from?  \nStudent : iCME.  \nInstructor (Andrew Ng) : Say again?  \nStudent : iCME.  \nInstructor (Andrew Ng) : iCME. Cool.  \nStudent : [Inaudible].  \nInstructor (Andrew Ng) : Civi and what else?  \nStudent : [Inaudible]  \nInstructor (Andrew Ng) : Synthesis, [inaudible] systems. Yeah, cool.  \nStudent : Chemi.  \nInstructor (Andrew Ng) : Chemi. Cool.  \nStudent : [Inaudible].  \nInstructor (Andrew Ng) : Aero/astro. Yes, right. Yeah, okay, cool. Anyone else?  \nStudent : [Inaudible].  \nInstructor (Andrew Ng) : Pardon? MSNE. All ri ght. Cool. Yeah.  \nStudent : [Inaudible].  \nInstructor (Andrew Ng) : Pardon?  \nStudent : [Inaudible].  \nInstructor (Andrew Ng) : Endo —  \nStudent : [Inaudible].  \nInstructor (Andrew Ng) : Oh, I see, industry. Okay. Cool. Great, great. So as you can \ntell from a cross-section of th is class

In [24]:
# Load, chunk and index the contents of the blog.
file_path = "/content/drive/My Drive/Colab Notebooks/langchain_chat_with_your_data/MachineLearning-Lecture01.pdf"
loader = PyPDFLoader(file_path)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = None
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=HuggingFaceEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [25]:
answer_1 = rag_chain.invoke("What is the pdf file is about?")

Llama.generate: prefix-match hit


 Based on the context provided, it appears that the PDF file is related to a machine learning course taught by Andrew Ng. The instructor mentions handouts with guidelines for an end-of-semester project and discusses topics such as online resources, big O notation, data structures, and clustering algorithms.

In [26]:
answer_1

' Based on the context provided, it appears that the PDF file is related to a machine learning course taught by Andrew Ng. The instructor mentions handouts with guidelines for an end-of-semester project and discusses topics such as online resources, big O notation, data structures, and clustering algorithms.'

In [27]:
answer_2 = rag_chain.invoke("How can I seperate the machine learning model?")

Llama.generate: prefix-match hit


 To separate a machine learning model, one can try to identify the underlying patterns and relationships in the data used for training. This may involve visualizing the data, performing exploratory data analysis (EDA), or using techniques such as feature engineering to extract relevant features from the dataset. Additionally, it is important to understand the assumptions made by the machine learning algorithm being used, as well as any potential biases in the data or model. By understanding these factors and taking appropriate steps to address them, one can improve the performance of the machine learning model and better separate its predictions.

In [28]:
answer_2

' To separate a machine learning model, one can try to identify the underlying patterns and relationships in the data used for training. This may involve visualizing the data, performing exploratory data analysis (EDA), or using techniques such as feature engineering to extract relevant features from the dataset. Additionally, it is important to understand the assumptions made by the machine learning algorithm being used, as well as any potential biases in the data or model. By understanding these factors and taking appropriate steps to address them, one can improve the performance of the machine learning model and better separate its predictions.'

In [60]:
# cleanup
vectorstore.delete_collection()

# Chat with URL

In this guide I build a QA app over the [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

In [61]:
from langchain.document_loaders import WebBaseLoader

In [62]:
# Load, chunk and index the contents of the blog.
file_path = "https://jalammar.github.io/illustrated-transformer/"
loader = WebBaseLoader(file_path)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = None
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=HuggingFaceEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [63]:
docs[0]

Document(page_content='\n\n\nThe Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nJay Alammar\nVisualizing machine learning one concept at a time.@JayAlammar on Twitter. YouTube Channel\n\n\nBlog\nAbout\n\n\n\n\n\n\nThe Illustrated Transformer\n\nDiscussions:\nHacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)\n\n\nTranslations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese\n\nWatch: MIT’s Deep Learning State of the Art lecture referencing this post\n\nFeatured in courses at Stanford, Harvard, MIT, Princeton, CMU and others\nIn the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In thi

In [64]:
answer_1 = rag_chain.invoke("What the content is about?")

Llama.generate: prefix-match hit


 The content of this passage is about attention mechanisms used in deep learning models, specifically in the Transformer architecture.

In [65]:
answer_1

' The content of this passage is about attention mechanisms used in deep learning models, specifically in the Transformer architecture.'

In [66]:
answer_2 = rag_chain.invoke("Please explain the concept of the Transformer architecture as easy as possible")

Llama.generate: prefix-match hit


 The Transformer architecture is an attention-based model that processes input sequences of variable length by parallelizing the computation across all positions in the sequence. It does this by using self-attention mechanisms, which allow different parts of the input to attend to one another and form dependencies between them. Unlike traditional recurrent neural networks (RNNs), which process the input sequentially and have recurrence connections that allow information from previous time steps to influence current step, Transformer models use feedforward neural networks (FNNs) instead, which do not have any recurrence connections and process each position in the sequence independently. This allows for faster computation and more parallelization opportunities, making it particularly well-suited for long input sequences.

In [67]:
answer_2

' The Transformer architecture is an attention-based model that processes input sequences of variable length by parallelizing the computation across all positions in the sequence. It does this by using self-attention mechanisms, which allow different parts of the input to attend to one another and form dependencies between them. Unlike traditional recurrent neural networks (RNNs), which process the input sequentially and have recurrence connections that allow information from previous time steps to influence current step, Transformer models use feedforward neural networks (FNNs) instead, which do not have any recurrence connections and process each position in the sequence independently. This allows for faster computation and more parallelization opportunities, making it particularly well-suited for long input sequences.'

In [68]:
answer_3 = rag_chain.invoke("How can Transformer model calculate the attention?")

Llama.generate: prefix-match hit


 The Transformer model calculates attention by first creating three vectors from each of the encoder's input vectors, called Query (Q), Key (K), and Value (V). These vectors are created by multiplying the embedding by three matrices that were trained during the training process. Then, the model computes the attention score for each element in the sequence by taking the dot product of the Q and K vectors and applying a softmax function. The scores are then used to compute a weighted sum of the Value vector, resulting in the final output of the self-attention layer.

In [70]:
answer_3

" The Transformer model calculates attention by first creating three vectors from each of the encoder's input vectors, called Query (Q), Key (K), and Value (V). These vectors are created by multiplying the embedding by three matrices that were trained during the training process. Then, the model computes the attention score for each element in the sequence by taking the dot product of the Q and K vectors and applying a softmax function. The scores are then used to compute a weighted sum of the Value vector, resulting in the final output of the self-attention layer."

In [71]:
answer_4 = rag_chain.invoke("How can Transformer model catch the sequence of the words by using FFN?")

Llama.generate: prefix-match hit


 The Transformer model uses FFN to catch the sequence of words by processing each word in parallel while flowing through the feed-forward layer. This allows for efficient computation and parallelization of the attention mechanism across different positions in the input sequence, enabling the model to capture long-range dependencies between words.

In [72]:
answer_4

' The Transformer model uses FFN to catch the sequence of words by processing each word in parallel while flowing through the feed-forward layer. This allows for efficient computation and parallelization of the attention mechanism across different positions in the input sequence, enabling the model to capture long-range dependencies between words.'

# Chat with URL(Utilize Beautiful Soup)

In this guide I build a QA app over the [Number of foreign visitors to China in 2023 down more than 60% from pre-pandemic levels](https://edition.cnn.com/travel/china-foreign-visitor-number-2023-intl-hnk/index.html) from CNN News

In [37]:
from langchain.document_loaders import WebBaseLoader
import bs4

In [51]:
# Load, chunk and index the contents of the blog.
file_path = "https://edition.cnn.com/travel/china-foreign-visitor-number-2023-intl-hnk/index.html"
loader = WebBaseLoader(
    file_path,
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("headline__text inline-placeholder", "timestamp", "article__content")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = None
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=HuggingFaceEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [46]:
docs[0]

Document(page_content='\n      Number of foreign visitors to China in 2023¬†down more than 60% from pre-pandemic levels\n    \n  Published\n        11:03 PM EST, Thu January 18, 2024\n    \n      It has been a year since China reopened its borders, but despite loosening its stringent Covid-19 restrictions,¬†foreign travelers have been slow to return to the country with numbers down more than 60% from pre-pandemic levels.\n  \n      China‚Äôs border authorities recorded 35.5 million entries and exits by foreign nationals in 2023, according to the National Immigration Administration. That‚Äôs nearly seven times more than the number from 2022, when the country was deep in its three-year self-imposed Covid isolation.\n  \n      The 2023 figure is just 36% of the 97.7 million border entries and exits by foreign nationals recorded in 2019, suggesting a long road to full recovery, though momentum picked up toward the end of the year.\n  \n      More than half of the border crossings made by f

In [52]:
answer_1 = rag_chain.invoke("What the content is about?")

Llama.generate: prefix-match hit


 The content is about China's efforts to make it easier for international travelers to visit the country following a slow return of visitors in earlier months.

In [53]:
answer_1

" The content is about China's efforts to make it easier for international travelers to visit the country following a slow return of visitors in earlier months."

In [54]:
answer_2 = rag_chain.invoke("What happnened to the Chinese tourism industry?")

Llama.generate: prefix-match hit


 According to the article, China has made efforts to ease travel restrictions for international visitors since reopening its borders a year ago. Despite these efforts, foreign visitor numbers are still down by more than 60% from pre-pandemic levels. The country recorded 35.5 million entries and exits by foreign nationals in 2023, which is nearly seven times the number from 2022 but only 36% of the 97.7 million recorded in 2019.

In [55]:
answer_2

' According to the article, China has made efforts to ease travel restrictions for international visitors since reopening its borders a year ago. Despite these efforts, foreign visitor numbers are still down by more than 60% from pre-pandemic levels. The country recorded 35.5 million entries and exits by foreign nationals in 2023, which is nearly seven times the number from 2022 but only 36% of the 97.7 million recorded in 2019.'

In [56]:
answer_3 = rag_chain.invoke("Plese describe the detailed of the effort that Chiense goverenment did?")

Llama.generate: prefix-match hit


 The Chinese government has made efforts to boost weak consumption and business ties by offering visa-free travel to European and Asian countries. In November, China announced a trial program allowing visitors from six nations to enter without a visa for 15 days, which started in December and lasts until the end of November this year. According to the National Immigration Administration, more than half of the border crossings made by foreign travelers in 2023 were recorded in the last three months of the year. The number of foreign nationals residing in China has rebounded to 85% of the level it was at the end of 2019, and Chinese authorities issued a total of 711,000 residency permits of various types to foreign nationals living in the country in 2023. The latest figures come amid attempts by Beijing to lure back foreign tourists and visitors as it seeks ways to boost a sluggish economy.

In [57]:
answer_3

' The Chinese government has made efforts to boost weak consumption and business ties by offering visa-free travel to European and Asian countries. In November, China announced a trial program allowing visitors from six nations to enter without a visa for 15 days, which started in December and lasts until the end of November this year. According to the National Immigration Administration, more than half of the border crossings made by foreign travelers in 2023 were recorded in the last three months of the year. The number of foreign nationals residing in China has rebounded to 85% of the level it was at the end of 2019, and Chinese authorities issued a total of 711,000 residency permits of various types to foreign nationals living in the country in 2023. The latest figures come amid attempts by Beijing to lure back foreign tourists and visitors as it seeks ways to boost a sluggish economy.'

In [58]:
answer_4 = rag_chain.invoke("How do you think about the effort that Chiense goverenment did?")

Llama.generate: prefix-match hit


 The Chinese government has made efforts to boost weak consumption and business ties by offering visa-free treatments to European and Asian countries. According to the National Immigration Administration, in December, 118,000 travelers from those six nations entered China without a visa under the new policy. While the number of foreign visitors to China in 2023 is down more than 60% from pre-pandemic levels, Chinese authorities issued a total of 711,000 residency permits of various types to foreign nationals living in the country in 2023. Despite these efforts, it has been a year since China reopened its borders and foreign travelers have been slow to return to the country with numbers down more than 60% from pre-pandemic levels.

In [59]:
answer_4

' The Chinese government has made efforts to boost weak consumption and business ties by offering visa-free treatments to European and Asian countries. According to the National Immigration Administration, in December, 118,000 travelers from those six nations entered China without a visa under the new policy. While the number of foreign visitors to China in 2023 is down more than 60% from pre-pandemic levels, Chinese authorities issued a total of 711,000 residency permits of various types to foreign nationals living in the country in 2023. Despite these efforts, it has been a year since China reopened its borders and foreign travelers have been slow to return to the country with numbers down more than 60% from pre-pandemic levels.'

In [None]:
# cleanup
vectorstore.delete_collection()