<a href="https://colab.research.google.com/github/quantranvr/all-in-one/blob/main/QA_with_RAG_series_part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

Tutorial @ https://python.langchain.com/docs/use_cases/question_answering/quickstart

This notebook contains 2 core parts:
1. **Reproduce** [tutorial](https://python.langchain.com/docs/use_cases/question_answering/quickstart)'s example(s)
2. **Apply** knowledge learned to solve a new (but similar) problem


# Installation

In [2]:
!pip install --upgrade --quiet langchain langchain-community langchainhub langchain-openai chromadb bs4

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m802.4/802.4 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.0/509.0 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m228.7/228.7 kB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.4/223.4 kB[0m [31m22.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m44.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m58.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━

# API KEYS

In [4]:
openai_api_key = "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"

# 2024.01.22
# Thank you for signing up! LangSmith is still in closed beta and we're slowly
# rolling access to more users. You are on a waitlist and we will get back to
# you when we roll out more invites.
langsmith_api_key = ""

# Part 1: **Reproduce** [tutorial](https://python.langchain.com/docs/use_cases/question_answering/quickstart)'s example

`In this guide we’ll build a QA app over the LLM Powered Autonomous Agents blog post by Lilian Weng, which allows us to ask questions about the contents of the post.` ([Reference](https://python.langchain.com/docs/use_cases/question_answering/quickstart#preview))

## Set up environment variable `OPENAI_API_KEY`

In [5]:
import os

os.environ["OPENAI_API_KEY"] = openai_api_key

## Import

In [6]:
# for loading the blog post content
from langchain_community.document_loaders import WebBaseLoader
# for parsing HTML to text
import bs4
# for recursively splitting the document using common separators
# until each chunk is the appropriate size
from langchain.text_splitter import RecursiveCharacterTextSplitter
# for pulling prompt from langchain hub
from langchain import hub
# for vector store
from langchain_community.vectorstores import Chroma
# for parsing output
from langchain_core.output_parsers import StrOutputParser
# for ?
from langchain_core.runnables import RunnablePassthrough
# for setting up llm and embeddings
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

## Load, chunk and index the contents of the [blog](https://lilianweng.github.io/posts/2023-06-23-agent/)

Notes:
1. In RecursiveCharacterTextSplitter, `add_start_index` is set to `True` so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”
2. `TextSplitter` is a subclass of `DocumentTransformers` - an object that perfomrs a transformation on a list of documents
3. *When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors)*
4. **Embedding** is wrapper around a text embedding model and used for converting text to embeddings
5. **VectorStore** is wrapper around a vectordatabase and used for storing and querying embeddings

Going **deeper** at:
1. [Document loaders](https://python.langchain.com/docs/integrations/document_loaders/) to choose from 160+ integrations
2. [Document transformers](https://python.langchain.com/docs/integrations/document_transformers/) to select the appropriate integration
3. [Embeddings](https://python.langchain.com/docs/integrations/text_embedding/) to select from 30+ integrations
4. [VectorSctore](https://python.langchain.com/docs/integrations/vectorstores/) to select from 40+ integrations

In [23]:
# Setup document loader
bs4_strainer = bs4.SoupStrainer(class_ = ("post-content", "post-title", "post-header"))
loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = {"parse_only": bs4_strainer}
)

# Load the blog post content
docs = loader.load()

# Setup text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
    add_start_index = True,
)

# Split the docs
splits = text_splitter.split_documents(docs)

# Embed the contents of each document split and
# insert these embeddings into a vector database (or vector store)
vectorstore = Chroma.from_documents(
    documents = splits,
    embedding = OpenAIEmbeddings()
)

## Retrieve and generate using the relevant snippets of the blog

Notes:
1. *We’ll use the [LCEL Runnable](https://python.langchain.com/docs/expression_language/) protocol to define the chain, allowing us to - pipe together components and functions in a transparent way*
2.

Go **deeper** at:
1. [Chat models](https://python.langchain.com/docs/integrations/chat/) to choose from 25+ integrations
2. [LLM](https://python.langchain.com/docs/integrations/llms) to choose from 75+ integrations
3. [Customized prompt](https://python.langchain.com/docs/integrations/llms) to learn more about customizing prompt

In [21]:
# Turn vector store into a retriever
retriever = vectorstore.as_retriever()

# Use a prompt for RAG that is checked into the LangChain prompt hub
# https://smith.langchain.com/hub/rlm/rag-prompt?organizationId=1648f50c-3d41-5454-a345-8a3645232d42
prompt = hub.pull("rlm/rag-prompt")

# Setup LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# format retrieved info
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Pipe together components and functions
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [28]:
# See what a prompt for RAG looks like
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
print(f"RAG Prompt example:\n\"\"\"\n{example_messages[0].content}\n\"\"\"")

RAG Prompt example:
"""
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:
"""


In [22]:
# Invoke the chain to answer user question
rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought or Tree of Thoughts, which guide the model to think step by step and explore multiple reasoning possibilities. Task decomposition can also involve task-specific instructions or human inputs.'

In [29]:
# cleanup
vectorstore.delete_collection()

# Part 2: **Apply** knowledge learned to similar problem

Problem:

A LangChain learner wants to understand certain concepts of LangChain Expression Language (LCEL).

Retrieve information from this [docs](https://python.langchain.com/docs/expression_language/) to answer his questions

In [None]:
# for loading the blog post content
from langchain_community.document_loaders import WebBaseLoader
# for parsing HTML to text
import bs4
# for recursively splitting the document using common separators
# until each chunk is the appropriate size
from langchain.text_splitter import RecursiveCharacterTextSplitter
# for pulling prompt from langchain hub
from langchain import hub
# for vector store
from langchain_community.vectorstores import Chroma
# for parsing output
from langchain_core.output_parsers import StrOutputParser
# for ?
from langchain_core.runnables import RunnablePassthrough
# for setting up llm and embeddings
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [37]:
# load document
bs4_strainer = bs4.SoupStrainer(class_="docMainContainer_gTbr")
loader = WebBaseLoader(
    web_paths = ("https://python.langchain.com/docs/expression_language/",),
    bs_kwargs = {"parse_only": bs4_strainer}
)

docs = loader.load()

In [42]:
print(f"Document length = {len(docs[0].page_content)} characters")

Document length = 2743 characters


In [47]:
# split document into chunks of text
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 800,
    chunk_overlap = 100,
    add_start_index = True
)

splits = splitter.split_documents(docs)

In [52]:
# store chunk of texts into vector store
vectorstore = Chroma.from_documents(
    documents = splits,
    embedding = OpenAIEmbeddings(),
)

In [54]:
# setup retriever
retriever = vectorstore.as_retriever()

# setup prompt
prompt = hub.pull("rlm/rag-prompt")

# setup llm
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# format retrieved info
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# define the chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [59]:
# answer the question
user_questions = [
    "What are the advantages of using LangChain Expression Language?",
    "What are the disadvantages of using LangChain Expression Language?",
    "Why should I use LCEL?",
    "What does Async mean?",
    "How LCEL supports Async?"
]

rag_chain.invoke(user_questions[-1])

'LCEL supports async by allowing chains to be called with both synchronous and asynchronous APIs. This enables the use of the same code for prototypes and production, with the ability to handle concurrent requests. LCEL also optimizes parallel execution for chains with steps that can be executed in parallel, reducing latency.'