In [1]:
import getpass
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

In [2]:
os.environ["MISTRAL_API_KEY"] = getpass.getpass()

## Indexing
### Loading Data / documents

- To load the data there are various loaders available with langchain
- We will use WebBaseLoader in this case to load data from a blog post
- BeautifulSoup is used to filter out the required content and load the same


In [3]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

len(docs[0].page_content)

USER_AGENT environment variable not set, consider setting it to identify your requests.


43131

We have loaded the data from a blog post the length of each page is 43131 characters  
As you can see the type of docs is list of Documents  
  
Langchain supports more than 160 [Integrations (Various types of loaders)](https://python.langchain.com/v0.2/docs/integrations/document_loaders/)

## Splitting
Each document contains large chunk of text. eg. first document above contains 43k chars.  
There is a need to split the text into multiple small sized documents  
Lets split into 1000 char each with an overlap of 200 chars  

### Why splitting ?
The context size of 43k would be too long to fit into the context window of many models  

We will use RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

We set add_start_index=True so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.

In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, 
    add_start_index=True  #index at which each split starts
)
all_splits = text_splitter.split_documents(docs)

len(all_splits)

66

In [6]:
len(all_splits[0].page_content)

969

As we can see above the each Document is now less than 1000 characters  

In [9]:
all_splits[0].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 8}

## Store
- We need to index our documents so that they can be searched over  
- First step would be to create embeddings of each Document
- Store embeddings in a vector database

In [11]:
os.environ['HF_TOKEN'] = getpass.getpass()

In [14]:
from langchain_chroma import Chroma
from langchain_mistralai import MistralAIEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=MistralAIEmbeddings())

## Retrieval and Generation
Application logic
```mermaid
flowchart LR
A[Take user question] --> B(search vector database)
B --> C(Find relevent context vectors)
C --> D(Pass to LLM as context\n along with query)
D --> E(LLM outputs Answer)
```
We will use this logic to create a simple application

### Retrieve
- Search the database based on similarity score and retrieve matching documents
- Retriever interface of langchian wraps the index for search. It supports many search techniques including similarity search
- VectorStore.as_retriever() returns a retriever

In [15]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

len(retrieved_docs)

6

In [16]:
print(retrieved_docs[0].page_content)

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.


### Generate
Let’s put it all together into a chain: 

```mermaid
flowchart LR
A[question] -->B(retrieves relevant documents)
B --> C(constructs a prompt)
C --> D(passes that to a model)
D --> E(parses the output)
```


In [17]:
from langchain_mistralai import ChatMistralAI

llm = ChatMistralAI(model="mistral-large-latest")

In [None]:
# !pip install langchainhub

In [20]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()

example_messages

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]

In [21]:
print(example_messages[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


In [23]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task Decomposition is a process where a complex task is broken down into smaller, more manageable tasks. This is often done by Large Language Models (LLMs) using techniques like Chain of Thought (CoT), where the model is instructed to "think step by step" to utilize more test-time computation. This process transforms hard tasks into simpler steps, making them easier to understand and execute. The decomposition can be done by LLMs with simple prompting, using task-specific instructions, or with human inputs.

retriever, prompt, llm are instances of Runnable  i.e. they implement the methods invoke, stream, batch.  This is how they can be connected togather  
in a RunnableSequence  
- The operator for connection is '|'  
- You would have a question that format_docs is a python function and it does not implement Runnable interface, then how is it a part of the chain? Well,format_docs is a function but it is cast into RunnableLambda by langchain automatically. Hence a function works inside a chain
- dict with "context" and "question" are cast into RunnableParallel

That might sound complicated. Lets just remember that each object in the chain is Runnable  

As we've seen above, the input to prompt is expected to be a dict with keys "context" and "question".  
So the first element of this chain builds runnables that will calculate both of these from the input question:

- retriever | format_docs passes the question through the retriever, generating Document objects, and then to format_docs to generate strings;  
- RunnablePassthrough() passes through the input question unchanged.

That is, if you constructed

In [26]:
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
)

chain.invoke("Why is task decomposition required?")

ChatPromptValue(messages=[HumanMessage(content='You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don\'t know the answer, just say that you don\'t know. Use three sentences maximum and keep the answer concise.\nQuestion: Why is task decomposition required? \nContext: Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n\nHere are a sample conversation for task 

Then chain.invoke(question) would build a formatted prompt, ready for inference. (Note: when developing with LCEL, it can be practical to test with sub-chains like this.)

### Built-in chains
If preferred, LangChain includes convenience functions that implement the above LCEL. We compose two functions:

- **create_stuff_documents_chain** specifies how retrieved context is fed into a prompt and LLM. In this case, we will "stuff" the contents into the prompt -- i.e., we will include all retrieved context without any summarization or other processing. It largely implements our above rag_chain, with input keys context and input-- it generates an answer using retrieved context and query.
- **create_retrieval_chain** adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer. It has input key input, and includes input, context, and answer in its output.

In [28]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "What is Task Decomposition?"})
print(response["answer"])

Task decomposition is the process of breaking down complex tasks into smaller, more manageable subtasks. This technique is often used in AI systems to improve performance on intricate tasks. The model is instructed to "think step by step", decomposing harder tasks into simpler steps. This not only makes the task more manageable but also provides insight into the model's thinking process. For example, in the context of writing a Super Mario game in Python, task decomposition might involve breaking the task into subtasks like designing the game's graphics, programming the game's physics, and implementing keyboard controls.


#### Customizing the prompt
As shown above, we can load prompts (e.g., this RAG prompt) from the prompt hub. The prompt can also be easily customized:

In [29]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

"Task Decomposition is a process where a complex task is broken down into smaller, more manageable tasks. This is often done by Large Language Models (LLMs) using techniques like Chain of Thought (CoT) prompting, where the model is instructed to think step-by-step. The result is a series of simpler tasks that are easier to execute and understand, shedding light on the model's thinking process. Thanks for asking!"