### RAG Q&A Application

For monitoring our application, using Langsmith

In [1]:
# monitoring & tracing 
import os 

monitoring = False

if monitoring:
    os.environ['LANGCHAIN_TRACING_V2'] = "true"
    os.environ['LANGCHAIN_PROJECT'] = "Rag_App"

Loading all required enivornment variables.

In [2]:
from dotenv import load_dotenv

# loading all the environment variables
load_dotenv()

True

### RAG Pipeline - Indexing + Retrieval + Generation 

 LLM Model - Chat Model Interface

In [3]:
from langchain_openai import ChatOpenAI

chat_llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.7,
    max_tokens=None,
)

#### 1. Indexing - Loading Documents

We need to first load the blog post contents. We can use `DocumentLoaders` for this, which are objects that load in data from source and return a list of `Documents` 

* A `Document` is an object with some `page_content` (str) and `metadata` (dict). 

In this case we'll use the `WebBaseLoader`, which uses `urllib` to load HTML page from web URLs and `BeautifulSoup` to parse it to text. 

- We can customize the HTML -> text parsing by passing in parameters to the `BeautifulSoup` parser via `bs_kwargs`. In this case only HTML tags with class `post-content`, `post-title`, `post-header` are relevant, so we'll remove all others. 


In [4]:
!pip install bs4 -qqq

### Loading webpage
extracting only: post-title", "post-header", "post-content"

website: https://lilianweng.github.io/posts/2023-06-23-agent/

In [5]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# only keep post content, title and header from the full HTML
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))

# web loader 
loader = WebBaseLoader(
    web_path="https://lilianweng.github.io/posts/2023-06-23-agent/", 
    bs_kwargs={"parse_only": bs4_strainer},
)

In [6]:
# loading docs 
docs = loader.load()

len(docs[0].page_content)

43131

There 43131 words in the website.

In [7]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


#### 2. Indexing - Splitting 

Our loaded document is over 42k character long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs. 

To handle this we'll split the `Document` into `chunks` for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time. 

In this case we'll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of seperating a statement from important context related to it. We use the `RecursiveCharacterTextSplitter`, which will recursively split the documents using common seperators like new lines until each chunk is the appropriate size. "This is the recommended text splitter for generic text use cases".  

* we set `add_start_index=True` so that the character index at which each split Document starts within the intial Document is preserved as metadata attribute `start_index`.

In [8]:
from langchain_text_splitters import RecursiveCharacterTextSplitter 

# text splitter 
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)

# splitting the text
all_splits = text_splitter.split_documents(docs)

len(all_splits)

66

In [9]:
len(all_splits[0].page_content)

969

In [10]:
all_splits[50].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 33888}

#### 3 Indexing - Store

Now we need to index our 66 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or) a vector store. 

When we want to search over our splits, we take a text search query, embed it and perform some sort of `similarity` search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is `cosine` similarity - we measure the cosine of the angle between each pair of embeddings. 

In [89]:
!pip install langchain_chroma -qqq

In [11]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings 

# storing our documents in a vector store
vector_store = Chroma.from_documents(
                        documents=all_splits,
                        embedding=OpenAIEmbeddings(),
)

This completes the `Indexing` portion of the pipeline. At this point we have a query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question. 

## Similarity Search

In [12]:
def similarity_search(query: str, top_k) -> list: 
    results = vector_store.similarity_search_with_score(
    query=query,
    )
    return results

In [32]:
similarity_search("What is the best way to train a reinforcement learning agent?", top_k=5)

[(Document(page_content='Fig. 5. After fine-tuning with CoH, the model can follow instructions to produce outputs with incremental improvement in a sequence. (Image source: Liu et al. 2023)\nThe idea of CoH is to present a history of sequentially improved outputs  in context and train the model to take on the trend to produce better outputs. Algorithm Distillation (AD; Laskin et al. 2023) applies the same idea to cross-episode trajectories in reinforcement learning tasks, where an algorithm is encapsulated in a long history-conditioned policy. Considering that an agent interacts with the environment many times and in each episode the agent gets a little better, AD concatenates this learning history and feeds that into the model. Hence we should expect the next predicted action to lead to better performance than previous trials. The goal is to learn the process of RL instead of training a task-specific policy itself.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-a

In [33]:
def pretty_print_doc_score(docs):
    for i,doc in enumerate(docs): 
        print(f"Document: {i+1} \n")
        print("Content: \n")
        print(doc[0].page_content)
        print("\n")
        print("Score: \n")
        print(doc[1])
        print("\n")
        print("Metadata: \n")
        print(doc[0].metadata)
        print("\n\n")

In [34]:
pretty_print_doc_score(similarity_search("What is the agent?", 5))

Document: 1 

Content: 

They also discussed the risks, especially with illicit drugs and bioweapons. They developed a test set containing a list of known chemical weapon agents and asked the agent to synthesize them. 4 out of 11 requests (36%) were accepted to obtain a synthesis solution and the agent attempted to consult documentation to execute the procedure. 7 out of 11 were rejected and among these 7 rejected cases, 5 happened after a Web search while 2 were rejected based on prompt only.
Generative Agents Simulation#
Generative Agents (Park, et al. 2023) is super fun experiment where 25 virtual characters, each controlled by a LLM-powered agent, are living and interacting in a sandbox environment, inspired by The Sims. Generative agents create believable simulacra of human behavior for interactive applications.
The design of generative agents combines LLM with memory, planning and reflection mechanisms to enable agents to behave conditioned on past experience, as well as to inter

#### 4. Retrieval and Generation: Retrieve

Now let's write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and intial question to a model, and returns an answer.

First we need to define our logic for searching over documents. Langchain defines a `Retriever` interface which wraps an index that can return relevant `Documents` given a string query. 

The most common type of `Retriever` is the `VectorStoreRetriever`, which uses the similarity search capabilities of a vector store to facilitate retrieval. 

* Any `VectorStore` can easily be turned into a `Retriever` with `VectorStore.as_retriever()`:

In [35]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

# retrieved documents
retrieved_docs = retriever.invoke("what are the approaches to Task Decomposition?")

len(retrieved_docs)

5

In [36]:
retrieved_docs

[Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}),
 Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decompositi

In [37]:
print(retrieved_docs[0].page_content)

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


In [38]:
print(retrieved_docs[1].metadata)

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}


In [39]:
def pretty_print_doc(docs):
    for i,doc in enumerate(docs): 
        print(f"Document: {i+1} \n")
        print("Content: \n")
        print(doc.page_content)
        print("\n")
        print("Metadata: \n")
        print(doc.metadata)
        print("\n\n")

In [35]:
pretty_print_doc(retrieved_docs)

Document: 1 

Content: 

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


Metadata: 

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}



Document: 2 

Content: 

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposit

#### 5. Retrieval and Generation: Generate

Let's put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

In [40]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

We'll use a prompt for RAG that is checked into the Langchain prompt hub. 

In [95]:
!pip install langchainhub -qqq

In [41]:
from langchain import hub 

# pulling RAG prompt from the hub
prompt = hub.pull('rlm/rag-prompt')


print(prompt)


input_variables=['context', 'question'] metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]


In [42]:
print(prompt.messages[0].prompt.template)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:


In [43]:
example_message = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()

print(example_message)

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]


In [44]:
print(example_message[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


We'll use `LCEL Runnable` protocol to define the chain, allowing us to:

* Pipe together components and functions in a transparent way. 
* Automatically trace our chain in LangSmith. 
* Get streaming, async, and batched calling out of the box.

In [45]:
# formatting the retrieved documents 
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

sample_context = format_docs(all_splits[30:40])

sample_context

'API-Bank (Li et al. 2023) is a benchmark for evaluating the performance of tool-augmented LLMs. It contains 53 commonly used API tools, a complete tool-augmented LLM workflow, and 264 annotated dialogues that involve 568 API calls. The selection of APIs is quite diverse, including search engines, calculator, calendar queries, smart home control, schedule management, health data management, account authentication workflow and more. Because there are a large number of APIs, LLM first has access to API search engine to find the right API to call and then uses the corresponding documentation to make a call.\n\nFig. 12. Pseudo code of how LLM makes an API call in API-Bank. (Image source: Li et al. 2023)\nIn the API-Bank workflow, LLMs need to make a couple of decisions and at each step we can evaluate how accurate that decision is. Decisions include:\n\nWhether an API call is needed.\nIdentify the right API to call: if not good enough, LLMs need to iteratively modify the API inputs (e.g. d

In [102]:
print(sample_context)

API-Bank (Li et al. 2023) is a benchmark for evaluating the performance of tool-augmented LLMs. It contains 53 commonly used API tools, a complete tool-augmented LLM workflow, and 264 annotated dialogues that involve 568 API calls. The selection of APIs is quite diverse, including search engines, calculator, calendar queries, smart home control, schedule management, health data management, account authentication workflow and more. Because there are a large number of APIs, LLM first has access to API search engine to find the right API to call and then uses the corresponding documentation to make a call.

Fig. 12. Pseudo code of how LLM makes an API call in API-Bank. (Image source: Li et al. 2023)
In the API-Bank workflow, LLMs need to make a couple of decisions and at each step we can evaluate how accurate that decision is. Decisions include:

Whether an API call is needed.
Identify the right API to call: if not good enough, LLMs need to iteratively modify the API inputs (e.g. deciding

In [46]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

output_parser = StrOutputParser()

# Rag chain 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | output_parser
)

In [47]:
# normal response
answer = rag_chain.invoke("what is Task Decomposition?")

print(answer)

Task Decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable. This process can be done through techniques like Chain of Thought and Tree of Thoughts to guide an agent in planning and executing tasks effectively. Task decomposition can be facilitated by using simple prompts, task-specific instructions, or human inputs to ensure successful completion of the overall task.


In [48]:
# streamed response
for chunk in rag_chain.stream("what is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task Decomposition is a technique where complex tasks are broken down into smaller, more manageable steps. Models like Chain of Thought and Tree of Thoughts help in decomposing hard tasks into simpler steps, enhancing performance. Task decomposition can be done using LLM with simple prompting, task-specific instructions, or human inputs.

In [49]:
# without output parser 
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

output_parser = StrOutputParser()

# Rag chain 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()} 
    | prompt 
    | llm
)

In [50]:
response = rag_chain.invoke("what is Task Decomposition?")

In [51]:
response.content

'Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps. This approach helps agents plan and execute tasks more efficiently by transforming big tasks into manageable components. Task decomposition can be done through prompting techniques, task-specific instructions, or human inputs.'

In [52]:
response.response_metadata

{'token_usage': {'completion_tokens': 53,
  'prompt_tokens': 778,
  'total_tokens': 831},
 'model_name': 'gpt-3.5-turbo',
 'system_fingerprint': None,
 'finish_reason': 'stop',
 'logprobs': None}

Let's dissect the LCEL to understand what's going on..

First: Each of these components (`retriever`, `prompt`, `llm`, etc). are instances of `Runnable`. This means that they implement the same methods -- such as `sync`, `async`, `.invoke`, `.stream`, `.batch` - which makes them easier to connect together. 

* They can be connected into a `RunnableSequence`-- another Runnable -- via the `|` operator. 

* Langchain automatically cast certain objects to runnables when met with the `|` operator. Here, `format_docs` is cast to a `RunnableLambda`, and the dict with "context" and "question" is cast to a `RunnableParallel`. The details are less important than the bigger point, which is that each object is a `Runnable`.  

Let's trace how the input question flows through the above runnables. 

As we've seen above, the input to `prompt` is expected to be a dict with keys `"context"` and `"question"`. So the first element of this chain builds runnables that will calculate both if these from the input question: 

* `retriever | format_docs` passes the question through the retriever, generating the `Document` objects, and then to `format_docs` to generate strings; 

* `RunnablePassThrough()` passes through the input question unchanged. 

That is, if you constructed:

In [53]:
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
)

Then `chain.invoke(question)` would build a formatted prompt, ready for inference. (Note: when developing with LCEL, it can be pratical to test with sub-chains like this.)

The last steps of the chain are `llm`, which runs the inference, and `StrOuputParser()`, which just plucks the strings content out of the LLM's output message. 

* You can analyze the individual steps of this chain via its `LangSmith trace`.

In [54]:
constrcuted_prompt = chain.invoke("what is Task Decomposition?")

In [31]:
print(constrcuted_prompt.messages[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: what is Task Decomposition? 
Context: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step.

### Built-in chains

If preferred, LangChain includes convenience functions that implement the above LCEL. We compose two functionsç 

* `create_stuff_documents_chain`: specifies how retrieved context is fed into a prompt and LLM. In this case, we will `"stuff"` the contents into the prompts --i.e., we will include all retrieved context without any summarization or other processing. It largely implements our above `rag_chain`, with input keys `context` and `input` -- it generates an answer using retrieved context and query. 

* `create_retrieval_chain`: adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the fine answer. It has input key `input`, `context`, and `answer` in its output.

In [55]:
from langchain.chains.retrieval import create_retrieval_chain 
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# system prompt 
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)


prompt = ChatPromptTemplate.from_messages(
    messages=[
        ("system", system_prompt),
        ("human", "{input}")
    ]
)


# stuff documents chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)

# # rag chain 
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# response 
response = rag_chain.invoke({"input":"what is Task Decomposition?"})

print(response)

print('Answer: ',response['answer'],"\n\n")

{'input': 'what is Task Decomposition?', 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}), Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple

In [56]:
for doc in response['context']:
    print(doc)
    print()

page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}

page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be 

In [57]:
from langchain.chains import create_retrieval_chain 
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

template = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. 
Context: {context} 
Question: {input}
Answer:
"""

# prompt template
prompt = ChatPromptTemplate.from_template(template)

question_answer_chain = create_stuff_documents_chain(llm, prompt)

rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# response 
response = rag_chain.invoke({"input":"what is Task Decomposition?"})

print(response)

print('Answer: ',response['answer'],"\n\n")

{'input': 'what is Task Decomposition?', 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}), Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple

In [35]:
from langchain.chains import create_retrieval_chain 
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt = hub.pull('rlm/rag-prompt')

print(prompt)

prompt.input_variables[1] = "input"

print(prompt)

# question_answer_chain = create_stuff_documents_chain(llm, prompt)

# rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# # response 
# response = rag_chain.invoke({"input":"what is Task Decomposition?"})

# print(response)

# print('Answer: ',response['answer'],"\n\n")

input_variables=['context', 'question'] metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]
input_variables=['context', 'input'] metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answ

In [58]:
question = "what is Task Decomposition?"

result = rag_chain.invoke({"input": question})

In [59]:
result

{'input': 'what is Task Decomposition?',
 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}),
  Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multi

In [60]:
def extract_source(doc):
    source = []
    for doc in result['context']:
        source.append(doc.metadata['source'])
    return source

Extracted sources

In [61]:
extract_source(result)

['https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/']