In [1]:
# https://python.langchain.com/docs/use_cases/question_answering/

In [2]:
import os
os.environ['OPENAI_API_KEY']='sk-111111111111111111111111111111111111111111111111'
os.environ['OPENAI_API_BASE']='http://127.0.0.1:5000/v1'

In [3]:
import requests
model_info_url = 'http://127.0.0.1:5000/v1/internal/model/info'
resp = requests.get(model_info_url)
model = resp.json()['model_name']

print(model)

Mistral-7B-Instruct-v0.2-8.0bpw-h8-exl2-2


In [4]:
import bs4
import pickle

from langchain.vectorstores import Chroma
from langchain.document_loaders import WebBaseLoader

In [5]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

In [6]:
# docs = loader.load()

In [7]:
with open('data/test-docs.pkl','rb') as fh:
    docs = pickle.load(fh)

In [8]:
len(docs[0].page_content)

42824

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [10]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

In [11]:
len(all_splits)

66

In [12]:

len(all_splits[0].page_content)

969

In [13]:

all_splits[10].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'start_index': 7056}

In [14]:
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.vectorstores import Chroma

In [15]:
vectorstore = Chroma.from_documents(documents=all_splits, embedding=HuggingFaceBgeEmbeddings(model_name="BAAI/bge-small-en-v1.5", model_kwargs = {'device': 'cuda:1'}))

In [16]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [17]:
retrieved_docs = retriever.get_relevant_documents(
    "What are the approaches to Task Decomposition?"
)

In [18]:
len(retrieved_docs)

6

In [19]:
print(retrieved_docs[0].page_content)

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


In [20]:
# from langchain.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(max_tokens=500)

In [21]:
# External call
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

In [22]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [23]:
print(
    prompt.invoke(
        {"context": "filler context", "question": "filler question"}
    ).to_string()
)

Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


In [24]:
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

In [25]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [26]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [27]:
for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition is the process of breaking down a complex task into smaller, manageable sub-tasks. This can be done using language model prompts, task-specific instructions, or human inputs. For example, in the context of a Super Mario game in Python, tasks might include setting up the game environment, defining game functions, and implementing user input for keyboard control. Task decomposition allows agents to plan ahead and better understand their thought process. LLMs, such as CoT, utilize this technique to enhance model performance on complex tasks.

In [28]:
from langchain.prompts import PromptTemplate

In [29]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""

In [30]:
rag_prompt_custom = PromptTemplate.from_template(template)

In [31]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt_custom
    | llm
    | StrOutputParser()
)

In [32]:
rag_chain.invoke("What is Task Decomposition?")

"Task Decomposition is the process of breaking down complex tasks into smaller, manageable steps. It is a common technique used in AI systems to enhance performance and provide insight into the model's thought process. This can be achieved through LLM with simple prompts, task-specific instructions, or human inputs. (thanks for asking!)"

In [33]:
from operator import itemgetter

In [34]:
from langchain.schema.runnable import RunnableParallel

In [35]:
rag_chain_from_docs = (
    {
        "context": lambda input: format_docs(input["documents"]),
        "question": itemgetter("question"),
    }
    | rag_prompt_custom
    | llm
    | StrOutputParser()
)

In [36]:
rag_chain_with_source = RunnableParallel(
    {"documents": retriever, "question": RunnablePassthrough()}
) | {
    "documents": lambda input: [doc.metadata for doc in input["documents"]],
    "answer": rag_chain_from_docs,
}

In [37]:
rag_chain_with_source.invoke("What is Task Decomposition")

{'documents': [{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 2192},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 1585},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 17804},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 39221},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 30952},
  {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
   'start_index': 4317}],
 'answer': 'Task decomposition is the process of breaking down a complex task into smaller, manageable steps. It is a common technique used in artificial intelligence and machine learning models to enhance performance and provide insight into the model\'s thinking process. CoT (Chain of Thought) is a popular prompting technique for task decomposition, instructing the model to "think step by step" to decompose big ta

In [38]:
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

In [39]:
condense_q_system_prompt = """Given a chat history and the latest user question \
which might reference the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

In [40]:
condense_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", condense_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)

In [41]:
condense_q_chain = condense_q_prompt | llm | StrOutputParser()

In [42]:
from langchain.schema.messages import AIMessage, HumanMessage

In [43]:
condense_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "What is meant by large",
    }
)

'In the context of language models, "large" refers to models that have been trained on a significant amount of data and have a large number of parameters. These models are able to generate more accurate and contextually relevant responses compared to smaller models. They are also able to handle a wider range of topics and have a deeper understanding of language.'

In [44]:
condense_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "How do transformers work",
    }
)

'Transformer models are a type of neural network architecture introduced in a paper called "Attention is All You Need" by Vaswani et al. in 2017. The transformer model is designed for sequence-to-sequence tasks, such as machine translation, but it has also been applied to other tasks like text summarization and language modeling.\n\nThe key innovation of the transformer model is the self-attention mechanism, which allows the model to focus on different parts of the input sequence when producing each output token. This is in contrast to traditional recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, which process the input sequence one token at a time and rely on explicit state representations to keep track of context.\n\nThe self-attention mechanism works by computing a weighted sum of the input embeddings for each output token, where the weights are determined by the similarity of the input embeddings to each other. This similarity is calculated using a dot pr

In [45]:
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""

In [46]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)

In [47]:
def condense_question(input: dict):
    if input.get("chat_history"):
        return condense_q_chain
    else:
        return input["question"]

In [48]:
rag_chain = (
    RunnablePassthrough.assign(context=condense_question | retriever | format_docs)
    | qa_prompt
    | llm
)

In [49]:
chat_history = []

In [50]:
question = "What is Task Decomposition?"
ai_msg = rag_chain.invoke({"question": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question), ai_msg])

In [51]:
second_question = "What are common ways of doing it?"
rag_chain.invoke({"question": second_question, "chat_history": chat_history})

AIMessage(content='There are several common ways to perform task decomposition, depending on the specific context and the nature of the project or problem being addressed. Here are some common approaches:\n\n1. Top-down decomposition: In this approach, the overall goal or objective is defined first, and then the project is broken down into progressively smaller and more detailed sub-tasks or components. This method is commonly used in software development projects, where the system architecture is defined first, and then the various components and modules are designed and developed.\n2. Bottom-up decomposition: In this approach, the focus is on the individual components or tasks, and the project is built up from the ground level. This method is commonly used in data analysis or data processing projects, where large datasets are broken down into smaller pieces and processed in parallel.\n3. Divide and Conquer: In this approach, the project is divided into smaller sub-problems, each of w

In [52]:
chat_history

[HumanMessage(content='What is Task Decomposition?'),
 AIMessage(content='Task decomposition is a technique used in project management, computer science, and other fields to break down a complex project or task into smaller, manageable sub-tasks or components. This process helps simplify the overall goal into more manageable pieces, making it easier to plan, execute, and manage the work involved.\n\nIn software engineering, task decomposition is often used in designing algorithms or developing software. A complex problem is broken down into smaller, more manageable functions or procedures. Each function or procedure then performs a specific task, contributing to the overall solution.\n\nOne common method for task decomposition is the use of recursion, where a problem is solved by reducing it to a smaller instance of itself. For example, in a tree-traversal algorithm, a large data structure is broken down into smaller sub-trees, which are then processed recursively.\n\nAnother approach 