<a href="https://colab.research.google.com/github/zc277584121/bootcamp/blob/advanced_rag/bootcamp/RAG/advanced_rag/query_routing_with_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Query routing

## Google Colab preparation[optional]
This is an optional step, if you want to run this notebook on Google Colab.

In [None]:
! git clone -b advanced_rag --single-branch https://github.com/zc277584121/bootcamp.git

In [None]:
import shutil
src_dir = "./bootcamp/bootcamp/RAG/advanced_rag/rag_utils"
dst_dir = "./rag_utils"
shutil.copytree(src_dir, dst_dir)
src_dir = "./bootcamp/bootcamp/RAG/advanced_rag/imgs"
dst_dir = "./imgs"
shutil.copytree(src_dir, dst_dir)

In [None]:
! pip install git+https://github.com/zc277584121/langchain.git@zc_milvus#subdirectory=libs/partners/milvus&egg=langchain_milvus

In [None]:
! pip install --upgrade langchain langchain-community langchain-openai bs4

Please prepare you [OPENAI_API_KEY](https://openai.com/index/openai-api/) in your environment variables.
![](imgs/colab_api_key1.png)

### If you are running this notebook on Google Colab, you have to restart this session by `Cmd/Ctrl + M`, then press `.` to make the environment take effect.

In [None]:
from google.colab import userdata
import os

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

----
## Get started
![](imgs/query_routing.png)

## Prepare the data

We use the Langchain WebBaseLoader to load documents from [blog sources](https://lilianweng.github.io/posts/2023-06-23-agent/) and split them into chunks using the RecursiveCharacterTextSplitter.

In [1]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from rag_utils.vanilla import vectorstore

# Create a WebBaseLoader instance to load documents from web sources
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
# Load documents from web sources using the loader
documents = loader.load()
# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

# Split the documents into chunks using the text_splitter
docs = text_splitter.split_documents(documents)

## Build the chain

We load the docs into milvus vectorstore, and build a milvus retriever.

In [2]:
vectorstore.add_documents(docs)
retriever = vectorstore.as_retriever()

Build a router chain, and try to invoke it. It can return a string that classifies whether the query is decomposable.

In [3]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from rag_utils.vanilla import llm
from rag_utils.route import ROUTER_PROMPT

router_chain = (
    {"question": RunnablePassthrough()}
    | PromptTemplate.from_template(ROUTER_PROMPT)
    | llm
    | StrOutputParser()
)

router_chain.invoke("How can I use Milvus and what is the zilliz")

'Decomposable\nReason: The question can be decomposed into two sub-questions: "How can I use Milvus?" and "What is Zilliz?".'

Define the vanilla RAG chain.

In [4]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from rag_utils.vanilla import format_docs, rag_prompt, llm

vanilla_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

Define the sub query chain.

In [5]:
from rag_utils.sub_query import SubQueryRetriever
from langchain_core.runnables import RunnablePassthrough

sub_query_retriever = SubQueryRetriever.from_vectorstore(vectorstore)

sub_query_chain = (
    {"context": sub_query_retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

Define a route function.

In [6]:
from rag_utils.route import parse_router_output


def route(info):
    if parse_router_output(info["category"]) == "Decomposable":
        print("invoke sub_query_chain...")
        return RunnableLambda(lambda x: x["question"]) | sub_query_chain
    else:  # Independent
        print("invoke vanilla_rag_chain...")
        return RunnableLambda(lambda x: x["question"]) | vanilla_rag_chain

Let's define the full chain.

In [7]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

full_chain = {
    "category": router_chain,
    "question": RunnablePassthrough(),
} | RunnableLambda(route)

## Test the chain


In [8]:
query1 = "Which are the different types of memory and different types ANN algorithms?"

print("\n\n", full_chain.invoke(query1))

invoke sub_query_chain...
sub_queries: ['What are the different types of memory?', 'What are the different types of ANN algorithms?']


 The different types of memory include:

1. Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information after the original stimuli have ended. It typically lasts for up to a few seconds. Subcategories include iconic memory (visual), echoic memory (auditory), and haptic memory (touch).

2. Short-Term Memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning. Short-term memory is believed to have the capacity of about 7 items and lasts for 20-30 seconds.

3. Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:
   - Explicit / declarative me

In [9]:
print(vanilla_rag_chain.invoke(query1))

The different types of memory include:

1. Sensory Memory: This is the earliest stage of memory, retaining impressions of sensory information for up to a few seconds. It includes iconic (visual), echoic (auditory), and haptic (touch) memory.

2. Short-Term Memory (STM) or Working Memory: This stores information that we are currently aware of and needed for complex cognitive tasks. It has a capacity of about 7 items and lasts for 20-30 seconds.

3. Long-Term Memory (LTM): This can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. It includes explicit/declarative memory (facts and events that can be consciously recalled) and implicit/procedural memory (skills and routines performed automatically).

As for ANN algorithms for fast Maximum Inner Product Search (MIPS), the common choice is the approximate nearest neighbors (ANN) algorithm. This algorithm returns approximately the top k nearest neighbors, trading 

In [10]:
query2 = "Which are the different types of memory?"

print("\n\n", full_chain.invoke(query2))

invoke vanilla_rag_chain...


 The different types of memory include:

1. Sensory Memory: This is the earliest stage of memory, retaining impressions of sensory information for up to a few seconds. It includes iconic (visual), echoic (auditory), and haptic (touch) memory.

2. Short-Term Memory (STM) or Working Memory: This type of memory stores information that we are currently aware of and is needed for complex cognitive tasks. It has a capacity of about 7 items and lasts for 20-30 seconds.

3. Long-Term Memory (LTM): This type of memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. It has two subtypes:
   - Explicit / declarative memory: This is memory of facts and events that can be consciously recalled. It includes episodic memory (events and experiences) and semantic memory (facts and concepts).
   - Implicit / procedural memory: This type of memory is unconscious and involves skills and routin

In [11]:
print(vanilla_rag_chain.invoke(query2))

The different types of memory include:

1. Sensory Memory: This is the earliest stage of memory, retaining impressions of sensory information for up to a few seconds. It includes iconic memory (visual), echoic memory (auditory), and haptic memory (touch).

2. Short-Term Memory (STM) or Working Memory: This type of memory stores information that we are currently aware of and is needed for complex cognitive tasks. It has a capacity of about 7 items and lasts for 20-30 seconds.

3. Long-Term Memory (LTM): This type of memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. It has two subtypes:
   - Explicit / declarative memory: This is memory of facts and events that can be consciously recalled. It includes episodic memory (events and experiences) and semantic memory (facts and concepts).
   - Implicit / procedural memory: This type of memory is unconscious and involves skills and routines that are perfo