# RAG From scratch: Routing

## Purpose

This notebook has the purpose to explain Routing. Routing refers to the process of directing queries or tasks to the most appropriate components, resources, or subsystems in a pipeline.

- Logical Routing and Semantic Routing
    - Let LLM choose DB based on the question.
    - Embed question and choose prompt based on similarity
    - [Routing to Multiple Index](https://python.langchain.com/docs/use_cases/query_analysis/techniques/routing#routing-to-multiple-indexes)

## Environment

`(1) Packages`

In [1]:
!pip install langchain_community langchain_openai langchainhub langchain -q
!pip install tiktoken chromadb -q

`(2) LangSmith`

In [2]:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

In [None]:
api_key = os.getenv("LANGCHAIN_API_KEY")
if api_key:
    os.environ["LANGCHAIN_API_KEY"] = api_key
else:
    api_key = input("Enter your API key: ")
    os.environ["LANGCHAIN_API_KEY"] = api_key

`(3) API Keys`

In [None]:
api_key = os.getenv("OPENAI_API_KEY")
if api_key:
    os.environ["OPENAI_API_KEY"] = api_key
    api_key = input("Enter your API key: ")
    os.environ["OPENAI_API_KEY"] = api_key
else:
    api_key = input("Enter your API key: ")
    os.environ["OPENAI_API_KEY"] = api_key

## Logical and Semantic Routing

Here we will use function-calling for classification.

In [5]:
from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


In [6]:
# Data model
class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasource."""
    datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(
        ...,
        description="Given a user question choose which datasource would be most \
        relevant for answering their question"
    )

# LLM with function call
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(RouteQuery)

# Prompt
system = """You are an expert at routing a user question to the appropriate
data source. Based on the programming language the question is referring to,
route it to the relevant data source.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}")
    ]
)

# Define router
router = prompt | structured_llm

**Note:** was used function calling to produce structure output.

In [7]:
question = """Why doesn't the following code work:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(["human", "speak in {language}"])
prompt.invoke("french")
"""

result = router.invoke(
    {
        "question": question
    }
)
result

RouteQuery(datasource='python_docs')

In [8]:
result.datasource

'python_docs'

Once we have this, it is trivial to define a branch that uses `result.datasource`

In [9]:
def choose_route(result):
    if "python_docs" in result.datasource.lower():
        # Logic here
        return "chain for python docs"
    elif "js_docs" in result.datasource.lower():
        # Logic here
        return "chain for js docs"
    elif "golang_docs" in result.datasource.lower():
        # Logic here
        return "chain for golang docs"

from langchain_core.runnables import RunnableLambda

full_chain = (
    router
    | RunnableLambda(choose_route)
)

full_chain.invoke(
    {
        "question": question
    }
)

'chain for python docs'

### Semantic Routing

In [10]:
from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings


In [11]:
## Two prompts
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to
understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}
"""

math_template = """You are very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}
"""

# Embed prompts
embeddings = OpenAIEmbeddings()
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)

In [12]:
# Route question to prompt
def prompt_router(input):
    # Embed question
    query_embedding = embeddings.embed_query(input["query"])
    # Compute similarity
    similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
    # Return index of most similar prompt
    most_similar = prompt_templates[similarity.argmax()]
    # Chosen prompt
    print("Using MATH" if most_similar == math_template else "Using PHYSICS")
    return PromptTemplate(template=most_similar)


chain = (
    {
        "query": RunnablePassthrough()
    }
    | RunnableLambda(prompt_router)
    | ChatOpenAI()
    | StrOutputParser()
)

chain.invoke(
    "What is the speed of light?"
)

Using PHYSICS


'The speed of light in a vacuum is approximately 299,792 kilometers per second (or about 186,282 miles per second).'

In [13]:
chain.invoke(
    "What is cosine similarity?"
)

Using MATH


'Cosine similarity is a measure of similarity between two vectors in a multi-dimensional space. It is often used in machine learning and data mining to compare the similarity between two documents or items based on their content or features.\n\nIn mathematical terms, the cosine similarity between two vectors A and B is calculated by taking the dot product of the two vectors and dividing it by the product of their magnitudes. The formula for cosine similarity is:\n\ncosine similarity = (A • B) / (||A|| * ||B||)\n\nWhere A • B is the dot product of vectors A and B, and ||A|| and ||B|| are the magnitudes of vectors A and B, respectively.\n\nThe cosine similarity value ranges from -1 to 1, with 1 indicating that the two vectors are identical, 0 indicating no similarity, and -1 indicating complete dissimilarity. A higher cosine similarity value indicates a higher degree of similarity between the two vectors.'

In [14]:
chain.invoke(
    "What is the difference between cosine similarity and dot product?"
)

Using PHYSICS


'Cosine similarity and dot product are both mathematical operations used in linear algebra, but they serve slightly different purposes.\n\nThe dot product is a scalar value that measures the similarity between two vectors by taking the cosine of the angle between them and multiplying their magnitudes. It essentially quantifies how much two vectors point in the same direction.\n\nCosine similarity, on the other hand, is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. It is a normalized form of the dot product, where the result is scaled to a value between -1 and 1, with 1 indicating perfect similarity and -1 indicating perfect dissimilarity.\n\nIn summary, while the dot product quantifies the directional similarity between two vectors in terms of their magnitudes, cosine similarity measures the angle between them to determine their similarity in a normalized manner.'

## Query structuring for metadata filters

Many vectorstores contain metadata fields. This makes it possible to filter for specific chunks based on metadata. Let's look at some example metadata we might see in a database of YouTube transcripts.

In [15]:
!pip install youtube-transcript-api pytube -q

In [17]:
!pip install --upgrade pytube -q

In [27]:
from langchain_community.document_loaders import YoutubeLoader

docs = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=pbAd8O1Lvm4",
    add_video_info=False,
).load()

docs[0].metadata

{'source': 'pbAd8O1Lvm4'}

In [29]:
docs[0].metadata = {
    'source': 'pbAd8O1Lvm4',
    'title': 'Self-reflective RAG with LangGraph: Self-RAG and CRAG',
    'description': 'Unknown',
    'view_count': 11922,
    'thumbnail_url': 'https://i.ytimg.com/vi/pbAd8O1Lvm4/hq720.jpg',
    'publish_date': '2024-02-07 00:00:00',
    'length': 1058,
    'author': 'LangChain'
}

In [30]:
docs[0].metadata

{'source': 'pbAd8O1Lvm4',
 'title': 'Self-reflective RAG with LangGraph: Self-RAG and CRAG',
 'description': 'Unknown',
 'view_count': 11922,
 'thumbnail_url': 'https://i.ytimg.com/vi/pbAd8O1Lvm4/hq720.jpg',
 'publish_date': '2024-02-07 00:00:00',
 'length': 1058,
 'author': 'LangChain'}

In [31]:
import datetime
from typing import Literal, Optional, Tuple
from langchain_core.pydantic_v1 import BaseModel, Field

class TutorialSearch(BaseModel):
    """Search over a database of tutorial videos about a software library."""

    content_search: str = Field(
        ...,
        description="Similarity search query applied to video transcripts"
    )
    title_search: str = Field(
        ...,
        description= (
            "Alternate version of the content search query to apply to video title."
            "Should be succint and only include key words that could be in a video"
            "title"
        )
    )
    min_view_count: Optional[int] = Field(
        None,
        description="Minimum view count filter, inclusive. Only use if explicitly specified"
    )
    max_view_count: Optional[int] = Field(
        None,
        description="Maximum view count filter, exclusive. Only use if explicitly specified"
    )
    earliest_publish_data: Optional[datetime.date] = Field(
        None,
        description="Earliest publish date filter, inclusive. Only use if explicitly specified"
    )
    latest_publish_data: Optional[datetime.date] = Field(
        None,
        description="Latest publish date filter, exclusive. Only use if explicitly specified"
    )
    min_length_sec: Optional[int] = Field(
        None,
        description="Minimum length filter, inclusive. Only use if explicitly specified"
    )
    max_length_sec: Optional[int] = Field(
        None,
        description="Maximum length filter, exclusive. Only use if explicitly specified"
    )

    def pretty_print(self) -> None:
        for field in self.__fields__:
            if getattr(self, field) is not None and getattr(self, field) != getattr(
                self.__fields__[field], "default", None
            ):
                print(f"{field}: {getattr(self, field)}")

In [33]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries.
You have access to a database of tutorial videos about a software library building
LLM-powered applications. \n
Given a question, return a database query optimized to retrieve the most relevant
results.

If there are acronyms or words you are not familiar with, do not try to rephrase
them.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}")
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
structured_llm = llm.with_structured_output(TutorialSearch)
query_analyzer = prompt | structured_llm

In [34]:
query_analyzer.invoke(
    {
        "question": "rag from scratch"
    }
).pretty_print()

content_search: rag from scratch
title_search: rag from scratch


In [35]:
query_analyzer.invoke(
    {"question": "videos on chat langchain published in 2023"}
).pretty_print()

content_search: chat langchain
title_search: 2023
earliest_publish_data: 2023-01-01
latest_publish_data: 2024-01-01


In [37]:
query_analyzer.invoke(
    {
        "question": "videos that are focused on the topic of chat langchain \
        that are published before 2024"
    }
).pretty_print()

content_search: chat langchain
title_search: chat langchain
latest_publish_data: 2024-01-01


In [38]:
query_analyzer.invoke(
    {
        "question": "how to use multi-modal models in an agent, only \
        videos under 5 minutes"
    }
).pretty_print()

content_search: multi-modal models agent
title_search: multi-modal models agent
max_length_sec: 300


## Doing this put in various vectorstores

In [39]:
!pip install --upgrade --quiet  lark langchain-chroma

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/111.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m102.4/111.0 kB[0m [31m3.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.0/111.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m628.3/628.3 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m62.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.47.1 requires tokenizers<0.22,>=0.21, but you have tokenizers 0.20.3 which is incompatible.[0m[31m
[0m

In [40]:
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import AzureOpenAIEmbeddings


docs = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
    ),
    Document(
        page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
        metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
    ),
    Document(
        page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
        metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
    ),
    Document(
        page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
        metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
    ),
    Document(
        page_content="Toys come alive and have a blast doing so",
        metadata={"year": 1995, "genre": "animated"},
    ),
    Document(
        page_content="Three men walk into the Zone, three men walk out of the Zone",
        metadata={
            "year": 1979,
            "director": "Andrei Tarkovsky",
            "genre": "thriller",
            "rating": 9.9,
        },
    ),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())

### Creating our Self-querying retriever

In [41]:
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI

metadata_field_info = [
    AttributeInfo(
        name="genre",
        description="TThe genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
        type="string or list[string]",
    ),
    AttributeInfo(
        name="year",
        description="The year the movie was released",
        type="integer",
    ),
    AttributeInfo(
        name="director",
        description="The name of the movie director",
        type="string",
    ),
    AttributeInfo(
        name="rating", description="A 1-10 rating for the movie", type="float"
    ),
]

document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectorstore,
    document_content_description,
    metadata_field_info,
)

In [42]:
retriever.invoke("I want to watch a movie rated higher than 8.5")

[Document(id='a813d2c4-b750-4164-aeae-c5e4870b3a81', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year': 1979}, page_content='Three men walk into the Zone, three men walk out of the Zone'),
 Document(id='3d1ef6c9-93b6-491f-a97d-71f4617d0261', metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea')]

In [43]:
retriever.invoke("What's a highly rated (above 8.5) science fiction film?")

[Document(id='3d1ef6c9-93b6-491f-a97d-71f4617d0261', metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),
 Document(id='a813d2c4-b750-4164-aeae-c5e4870b3a81', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year': 1979}, page_content='Three men walk into the Zone, three men walk out of the Zone')]

In [44]:
retriever.invoke(
    "What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)

[Document(id='50ae8bb8-ff56-479e-a6dc-b0fed09045fe', metadata={'genre': 'animated', 'year': 1995}, page_content='Toys come alive and have a blast doing so')]