<a href="https://colab.research.google.com/github/mistralai/cookbook/blob/main/third_party/LlamaIndex/RouterQueryEngine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Router Query Engine with Mistral AI and LlamaIndex

A `VectorStoreIndex` is designed to handle queries related to specific contexts, while a `SummaryIndex` is optimized for answering summarization queries. However, in real-world scenarios, user queries may require either context-specific responses or summarizations. To address this, the system must effectively route user queries to the appropriate index to provide relevant answers.

In this notebook, we will utilize the `RouterQueryEngine` to direct user queries to the appropriate index based on the query type.

### Installation

In [None]:
!pip install llama-index
!pip install llama-index-llms-mistralai
!pip install llama-index-embeddings-mistralai

### Setup API Key

In [1]:
import os
os.environ['MISTRAL_API_KEY'] = 'YOUR MISTRAL API KEY'

### Set LLM and Embedding Model

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import Settings

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors.llm_selectors import LLMSingleSelector

In [4]:
llm = MistralAI(model='mistral-large')
embed_model = MistralAIEmbedding()

Settings.llm = llm
Settings.embed_model = embed_model

### Download Data

We will use `Uber 10K SEC Filings`.

In [5]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

--2024-03-31 00:24:17--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/uber_2021.pdf’


2024-03-31 00:24:17 (38.5 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]



### Load Data

In [22]:
uber_docs = SimpleDirectoryReader(input_files=["./data/10k/uber_2021.pdf"]).load_data()

### Index and Query Engine creation

 1. VectorStoreIndex -> Specific context queries
 2. SummaryIndex -> Summarization queries

In [24]:
uber_vector_index = VectorStoreIndex.from_documents(uber_docs)

uber_summary_index = VectorStoreIndex.from_documents(uber_docs)

In [26]:
uber_vector_query_engine = uber_vector_index.as_query_engine(similarity_top_k = 5)
uber_summary_query_engine = uber_summary_index.as_query_engine()

### Create Tools

In [27]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=uber_vector_query_engine,
        metadata=ToolMetadata(
            name="vector_engine",
            description=(
                "Provides information about Uber financials for year 2021."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_summary_query_engine,
        metadata=ToolMetadata(
            name="summary_engine",
            description=(
                "Provides Summary about Uber financials for year 2021."
            ),
        ),
    ),
]

### Create Router Query Engine

In [28]:
query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=query_engine_tools,
    verbose = True
)

### Querying

#### Summarization Query

You can see that it uses `SummaryIndex` to provide answer to the summarization query.

In [31]:
response = query_engine.query("What is the summary of the Uber Financials in 2021?")
print(response)

[1;3;38;5;200mSelecting query engine 1: This choice specifically mentions a 'summary' of Uber's financials for the year 2021, which directly aligns with the question asked..
[0mIn 2021, Uber's Gross Bookings increased by $32.5 billion, a 56% increase compared to 2020. This growth was driven by a 66% increase in Delivery Gross Bookings due to higher demand for food delivery and larger order sizes, as well as expansion in U.S. and international markets. Mobility Gross Bookings also grew by 36% due to increased trip volumes as the business recovered from COVID-19 impacts.

Uber's revenue for the year was $17.5 billion, a 57% increase from the previous year. This growth was attributed to the overall expansion of the Delivery business and an increase in Freight revenue due to the acquisition of Transplace in the fourth quarter of 2021.

The net loss attributable to Uber Technologies, Inc. was $496 million, a 93% improvement from the previous year. This improvement was driven by a $1.6 bil

#### Specific Context Query

You can see it uses `VectorStoreIndex` to answer specific context type query.

In [29]:
response = query_engine.query("What is the the revenue of Uber in 2021?")
print(response)

[1;3;38;5;200mSelecting query engine 0: This choice is more likely to contain detailed financial information about Uber in 2021, including revenue..
[0mThe revenue of Uber in 2021 was $17,455 million.
