# RAG Pipeline with Ollama, Mistral and LlamaIndex

In this notebook, we will demonstrate how to build a RAG pipeline using Ollama, Mistral models, and LlamaIndex. The following topics will be covered:

1.	Integrating Mistral with Ollama and LlamaIndex.
2.	Implementing RAG with Ollama and LlamaIndex using the Mistral model.
3.	Routing queries with RouterQueryEngine.
4.	Handling complex queries with SubQuestionQueryEngine.

Before running this notebook, you need to set up Ollama. Please follow the instructions [here](https://ollama.com/library/mistral:instruct).

In [1]:
import nest_asyncio

nest_asyncio.apply()

from IPython.display import display, HTML

## Setup LLM

In [None]:
from llama_index.llms.ollama import Ollama

llm = Ollama(model="mistral:instruct", request_timeout=60.0)


### Querying

In [3]:

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="What is the capital city of France?"),
]
response = llm.chat(messages)

In [4]:
display(HTML(f'<p style="font-size:20px">{response}</p>'))

## Setup Embedding Model

In [5]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

## Download Data

We will use Uber and Lyft 10K SEC filings for the demostration.

In [7]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './lyft_2021.pdf'

zsh:1: command not found: wget
zsh:1: command not found: wget


## Load Data

In [8]:
from llama_index.core import SimpleDirectoryReader

uber_docs = SimpleDirectoryReader(input_files=["./uber_2021.pdf"]).load_data()
lyft_docs = SimpleDirectoryReader(input_files=["./lyft_2021.pdf"]).load_data()

## Create Index and Query Engines

In [9]:
from llama_index.core import VectorStoreIndex
from llama_index.core import SummaryIndex

uber_vector_index = VectorStoreIndex.from_documents(uber_docs)
uber_vector_query_engine = uber_vector_index.as_query_engine(similarity_top_k=2)

lyft_vector_index = VectorStoreIndex.from_documents(lyft_docs)
lyft_vector_query_engine = lyft_vector_index.as_query_engine(similarity_top_k=2)


### Querying

In [10]:
response = uber_vector_query_engine.query("What is the revenue of uber in 2021 in millions?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [11]:
response = lyft_vector_query_engine.query("What is the revenue of lyft in 2021 in millions?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

## RouterQueryEngine

We will utilize the `RouterQueryEngine` to direct user queries to the appropriate index based on the query related to either Uber/ Lyft.

### Create QueryEngine tools

In [12]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors.llm_selectors import LLMSingleSelector

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_vector_query_engine,
        metadata=ToolMetadata(
            name="vector_lyft_10k",
            description="Provides information about Lyft financials for year 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=uber_vector_query_engine,
        metadata=ToolMetadata(
            name="vector_uber_10k",
            description="Provides information about Uber financials for year 2021",
        ),
    ),
]

### Create `RouterQueryEnine`

In [13]:

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=query_engine_tools,
    verbose = True
)

### Querying

In [14]:
response = query_engine.query("What are the investments made by Uber?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

[1;3;38;5;200mSelecting query engine 1: The provided choices are summaries of financial reports for the year 2021. Investments made by a company are not typically included in a financial report. However, the financial report may provide information about investments through capital expenditures or acquisition costs. As such, to find out about Uber's investments, one would need to look at a separate report or section that discusses these topics..
[0m

In [15]:
response = query_engine.query("What are the investments made by the Lyft in 2021?")

[1;3;38;5;200mSelecting query engine 0: The given choices do not provide information about investments made by Lyft in 2021. They only provide information about the financials of Lyft and Uber for the year 2021..
[0m

In [16]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

## SubQuestionQueryEngine

We will explore how the `SubQuestionQueryEngine` can be leveraged to tackle complex queries by generating and addressing sub-queries.

### Create `SubQuestionQueryEngine`

In [17]:
from llama_index.core.query_engine import SubQuestionQueryEngine

sub_question_query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools,
                                                                 verbose=True)

### Querying

In [22]:
response = sub_question_query_engine.query("Compare the revenues of Uber and Lyft in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Generated 2 sub questions.
[1;3;38;2;237;90;200m[vector_uber_10k] Q: What is the revenue of Uber for year 2021
[0m[1;3;38;2;90;149;237m[vector_lyft_10k] Q: What is the revenue of Lyft for year 2021
[0m[1;3;38;2;90;149;237m[vector_lyft_10k] A:  In the provided context, the revenue for Lyft in the year 2021 is $3,208,323 thousand. This can be found on page 79 (file_path: lyft_2021.pdf).
[0m[1;3;38;2;237;90;200m[vector_uber_10k] A:  The revenue for Uber in the year 2021, as per the provided context, was $17,455 million.
[0m

In [19]:
response = sub_question_query_engine.query("What are the investments made by Uber and Lyft in 2021?")
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

Generated 2 sub questions.
[1;3;38;2;237;90;200m[vector_uber_10k] Q: What investments were made by Uber in year 2021
[0m[1;3;38;2;90;149;237m[vector_lyft_10k] Q: What investments were made by Lyft in year 2021
[0m[1;3;38;2;237;90;200m[vector_uber_10k] A:  In year 2021, Uber made purchases of non-marketable equity securities for 982 million dollars as per the provided cash flow statement. However, it is important to note that the information does not specify the details or names of the specific investments made.
[0m[1;3;38;2;90;149;237m[vector_lyft_10k] A:  Based on the provided context, it appears that in the year 2021, Lyft made significant investments in several areas. These include developing and launching new offerings and platform features, expanding in existing and new markets, investing in their platform and customer engagement, investing in environmental programs such as their commitment to 100% EVs on their platform by the end of 2030, and expanding support services for