# Data Agents with LLamaIndex

In LlamaIndex v0.10, the infrastructure for agents is centered around the idea of a Data Agent, or what they call an “LLM-powered knowledge worker.” While the query engines at the heart of LlamaIndex can read from vector databases, data agents can dynamically deal with data from vector databases and a host of external tools.

​In this event, we deep dive into how LlamaIndex does agents, and how agents interact with the core constructs of nodes and query engines. We build a complex RAG capable of answering questions by reasoning through quantitative (structured) and qualitative (unstructured) information.

### A note on terminology:

You'll notice that there are quite a few similarities between LangChain and LlamaIndex. LlamaIndex can largely be thought of as an extension to LangChain, in some ways - but they moved some of the language around. Let's spend a few moments disambiguating the language.

- `QueryEngine` -> `LCEL Chain`:
  -  `QueryEngine` is just LlamaIndex's way of indicating something is an LLM "chain" on top of a retrieval system
- `OpenAIAgent` vs. `Agent`:
  - The two agents have the same fundamental pattern: Decide which of a list of tools to use to answer a user's query.
  - `OpenAIAgent` (LlamaIndex's primary agent) does not need to rely on an agent excecutor due to the fact that it is leveraging OpenAI's [functional api](https://openai.com/blog/function-calling-and-other-api-updates) which allows the agent to interface "directly" with the tools instead of operating through an intermediary application process.

There is, however, a much large terminological difference when it comes to discussing data.

##### Nodes vs. Documents

As you're aware of from the previous weeks assignments, there's an idea of `documents` in NLP which refers to text objects that exist within a corpus of documents.

LlamaIndex takes this a step further and reclassifies `documents` as `nodes`. Confusingly, it refers to the `Source Document` as simply `Documents`.

The `Document` -> `node` structure is, almost exactly, equivalent to the `Source Document` -> `Document` structure found in LangChain - but the new terminology comes with some clarity about different structure-indices.

We won't be leveraging those structured indicies today, but we will be leveraging a "benefit" of the `node` structure that exists as a default in LlamaIndex, which is the ability to quickly filter nodes based on their metadata.

![image](https://i.imgur.com/B1QDjs5.png)

## BOILERPLATE

This is only relevant when running the code in a Jupyter Notebook.

In [1]:
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Load Dependencies

Let's grab our core `llama-index` library, as well as OpenAI's Python SDK.

We'll be leveraging OpenAI's suite of APIs to power our RAG pipelines today.

> NOTE: You can safely ignore any pip errors that occur during the running of these cells.

In [2]:
!pip install -qU llama-index openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m308.7/308.7 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m47.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m136.1/136.1 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

We'll be collecting our semantic data from Wikipedia - and so will need the [Wikipedia Reader](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-wikipedia)!

In [3]:
!pip install -qU wikipedia llama-index-readers-wikipedia

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone


Our vector database today will be powered by [QDrant](https://qdrant.tech/) and so we'll need that package as well!

In [4]:
!pip install -qU llama-index-vector-stores-qdrant qdrant-client

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.2/223.2 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.6/294.6 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.5/57.5 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 4.25.3 which is incompatible.[0m[31m
[0m

Finally, we'll need to grab a few dependencies related to our quantitative data!

In [5]:
!pip install -q -U sqlalchemy pandas

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m29.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==2.0.3, but you have pandas 2.2.2 which is incompatible.[0m[31m
[0m

### Optional Dependency

We'll can use [Weights and Biases](https://docs.wandb.ai/guides/prompts) (WandB) as a visibility platform, as well as storing our index!

In [6]:
!pip install -qU wandb llama-index-callbacks-wandb

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.3/207.3 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m267.1/267.1 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [7]:
import os
import getpass

os.environ["WANDB_API_KEY"] = getpass.getpass("WandB API Key: ")

WandB API Key: ··········


We'll also need to set a callback handler for WandB to ensure smooth operation of our traces!

In [8]:
import llama_index
from llama_index.core import set_global_handler

set_global_handler("wandb", run_args={"project": "data-agents-demo"})
wandb_callback = llama_index.core.global_handler

[34m[1mwandb[0m: Streaming LlamaIndex events to W&B at https://wandb.ai/philmui/data-agents-demo/runs/iznglq7u
[34m[1mwandb[0m: `WandbCallbackHandler` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `llamaindex`.


## Task 2: Set Env Variables and Set Up WandB Callback

Let's set our API keys for both OpenAI and WandB!

In [9]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

OpenAI API Key: ··········


### Task 3: Settings

LlamaIndex lets us set global settings which we can use to influence the default behaviour of our components.

Let's set our LLM and our Embedding Model!

In [10]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

## `Index` Creation

In order for us to perform RAG in the traditional sense - we need an `Index`.

So what is an `Index`? Well - let's see how LlamaIndex defines it:

> In LlamaIndex terms, an `Index` is a data structure composed of Document objects, designed to enable querying by an LLM. Your Index is designed to be complementary to your querying strategy.

Okay, so we know that we have a boatload of Wikipedia content - and we know that we want to be able to query the `Index` and receive documents that are related to our query - so let's use an `Index` built on the idea of embedding-vectors.

Introducing: `VectorStoreIndex`!

Again, let's see how LlamaIndex defines this:

> A `VectorStoreIndex` is by far the most frequent type of `Index` you'll encounter. The Vector Store Index takes your Documents and splits them up into Nodes. It then creates `vector` embeddings of the text of every node, ready to be queried by an LLM.

Alright, that sounds awesome - let's make one!

### Data Collection

We're just going to be pulling information straight from Wikipedia using the built in `WikipediaReader`.

> NOTE: Setting `auto_suggest=False` ensures we run into fewer auto-correct based errors.

In [11]:
from llama_index.readers.wikipedia import WikipediaReader

movie_list = [
    "Dune (2021 film)",
    "Dune: Part Two",
    "Harry Potter and the Philosopher's Stone (film)",
    "Harry Potter and the Chamber of Secrets (film)",
    "The Lord of the Rings: The Fellowship of the Ring",
    "The Lord of the Rings: The Two Towers",
    "The Hobbit: An Unexpected Journey",
    "The Hobbit: The Desolation of Smaug"
]

wiki_docs = WikipediaReader().load_data(pages=movie_list, auto_suggest=False)

### Initializing our `VectorStoreIndex` with QDrant

QDrant is a locally hostable and open-source vector database solution.

It offers powerful features like metadata filtering out of the box, and will suit our needs well today!

We'll start by creating our local `:memory:` client (in-memory and not meant for production use-cases) and our collection.

In [12]:
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient, models

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="movie_wikis",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

True

Then we'll create our `VectorStore` and `StorageContext` which will allow us to create an empty `VectorStoreIndex` which we will be able to add nodes to later!

In [13]:
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext

vector_store = QdrantVectorStore(client=client, collection_name="movie_wikis")

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    [],
    storage_context=storage_context,
)

[34m[1mwandb[0m: Logged trace tree to W&B.


### Node Construction

Now we will loop through our documents and metadata and construct nodes.

We'll make sure to explicitly associate our nodes with their respective movie so we can filter by the movie title in the upcoming cells.

You might be thinking to yourself - wait, we never indicated which embedding model this should use - but remember

In [14]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.extractors import TitleExtractor

pipeline = IngestionPipeline(transformations=[TokenTextSplitter()])

for movie, wiki_doc in zip(movie_list, wiki_docs):
  nodes = pipeline.run(documents=[wiki_doc])
  for node in nodes:
      node.metadata = {"title" : movie}
  index.insert_nodes(nodes)

[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.
[34m[1mwandb[0m: Logged trace tree to W&B.


## Simple RAG - QueryEngine

Now that we're created our `VectorStoreIndex`, powered by a QDrant VectorStore, we can wrap it in a simple `QueryEngine` using the `as_query_engine()` method - which will connect a few things together for us:

In [15]:
simple_rag = index.as_query_engine()

Before we test this out - let's see what information we can find out about from our new `QueryEngine`!

In [16]:
for k, v in simple_rag.get_prompts().items():
  print(v.get_template())
  print("\n~~~~~~~~~~~~~~~~~~\n")

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 

~~~~~~~~~~~~~~~~~~

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 

~~~~~~~~~~~~~~~~~~



Let's see how it does!

In [17]:
response = simple_rag.query("Who is the evil Wizard in the story?")

[34m[1mwandb[0m: Logged trace tree to W&B.


In [18]:
response.response

'Lord Voldemort'

That makes sense!

Let's ask a question that's slightly more...ambiguous.

In [19]:
response = simple_rag.query("Who are the giant beings that roam across the world?")

[34m[1mwandb[0m: Logged trace tree to W&B.


In [20]:
response.response

'Stone Giants are the giant beings that roam across the world in the context provided.'

We can check the source nodes to see which movies we retrieved.

In [21]:
print([x.metadata["title"] for x in response.source_nodes])

['Dune (2021 film)', 'The Hobbit: An Unexpected Journey']


Okay, so in this case - we've gone with "Stone Giants" from the Hobbit.

But there's also the sandworms from Dune, and the Ents from Lord of the Rings, and it looks like we got documents from Dune as well.

Let's see if there's a way we can use the title metadata we added to filter the results we get!

## Auto Retriever Functional Tool

This tool will leverage OpenAI's functional endpoint to select the correct metadata filter and query the filtered index - only looking at nodes with the desired metadata.

A simplified diagram: ![image](https://i.imgur.com/AICDPav.png)

First, we need to create our `VectoreStoreInfo` object which will hold all the relevant metadata we need for each component (in this case title metadata).

Notice that you need to include it in a text list.

In [22]:
from llama_index.core.tools import FunctionTool
from llama_index.core.vector_stores.types import (
    VectorStoreInfo,
    MetadataInfo,
    ExactMatchFilter,
    MetadataFilters,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

from typing import List, Tuple, Any
from pydantic import BaseModel, Field

top_k = 3

vector_store_info = VectorStoreInfo(
    content_info="information about movies",
    metadata_info=[MetadataInfo(
        name="title",
        type="str",
        description='title of the movie, ["Dune (2021 film)", "Harry Potter and the Philosopher\'s Stone (film)", "The Lord of the Rings: The Fellowship of the Ring"]'
        )]
)

Now we'll create our base PyDantic object that we can use to ensure compatability with our application layer. This verifies that the response from the OpenAI endpoint conforms to this schema.

In [23]:
class AutoRetrieveModel(BaseModel):
    query: str = Field(..., description="natural language query string")
    filter_key_list: List[str] = Field(
        ..., description="List of metadata filter field names"
    )
    filter_value_list: List[str] = Field(
        ...,
        description=(
            "List of metadata filter field values (corresponding to names specified in filter_key_list)"
        )
    )

Now we can build our function that we will use to query the functional endpoint.

In [24]:
def auto_retrieve_fn(
    query: str, filter_key_list: List[str], filter_value_list: List[str]
):
    """Auto retrieval function.

    Performs auto-retrieval from a vector database, and then applies a set of filters.

    """
    query = query or "Query"

    exact_match_filters = [
        ExactMatchFilter(key=k, value=v)
        for k, v in zip(filter_key_list, filter_value_list)
    ]
    retriever = VectorIndexRetriever(
        index, filters=MetadataFilters(filters=exact_match_filters), top_k=top_k
    )
    query_engine = RetrieverQueryEngine.from_args(retriever)

    response = query_engine.query(query)
    return str(response)

Now we need to wrap our system in a tool in order to integrate it into the larger application.

Source Code Here:
- [`FunctionTool`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/tools/function_tool.py#L21)

In [25]:
description = f"""\
Use this tool to look up non-review based information about films.
The schema is given below:
{vector_store_info.json()}
"""

auto_retrieve_tool = FunctionTool.from_defaults(
    fn=auto_retrieve_fn,
    name="semantic-film-info",
    description=description,
    fn_schema=AutoRetrieveModel
)

All that's left to do is attach the tool to an OpenAIAgent and let it rip!

Source Code Here:
- [`OpenAIAgent`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/agent/openai_agent.py#L361)

In [26]:
from llama_index.agent.openai import OpenAIAgent

agent = OpenAIAgent.from_tools(
    tools=[auto_retrieve_tool],
    verbose=True,
)

In [27]:
response = agent.chat("Who starred in the 2021 film?")
print(str(response))

Added user message to memory: Who starred in the 2021 film?
=== Calling Function ===
Calling function: semantic-film-info with args: {"query":"cast of Dune (2021 film)","filter_key_list":["title"],"filter_value_list":["Dune (2021 film)"]}
Got output: The cast of Dune (2021 film) includes Timothée Chalamet as Paul Atreides, Rebecca Ferguson as Lady Jessica, Oscar Isaac as Duke Leto Atreides, Josh Brolin as Gurney Halleck, Stellan Skarsgård as Baron Vladimir Harkonnen, Dave Bautista as Glossu Rabban, Sharon Duncan-Brewster as Dr. Liet Kynes, Stephen McKinley Henderson as Thufir Hawat, Zendaya as Chani, Chang Chen as Dr. Wellington Yueh, Charlotte Rampling as Reverend Mother Mohiam, Jason Momoa as Duncan Idaho, Javier Bardem as Stilgar, David Dastmalchian as Piter De Vries, Babs Olusanmokun as Jamis, Golda Rosheuvel as Shadout Mapes, and Roger Yuan as Lieutenant Lanville.



[34m[1mwandb[0m: Logged trace tree to W&B.


The cast of the 2021 film "Dune" includes Timothée Chalamet, Rebecca Ferguson, Oscar Isaac, Josh Brolin, Stellan Skarsgård, Dave Bautista, Sharon Duncan-Brewster, Stephen McKinley Henderson, Zendaya, Chang Chen, Charlotte Rampling, Jason Momoa, Javier Bardem, David Dastmalchian, Babs Olusanmokun, Golda Rosheuvel, and Roger Yuan.


In [28]:
response = agent.chat("Who are those giant guys from Lord of the Rings that roam around the forest?")
print(str(response))

Added user message to memory: Who are those giant guys from Lord of the Rings that roam around the forest?
=== Calling Function ===
Calling function: semantic-film-info with args: {"query":"characters from The Lord of the Rings: The Fellowship of the Ring","filter_key_list":["title"],"filter_value_list":["The Lord of the Rings: The Fellowship of the Ring"]}
Got output: Some characters from "The Lord of the Rings: The Fellowship of the Ring" include Frodo, Sam, Merry, Pippin, Gandalf, Aragorn (Strider), Arwen, Boromir, Legolas, Gimli, Saruman, Galadriel, Elrond, Bilbo, Gollum, Sauron, Celeborn, Lurtz, Haldir, Ereinion Gil-galad, Elendil, and Isildur.



[34m[1mwandb[0m: Logged trace tree to W&B.


The giant guys from "The Lord of the Rings: The Fellowship of the Ring" that roam around the forest are likely the Ents. Ents are tree-like creatures in J.R.R. Tolkien's Middle-earth, known for their immense size and connection to the forests.


## Quantitative RAG Pipeline with NL2SQL Tooling

We'll walk through the steps of creating a natural language to SQL system in the following section.

> NOTICE: This does not have parsing on the inputs or intermediary calls to ensure that users are using safe SQL queries. Use this with caution in a production environment without adding specific guardrails from either side of the application.

The next few steps should be largely straightforward, we'll want to:

1. Read in our `.csv` files into `pd.DataFrame` objects
2. Create an in-memory `sqlite` powered `sqlalchemy` engine
3. Cast our `pd.DataFrame` objects to the SQL engine
4. Create an `SQLDatabase` object through LlamaIndex
5. Use that to create a `QueryEngineTool` that we can interact with through the `NLSQLTableQueryEngine`!

If you get stuck, please consult the documentation.

In [29]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/dune1.csv

--2024-04-17 17:30:25--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/dune1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 133391 (130K) [text/plain]
Saving to: ‘dune1.csv’


2024-04-17 17:30:26 (1.68 MB/s) - ‘dune1.csv’ saved [133391/133391]



In [30]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/hp_ss.csv

--2024-04-17 17:30:29--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/hp_ss.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 80384 (78K) [text/plain]
Saving to: ‘hp_ss.csv’


2024-04-17 17:30:30 (1.80 MB/s) - ‘hp_ss.csv’ saved [80384/80384]



In [31]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/lotr_fotr.csv

--2024-04-17 17:30:31--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/lotr_fotr.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 172855 (169K) [text/plain]
Saving to: ‘lotr_fotr.csv’


2024-04-17 17:30:32 (1.88 MB/s) - ‘lotr_fotr.csv’ saved [172855/172855]



#### Read `.csv` Into Pandas

In [32]:
import pandas as pd

dune1 = pd.read_csv("./dune1.csv")
hp_ss = pd.read_csv("./hp_ss.csv")
lotr_fotr = pd.read_csv("./lotr_fotr.csv")

#### Create SQLAlchemy engine with SQLite

In [33]:
from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///:memory:")

#### Convert `pd.DataFrame` to SQL tables

In [34]:
dune1.to_sql(
  "Dune",
  engine
)

274

In [35]:
hp_ss.to_sql(
  "Harry Potter",
  engine
)

125

In [36]:
lotr_fotr.to_sql(
  "The Lord of the Rings",
  engine
)

250

#### Construct a `SQLDatabase` index

Source Code Here:
- [`SQLDatabase`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/langchain_helpers/sql_wrapper.py#L9)

In [37]:
from llama_index.core import SQLDatabase

short_movie_list=[
    "Dune",
    "Harry Potter",
    "The Lord of the Rings"
]

sql_database = SQLDatabase(
    engine=engine,
    include_tables=short_movie_list
)

#### Create the NLSQLTableQueryEngine interface for all added SQL tables

Source Code Here:
- [`NLSQLTableQueryEngine`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/indices/struct_store/sql_query.py#L75C1-L75C1)

In [38]:
from llama_index.core.indices.struct_store.sql_query import NLSQLTableQueryEngine

sql_query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=short_movie_list,
)

#### Wrap It All Up in a `QueryEngineTool`

You'll want to ensure you have a descriptive...description!

This is what will help the LLM decide which table to use when querying!

Sorce Code Here:

- [`QueryEngineTool`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/tools/query_engine.py#L13)

In [39]:
DESCRIPTION = """This tool should be used to convert natural language to SQL queries that query tables that contain review information about:
'Dune'
'Harry Potter)'
'The Lord of the Rings'
"""

In [40]:
from llama_index.core.tools.query_engine import QueryEngineTool

sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    name="sql-query",
    description=DESCRIPTION,
)

In [41]:
agent = OpenAIAgent.from_tools(
    tools=[sql_tool],
    verbose=True
)

In [42]:
response = agent.chat("What is the average rating of the Harry Potter movie?")

Added user message to memory: What is the average rating of the Harry Potter movie?
=== Calling Function ===
Calling function: sql-query with args: {"input":"average rating of the Harry Potter movie"}
Got output: The average rating of the Harry Potter movie is approximately 7.66 out of 10.



[34m[1mwandb[0m: Logged trace tree to W&B.


In [43]:
print(str(response))

The average rating of the Harry Potter movie is approximately 7.66 out of 10.


In [44]:
response = agent.chat("What movie series has better reviews, Lord of the Rings or Dune?")

Added user message to memory: What movie series has better reviews, Lord of the Rings or Dune?
=== Calling Function ===
Calling function: sql-query with args: {"input": "average rating of The Lord of the Rings"}
Got output: The average rating of The Lord of the Rings is approximately 9.87 out of 10. This indicates that the movie is highly regarded by viewers.

=== Calling Function ===
Calling function: sql-query with args: {"input": "average rating of Dune"}
Got output: The average rating of Dune is approximately 8.34.



[34m[1mwandb[0m: Logged trace tree to W&B.


In [45]:
print(str(response))

The average rating of The Lord of the Rings is approximately 9.87 out of 10, indicating that the movie series has better reviews compared to Dune, which has an average rating of approximately 8.34.


### Multi-Tool Agent Using Query Planning Tool

We're going to be leveraging the Query Planning Tool today to help our agent "come up with a plan" that it can execute to best answer our questions.

Let's start by defining the `QueryPlanTool`!

In [46]:
from llama_index.core.tools import QueryPlanTool
from llama_index.core import get_response_synthesizer

response_synthesizer = get_response_synthesizer()
query_plan_tool = QueryPlanTool.from_defaults(
    query_engine_tools=[auto_retrieve_tool, sql_tool],
    response_synthesizer=response_synthesizer,
)

Let's check out how this tool looks!

In [47]:
query_plan_tool.metadata.to_openai_tool()

{'type': 'function',
 'function': {'name': 'query_plan_tool',
  'description': '        This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n        Tool Name: semantic-film-info\nTool Description: Use this tool to look up non-review based information about films.\nThe schema is given below:\n{"metadata_info": [{"name": "title", "type": "str", "description": "title of the movie, [\\"Dune (2021 film)\\", \\"Harry Potter and the Philosopher\'s Stone (film)\\", \\"The Lord of the Rings: The Fellowship of the Ring\\"]"}], "content_info": "information about movies"}\n \n\nTool Name: sql-query\nTool Description: This tool should be used to convert natural language to SQL queries that query tables that contain revie

In [48]:
combined_tool_agent = OpenAIAgent.from_tools(
    tools=[auto_retrieve_tool, sql_tool],
    verbose=True
)

In [49]:
response = combined_tool_agent.chat("Which movie is about a chamber, and what is the average rating of the movie?")

Added user message to memory: Which movie is about a chamber, and what is the average rating of the movie?
=== Calling Function ===
Calling function: semantic-film-info with args: {"query":"movie about a chamber","filter_key_list":["title"],"filter_value_list":["Chamber"]}
Got output: Empty Response



[34m[1mwandb[0m: Logged trace tree to W&B.


In [None]:
print(str(response))

In [51]:
response = combined_tool_agent.chat("What worlds do the LoTR, and Dune movies take place in?")

Added user message to memory: What worlds do the LoTR, and Dune movies take place in?
=== Calling Function ===
Calling function: semantic-film-info with args: {"query": "The Lord of the Rings", "filter_key_list": ["title"], "filter_value_list": ["The Lord of the Rings"]}
Got output: Empty Response

=== Calling Function ===
Calling function: semantic-film-info with args: {"query": "Dune", "filter_key_list": ["title"], "filter_value_list": ["Dune"]}
Got output: Empty Response



[34m[1mwandb[0m: Logged trace tree to W&B.


In [52]:
print(str(response))

I couldn't retrieve the specific information about the worlds where "The Lord of the Rings" and "Dune" movies take place. Would you like me to try searching for this information again?


In [53]:
response = combined_tool_agent.chat("Which of the following movie series is considered the 'best': Harry Potter, Dune, Lord of the Rings, or The Hobbit? Based your answer on both reviews, and non-review information.")

Added user message to memory: Which of the following movie series is considered the 'best': Harry Potter, Dune, Lord of the Rings, or The Hobbit? Based your answer on both reviews, and non-review information.
=== Calling Function ===
Calling function: semantic-film-info with args: {"query": "Harry Potter", "filter_key_list": ["title"], "filter_value_list": ["Harry Potter"]}
Got output: Empty Response

=== Calling Function ===
Calling function: semantic-film-info with args: {"query": "Dune", "filter_key_list": ["title"], "filter_value_list": ["Dune"]}
Got output: Empty Response

=== Calling Function ===
Calling function: semantic-film-info with args: {"query": "The Lord of the Rings", "filter_key_list": ["title"], "filter_value_list": ["The Lord of the Rings"]}
Got output: Empty Response

=== Calling Function ===
Calling function: semantic-film-info with args: {"query": "The Hobbit", "filter_key_list": ["title"], "filter_value_list": ["The Hobbit"]}
Got output: Empty Response



[34m[1mwandb[0m: Logged trace tree to W&B.


In [55]:
print(str(response))

I couldn't retrieve information about the reviews and non-review details for the movie series "Harry Potter," "Dune," "The Lord of the Rings," and "The Hobbit." Would you like me to try searching for this information again or provide information based on other criteria?


In [56]:
wandb_callback.finish()