# Weaviate Query Agent

Welcome to this demo notebook! Here, we'll walk you through a comprehensive example showcasing the **Weaviate Query Agent** functionality.  
The Query Agent is an intelligent layer that sits on top of your Weaviate vector database, using generative AI to optimize natural language queries and automatically determine the best search strategy.

### How the Query Agent Works

The Query Agent intelligently handles complex queries by:
- **Query Optimization**: Uses generative AI to transform natural language into optimized vector database queries
- **Collection Selection**: Automatically determines which collection(s) to search based on your question
- **Smart Filtering**: Decides when filtering is needed to narrow down results
- **Aggregation Logic**: Determines if aggregation operations should be performed (counts, grouping, etc.)
- **Execution**: Runs the optimized queries against your Weaviate instance

### Dataset Overview

This notebook demonstrates the Query Agent using three diverse collections:
- **Books**: 10,000 books with titles, authors, descriptions, and genres
- **Brands**: 104 clothing brands with descriptions, ratings, and company information  
- **Ecommerce**: 448 fashion items with prices, categories, reviews, and brand associations

All datasets come pre-vectorized with **Snowflake Arctic Embed v2.0** embeddings via Weaviate's embedding service.

### Further Reading

- Learn more about Query Agents in the [official documentation](https://docs.weaviate.io/agents/query)
- Explore the technical implementation in our [Query Agent tutorial](https://docs.weaviate.io/agents/query/tutorial-ecommerce)
- Understand vector databases in the [Weaviate developer docs](https://weaviate.io/developers/weaviate)

To ensure smooth execution and prevent potential conflicts with your global Python environment, we recommend running the code in a virtual environment. Later in this notebook, we'll guide you through setting up this environment and installing the necessary dependencies.

## Libraries/packages used 

The following libraries are used in this notebook:

* [<code style="color:blue;">weaviate-client[agents]:</code>](https://weaviate.io/developers/weaviate/client-libraries/python) A powerful vector database with Query Agent functionality for intelligent search
* [<code style="color:blue;">datasets:</code>](https://huggingface.co/docs/datasets/) Hugging Face datasets library for loading pre-vectorized data
* [<code style="color:blue;">os:</code>](https://docs.python.org/3/library/os.html) Used for environment variable management
* [<code style="color:blue;">dotenv:</code>](https://pypi.org/project/python-dotenv/) Loads environment variables from .env files

The packages mentioned above are already installed on the learning portal to run this exercise. If you would like to run this code on your local machine, use the following commands:

* `pip install "weaviate-client[agents]" datasets python-dotenv`

<a id='TOC'></a>  
## Table of contents  

1. <a href="#Dependencies">Dependencies</a><br>
2. <a href="#Connecting">Connecting to your Weaviate cluster</a><br>
3. <a href="#Collections">Setting up collections and loading data</a><br>
   3.1 <a href="#BrandsCollection">Create and populate Brands collection</a><br>
   3.2 <a href="#EcommerceCollection">Create and populate Ecommerce collection</a><br>
   3.3 <a href="#BooksCollection">Create and populate Books collection</a><br>
4. <a href="#QueryAgent">Creating the Query Agent</a><br>
5. <a href="#ExampleQueries">Example Query Agent interactions</a><br>
   5.1 <a href="#ClassicRAG">Classic RAG search</a><br>
   5.2 <a href="#DatabaseFiltering">Generated database filtering</a><br>
   5.3 <a href="#StatisticalQueries">Statistical and aggregation queries</a><br>
   5.4 <a href="#MultipleQueries">Querying multiple databases</a><br>
   5.5 <a href="#MissingInfo">Identifying missing information</a><br>

<a id='Dependencies'></a>  
## 1. Dependencies
[Back to table of contents](#TOC)

This section initializes the necessary dependencies and sets environment variables for connecting to your Weaviate Cloud instance.

In [92]:
import os

# set environment variables  
os.environ['WEAVIATE_URL'] = ''  # weaviate instance url  
os.environ['WEAVIATE_API_KEY'] = ''  # admin api key  

In [93]:
import dotenv

dotenv.load_dotenv(override=True)

True

<a id='Connecting'></a>  
## 2. Connecting to your Weaviate cluster  
[Back to table of contents](#TOC)  

To interact with our Weaviate cluster, we'll initialize a client object and verify the connection is successful. This connection will be used throughout the notebook for all data operations and Query Agent interactions.

In [94]:
import weaviate

# Connect to Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY"))
)

# check if the connection is successful  
client.is_ready()

True

<a id='Collections'></a>  
## 3. Setting up collections and loading data  
[Back to table of contents](#TOC)  

In this section, we'll create three collections (Brands, Ecommerce, and Books) and populate them with pre-vectorized data from the [Weaviate Agents dataset on Hugging Face](https://huggingface.co/datasets/weaviate/agents). Each collection uses the Snowflake Arctic Embed v2.0 model for consistent vector embeddings.

In [95]:
# import required module for configuration
import weaviate.classes.config as wc

# delete collection if it exists  
if client.collections.exists("Brands"):  
    client.collections.delete("Brands")  

client.collections.create(
    name="Brands", #Collection name

    #Set vectorizer configuration for text2vec using Weaviates embedding model
    vector_config=wc.Configure.Vectors.text2vec_weaviate(
        model="Snowflake/snowflake-arctic-embed-l-v2.0",
        source_properties=["name", "description", "child_brands", "parent_brand"],
    ),

)

print("Successfully created collection: Brands with corrected schema.")

Successfully created collection: Brands with corrected schema.


<a id='BrandsCollection'></a>  
### 3.1 Create and populate Brands collection  
[Back to table of contents](#TOC)  

We'll start by creating the Brands collection with properties for clothing brand information, then populate it with 104 brand records from the Weaviate agents dataset.

In [96]:
# Upload brands data using the Weaviate-recommended streaming approach
from datasets import load_dataset

# Load fresh streaming dataset for upload
brands_dataset = load_dataset("weaviate/agents", "query-agent-brands", split="train", streaming=True)

brands_collection = client.collections.get("Brands")
with brands_collection.batch.fixed_size(batch_size=100) as batch:
    for item in brands_dataset:
        batch.add_object(
            properties=item["properties"],
            vector=item["vector"]
        )

print("Successfully uploaded brands data using Weaviate streaming method!")

Successfully uploaded brands data using Weaviate streaming method!


In [97]:
# import required module for configuration
import weaviate.classes.config as wc

# delete collection if it exists  
if client.collections.exists("Ecommerce"):  
    client.collections.delete("Ecommerce")  

client.collections.create(
    name="Ecommerce", #Collection name

    #Set vectorizer configuration for text2vec using Weaviates embedding model
    vector_config=wc.Configure.Vectors.text2vec_weaviate(
        model="Snowflake/snowflake-arctic-embed-l-v2.0",
        source_properties=["name", "description", "collection", "category", "subcategory", "brand", "colors", "tags", "reviews"],
    ),
)

print("Successfully created collection: Ecommerce with corrected schema.")

Successfully created collection: Ecommerce with corrected schema.


<a id='EcommerceCollection'></a>  
### 3.2 Create and populate Ecommerce collection  
[Back to table of contents](#TOC)  

Next, we'll create the Ecommerce collection for fashion items with properties including prices, categories, reviews, and brand associations, then load 448 product records.

In [98]:
# Load fresh streaming dataset for upload
ecommerce_dataset = load_dataset("weaviate/agents", "query-agent-ecommerce", split="train", streaming=True)

ecommerce_collection = client.collections.get("Ecommerce")
with ecommerce_collection.batch.fixed_size(batch_size=100) as batch:
    for item in ecommerce_dataset:
        batch.add_object(
            properties=item["properties"],
            vector=item["vector"]
        )

print("Successfully uploaded ecommerce data using Weaviate streaming method!")

Successfully uploaded ecommerce data using Weaviate streaming method!


In [99]:
# import required module for configuration
import weaviate.classes.config as wc

# delete collection if it exists  
if client.collections.exists("Books"):  
    client.collections.delete("Books")  

client.collections.create(
    name="Books", #Collection name

    #Set vectorizer configuration for text2vec using Weaviates embedding model
    vector_config=wc.Configure.Vectors.text2vec_weaviate(
        model="Snowflake/snowflake-arctic-embed-l-v2.0",
        source_properties=["title", "author", "description", "genres"],
    ),
)

print("Successfully created collection: Books with corrected schema.")

Successfully created collection: Books with corrected schema.


<a id='BooksCollection'></a>  
### 3.3 Create and populate Books collection  
[Back to table of contents](#TOC)  

Finally, we'll create the Books collection with properties for titles, authors, descriptions, and genres, then populate it with 10,000 book records from various genres including mystery, fiction, and non-fiction.

In [100]:
# Load fresh streaming dataset for upload
books_dataset = load_dataset("weaviate/agents", "query-agent-books", split="train", streaming=True)

books_collection = client.collections.get("Books")
with books_collection.batch.fixed_size(batch_size=100) as batch:
    for item in books_dataset:
        batch.add_object(
            properties=item["properties"],
            vector=item["vector"]
        )

print("Successfully uploaded books data using Weaviate streaming method!")

Successfully uploaded books data using Weaviate streaming method!


<a id='QueryAgent'></a>  
## 4. Creating the Query Agent  
[Back to table of contents](#TOC)  

Now we'll create our Query Agent instance, which will intelligently route queries across our three collections (Brands, Ecommerce, Books) and automatically determine the optimal search strategy.

The output from the Query Agent will include: the original query, the generated answer to the query, the searches performed, the collections searched, filters applied, aggregates completed, source objects pulled from the database that comntributed to the generated answer and any mising information if teh generated answer is incomplete.

In [101]:
from weaviate_agents.query import QueryAgent

# Instantiate a new agent object, and specify the collections to query
qa = QueryAgent(
    client=client, 
    collections=["Brands", "Ecommerce", "Books"],  # Changed from collections parameter name
)


<a id='ExampleQueries'></a>  
## 5. Example Query Agent interactions  
[Back to table of contents](#TOC)  

Let's explore the Query Agent's capabilities through various types of queries. Notice how the agent automatically determines which collections to search, applies filters, and performs aggregations based on the natural language input.

<a id='BookQueries'></a>  
### 5.1 Classic RAG search  
[Back to table of contents](#TOC)  

This Query Agent is able to respond as a classic RAG system. The Query Agent will run an initial generative step that will optimize the original query for a vector database.



In [102]:
# Perform a query
response = qa.run(
    "Are there any books about King Arthur or Knights that I should read?"
)

# Print the response 
# The output will include: theoriginal query, the generated answer to the query, the seraches performed,
# the collections searched, filters applied, aggregates completed and source objects ulled from the database.
response.display()





<a id='BookQueries'></a>  
### 5.2 Generated database filtering   
[Back to table of contents](#TOC)  

This Query Agent is able determine appropriate filters and apply filtering for database objects returned by the vector database query. This allows for dynamic filtering in RAG systems, a step that is normally a manual process.


In [103]:
# Perform a query
response = qa.run(
    "I am in the mood for a good mystery book, what should I read?"
)

# Print the response
response.display()





<a id='StatisticalQueries'></a>  
### 5.3 Statistical and aggregation queries  
[Back to table of contents](#TOC)  

The Query Agent automatically recognizes when aggregation is needed and performs complex statistical operations like counting and grouping data.

In [104]:
# Perform a query
response = qa.run(
    "Which author has the most books in my collection?"
)

# Print the response
response.display()





<a id='MultipleQueries'></a>  
### 5.4 Querying multiple databases  
[Back to table of contents](#TOC)  

The Query agent looks at collection descriptions and property data to determine which collection or collections to query for proper context for the query response generation.

In [105]:
# Perform a query
response = qa.run(
    """I am looking to buy a cool new jean jacket or light coat at or below $150. 
    I want a trendy but also timeless look but can also hold up and last me a long time. 
    If possible, I want to purchase something from a new company that might get me a 
    sponsorship for my social media account."""
)
# Print the response
response.display()





<a id='MissingInfo'></a>  
### 5.5 Identifying missing information  
[Back to table of contents](#TOC)  

If a partial response is generated for a query the Query Agent will report the missing information not present in the database.

In [106]:
# Perform a query
response = qa.run(
    """I want to buy a new leather jacket but I am vegan and only buy from socially responsible brands."""
)
# Print the response
response.display()





In [107]:
# close the Weaviate client to free up resources
client.close()