## Lesson 2:  Building Agents to Simplify Vector Database Query Tasks

<a target="_blank" href="https://colab.research.google.com/github/saskinosie/Building-AI-Agents-for-Vector-Database-Search/blob/main/2-building-agents-complete.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Now that we understand vector databases and how to make queries, lets build some helpers!

## Install Pydantic

We will use Pydantic to build agents to help us with our vector searches. But first, we will build some simple agents with straighforward tasks

In [7]:
#!pip install pydantic-ai

## Create an agent that likes to keep the conversation going

### This simple agent will feign interest in your conversation and ask you follow up questions to keep the conversation going.

In [8]:
from pydantic_ai import Agent

agent = Agent(
    model="gpt-4o-mini",
    instructions="""
    You are a helpful assistant happily answering user questions.
    You enjoy the conversation so much, you always want to leave them with another relevant question.
    """,
)

In [9]:
result = await agent.run("What is the best video gaming system ever made?")
print(result.output)

The title of "best video gaming system ever made" is highly subjective and often depends on personal preferences, gaming style, and nostalgia. Some popular contenders include:

1. **Super Nintendo Entertainment System (SNES)** - Celebrated for its classic games and strong library, including titles like "Super Mario World" and "The Legend of Zelda: A Link to the Past."
2. **Sony PlayStation 2** - Known for its vast library of games and DVD playback feature, it remains one of the best-selling consoles of all time.
3. **Xbox 360** - Praised for its online service, Xbox Live, and a great selection of games like "Halo" and "Gears of War."
4. **Nintendo Switch** - It offers unique hybrid gaming experiences and has a strong library of first-party titles like "The Legend of Zelda: Breath of the Wild" and "Animal Crossing: New Horizons."
5. **PC Gaming** - While not a console, many consider high-end gaming PCs to be the ultimate gaming platform due to their versatility and performance.

Ultimat

### But as is this agent does not remember what it was we were literally just talking about

In [10]:
result = await agent.run("I do not have any prior experience so I was wondering which you thought was the best. Can you make a recommendation?")
print(result.output)

Sure! To give the best recommendation, it would be helpful to know what area you're interested in. Are you looking for a job, a hobby to pick up, or perhaps a course or skill to learn? Let me know what you're thinking!


## Create an agent with memory

### Let's help our agent out and give him some memory.

In [11]:
result = await agent.run("What is the best video gaming system ever made?")
print(result.output)

The title of "best video gaming system ever made" can be quite subjective and varies based on personal preferences, nostalgia, and the types of games someone enjoys. However, some popular contenders often mentioned include:

1. **Super Nintendo Entertainment System (SNES)**: Renowned for its classic games like "Super Mario World," "The Legend of Zelda: A Link to the Past," and "Super Metroid."
2. **PlayStation 2**: Widely praised for its extensive library and backward compatibility with PlayStation 1 games.
3. **Nintendo Switch**: Known for its versatility, allowing for handheld and docked gaming, plus a strong lineup of exclusive titles.
4. **Xbox 360**: Highly regarded for its online services and influential titles like "Halo 3" and "Gears of War."
5. **PC Gaming**: While not a traditional console, many argue that gaming on a PC offers the best graphics, performance, and customization.

Ultimately, the best system can depend on the types of games you enjoy, whether you prioritize exc

In [12]:
result = await agent.run("I do not have any prior experience so I was wondering which you thought was the best. Can you make a recommendation?", 
                            message_history=result.all_messages())
print(result.output)

If you're new to gaming and looking for a recommendation, the **Nintendo Switch** is often a great choice for beginners. Here‚Äôs why:

1. **Versatility**: You can play it at home on your TV or take it with you as a handheld device, making it very user-friendly.
2. **Family-Friendly Games**: The Switch has a wide range of games suitable for all ages, including popular titles like "Animal Crossing: New Horizons," "Super Mario Odyssey," and "The Legend of Zelda: Breath of the Wild."
3. **Casual Gaming**: Many games on the platform are easy to pick up and play, perfect for newcomers.
4. **Social Features**: It has fun multiplayer games and encourages local co-op play, so you can enjoy gaming with friends or family.

If you're considering getting started with gaming, the Switch might be the best way to ease into it! Do you have any specific genres or types of games that interest you?


### Great! You have now created your first agent withe memory!!!

### Now let's pull our collection that we created in the last notebook and create some agents that will help us with vector databse queries.

## Reconnecting to Wevaite

In [26]:
import os
from dotenv import load_dotenv

load_dotenv()

WEAVIATE_URL = os.getenv("WEAVIATE_URL")
WEAVIATE_KEY = os.getenv("WEAVIATE_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


print("Weaviate URL:", WEAVIATE_URL)
print("Weaviate API Key:", WEAVIATE_KEY[:10])
print("OpenAI API Key:", OPENAI_API_KEY[:10])

Weaviate URL: rwxzavyuspepzg2fkhjag.c0.us-west3.gcp.weaviate.cloud
Weaviate API Key: 7Tdl1PKHIc
OpenAI API Key: sk-proj-iu


In [27]:
import weaviate
from weaviate.classes.init import Auth

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WEAVIATE_URL,
    auth_credentials=Auth.api_key(WEAVIATE_KEY),
    headers = {
        "X-OpenAI-Api-Key": OPENAI_API_KEY
    },
)

print("Client ready:", client.is_ready())

Client ready: True


In [28]:
# Check our collection from before
contracts = client.collections.use("FinancialContract")
contracts_config = contracts.config.get()

print(contracts_config)

_CollectionConfig(name='FinancialContract', description='A collection of financial contracts with terms, conditions, and legal clauses', generative_config=_GenerativeConfig(generative=<GenerativeSearches.OPENAI: 'generative-openai'>, model={'model': 'gpt-4o-mini'}), inverted_index_config=_InvertedIndexConfig(bm25=_BM25Config(b=0.75, k1=1.2), cleanup_interval_seconds=60, index_null_state=False, index_property_length=False, index_timestamps=False, stopwords=_StopwordsConfig(preset=<StopwordsPreset.EN: 'en'>, additions=None, removals=None)), multi_tenancy_config=_MultiTenancyConfig(enabled=False, auto_tenant_creation=False, auto_tenant_activation=False), properties=[_Property(name='author', description="This property was generated by Weaviate's auto-schema feature on Mon Nov  3 18:23:07 2025", data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, 

### Let's start with creating an agent that helps us generate optimized vector database queries based on our natural language querie.

In [16]:
from pydantic_ai import Agent

# Simple query optimizer and search agent- we have already done this
search_agent = Agent(
    model="openai:gpt-4o-mini",
    # Here we are telling it exactky what we want it to do. We will give it concise instructions.
    instructions="You optimize queries for contract search, then search and return results.",
)

async def search_(user_query):
    # Optimize and search- Here we will send our query to the agent and get back an optimized query. 
    result = await search_agent.run(f"Optimize this query for contract search: {user_query}")
    optimized_query = result.output

    # Search the database- Then we search our collection using the optimized query teh agent created!
    response = contracts.query.near_text(query=optimized_query, limit=3)

    # Show results
    for i, contract in enumerate(response.objects):
        print(f"Contract {i+1}: {contract.properties['contract_text']}")

    return response.objects



In [17]:
# Test it
results = await search_("I need job contract info")

Contract 1: EMPLOYMENT CONTRACT

This Employment Contract ("Contract") is made and entered into as of the 15th day of March, 2023, by and between Weaviate, located at 123 Innovation Drive, Tech City, and Mark Robson, residing at 456 Elm Street, Hometown.

1. POSITION
Mark Robson is hereby employed as a Software Engineer and will report directly to the Head of Engineering.

2. TERM
The term of this Contract shall commence on March 15, 2023, and shall continue for a period of two (2) years unless terminated earlier in accordance with the provisions of this Contract.

3. SALARY
As compensation for services rendered, the Employee shall receive an annual salary of $372,000, payable in monthly installments.

4. BENEFITS
The Employee is entitled to benefits including health insurance, paid time off, and a yearly bonus based on performance evaluations. The annual bonus may vary, but can range up to 8.08% of the employee's salary based on company performance.

5. CONFIDENTIALITY
The Employee ag

### It is that easy! We now have an agent that will take our natural language query, create a vector databased optimized query, send that query to our vector database and return the results.

### Lets take it to the next level and build an agent that will "look" at our collection properties and decide, based on our natural language query, which properties in our collection to filter on and build filter(s) into our query.

In [18]:
from weaviate.classes.query import Filter
from datetime import datetime, timezone

# Get collection properties- Here we are passing our collection properties to the agent
collection_config = contracts.config.get()
properties = {prop.name: str(prop.data_type) for prop in collection_config.properties}

# Smart filtering agent that creates filters automatically
# We are Instructing the agent to generate filters for us and giving it examples
smart_filter_agent = Agent(
    model="openai:gpt-4o-mini",
    instructions=f"""
    You analyze queries and automatically create Weaviate filters.
    Collection properties: {properties}
    
    Based on the user query, generate Python code that creates Filter objects.
    Only return the filter code, nothing else.
    
    Examples:
    - For "Jane Doe contracts": Filter.by_property("author").equal("Jane Doe")
    - For "recent employment contracts": Filter.by_property("contract_type").equal("employment contract") & 
Filter.by_property("date").greater_than(datetime(2023, 6, 1, tzinfo=timezone.utc))
    - For "short contracts": Filter.by_property("contract_length").less_than(2)
    
    If no filters needed, return: None
    """,
)

async def auto_filtered_search(user_query):
    # Get filter code from agent- ere we pass teh agent the query and get the filters for it
    filter_result = await smart_filter_agent.run(f"Create filters for: {user_query}")
    filter_code = filter_result.output.strip()

    print(f"Query: {user_query}")
    print(f"Generated filter: {filter_code}")

    # Execute the filter code
    query_filters = None
    if filter_code != "None":
        try:
            query_filters = eval(filter_code)
        except:
            print("Filter creation failed, searching without filters")

    # Search with auto-generated filters
    # Now we pass in the agent created filters using the filter parameter
    response = contracts.query.near_text(
        query=user_query,
        filters=query_filters,
        limit=3
    )

    print(f"\nFound {len(response.objects)} contracts:")
    for i, contract in enumerate(response.objects):
        print(f"Contract {i+1}: {contract.properties['contract_type']} by {contract.properties['author']} on {contract.properties['date']}")

    return response.objects

In [19]:
results = await auto_filtered_search("Show me Jane Doe's recent employment contracts after 2022")

Query: Show me Jane Doe's recent employment contracts after 2022
Generated filter: Filter.by_property("author").equal("Jane Doe") & 
Filter.by_property("contract_type").equal("employment contract") & 
Filter.by_property("date").greater_than(datetime(2023, 1, 1, tzinfo=timezone.utc))
Filter creation failed, searching without filters

Found 3 contracts:
Contract 1: employment contract by Jane Doe on 2023-03-15 10:30:00+00:00
Contract 2: employment contract by Jane Doe on 2023-03-15 09:00:00+00:00
Contract 3: service agreement by Jane Doe on 2023-03-15 10:45:00+00:00


## Combining Query Optimization with Dynamic Filtering

Now let's build an agent that combines both techniques: optimizing the search query AND dynamically generating filters. This is where agents really shine - they can handle complex, natural language requests and translate them into optimal database queries.

In [20]:
import openai
from weaviate.classes.query import MetadataQuery

# Initialize OpenAI client for query optimization
openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)

# Combined agent that does BOTH query optimization AND dynamic filtering
combined_agent = Agent(
    model="openai:gpt-4o-mini",
    instructions=f"""
    You are an intelligent database query assistant that performs TWO tasks:
    
    1. OPTIMIZE the user's natural language query for better vector search results
    2. GENERATE appropriate Weaviate filters based on the query context
    
    Collection properties available: {properties}
    
    Return your response in this exact format:
    OPTIMIZED_QUERY: <the optimized search query>
    FILTERS: <Python filter code or None>
    
    Filter Examples:
    - Author filter: Filter.by_property("author").equal("Jane Doe")
    - Date range: Filter.by_property("date").greater_than(datetime(2023, 6, 1, tzinfo=timezone.utc))
    - Contract type: Filter.by_property("contract_type").equal("employment contract")
    - Combined: Filter.by_property("author").equal("Edward Elric") & Filter.by_property("date").greater_than(datetime(2023, 1, 1, tzinfo=timezone.utc))
    """,
)

async def smart_search_with_optimization(user_query):
    """
    This function demonstrates the power of combining multiple AI capabilities:
    - Query optimization for better semantic search
    - Dynamic filter generation for precise results
    """
    
    print(f"üîç Original user query: '{user_query}'")
    print("\n" + "="*70)
    
    # Step 1: Get both optimized query AND filters from the agent
    agent_result = await combined_agent.run(f"Process this query: {user_query}")
    agent_output = agent_result.output
    
    # Step 2: Parse the agent's response
    lines = agent_output.strip().split('\n')
    optimized_query = None
    filter_code = None
    
    for line in lines:
        if line.startswith('OPTIMIZED_QUERY:'):
            optimized_query = line.replace('OPTIMIZED_QUERY:', '').strip()
        elif line.startswith('FILTERS:'):
            filter_code = line.replace('FILTERS:', '').strip()
    
    print(f"‚ú® Optimized query: '{optimized_query}'")
    print(f"üéØ Generated filters: {filter_code}")
    print("="*70 + "\n")
    
    # Step 3: Execute the filter code
    query_filters = None
    if filter_code and filter_code != "None":
        try:
            query_filters = eval(filter_code)
        except Exception as e:
            print(f"‚ö†Ô∏è  Filter generation failed: {e}")
            print("Proceeding with optimized query only...\n")
    
    # Step 4: Execute the search with BOTH optimizations
    response = contracts.query.near_text(
        query=optimized_query if optimized_query else user_query,
        filters=query_filters,
        limit=3,
        return_metadata=MetadataQuery(distance=True)
    )
    
    # Step 5: Display results
    print(f"üìä Found {len(response.objects)} contracts:\n")
    for i, contract in enumerate(response.objects, 1):
        print(f"Contract {i}:")
        print(f"  Type: {contract.properties['contract_type']}")
        print(f"  Author: {contract.properties['author']}")
        print(f"  Date: {contract.properties['date']}")
        print(f"  Relevance Score: {1 - contract.metadata.distance:.4f}")
        print(f"  Preview: {contract.properties['contract_text'][:150]}...")
        print()
    
    return response.objects

### Example 1: Complex natural language query

Let's test with a messy, conversational query that needs both optimization and filtering:

In [29]:
# Example 1: Messy query that mentions author, timeframe, and topic
results = await smart_search_with_optimization(
    "I want to find stuff that Jane Doe wrote about jobs and employment, but only the recent ones"
)

üîç Original user query: 'I want to find stuff that Jane Doe wrote about jobs and employment, but only the recent ones'

‚ú® Optimized query: 'find documents authored by Jane Doe related to jobs and employment, focusing on recent content'
üéØ Generated filters: Filter.by_property("author").equal("Jane Doe") & Filter.by_property("contract_type").equal("employment") & Filter.by_property("date").greater_than(datetime(2023, 1, 1, tzinfo=timezone.utc))

üìä Found 2 contracts:

Contract 1:
  Type: employment contract
  Author: Jane Doe
  Date: 2023-03-15 10:30:00+00:00
  Relevance Score: 0.4211
  Preview: EMPLOYMENT CONTRACT

This Employment Contract ("Contract") is entered into as of the 15th day of March, 2023, by and between Weaviate, a corporation o...

Contract 2:
  Type: employment contract
  Author: Jane Doe
  Date: 2023-03-15 09:00:00+00:00
  Relevance Score: 0.3936
  Preview: EMPLOYMENT CONTRACT

This Employment Contract ("Contract") is made and entered into as of the 15th day of

### Example 2: Query with implicit filtering needs

Notice how the agent can infer filters even when they're not explicitly stated:

In [22]:
# Example 2: Implicit filter requirements
results = await smart_search_with_optimization(
    "Show me short contracts about partnerships written by Edward Elric"
)

üîç Original user query: 'Show me short contracts about partnerships written by Edward Elric'

‚ú® Optimized query: 'Find short partnership contracts authored by Edward Elric'
üéØ Generated filters: Filter.by_property("author").equal("Edward Elric") & Filter.by_property("contract_type").equal("partnership") & Filter.by_property("contract_length").less_than(30)

üìä Found 2 contracts:

Contract 1:
  Type: partnership agreement
  Author: Edward Elric
  Date: 2023-03-15 10:30:00+00:00
  Relevance Score: 0.4752
  Preview: Partnership Agreement

This Partnership Agreement ("Agreement") is made and entered into on March 15, 2023, by and between Weaviate, a corporation dul...

Contract 2:
  Type: partnership agreement
  Author: Edward Elric
  Date: 2023-03-15 14:30:00+00:00
  Relevance Score: 0.4253
  Preview: PARTNERSHIP AGREEMENT

This Partnership Agreement is made and entered into as of March 15, 2023, by and between Weaviate, a corporation organized unde...



## Key Takeaways

**What we've demonstrated:**
1. ‚úÖ **Agents simplify complexity** - A single natural language query handles multiple operations
2. ‚úÖ **Combining techniques is powerful** - Query optimization + dynamic filtering = better results
3. ‚úÖ **Agents are not magic** - They're structured prompts + smart parsing + API calls
4. ‚úÖ **Building blocks are simple** - Each component is straightforward, but together they're powerful

**The agent advantage:**
- **Before agents**: You'd need to manually write filter code, optimize queries, and chain operations
- **With agents**: Natural language ‚Üí optimized database query with filters automatically

This is the foundation for building production RAG systems that actually work!

In [24]:
client.close()