In [1]:
# To run in Google Colab, uncomment the next line
# !pip install weaviate-client


## Lesson 1:  Creating a Vector Database and Exploring Queries

<a target="_blank" href="https://colab.research.google.com/github/saskinosie/dsd-building-ai-agents-with-vector-db/blob/main/1-vector-database-queries.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Before we create an agent that can help us with our vector database queries, let's figure out what we might need help with.

## Get keys and URLs to connect to the Weaviate Client

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

WEAVIATE_URL = os.getenv("WEAVIATE_URL")
WEAVIATE_KEY = os.getenv("WEAVIATE_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


print("Weaviate URL:", WEAVIATE_URL)
print("Weaviate API Key:", WEAVIATE_KEY[:10])
print("OpenAI API Key:", OPENAI_API_KEY[:10])

Weaviate URL: rwxzavyuspepzg2fkhjag.c0.us-west3.gcp.weaviate.cloud
Weaviate API Key: 7Tdl1PKHIc
OpenAI API Key: sk-proj-iu


## Connect to Weaviate

You need to pass in your Weaviate Cloud URL and KEY.

In [3]:
import weaviate
from weaviate.classes.init import Auth

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WEAVIATE_URL,
    auth_credentials=Auth.api_key(WEAVIATE_KEY),
    headers = {
        "X-OpenAI-Api-Key": OPENAI_API_KEY
    },
)

print("Client ready:", client.is_ready())

Client ready: True


## Load the financial contracts dataset

Let's load the pre-vectorized financial contracts dataset from [HuggingFace](https://huggingface.co/datasets/weaviate/agents/viewer/query-agent-financial-contracts). This data set is what we will pass to our Weaviate 

This data set come with vectors already created by the Snowflake/snowflake-arctic-embed-l-v2.0 embedding model. When we upload data to Weaviate, the embeddings are created for us by default, but since we have them already we will upload them with our original data to save time.

In [4]:
from datasets import load_dataset

# Load the financial contracts dataset
dataset = load_dataset(
    "weaviate/agents", 
    "query-agent-financial-contracts", 
    split="train", 
    streaming=True
)

# Let's examine the first few items
print("Dataset loaded successfully!")
print("\n--- Sample contract data ---")

for i, item in enumerate(dataset):
    if i >= 2:  # Just show 2 examples
        break
    print(f"\nContract {i+1}:")
    print("Properties:", item["properties"])
    print("Vector length:", len(item["vector"]) if item["vector"] else "No vector")

Dataset loaded successfully!

--- Sample contract data ---

Contract 1:
Properties: {'date': '2023-03-15T14:30:00+00:00', 'contract_type': 'partnership agreement', 'author': 'Arthur Penndragon', 'contract_length': 3, 'doc_id': 53, 'contract_text': 'PARTNERSHIP AGREEMENT\n\nThis Partnership Agreement ("Agreement") is made and entered into as of the 15th day of March, 2023, by and between Weaviate, a company registered in the State of California, and OpenAI, a research organization based in San Francisco, California.\n\n1. Purpose\nThe parties agree to establish a partnership to collaborate on artificial intelligence research and development, sharing resources and expertise.\n\n2. Contributions\nWeaviate shall contribute technology resources valued at $112.85 and staff time equivalent to a monetary value of $550.09. OpenAI shall contribute its research expertise and a project management team valued at $98.14.\n\n3. Profit Sharing\nThe net profits generated from joint projects shall be di

## Create a collection for contracts

In [5]:
from weaviate.classes.config import Configure

# Delete collection if it exists
if client.collections.exists("FinancialContract"):
    client.collections.delete("FinancialContract")

# Create the collection with a description for our agent
contracts = client.collections.create(
    name="FinancialContract",
    description="A collection of financial contracts with terms, conditions, and legal clauses",
    vector_config=Configure.Vectors.text2vec_weaviate(
        model="Snowflake/snowflake-arctic-embed-l-v2.0",
        source_properties=["contract_text"]
    ),
)

print("Collection 'FinancialContract' created successfully!")

Collection 'FinancialContract' created successfully!


## Load data into Weaviate

### Now we'll stream the data from HuggingFace directly into our Weaviate collection.

In [6]:
# Reload the dataset for importing
dataset = load_dataset(
    "weaviate/agents", 
    "query-agent-financial-contracts", 
    split="train", 
    streaming=True
)

# Get the collection
contracts = client.collections.get("FinancialContract")

# Import data with batch processing
with contracts.batch.fixed_size(batch_size=100) as batch:
    for item in dataset:
        # Add the object with pre-computed vector
        batch.add_object(
            properties=item["properties"],
            vector=item["vector"]
        )

print(f"Data import completed!")
print(f"Total contracts in collection: {len(contracts)}")

Data import completed!
Total contracts in collection: 100


## Basic contract exploration

### Let's explore what's in our contract collection.

In [7]:
# Get some basic stats about our collection
print("=== Collection Stats ===")
print(f"Total contracts: {len(contracts)}")

# Sample some contracts to understand the data structure
response = contracts.query.fetch_objects(limit=3)

print("\n=== Sample Contracts ===")
for i, contract in enumerate(response.objects):
    print(f"\nContract {i+1}:")
    for prop, value in contract.properties.items():
        # Truncate long text for readability
        if isinstance(value, str) and len(value) > 200:
            print(f"  {prop}: {value[:200]}...")
        else:
            print(f"  {prop}: {value}")

=== Collection Stats ===
Total contracts: 100

=== Sample Contracts ===

Contract 1:
  date: 2023-09-15 10:30:00+00:00
  contract_type: service agreement
  contract_length: 2.0
  contract_text: SERVICE AGREEMENT

This Service Agreement ("Agreement") is made and entered into as of September 15, 2023, by and between Weaviate ("Client"), located at 123 Innovation Drive, Tech City, and Kaladin S...
  doc_id: 45.0
  author: Kaladin Stormblessed

Contract 2:
  date: 2022-03-15 09:30:00+00:00
  contract_type: employment contract
  author: John Williams
  contract_length: 2.0
  doc_id: 65.0
  contract_text: EMPLOYMENT CONTRACT

This Employment Contract ("Contract") is entered into as of the 15th day of March, 2022, by and between Weaviate, a corporation with its principal place of business at 123 Tech Dr...

Contract 3:
  date: 2023-03-15 10:30:00+00:00
  contract_type: sales agreement
  contract_length: 2.0
  contract_text: SALES AGREEMENT

This Sales Agreement (“Agreement”) is made and enter

## Vector search

### Now we will write a basic vector search to find contracts by meaning.

In [8]:
# This is a simple function to make our outputs a little prettier
import json
def print_properties(item):
    print(
        json.dumps(
            item.properties,
            indent=2, sort_keys=True, default=str
        )
    )

In [9]:
from weaviate.classes.query import MetadataQuery

# Search for employment contracts with roles, salaries and benefits
response = contracts.query.near_text(
    query="Employment contracts with job roles, salaries, and employee benefits",
    limit=3,
    return_metadata=MetadataQuery(distance=True)
)


for item in response.objects:
    print_properties(item)
    print(item.metadata.distance)


{
  "author": "Jane Doe",
  "contract_length": 2.0,
  "contract_text": "EMPLOYMENT CONTRACT\n\nThis Employment Contract (\"Contract\") is made and entered into as of the 15th day of March, 2023, by and between Weaviate, located at 123 Innovation Drive, Tech City, and Mark Robson, residing at 456 Elm Street, Hometown.\n\n1. POSITION\nMark Robson is hereby employed as a Software Engineer and will report directly to the Head of Engineering.\n\n2. TERM\nThe term of this Contract shall commence on March 15, 2023, and shall continue for a period of two (2) years unless terminated earlier in accordance with the provisions of this Contract.\n\n3. SALARY\nAs compensation for services rendered, the Employee shall receive an annual salary of $372,000, payable in monthly installments.\n\n4. BENEFITS\nThe Employee is entitled to benefits including health insurance, paid time off, and a yearly bonus based on performance evaluations. The annual bonus may vary, but can range up to 8.08% of the employe

### But what if we need something more recent from an author we trust?

## Vector search with filters

### Let's add some filters to hone our search a little


In [10]:
from weaviate.classes.query import Filter
from datetime import datetime, timezone

# Search for employment contracts
response = contracts.query.near_text(
    query="I need contracts that have good info on what I need to be looking for when signing a new contract for a job I am going to be getting",
    limit=3,
    filters=Filter.by_property("author").equal("Edward Elric") & Filter.by_property("date").greater_than(datetime(2023, 1, 1, tzinfo=timezone.utc)),
    return_metadata=MetadataQuery(distance=True)
)


for item in response.objects:
    print_properties(item)
    print(item.metadata.distance)


{
  "author": "Edward Elric",
  "contract_length": 2.0,
  "contract_text": "EMPLOYMENT CONTRACT\n\nThis Employment Contract (\"Contract\") is made effective as of 2023-11-15, by and between Weaviate, a corporation organized under the laws of the State of California, with its principal office located at 123 Tech Avenue, San Francisco, CA 94105 (\"Employer\"), and Mark Robson, residing at 456 Elm Street, Los Angeles, CA 90001 (\"Employee\").\n\n1. POSITION\nThe Employer hereby employs the Employee as a Software Engineer. The Employee agrees to perform the duties and responsibilities as outlined by the Employer.\n\n2. TERM\nThe term of this Contract shall commence on November 15, 2023, and shall continue for a period of two (2) years, ending on November 15, 2025, unless earlier terminated in accordance with this Contract.\n\n3. SALARY\nThe Employee shall receive a salary of $306.59 per week, payable in accordance with the Employer's standard payroll practices. Additionally, the Employee m

### Can our queries improve? Or is this as good as it gets?

## Query optimization with an LLM

### Let's leverage an LLM to help us imporve our queries 

In [11]:
import openai
# Initialize OpenAI client
openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)

# Send a query to OpenAI
response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": """I'm searching a vector database of 
  contracts. 
      
  My current query is: "I need contracts that have good info on what I need to be looking for when signing a new contract for a job I am going to be getting. I want to make lot sof money and not be taken advantage of by the man"

  Can you suggest 2-3 better ways to phrase this that 
  would find more relevant results? 
  Just give me the improved queries, nothing else."""}
    ]
)

print(response.choices[0].message.content)

1. "Contracts that outline key considerations and negotiation tips for employment agreements to maximize earnings and protect my rights."

2. "Guidelines and examples of employment contracts that ensure fair compensation and safeguard against unfavorable terms."

3. "Resources on important clauses to review in job contracts to secure a lucrative position and avoid exploitation."


### Now let's use one of the optimized queries for our vector seach query 

In [12]:
# Search for employment contracts
response = contracts.query.near_text(
    query="Seeking resources or contracts that detail important factors to review when accepting a new job offer.",
    limit=3,
    return_metadata=MetadataQuery(distance=True)
)


for item in response.objects:
    print_properties(item)
    print(item.metadata.distance)

{
  "author": "Jane Doe",
  "contract_length": 2.0,
  "contract_text": "EMPLOYMENT CONTRACT\n\nThis Employment Contract (\"Contract\") is made and entered into as of the 15th day of March, 2023, by and between Weaviate, located at 123 Innovation Drive, Tech City, and Mark Robson, residing at 456 Elm Street, Hometown.\n\n1. POSITION\nMark Robson is hereby employed as a Software Engineer and will report directly to the Head of Engineering.\n\n2. TERM\nThe term of this Contract shall commence on March 15, 2023, and shall continue for a period of two (2) years unless terminated earlier in accordance with the provisions of this Contract.\n\n3. SALARY\nAs compensation for services rendered, the Employee shall receive an annual salary of $372,000, payable in monthly installments.\n\n4. BENEFITS\nThe Employee is entitled to benefits including health insurance, paid time off, and a yearly bonus based on performance evaluations. The annual bonus may vary, but can range up to 8.08% of the employe

## Generative search - Ask questions about contracts

### Now let's use generative search to get explanations about contracts.

In [13]:
from weaviate.classes.config import Reconfigure

financialcontract = client.collections.use("FinancialContract")

financialcontract.config.update(
    generative_config=Reconfigure.Generative.openai(
        model="gpt-4o-mini"  # Update the generative model
    )
)

### We can obtain a separate response for each object we return from the databsase

In [14]:
# Ask about contract risks using the collection's configured generative model
response = contracts.generate.near_text(
    query="contract risks liability issues problems",
    limit=3,
    single_prompt="Based on this contract content: {contract_type} {contract_text}, what are the main risks or potential issues a business should be aware of? Provide 3 key concerns.",
)

print("\nSource contracts and generated outputs:")
for i, contract in enumerate(response.objects):
    print(f"Contract {i+1}: {list(contract.properties.keys())}")
    print(f"Generated output: {contract.generative.text}")  


Source contracts and generated outputs:
Contract 1: ['date', 'contract_type', 'author', 'contract_length', 'doc_id', 'contract_text']
Generated output: When entering into a service agreement like the one outlined, businesses should be aware of several potential risks or issues. Here are three key concerns:

1. **Termination and Transition Risks**:
   - The agreement allows either party to terminate the contract with a 30-day notice. This short notice period may not provide adequate time for the Client to transition to another service provider or to wrap up ongoing projects. If the Service Provider terminates, the Client might find themselves without essential services, potentially disrupting operations or project timelines. Proper planning for succession or contingency measures should be established to mitigate this risk.

2. **Payment Obligations and Cash Flow**:
   - The Client is obligated to pay a total of $960.08 in monthly installments. If there are cash flow issues or unexpecte

### Or, we can get one unified response from all of our returned objects (this is most liekly what we want in the wild)

In [15]:
response = contracts.generate.near_text(
    query="contract risks liability issues problems",
    limit=3,
    grouped_task="Based on this contract content, what are the main risks or potential issues a business should be aware of? Provide 3 key concerns.",
    grouped_properties=["contract_type", "contract_text"]  # Optional, to limit prompt length
)

# Print the generated output for the group
print("Generated output for all contracts:")
print(response.generative.text)



Generated output for all contracts:
Based on the provided contract content, here are three key concerns or potential risks a business should be aware of:

1. **Termination Conditions and Notice Period**:
   - The termination clauses in both the service and employment contracts allow for termination with relatively short notice (30 days for the service agreement and 2 weeks for the employment contract). This could lead to sudden disruptions in service or staffing which could adversely impact business operations. The business may want to consider a more extended notice period or specific grounds for termination to provide security in personnel changes.

2. **Payment and Compensation Clarity**:
   - The service agreement outlines a total fee and an installment payment plan. However, clarity on services rendered and potential additional costs (e.g., for out-of-scope work) is necessary to prevent disputes. Additionally, the employment contract outlines a relatively low weekly salary which c

In [16]:
# Clean up
client.close()