# Advanced Weaviate Vector Database Tutorial

This notebook demonstrates advanced usage of Weaviate using the custom `WeaviateCloudClient` wrapper. We'll cover:

1. Using the custom `WeaviateCloudClient` wrapper
2. Creating collections with different vectorizers
3. Batch data import
4. Advanced querying techniques
5. Error handling and fallbacks

## Prerequisites

- A Weaviate Cloud account
- API keys for Weaviate and OpenRouter/OpenAI
- Environment variables set in a `.env` file

In [39]:
# Install required packages
%pip install weaviate-client python-dotenv requests

Note: you may need to restart the kernel to use updated packages.


## 1. Setting Up Environment and Imports

In [40]:
import os
import time
import json
from dotenv import load_dotenv
import requests

# Import our custom WeaviateCloudClient
from weaviate_client_v4 import WeaviateCloudClient

In [41]:
# Load environment variables from .env file
load_dotenv()

# Verify environment variables
WEAVIATE_URL = os.getenv("WEAVIATE_URL")
WEAVIATE_API_KEY = os.getenv("WEAVIATE_API_KEY")
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Print environment variable status (masked for security)
print(f"WEAVIATE_URL: {'Set' if WEAVIATE_URL else 'Not set'}")
print(f"WEAVIATE_API_KEY: {'Set' if WEAVIATE_API_KEY else 'Not set'}")
print(f"OPENROUTER_API_KEY: {'Set' if OPENROUTER_API_KEY else 'Not set'}")
print(f"OPENAI_API_KEY: {'Set' if OPENAI_API_KEY else 'Not set'}")

# Verify required variables
if not all([WEAVIATE_URL, WEAVIATE_API_KEY]):
    print("⚠️ Warning: Required environment variables are missing. Check your .env file.")
else:
    print("✅ Required environment variables loaded.")

WEAVIATE_URL: Set
WEAVIATE_API_KEY: Set
OPENROUTER_API_KEY: Set
OPENAI_API_KEY: Not set
✅ Required environment variables loaded.


## 2. Connecting to Weaviate using WeaviateCloudClient

The `WeaviateCloudClient` is a custom wrapper around the Weaviate v4 client that provides additional functionality and error handling.

In [42]:
# Create a client instance
print("Creating WeaviateCloudClient instance...")
client = WeaviateCloudClient()

# Connect to Weaviate
print("\nConnecting to Weaviate...")
try:
    client.connect()
    print("✅ Connected to Weaviate")
    
    # Get meta information
    meta_info = client.get_meta_info()
    print("\nMeta Information:")
    print(f"  Version: {meta_info.get('version', 'N/A')}")
    print(f"  Hostname: {meta_info.get('hostname', 'N/A')}")
    
except Exception as e:
    print(f"❌ Failed to connect: {e}")

2025-05-12 22:13:49,606 - weaviate_client_v4 - INFO - Initialized Weaviate Cloud client for URL: nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud


Creating WeaviateCloudClient instance...

Connecting to Weaviate...


2025-05-12 22:13:50,655 - httpx - INFO - HTTP Request: GET https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/meta "HTTP/1.1 200 OK"
2025-05-12 22:13:50,714 - weaviate_client_v4 - INFO - Connected to Weaviate Cloud at nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud
2025-05-12 22:13:50,904 - httpx - INFO - HTTP Request: GET https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/meta "HTTP/1.1 200 OK"


✅ Connected to Weaviate

Meta Information:
  Version: 1.30.1
  Hostname: http://[::]:8080


## 3. Listing Existing Collections

Let's see what collections already exist in our Weaviate instance.

In [43]:
try:
    collections = client.list_collections()
    print("Existing collections:")
    if collections:
        for i, collection in enumerate(collections):
            print(f"  {i+1}. {collection}")
    else:
        print("  No collections found.")
except Exception as e:
    print(f"❌ Failed to list collections: {e}")

2025-05-12 22:13:51,102 - httpx - INFO - HTTP Request: GET https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/schema "HTTP/1.1 200 OK"


Existing collections:
  No collections found.


## 4. Creating a Collection

Let's create a new collection for storing articles.

In [44]:
# Define collection name
collection_name = "Articles"

# Check if collection exists and delete it
print(f"Checking if {collection_name} exists...")
collections = client.list_collections()
if collection_name in collections:
    print(f"  {collection_name} exists, deleting it...")
    client.delete_collection(collection_name)
    print(f"✅ Deleted {collection_name}")
    time.sleep(2)  # Give it time to delete
else:
    print(f"  {collection_name} does not exist")

# Define properties for the collection
properties = [
    {
        "name": "title",
        "dataType": ["text"],
        "description": "Title of the article",
    },
    {
        "name": "content",
        "dataType": ["text"],
        "description": "Content of the article",
    },
    {
        "name": "category",
        "dataType": ["text"],
        "description": "Category of the article",
    },
    {
        "name": "publishDate",
        "dataType": ["date"],
        "description": "Publication date",
    }
]

# Create the collection
print(f"\nCreating {collection_name} collection...")
try:
    client.create_collection(collection_name, properties)
    print(f"✅ Created {collection_name} collection")
except Exception as e:
    print(f"❌ Failed to create collection: {e}")

Checking if Articles exists...


2025-05-12 22:13:51,766 - httpx - INFO - HTTP Request: GET https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/schema "HTTP/1.1 200 OK"


  Articles does not exist

Creating Articles collection...


2025-05-12 22:13:52,333 - weaviate_client_v4 - INFO - Response status: 200
2025-05-12 22:13:52,333 - weaviate_client_v4 - INFO - Response content: {"class":"Articles","invertedIndexConfig":{"bm25":{"b":0.75,"k1":1.2},"cleanupIntervalSeconds":60,"stopwords":{"additions":null,"preset":"en","removals":null},"usingBlockMaxWAND":true},"multiTenancyConfig":{"autoTenantActivation":false,"autoTenantCreation":false,"enabled":false},"properties":[{"dataType":["text"],"description":"Title of the article","indexFilterable":true,"indexRangeFilters":false,"indexSearchable":true,"name":"title","tokenization":"word"},{"dataType":["text"],"description":"Content of the article","indexFilterable":true,"indexRangeFilters":false,"indexSearchable":true,"name":"content","tokenization":"word"},{"dataType":["text"],"description":"Category of the article","indexFilterable":true,"indexRangeFilters":false,"indexSearchable":true,"name":"category","tokenization":"word"},{"dataType":["date"],"description":"Publicati

✅ Created Articles collection


## 5. Inserting Data

Now let's insert some sample data into our collection.

In [45]:
# Sample data to insert
articles = [
    {
        "title": "Understanding Vector Databases",
        "content": "Vector databases store data as high-dimensional vectors, making them ideal for semantic search and AI applications.",
        "category": "Technology",
        "publishDate": "2023-01-15T00:00:00Z"
    },
    {
        "title": "Machine Learning Basics",
        "content": "Machine learning is a subset of artificial intelligence that enables systems to learn from data without explicit programming.",
        "category": "Technology",
        "publishDate": "2023-02-20T00:00:00Z"
    },
    {
        "title": "The Future of AI",
        "content": "Artificial intelligence continues to evolve, with new breakthroughs in natural language processing and computer vision.",
        "category": "Technology",
        "publishDate": "2023-03-10T00:00:00Z"
    },
    {
        "title": "Data Science Trends",
        "content": "Data science is rapidly evolving with new tools and techniques for analyzing and visualizing large datasets.",
        "category": "Data Science",
        "publishDate": "2023-04-05T00:00:00Z"
    }
]

# Insert articles
print("\nInserting data...")
for article in articles:
    try:
        object_id = client.insert_object(collection_name, article)
        print(f"✅ Inserted article: '{article['title']}' with ID: {object_id}")
    except Exception as e:
        print(f"❌ Failed to insert article '{article['title']}': {e}")

# Wait for indexing to complete
print("\nWaiting for indexing to complete...")
time.sleep(3)

2025-05-12 22:13:52,483 - httpx - INFO - HTTP Request: POST https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/objects "HTTP/1.1 200 OK"
2025-05-12 22:13:52,512 - weaviate_client_v4 - INFO - Inserted object into Articles with ID: 99721c38-4d08-4c66-b353-362e72860c26



Inserting data...
✅ Inserted article: 'Understanding Vector Databases' with ID: 99721c38-4d08-4c66-b353-362e72860c26


2025-05-12 22:13:52,653 - httpx - INFO - HTTP Request: POST https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/objects "HTTP/1.1 200 OK"
2025-05-12 22:13:52,653 - weaviate_client_v4 - INFO - Inserted object into Articles with ID: 85ded774-b751-4ff1-8c85-59999ef93894


✅ Inserted article: 'Machine Learning Basics' with ID: 85ded774-b751-4ff1-8c85-59999ef93894


2025-05-12 22:13:52,784 - httpx - INFO - HTTP Request: POST https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/objects "HTTP/1.1 200 OK"
2025-05-12 22:13:52,839 - weaviate_client_v4 - INFO - Inserted object into Articles with ID: 898231ca-345f-41a8-8e19-fc5e68cc7ffe
2025-05-12 22:13:52,952 - httpx - INFO - HTTP Request: POST https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/objects "HTTP/1.1 200 OK"
2025-05-12 22:13:52,952 - weaviate_client_v4 - INFO - Inserted object into Articles with ID: 6b52194c-84af-4272-9847-7d3e80418550


✅ Inserted article: 'The Future of AI' with ID: 898231ca-345f-41a8-8e19-fc5e68cc7ffe
✅ Inserted article: 'Data Science Trends' with ID: 6b52194c-84af-4272-9847-7d3e80418550

Waiting for indexing to complete...


## 6. Querying Data

Let's query the data we've inserted.

In [46]:
# Query all objects
print("\nQuerying all objects...")
try:
    objects = client.query_objects(collection_name, ["title", "category", "publishDate"], 10)
    print(f"✅ Retrieved {len(objects)} objects:")
    for obj in objects:
        print(f"  - {obj.get('title', 'N/A')} ({obj.get('category', 'N/A')})")
except Exception as e:
    print(f"❌ Failed to query objects: {e}")


Querying all objects...


2025-05-12 22:13:56,436 - weaviate_client_v4 - INFO - Retrieved 4 objects from Articles


✅ Retrieved 4 objects:
  - Data Science Trends (Data Science)
  - Machine Learning Basics (Technology)
  - The Future of AI (Technology)
  - Understanding Vector Databases (Technology)


## 7. Searching Data

Let's try to search for articles using a text query. Note that this requires a vectorizer to be configured.

In [47]:
# Search for articles
print("\nSearching for articles about 'artificial intelligence'...")
try:
    search_results = client.search_objects(collection_name, "artificial intelligence", ["title", "content", "category"], 5)
    print(f"✅ Found {len(search_results)} results:")
    for result in search_results:
        print(f"  - {result.get('title', 'N/A')} ({result.get('category', 'N/A')})")
        print(f"    {result.get('content', 'N/A')}")
except Exception as e:
    print(f"❌ Search failed: {e}")
    print("   Note: Semantic search requires a vectorizer like text2vec-openai.")
    print("   The 'none' vectorizer used in this example doesn't support semantic search.")


Searching for articles about 'artificial intelligence'...


2025-05-12 22:13:56,881 - weaviate_client_v4 - INFO - Search for 'artificial intelligence' returned 0 objects from Articles


✅ Found 0 results:


## 8. Error Handling and Fallbacks

The `WeaviateCloudClient` includes built-in error handling and fallbacks. Let's see how it handles errors.

In [48]:
# Try to access a non-existent collection
print("\nTrying to query a non-existent collection...")
try:
    objects = client.query_objects("NonExistentCollection", ["title"], 10)
    print(f"Retrieved {len(objects)} objects")
except Exception as e:
    print(f"❌ Expected error: {e}")
    print("   The client attempted to handle this error gracefully.")


Trying to query a non-existent collection...


2025-05-12 22:13:57,333 - weaviate_client_v4 - INFO - Retrieved 0 objects from NonExistentCollection


Retrieved 0 objects


## 9. Cleaning Up

Finally, let's clean up by deleting the collection we created.

In [49]:
# Clean up
print("\nCleaning up...")
try:
    client.delete_collection(collection_name)
    print(f"✅ Deleted {collection_name} collection")
except Exception as e:
    print(f"❌ Failed to delete collection: {e}")

# Close the client
client.close()
print("\n=== Example Complete ===")


Cleaning up...


2025-05-12 22:13:57,647 - httpx - INFO - HTTP Request: DELETE https://nrdv2vfqrjogir9kivhsg.c0.asia-southeast1.gcp.weaviate.cloud/v1/schema/Articles "HTTP/1.1 200 OK"
2025-05-12 22:13:57,647 - weaviate_client_v4 - INFO - Deleted collection: Articles
2025-05-12 22:13:57,647 - weaviate_client_v4 - INFO - Closed connection to Weaviate Cloud


✅ Deleted Articles collection

=== Example Complete ===


## 10. Understanding the WeaviateCloudClient Implementation

The `WeaviateCloudClient` class provides a simplified interface to Weaviate with robust error handling. Here's a summary of its key features:

1. **Automatic Environment Loading**: Loads credentials from environment variables
2. **Robust Connection Handling**: Tries multiple connection methods if the primary one fails
3. **REST API Fallbacks**: Falls back to REST API calls when gRPC operations fail
4. **Simplified Methods**: Provides easy-to-use methods for common operations
5. **Comprehensive Error Handling**: Catches and logs errors at multiple levels

The client is designed to work with Weaviate Cloud instances and handles the complexities of the v4 API.

## Conclusion

In this notebook, we've demonstrated how to use the custom `WeaviateCloudClient` wrapper to interact with Weaviate. This wrapper provides a simplified interface with robust error handling, making it easier to work with Weaviate in production environments.

Key takeaways:
- The `WeaviateCloudClient` simplifies common Weaviate operations
- It provides fallback mechanisms when primary operations fail
- It handles errors gracefully and provides detailed logging
- It's designed specifically for Weaviate Cloud instances

For more advanced use cases, you may want to use the native Weaviate v4 client directly, but the `WeaviateCloudClient` is a good starting point for many applications.