# Getting Started with VectorDB

This notebook provides a comprehensive introduction to VectorDB, a high-performance vector database for similarity search applications.

## What you'll learn

1. Installing and setting up VectorDB
2. Creating your first vector database
3. Adding vectors with metadata
4. Performing similarity searches
5. Filtering search results
6. Persisting data to disk

## 1. Installation

Install VectorDB using pip:

In [None]:
# Install VectorDB (uncomment to run)
# !pip install vectordb

# Or install from source
# !pip install -e /path/to/vectordb

## 2. Import and Setup

In [None]:
import numpy as np
from vectordb import VectorDatabase

# Set random seed for reproducibility
np.random.seed(42)

print("VectorDB imported successfully!")

## 3. Creating a Database

VectorDB can operate in two modes:
- **In-memory**: Fast, but data is lost when the program ends
- **Persistent**: Data is saved to disk and survives restarts

In [None]:
# Create an in-memory database
db = VectorDatabase()
print(f"Created in-memory database")

# Or create a persistent database
# db = VectorDatabase(storage_path="./my_vectordb")

## 4. Creating a Collection

A **collection** is a group of vectors with the same dimensionality. Think of it like a table in a traditional database.

In [None]:
# Create a collection for 128-dimensional vectors
collection = db.create_collection(
    name="my_first_collection",
    dimension=128,
    metric="cosine",  # Distance metric: 'cosine', 'euclidean', or 'dot'
    description="My first vector collection"
)

print(f"Created collection: {collection.name}")
print(f"Dimension: {collection.dimension}")
print(f"Metric: {collection.metric}")

## 5. Adding Vectors

Let's add some vectors to our collection. Vectors can have:
- **IDs**: Unique identifiers (auto-generated if not provided)
- **Metadata**: Additional information stored with each vector

In [None]:
# Generate some sample vectors
n_vectors = 1000
dimension = 128
vectors = np.random.randn(n_vectors, dimension).astype(np.float32)

# Normalize vectors (recommended for cosine similarity)
vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)

print(f"Generated {n_vectors} vectors of dimension {dimension}")
print(f"Vector shape: {vectors.shape}")

In [None]:
# Create metadata for each vector
categories = ["tech", "science", "art", "sports", "music"]
metadata = [
    {
        "category": categories[i % len(categories)],
        "score": float(i) / n_vectors,
        "active": i % 2 == 0,
        "index": i
    }
    for i in range(n_vectors)
]

print(f"Sample metadata: {metadata[0]}")

In [None]:
# Add vectors to the collection
ids = collection.add(vectors, metadata=metadata)

print(f"Added {len(ids)} vectors")
print(f"Sample IDs: {ids[:5]}")
print(f"Collection size: {collection.count()}")

## 6. Similarity Search

Now let's search for vectors similar to a query vector.

In [None]:
# Create a query vector
query = np.random.randn(dimension).astype(np.float32)
query = query / np.linalg.norm(query)  # Normalize

# Search for the 5 most similar vectors
results = collection.search(query, k=5)

print("Search Results:")
print("-" * 50)
for i, result in enumerate(results):
    print(f"{i+1}. ID: {result['id']}")
    print(f"   Distance: {result['distance']:.4f}")
    print(f"   Category: {result['metadata']['category']}")
    print()

In [None]:
# Search for a known vector (should return itself as the closest match)
known_vector = vectors[42]
results = collection.search(known_vector, k=3)

print("Searching for vector at index 42:")
for result in results:
    print(f"  ID: {result['id']}, Distance: {result['distance']:.6f}, Index: {result['metadata']['index']}")

## 7. Filtered Search

You can filter search results based on metadata conditions.

In [None]:
# Search only within "tech" category
results = collection.search(
    query,
    k=5,
    filter={"category": "tech"}
)

print("Results filtered by category='tech':")
for result in results:
    print(f"  {result['id']}: category={result['metadata']['category']}, distance={result['distance']:.4f}")

In [None]:
# Search with multiple filter conditions
results = collection.search(
    query,
    k=5,
    filter={
        "$and": [
            {"category": {"$in": ["tech", "science"]}},
            {"score": {"$gte": 0.5}},
            {"active": True}
        ]
    }
)

print("Results with compound filter:")
for result in results:
    meta = result['metadata']
    print(f"  {result['id']}: category={meta['category']}, score={meta['score']:.2f}, active={meta['active']}")

## 8. Batch Search

For efficiency, you can search with multiple queries at once.

In [None]:
# Create multiple query vectors
n_queries = 5
queries = np.random.randn(n_queries, dimension).astype(np.float32)
queries = queries / np.linalg.norm(queries, axis=1, keepdims=True)

# Batch search
batch_results = collection.search_batch(queries, k=3)

print(f"Batch search with {n_queries} queries:")
for i, results in enumerate(batch_results):
    print(f"\nQuery {i+1} results:")
    for result in results:
        print(f"  {result['id']}: distance={result['distance']:.4f}")

## 9. Retrieving and Updating Vectors

In [None]:
# Get a vector by ID
vector_id = ids[0]
result = collection.get(vector_id)

print(f"Retrieved vector {vector_id}:")
print(f"  Vector shape: {np.array(result['vector']).shape}")
print(f"  Metadata: {result['metadata']}")

In [None]:
# Update vector metadata
collection.update(
    vector_id,
    metadata={"category": "updated", "note": "This vector was updated"}
)

# Verify the update
result = collection.get(vector_id)
print(f"Updated metadata: {result['metadata']}")

## 10. Deleting Vectors

In [None]:
# Check current count
print(f"Before deletion: {collection.count()} vectors")

# Delete a single vector
collection.delete(ids[0])
print(f"After deleting 1 vector: {collection.count()} vectors")

# Delete multiple vectors
collection.delete(ids[1:6])
print(f"After deleting 5 more: {collection.count()} vectors")

## 11. Collection Management

In [None]:
# List all collections
print("Collections:", db.list_collections())

# Get collection info
info = collection.info()
print(f"\nCollection info:")
for key, value in info.items():
    print(f"  {key}: {value}")

In [None]:
# Create another collection
collection2 = db.create_collection(
    name="high_dim_collection",
    dimension=768,  # Common embedding dimension
    metric="cosine"
)

print("Collections:", db.list_collections())

In [None]:
# Delete a collection
db.delete_collection("high_dim_collection")
print("After deletion:", db.list_collections())

## 12. Cleanup

In [None]:
# Close the database connection
db.close()
print("Database closed.")

## Summary

In this notebook, you learned how to:

1. ✅ Create a VectorDB database (in-memory or persistent)
2. ✅ Create collections with specific dimensions and metrics
3. ✅ Add vectors with metadata
4. ✅ Perform similarity searches
5. ✅ Filter search results using metadata
6. ✅ Use batch operations for efficiency
7. ✅ Retrieve, update, and delete vectors
8. ✅ Manage collections

## Next Steps

- **02_index_comparison.ipynb**: Learn about different index types and when to use them
- **03_performance_tuning.ipynb**: Optimize VectorDB for your specific use case