Skip to content

mihirahuja1/vectorwrap

Repository files navigation

vectorwrap 0.9.0

PyPI GitHub Stars CI Coverage

SQLite→Postgres swap demo

Universal vector search wrapper for Postgres, MySQL, SQLite, DuckDB, ClickHouse (pgvector, HeatWave, sqlite-vss, DuckDB VSS, ClickHouse ANN).

Switch between PostgreSQL, MySQL, SQLite, DuckDB, and ClickHouse vector backends with a single line of code. Perfect for prototyping, testing, and production deployments.

Stable API - Core methods follow semantic versioning with backward compatibility guarantees.

Quick Start

Open in Colab

# Core install (PostgreSQL + MySQL support)
pip install vectorwrap

# Add SQLite support (requires system SQLite with extension support)
pip install "vectorwrap[sqlite]"

# Add DuckDB support (includes VSS extension)
pip install "vectorwrap[duckdb]"

# Add ClickHouse support (includes clickhouse-connect)
pip install "vectorwrap[clickhouse]"

# Install all backends for development
pip install "vectorwrap[sqlite,duckdb,clickhouse]"
from vectorwrap import VectorDB

# Your embedding function (use OpenAI, Hugging Face, etc.)
def embed(text: str) -> list[float]:
    # Return your 1536-dim embeddings here
    return [0.1, 0.2, ...] 

# Connect to any supported database
db = VectorDB("postgresql://user:pass@host/db")  # or mysql://... or sqlite:///path.db or duckdb:///path.db or clickhouse://...
db.create_collection("products", dim=1536)

# Insert vectors with metadata
db.upsert("products", 1, embed("Apple iPhone 15 Pro"), {"category": "phone", "price": 999})
db.upsert("products", 2, embed("Samsung Galaxy S24"), {"category": "phone", "price": 899})

# Semantic search with filtering
results = db.query(
    collection="products",
    query_vector=embed("latest smartphone"),
    top_k=5,
    filter={"category": "phone"}
)
print(results)  # → [(1, 0.023), (2, 0.087)]

Supported Backends

Database Vector Type Indexing Installation Notes
PostgreSQL 16+ + pgvector VECTOR(n) HNSW CREATE EXTENSION vector; Production ready
MySQL 8.2+ HeatWave VECTOR(n) Automatic Built-in Native vector support
MySQL ≤8.0 (legacy) JSON arrays None Built-in Slower, Python distance
MariaDB 11.8+ GA LTS VECTOR(n) HNSW Built-in Native vectors, 10M+ users
MariaDB <11.8 (legacy) JSON arrays None Built-in Auto-fallback, Python distance
SQLite + sqlite-vss Virtual table HNSW pip install "vectorwrap[sqlite]" Great for prototyping
DuckDB + VSS FLOAT[] arrays HNSW pip install "vectorwrap[duckdb]" Analytics + vectors
ClickHouse Array(Float32) HNSW pip install "vectorwrap[clickhouse]" High-performance analytics
Redis + RediSearch Binary vectors HNSW/FLAT pip install "vectorwrap[redis]" Ultra-fast in-memory search
RethinkDB JSON arrays In-memory HNSW pip install "vectorwrap[rethinkdb]" WORLD'S FIRST: Real-time changefeeds

Examples

Complete Example with OpenAI Embeddings

from openai import OpenAI
from vectorwrap import VectorDB

client = OpenAI()

def embed(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Use any database - just change the connection string!
db = VectorDB("postgresql://user:pass@localhost/vectors")
db.create_collection("documents", dim=1536)

# Add some documents
documents = [
    ("Python is a programming language", {"topic": "programming"}),
    ("Machine learning uses neural networks", {"topic": "ai"}),
    ("Databases store structured data", {"topic": "data"}),
]

for i, (text, metadata) in enumerate(documents):
    db.upsert("documents", i, embed(text), metadata)

# Search for similar content
query = "What is artificial intelligence?"
results = db.query("documents", embed(query), top_k=2)

for doc_id, distance in results:
    print(f"Document {doc_id}: distance={distance:.3f}")

Database-Specific Connection Strings

# PostgreSQL with pgvector
db = VectorDB("postgresql://user:password@localhost:5432/mydb")

# MySQL (8.2+ with native vectors or legacy JSON mode)  
db = VectorDB("mysql://user:password@localhost:3306/mydb")

# SQLite (local file or in-memory)
db = VectorDB("sqlite:///./vectors.db")
db = VectorDB("sqlite:///:memory:")

# DuckDB (local file or in-memory)
db = VectorDB("duckdb:///./vectors.db")
db = VectorDB("duckdb:///:memory:")

# ClickHouse (local or remote)
db = VectorDB("clickhouse://default@localhost:8123/default")
db = VectorDB("clickhouse://user:password@host:port/database")

API Reference

VectorDB(connection_string: str) - Stable

Create a vector database connection.

create_collection(name: str, dim: int) - Stable

Create a new collection for vectors of dimension dim.

upsert(collection: str, id: int, vector: list[float], metadata: dict = None) - Stable

Insert or update a vector with optional metadata.

query(collection: str, query_vector: list[float], top_k: int = 5, filter: dict = None) - Stable

Find the top_k most similar vectors. Returns list of (id, distance) tuples.

Filtering Support:

  • PostgreSQL & MySQL: Native SQL filtering
  • SQLite: Adaptive oversampling (fetches more results, then filters)
  • DuckDB: Native JSON filtering with SQL predicates
  • ClickHouse: Native JSON filtering with JSONExtract functions

API Stability

vectorwrap follows semantic versioning and maintains API stability:

Stable APIs (No breaking changes in minor versions)

  • Core Interface: VectorDB() constructor and connection string format
  • Collection Management: create_collection(name, dim)
  • Data Operations: upsert(collection, id, vector, metadata) and query(collection, query_vector, top_k, filter)
  • Return Formats: Query results as [(id, distance), ...] tuples

Evolving APIs (May change in minor versions with deprecation warnings)

  • Backend-specific optimizations: Index configuration, distance metrics
  • Advanced filtering: Complex filter syntax beyond simple key-value pairs
  • Batch operations: Bulk insert/update methods (planned)

Experimental (May change without notice)

  • New backends: Recently added database support may have API refinements
  • Extension methods: Database-specific functionality not in core API

Version Compatibility Promise

  • Patch versions (0.3.1 → 0.3.2): Only bug fixes, no API changes
  • Minor versions (0.3.x → 0.4.0): New features, deprecated APIs get warnings
  • Major versions (0.x → 1.0): Breaking changes allowed, migration guide provided

Current Status: v0.4.0 - Stable release with API backward compatibility guarantees

Installation Notes

SQLite Setup

SQLite support requires loadable extensions. On some systems you may need:

# macOS with Homebrew
brew install sqlite
export LDFLAGS="-L$(brew --prefix sqlite)/lib"
export CPPFLAGS="-I$(brew --prefix sqlite)/include"
pip install "vectorwrap[sqlite]"

# Or use system package manager
# Ubuntu: apt install libsqlite3-dev
# CentOS: yum install sqlite-devel

PostgreSQL Setup

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

MySQL Setup

MySQL 8.2+ has native VECTOR type support. For older versions, vectorwrap automatically falls back to JSON storage with Python-based distance calculations.

MariaDB Setup

MariaDB 11.8 GA LTS introduced native VECTOR data type with HNSW indexing, similar to pgvector. For older versions, vectorwrap automatically falls back to JSON storage.

# MariaDB 11.8+ (native VECTOR support)
db = VectorDB("mariadb://user:pass@localhost:3306/vectordb")
db.create_collection("embeddings", dim=1536)  # Uses VECTOR(1536) with HNSW

# Older versions automatically use JSON fallback
# No code changes needed - version detection is automatic

DuckDB Setup

DuckDB includes the VSS extension by default since v0.10.2. The extension provides HNSW indexing for fast vector similarity search:

# Works out of the box with vectorwrap[duckdb]
db = VectorDB("duckdb:///analytics.db")
db.create_collection("embeddings", dim=1536)  # Auto-creates HNSW index

ClickHouse Setup

ClickHouse provides native support for vector similarity search using ANN indexes:

# Works with vectorwrap[clickhouse]
db = VectorDB("clickhouse://default@localhost:8123/default")
db.create_collection("embeddings", dim=1536)  # Auto-creates HNSW index

Note: ClickHouse vector similarity indexes require ClickHouse version 25.8+ with the experimental feature enabled. The backend automatically handles this configuration.

Use Cases

  • Prototyping: Start with SQLite or DuckDB, scale to PostgreSQL or ClickHouse
  • Testing: Use in-memory databases (SQLite/DuckDB) for fast tests
  • Analytics: DuckDB or ClickHouse for combining vector search with analytical queries
  • Multi-tenant: Different customers on different database backends
  • Migration: Move vector data between database systems seamlessly
  • Hybrid deployments: PostgreSQL for production, DuckDB/ClickHouse for analytics
  • High-performance: ClickHouse for large-scale vector search workloads

Integrations

vectorwrap integrates with popular AI frameworks and platforms:

  • Appwrite: Add AI/vector capabilities to Appwrite apps (uses MariaDB backend) - Testing Guide
  • LangChain: Drop-in VectorStore adapter for RAG pipelines
  • LlamaIndex: VectorStore wrapper for data frameworks
  • Supabase: Managed PostgreSQL + pgvector helper
  • Milvus: Enterprise vector database adapter
  • Qdrant: Cloud-native vector search integration
  • Weaviate: Production-scale vector database integration
# Install with integrations
pip install "vectorwrap[langchain]"
pip install "vectorwrap[llamaindex]"
pip install "vectorwrap[milvus]"
pip install "vectorwrap[qdrant]"
pip install "vectorwrap[weaviate]"

Example with Appwrite (No External Vector DB Needed):

from vectorwrap.integrations.appwrite import AppwriteVectorStore

# Add vector search to your Appwrite app
vector_store = AppwriteVectorStore.from_connection_string(
    connection_url="mariadb://appwrite:password@localhost:3306/appwrite",
    collection_name="embeddings",
    dimension=1536
)

# Store and search vectors in Appwrite's MariaDB
vector_store.add_documents([
    {"text": "Hello world", "metadata": {"source": "doc1"}}
], embedding_function=embed_fn)

results = vector_store.search("greeting", embed_fn, top_k=5)

Example with LangChain:

from langchain.embeddings import OpenAIEmbeddings
from vectorwrap.integrations.langchain import VectorwrapStore

embeddings = OpenAIEmbeddings()
vectorstore = VectorwrapStore(
    connection_url="postgresql://user:pass@localhost/db",
    collection_name="documents",
    embedding_function=embeddings
)

vectorstore.add_texts(["Hello world", "LangChain + vectorwrap"])
results = vectorstore.similarity_search("greeting", k=5)

Example with Weaviate:

from vectorwrap.integrations.weaviate import WeaviateBackend

# Connect to Weaviate (local or cloud)
db = WeaviateBackend(url="http://localhost:8080")
# Or cloud: WeaviateBackend(url="https://xxx.weaviate.network", api_key="your-key")

# Create collection and insert vectors
db.create_collection("documents", dim=1536)
db.upsert("documents", 1, embedding_vector, {"source": "doc1"})

# Query with metadata filters
results = db.query("documents", query_vector, top_k=10, filter={"source": "doc1"})

See docs/INTEGRATIONS.md for complete integration guide.

Benchmarks

Comprehensive performance benchmarks are available in the bench/ directory.

Quick benchmark:

pip install "vectorwrap[all]" matplotlib
python bench/benchmark.py
python bench/visualize.py benchmark_results.json

See bench/README.md for detailed benchmarking guide.

Roadmap

v1.0 Stable Release

  • API Freeze: Lock stable APIs with full backward compatibility
  • Production Testing: Comprehensive benchmarks across all backends [DONE]
  • Documentation: Complete API docs and migration guides

Future Features

  • Elasticsearch with dense vector fields
  • Batch operations for bulk inserts
  • Index configuration options
  • Distance metrics: Cosine, dot product, custom functions

License

MIT © 2025 Mihir Ahuja


If vectorwrap saved you time, please star the repo – it helps others discover it!

PyPI PackageGitHub RepositoryReport Issues

About

Universal vector search wrapper for Postgres, MySQL, SQLite (pgvector, Vector Store, sqlite‑vss).

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors