Skip to content

mihirahuja1/vectorwrap

Repository files navigation

vectorwrap 0.8.0

PyPI GitHub Stars CI Coverage

SQLite→Postgres swap demo

Universal vector search wrapper for Postgres, MySQL, SQLite, DuckDB, ClickHouse (pgvector, HeatWave, sqlite-vss, DuckDB VSS, ClickHouse ANN).

Switch between PostgreSQL, MySQL, SQLite, DuckDB, and ClickHouse vector backends with a single line of code. Perfect for prototyping, testing, and production deployments.

Stable API - Core methods follow semantic versioning with backward compatibility guarantees.

Quick Start

Open in Colab

# Core install (PostgreSQL + MySQL support)
pip install vectorwrap

# Add SQLite support (requires system SQLite with extension support)
pip install "vectorwrap[sqlite]"

# Add DuckDB support (includes VSS extension)
pip install "vectorwrap[duckdb]"

# Add ClickHouse support (includes clickhouse-connect)
pip install "vectorwrap[clickhouse]"

# Install all backends for development
pip install "vectorwrap[sqlite,duckdb,clickhouse]"
from vectorwrap import VectorDB

# Your embedding function (use OpenAI, Hugging Face, etc.)
def embed(text: str) -> list[float]:
    # Return your 1536-dim embeddings here
    return [0.1, 0.2, ...] 

# Connect to any supported database
db = VectorDB("postgresql://user:pass@host/db")  # or mysql://... or sqlite:///path.db or duckdb:///path.db or clickhouse://...
db.create_collection("products", dim=1536)

# Insert vectors with metadata
db.upsert("products", 1, embed("Apple iPhone 15 Pro"), {"category": "phone", "price": 999})
db.upsert("products", 2, embed("Samsung Galaxy S24"), {"category": "phone", "price": 899})

# Semantic search with filtering
results = db.query(
    collection="products",
    query_vector=embed("latest smartphone"),
    top_k=5,
    filter={"category": "phone"}
)
print(results)  # → [(1, 0.023), (2, 0.087)]

Supported Backends

Database Vector Type Indexing Installation Notes
PostgreSQL 16+ + pgvector VECTOR(n) HNSW CREATE EXTENSION vector; Production ready
MySQL 8.2+ HeatWave VECTOR(n) Automatic Built-in Native vector support
MySQL ≤8.0 (legacy) JSON arrays None Built-in Slower, Python distance
MariaDB 11.8+ GA LTS VECTOR(n) HNSW Built-in Native vectors, 10M+ users
MariaDB <11.8 (legacy) JSON arrays None Built-in Auto-fallback, Python distance
SQLite + sqlite-vss Virtual table HNSW pip install "vectorwrap[sqlite]" Great for prototyping
DuckDB + VSS FLOAT[] arrays HNSW pip install "vectorwrap[duckdb]" Analytics + vectors
ClickHouse Array(Float32) HNSW pip install "vectorwrap[clickhouse]" High-performance analytics
Redis + RediSearch Binary vectors HNSW/FLAT pip install "vectorwrap[redis]" Ultra-fast in-memory search
RethinkDB JSON arrays In-memory HNSW pip install "vectorwrap[rethinkdb]" WORLD'S FIRST: Real-time changefeeds

Examples

Complete Example with OpenAI Embeddings

from openai import OpenAI
from vectorwrap import VectorDB

client = OpenAI()

def embed(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Use any database - just change the connection string!
db = VectorDB("postgresql://user:pass@localhost/vectors")
db.create_collection("documents", dim=1536)

# Add some documents
documents = [
    ("Python is a programming language", {"topic": "programming"}),
    ("Machine learning uses neural networks", {"topic": "ai"}),
    ("Databases store structured data", {"topic": "data"}),
]

for i, (text, metadata) in enumerate(documents):
    db.upsert("documents", i, embed(text), metadata)

# Search for similar content
query = "What is artificial intelligence?"
results = db.query("documents", embed(query), top_k=2)

for doc_id, distance in results:
    print(f"Document {doc_id}: distance={distance:.3f}")

Database-Specific Connection Strings

# PostgreSQL with pgvector
db = VectorDB("postgresql://user:password@localhost:5432/mydb")

# MySQL (8.2+ with native vectors or legacy JSON mode)  
db = VectorDB("mysql://user:password@localhost:3306/mydb")

# SQLite (local file or in-memory)
db = VectorDB("sqlite:///./vectors.db")
db = VectorDB("sqlite:///:memory:")

# DuckDB (local file or in-memory)
db = VectorDB("duckdb:///./vectors.db")
db = VectorDB("duckdb:///:memory:")

# ClickHouse (local or remote)
db = VectorDB("clickhouse://default@localhost:8123/default")
db = VectorDB("clickhouse://user:password@host:port/database")

API Reference

VectorDB(connection_string: str) - Stable

Create a vector database connection.

create_collection(name: str, dim: int) - Stable

Create a new collection for vectors of dimension dim.

upsert(collection: str, id: int, vector: list[float], metadata: dict = None) - Stable

Insert or update a vector with optional metadata.

query(collection: str, query_vector: list[float], top_k: int = 5, filter: dict = None) - Stable

Find the top_k most similar vectors. Returns list of (id, distance) tuples.

Filtering Support:

  • PostgreSQL & MySQL: Native SQL filtering
  • SQLite: Adaptive oversampling (fetches more results, then filters)
  • DuckDB: Native JSON filtering with SQL predicates
  • ClickHouse: Native JSON filtering with JSONExtract functions

API Stability

vectorwrap follows semantic versioning and maintains API stability:

Stable APIs (No breaking changes in minor versions)

  • Core Interface: VectorDB() constructor and connection string format
  • Collection Management: create_collection(name, dim)
  • Data Operations: upsert(collection, id, vector, metadata) and query(collection, query_vector, top_k, filter)
  • Return Formats: Query results as [(id, distance), ...] tuples

Evolving APIs (May change in minor versions with deprecation warnings)

  • Backend-specific optimizations: Index configuration, distance metrics
  • Advanced filtering: Complex filter syntax beyond simple key-value pairs
  • Batch operations: Bulk insert/update methods (planned)

Experimental (May change without notice)

  • New backends: Recently added database support may have API refinements
  • Extension methods: Database-specific functionality not in core API

Version Compatibility Promise

  • Patch versions (0.3.1 → 0.3.2): Only bug fixes, no API changes
  • Minor versions (0.3.x → 0.4.0): New features, deprecated APIs get warnings
  • Major versions (0.x → 1.0): Breaking changes allowed, migration guide provided

Current Status: v0.4.0 - Stable release with API backward compatibility guarantees

Installation Notes

SQLite Setup

SQLite support requires loadable extensions. On some systems you may need:

# macOS with Homebrew
brew install sqlite
export LDFLAGS="-L$(brew --prefix sqlite)/lib"
export CPPFLAGS="-I$(brew --prefix sqlite)/include"
pip install "vectorwrap[sqlite]"

# Or use system package manager
# Ubuntu: apt install libsqlite3-dev
# CentOS: yum install sqlite-devel

PostgreSQL Setup

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

MySQL Setup

MySQL 8.2+ has native VECTOR type support. For older versions, vectorwrap automatically falls back to JSON storage with Python-based distance calculations.

MariaDB Setup

MariaDB 11.8 GA LTS introduced native VECTOR data type with HNSW indexing, similar to pgvector. For older versions, vectorwrap automatically falls back to JSON storage.

# MariaDB 11.8+ (native VECTOR support)
db = VectorDB("mariadb://user:pass@localhost:3306/vectordb")
db.create_collection("embeddings", dim=1536)  # Uses VECTOR(1536) with HNSW

# Older versions automatically use JSON fallback
# No code changes needed - version detection is automatic

DuckDB Setup

DuckDB includes the VSS extension by default since v0.10.2. The extension provides HNSW indexing for fast vector similarity search:

# Works out of the box with vectorwrap[duckdb]
db = VectorDB("duckdb:///analytics.db")
db.create_collection("embeddings", dim=1536)  # Auto-creates HNSW index

ClickHouse Setup

ClickHouse provides native support for vector similarity search using ANN indexes:

# Works with vectorwrap[clickhouse]
db = VectorDB("clickhouse://default@localhost:8123/default")
db.create_collection("embeddings", dim=1536)  # Auto-creates HNSW index

Note: ClickHouse vector similarity indexes require ClickHouse version 25.8+ with the experimental feature enabled. The backend automatically handles this configuration.

Use Cases

  • Prototyping: Start with SQLite or DuckDB, scale to PostgreSQL or ClickHouse
  • Testing: Use in-memory databases (SQLite/DuckDB) for fast tests
  • Analytics: DuckDB or ClickHouse for combining vector search with analytical queries
  • Multi-tenant: Different customers on different database backends
  • Migration: Move vector data between database systems seamlessly
  • Hybrid deployments: PostgreSQL for production, DuckDB/ClickHouse for analytics
  • High-performance: ClickHouse for large-scale vector search workloads

Integrations

vectorwrap integrates with popular AI frameworks and platforms:

  • Appwrite: Add AI/vector capabilities to Appwrite apps (uses MariaDB backend) - Testing Guide
  • LangChain: Drop-in VectorStore adapter for RAG pipelines
  • LlamaIndex: VectorStore wrapper for data frameworks
  • Supabase: Managed PostgreSQL + pgvector helper
  • Milvus: Enterprise vector database adapter
  • Qdrant: Cloud-native vector search integration
# Install with integrations
pip install "vectorwrap[langchain]"
pip install "vectorwrap[llamaindex]"
pip install "vectorwrap[milvus]"
pip install "vectorwrap[qdrant]"

Example with Appwrite (No External Vector DB Needed):

from vectorwrap.integrations.appwrite import AppwriteVectorStore

# Add vector search to your Appwrite app
vector_store = AppwriteVectorStore.from_connection_string(
    connection_url="mariadb://appwrite:password@localhost:3306/appwrite",
    collection_name="embeddings",
    dimension=1536
)

# Store and search vectors in Appwrite's MariaDB
vector_store.add_documents([
    {"text": "Hello world", "metadata": {"source": "doc1"}}
], embedding_function=embed_fn)

results = vector_store.search("greeting", embed_fn, top_k=5)

Example with LangChain:

from langchain.embeddings import OpenAIEmbeddings
from vectorwrap.integrations.langchain import VectorwrapStore

embeddings = OpenAIEmbeddings()
vectorstore = VectorwrapStore(
    connection_url="postgresql://user:pass@localhost/db",
    collection_name="documents",
    embedding_function=embeddings
)

vectorstore.add_texts(["Hello world", "LangChain + vectorwrap"])
results = vectorstore.similarity_search("greeting", k=5)

See docs/INTEGRATIONS.md for complete integration guide.

Benchmarks

Comprehensive performance benchmarks are available in the bench/ directory.

Quick benchmark:

pip install "vectorwrap[all]" matplotlib
python bench/benchmark.py
python bench/visualize.py benchmark_results.json

See bench/README.md for detailed benchmarking guide.

Roadmap

v1.0 Stable Release

  • API Freeze: Lock stable APIs with full backward compatibility
  • Production Testing: Comprehensive benchmarks across all backends [DONE]
  • Documentation: Complete API docs and migration guides

Future Features

  • Redis with RediSearch
  • Elasticsearch with dense vector fields
  • Qdrant and Weaviate support
  • Batch operations for bulk inserts
  • Index configuration options
  • Distance metrics: Cosine, dot product, custom functions

License

MIT © 2025 Mihir Ahuja


If vectorwrap saved you time, please star the repo – it helps others discover it!

PyPI PackageGitHub RepositoryReport Issues

About

Universal vector search wrapper for Postgres, MySQL, SQLite (pgvector, Vector Store, sqlite‑vss).

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published