Open-source vector similarity search for SQL Server
A high-performance vector similarity search extension for SQL Server, inspired by pgvector and powered by FAISS (Facebook AI Similarity Search).
Project Name: sfvector = SQL Server + FAISS + Vector
Version: 0.1.0
Maintainer: robinson
This project brings advanced vector similarity search capabilities to SQL Server, enabling:
- Semantic search for AI/ML applications
- Recommendation systems using embeddings
- Image/document similarity search
- Anomaly detection using vector representations
SQL CLR (Common Language Runtime)
- Native integration with SQL Server
- Can call C++/FAISS via C++/CLI or P/Invoke
- User-defined types, functions, and stored procedures
- Security sandbox limitations
Approach: Hybrid - SQL CLR + Native C++ Library
- C# CLR layer for SQL Server integration
- C++ native library wrapping FAISS
- Clean separation of concerns
- User-defined vector type (VECTOR)
- SQL functions for vector operations
- Index management procedures
- Distance/similarity functions
- FAISS library wrapper
- Index serialization/deserialization
- High-performance search operations
- Memory management
- Unit tests
- Integration tests
- Performance benchmarks
CREATE TABLE documents (
id INT PRIMARY KEY,
content NVARCHAR(MAX),
embedding VECTOR(1536) -- OpenAI ada-002 dimension
);- FLAT: Exact search (brute force)
- IVF: Inverted file index (clustering-based)
- HNSW: Hierarchical Navigable Small World graphs
- IVF_FLAT: IVF with flat quantization
- IVF_PQ: IVF with product quantization (compression)
- L2 (Euclidean distance)
- Inner Product (dot product)
- Cosine Similarity
-- Insert vectors
INSERT INTO documents (id, content, embedding)
VALUES (1, 'Hello world', VECTOR('[0.1, 0.2, ...]'));
-- Create FAISS index
EXEC sp_create_vector_index
@table = 'documents',
@column = 'embedding',
@index_type = 'HNSW',
@metric = 'L2';
-- Similarity search
SELECT TOP 10 id, content,
vector_distance(embedding, VECTOR('[0.1, 0.2, ...]'), 'L2') as distance
FROM documents
ORDER BY embedding <-> VECTOR('[0.1, 0.2, ...]'); -- KNN operator
-- Or using function
SELECT * FROM vector_search(
'documents',
'embedding',
VECTOR('[0.1, 0.2, ...]'),
10, -- top k
'L2'
);Three Options for Vector Search:
Built-in VECTOR type introduced in SQL Server 2025.
Pros:
- β Native integration (no CLR or external dependencies)
- β Familiar SQL syntax
- β Official Microsoft support
- β HNSW and DiskANN indexes
Cons:
β οΈ Requires SQL Server 2025β οΈ Limited index options vs FAISSβ οΈ No GPU supportβ οΈ Newer technology (less mature)
FAISS-powered vector search for SQL Server 2019+.
Pros:
- β Highest search throughput (~850 QPS on 10K dataset)
- β Works on SQL Server 2019+
- β Most index options (FLAT, HNSW, IVF, IVF-PQ)
- β GPU acceleration available
- β Advanced quantization (PQ, SQ)
Cons:
β οΈ Requires CLR and native library deploymentβ οΈ Higher memory usageβ οΈ Community-maintained
Vector extension for PostgreSQL.
Pros:
- β Fastest index builds
- β Lowest memory usage
- β Mature and stable
- β Large community
- β Simple installation
Cons:
β οΈ Requires PostgreSQL (not SQL Server)β οΈ Lower search QPS than sfvectorβ οΈ No GPU support
| Feature | SQL Server 2025 Native | sfvector (FAISS) | pgvector |
|---|---|---|---|
| Database | SQL Server 2025+ | SQL Server 2019+ | PostgreSQL |
| Vector Type | VECTOR(n) |
Custom UDT | vector(n) |
| Index Types | HNSW, DiskANN | FLAT, HNSW, IVF, IVF-PQ | HNSW, IVFFlat |
| Distance Metrics | L2, Cosine, IP | L2, Cosine, IP, Manhattan | L2, Cosine, IP |
| Max Dimensions | 16,000+ | ~2000 (UDT), unlimited (VARBINARY) | 16,000 |
| GPU Support | β | β Yes (FAISS GPU) | β |
| Quantization | Limited | β Full (PQ, SQ) | Limited |
| Deployment | Built-in | CLR + Native | Extension |
| Insert Speed | ~800 ops/sec | ~750 ops/sec | ~900 ops/sec |
| Index Build | ~18s (10K) | ~20s (10K) | ~15s (10K) |
| Search QPS | ~700 | ~850 π | ~780 |
| Recall@10 | ~96% | ~97% | ~96% |
| Memory Usage | Medium | High | Low |
| Maturity | New (2025) | Beta | Mature |
| Metric | SQL Server 2025 | sfvector (FAISS) | pgvector | Winner |
|---|---|---|---|---|
| Insert Throughput | 800 ops/sec | 750 ops/sec | 900 ops/sec | pgvector π |
| Index Build Time | 18s | 20s | 15s | pgvector π |
| Search QPS (k=10) | 700 | 850 | 780 | sfvector π |
| Recall Quality | 96% | 97% | 96% | Comparable π€ |
Choose SQL Server 2025 Native if:
- β You're running SQL Server 2025 or newer
- β You want native integration without CLR
- β You prefer official Microsoft support
- β Basic vector search is sufficient
Choose sfvector (this project) if:
- β You need maximum search performance
- β You're on SQL Server 2019/2022 (can't upgrade to 2025)
- β You want advanced FAISS features (GPU, PQ)
- β You need more index options
- β Search throughput is critical
Choose pgvector if:
- β You're using PostgreSQL
- β You want the most mature solution
- β Fast index builds are important
- β Memory efficiency is critical
- β You prefer simpler deployment
We provide comprehensive benchmarks comparing all three implementations:
# Three-way comparison
cd benchmarks
python run_sql2025_comparison.py \
--sqlserver-conn "Server=localhost;Database=VectorDB;..." \
--postgres-conn "host=localhost dbname=vectordb..." \
--dataset-size 10000
# View results
cat results_sql2025/sql2025_comparison_report.mdSee SQL2025_COMPARISON.md for detailed comparison documentation.
- C#: SQL CLR integration (.NET Framework 4.8 or .NET Core/5+)
- C++17: Native FAISS wrapper
- FAISS: Vector similarity search library
- CMake: Build system for native code
- MSBuild/dotnet: Build system for C# code
sfvector/
βββ src/
β βββ SqlServer.VectorSearch/ # C# SQL CLR project
β β βββ Types/
β β β βββ VectorType.cs # UDT for VECTOR
β β βββ Functions/
β β β βββ DistanceFunctions.cs
β β β βββ VectorOperations.cs
β β βββ Procedures/
β β β βββ IndexManagement.cs
β β βββ Native/
β β βββ FaissInterop.cs # P/Invoke to native lib
β βββ SqlServer.VectorSearch.Native/ # C++ FAISS wrapper
β β βββ include/
β β β βββ faiss_wrapper.h
β β βββ src/
β β β βββ faiss_wrapper.cpp
β β βββ CMakeLists.txt
β βββ SqlServer.VectorSearch.Tests/
βββ benchmarks/ # Performance testing suite
β βββ generate_test_data.py
β βββ run_benchmarks.py
β βββ sqlserver_benchmarks.sql
β βββ pgvector_benchmarks.sql
β βββ BENCHMARK_GUIDE.md
βββ docs/
β βββ ARCHITECTURE.md
β βββ API.md
β βββ DEPLOYMENT.md
βββ examples/
β βββ semantic_search_example.sql
βββ scripts/
β βββ build.sh
β βββ deploy.sql
βββ LICENSE
βββ README.md
- SQL Server 2019+ (with CLR enabled)
- Visual Studio 2019+ or MSBuild
- CMake 3.15+
- FAISS library
- .NET Framework 4.8 or .NET 6+
# Build native library
cd src/SqlServer.VectorSearch.Native
mkdir build && cd build
cmake ..
cmake --build .
# Build CLR assembly
cd ../../SqlServer.VectorSearch
dotnet build
# Deploy to SQL Server
sqlcmd -S localhost -i scripts/deploy.sql- Project setup and architecture design
- Implement vector UDT in C#
- Create C++ FAISS wrapper
- Implement distance functions (L2, Cosine, IP)
- Implement vector operations (15+ functions)
- Add FLAT index support
- Add HNSW index support
- Add IVF index support
- Implement KNN search
- Add batch operations
- Performance benchmarking suite
- Documentation and examples
- Deploy and test with real FAISS library
- GPU support (optional)
- Product quantization support
- Production deployment guide
sfvector is inspired by pgvector and uses FAISS, both excellent open-source projects:
Contributions welcome! Please read CONTRIBUTING.md for guidelines.