MSVBASE

What's NEW

New Research Paper SPFresh: Incremental In-Place Update for Billion-Scale Vector Search - published in SOSP 2023
New Research Paper VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity - published in OSDI 2023

What's MSVBASE

MSVBASE is a new system capable of efficiently serving complex online queries that involve both approximate similarity search and relational operators on scalar and vector data-sets. MSVBASE builds a unified query execution engine to support a wide range of queries both on scalar and vector data, and shows superior query performance and accuracy. It integrates high-dimensional vector indices into PostgreSQL with minimal code modifications.

Quickstart

Clone and Patch

git clone https://github.com/microsoft/MSVBASE.git
cd MSVBASE
git submodule update --init --recursive
./scripts/patch.sh

Build

./scripts/dockerbuild.sh

Run

./scripts/dockerrun.sh

Features and SQL syntax

It is compatible with PostgreSQL syntax and protocol, supporting vector distance calculations for L2 and Inner product. It also supports hnsw and sptag indices. Soon, we are going to integrate spann and more indices.

Command Line

docker exec -it --privileged --user=root vbase_open_source bash
psql -U vectordb

SQL syntax

It preserves all the features of PostgreSQL while extending query support on vector data.

Currently, 'float array' is used to store vectors.

create table t_table(id int, price int, vector_1 float8[10], vector_2 float8[10]);

Insert or Import data.

insert into t_table values(1, 10, '{1,2,3,4,5,6,7,8,9,0}', '{5,6,7,1,2,3,4,8,9,1}');
copy t_table from 'your_data_path.tsv' DELIMITER E'\t' csv quote e'\x01';

When creating a vector index, it is necessary to specify the algorithm and the distance calculation method to be used.

create index vector_index_1 on t_table using hnsw(vector_1) with(dimension=10,distmethod=l2_distance);
create index vector_index_2 on t_table using sptag(vector_2 vector_inner_product_ops) with(distmethod=inner_product);

TopK. When calculating distances, the '<->' operator represents the L2 distance, while '<*>' represents the inner product distance.

select id from t_table order by vector_1 <-> '{5,9,8,6,2,1,1,0,4,3}' limit 10;
select id from t_table order by vector_2 <*> '{5,9,8,6,2,1,1,0,4,3}' limit 5;

TopK + Filter.

select id from t_table where price > 15 order by vector_1 <-> '{5,9,8,6,2,1,1,0,4,3}' limit 10;
select id from t_table where price > 15 order by vector_2 <*> '{5,9,8,6,2,1,1,0,4,3}' limit 5;

Distance Range Filter. It also supports distance threshold-based filtering queries. The query will retrieve vector data that is within the distance threshold. In the query, the '<<->>' operator represents the L2 distance, while '<<*>>' represents the inner product distance. The first element of the array represents the distance threshold.

select id from t_table where price > 15 and vector_1 <<->> '{30,5,9,8,6,2,1,1,0,4,3}';

Multi-vector Column Query.

select id from t_table
order by approximate_sum('0.5 * vector_1<->{5,9,8,6,2,1,1,0,4,3} + vector_2<*>{5,9,8,6,2,1,1,0,4,3}' ) limit 5;

Join. Join on vector similarity with threshold

select t_table.id as tid, d_table.id as did
from t_table join d_table
on t_table.vector_2 <<*>> array_cat(ARRAY[cast(10 as float8)], d_table.vector_2);

Example

create database test;
\c test;
create extension vectordb;
create table t_table(id int, price int, vector_1 float8[10]);
insert into t_table values(1, 10, '{1,2,3,4,5,6,7,8,9,0}');
insert into t_table values(2, 20, '{5,6,7,1,2,3,4,8,9,1}');
insert into t_table values(3, 30, '{9,8,7,6,5,4,3,2,1,0}');
create index t4_index on t_table using hnsw(vector_1) with(dimension=10,distmethod=l2_distance);
set enable_seqscan=false;
select id from t_table where price > 15 order by vector_1 <-> '{5,9,8,6,2,1,1,0,4,3}' limit 1;
insert into t_table values(4, 40, '{19,18,17,16,15,14,13,12,11,10}');
delete from t_table where id = 2;

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
evaluation		evaluation
patch		patch
scripts		scripts
sql		sql
src		src
thirdparty		thirdparty
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
azure-pipelines.yml		azure-pipelines.yml
vectordb.control.in		vectordb.control.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSVBASE

What's NEW

What's MSVBASE

Quickstart

Clone and Patch

Build

Run

Features and SQL syntax

Command Line

SQL syntax

Contributing

Trademarks

About

Releases

Packages

Contributors 2

Languages

License

microsoft/MSVBASE

Folders and files

Latest commit

History

Repository files navigation

MSVBASE

What's NEW

What's MSVBASE

Quickstart

Clone and Patch

Build

Run

Features and SQL syntax

Command Line

SQL syntax

Contributing

Trademarks

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages