Skip to content

Commit

Permalink
Add Milvus integration for vector create and search (#1269)
Browse files Browse the repository at this point in the history
Integrated Milvus vector store into EvaDB. Added a `MilvusVectorStore`
class and Milvus type for query parsing and execution.
Below are environment values for the use of the Milvus index:

* `MILVUS_URI` is the URI of the Milvus instance (which would be
http://localhost:19530 when running locally). **This value is required**
* `MILVUS_USER` is the name of the user for the Milvus instance.
* `MILVUS_PASSWORD` is the password of the user for the Milvus instance.
* `MILVUS_DB_NAME` is the name of the database to be used. This will
default to the `default` database if not provided.
* `MILVUS_TOKEN` is the authorization token for the Milvus instance.

---------

Co-authored-by: Andy Xu <xzdandy@gmail.com>
  • Loading branch information
RichardZhangRZ and xzdandy committed Oct 27, 2023
1 parent af696d6 commit 71b9aca
Show file tree
Hide file tree
Showing 15 changed files with 304 additions and 5 deletions.
1 change: 1 addition & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ parts:
- file: source/reference/vector_databases/qdrant
- file: source/reference/vector_databases/pgvector
- file: source/reference/vector_databases/pinecone
- file: source/reference/vector_databases/milvus

- file: source/reference/ai/index
title: AI Engines
Expand Down
37 changes: 37 additions & 0 deletions docs/source/reference/vector_databases/milvus.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Milvus
==========

Milvus is an open-source, distributed vector database designed for similarity search and analytics on large-scale vector data.
The connection to Milvus is based on the `pymilvus <https://pymilvus.readthedocs.io/en/latest>`_ library.

Dependency
----------

* pymilvus

Parameters
----------

To use Milvus you must have a URI to a running Milvus instance. Here are the `instructions to spin up a local instance <https://milvus.io/docs/install_standalone-docker.md>`_.
If you are running it locally, the Milvus instance should be running on ``http://localhost:19530``. Please be sure that the Milvus version is >= 2.3.0. Below are values that the Milvus integration uses:

* `MILVUS_URI` is the URI of the Milvus instance (which would be ``http://localhost:19530`` when running locally). **This value is required**
* `MILVUS_USER` is the name of the user for the Milvus instance.
* `MILVUS_PASSWORD` is the password of the user for the Milvus instance.
* `MILVUS_DB_NAME` is the name of the database to be used. This will default to the `default` database if not provided.
* `MILVUS_TOKEN` is the authorization token for the Milvus instance.

The above values can either be set via the ``SET`` statement, or in the os environment fields "MILVUS_URI", "MILVUS_USER", "MILVUS_PASSWORD", "MILVUS_DB_NAME", and "MILVUS_TOKEN"


.. code-block:: sql
SET MILVUS_URI = 'http://localhost:19530';
Create Index
-----------------

.. code-block:: sql
CREATE INDEX index_name ON table_name (data) USING MILVUS;
1 change: 1 addition & 0 deletions evadb/catalog/catalog_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ class VectorStoreType(EvaDBEnum):
PINECONE # noqa: F821
PGVECTOR # noqa: F821
CHROMADB # noqa: F821
MILVUS # noqa: F821


class VideoColumnName(EvaDBEnum):
Expand Down
5 changes: 5 additions & 0 deletions evadb/evadb_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,9 @@
"OPENAI_API_KEY": "",
"PINECONE_API_KEY": "",
"PINECONE_ENV": "",
"MILVUS_URI": "",
"MILVUS_USER": "",
"MILVUS_PASSWORD": "",
"MILVUS_DB_NAME": "",
"MILVUS_TOKEN": "",
}
12 changes: 12 additions & 0 deletions evadb/executor/executor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,18 @@ def handle_vector_store_params(
),
"PINECONE_ENV": catalog().get_configuration_catalog_value("PINECONE_ENV"),
}
elif vector_store_type == VectorStoreType.MILVUS:
return {
"MILVUS_URI": catalog().get_configuration_catalog_value("MILVUS_URI"),
"MILVUS_USER": catalog().get_configuration_catalog_value("MILVUS_USER"),
"MILVUS_PASSWORD": catalog().get_configuration_catalog_value(
"MILVUS_PASSWORD"
),
"MILVUS_DB_NAME": catalog().get_configuration_catalog_value(
"MILVUS_DB_NAME"
),
"MILVUS_TOKEN": catalog().get_configuration_catalog_value("MILVUS_TOKEN"),
}
else:
raise ValueError("Unsupported vector store type: {}".format(vector_store_type))

Expand Down
2 changes: 1 addition & 1 deletion evadb/interfaces/relational/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ def create_vector_index(
index_name (str): Name of the index.
table_name (str): Name of the table.
expr (str): Expression used to build the vector index.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB`.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB` or `MILVUS`.
Returns:
EvaDBCursor: The EvaDBCursor object.
Expand Down
3 changes: 2 additions & 1 deletion evadb/parser/evadb.lark
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ function_metadata_key: uid

function_metadata_value: constant

vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB)
vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB | MILVUS)

index_elem: ("(" uid_list ")"
| "(" function_call ")")
Expand Down Expand Up @@ -424,6 +424,7 @@ QDRANT: "QDRANT"i
PINECONE: "PINECONE"i
PGVECTOR: "PGVECTOR"i
CHROMADB: "CHROMADB"i
MILVUS: "MILVUS"i

// Computer vision tasks

Expand Down
2 changes: 2 additions & 0 deletions evadb/parser/lark_visitor/_create_statements.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,8 @@ def vector_store_type(self, tree):
vector_store_type = VectorStoreType.PGVECTOR
elif str.upper(token) == "CHROMADB":
vector_store_type = VectorStoreType.CHROMADB
elif str.upper(token) == "MILVUS":
vector_store_type = VectorStoreType.MILVUS
return vector_store_type


Expand Down
146 changes: 146 additions & 0 deletions evadb/third_party/vector_stores/milvus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# coding=utf-8
# Copyright 2018-2023 EvaDB
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import List

from evadb.third_party.vector_stores.types import (
FeaturePayload,
VectorIndexQuery,
VectorIndexQueryResult,
VectorStore,
)
from evadb.utils.generic_utils import try_to_import_milvus_client

allowed_params = [
"MILVUS_URI",
"MILVUS_USER",
"MILVUS_PASSWORD",
"MILVUS_DB_NAME",
"MILVUS_TOKEN",
]
required_params = []
_milvus_client_instance = None


def get_milvus_client(
milvus_uri: str,
milvus_user: str,
milvus_password: str,
milvus_db_name: str,
milvus_token: str,
):
global _milvus_client_instance
if _milvus_client_instance is None:
try_to_import_milvus_client()
import pymilvus

_milvus_client_instance = pymilvus.MilvusClient(
uri=milvus_uri,
user=milvus_user,
password=milvus_password,
db_name=milvus_db_name,
token=milvus_token,
)

return _milvus_client_instance


class MilvusVectorStore(VectorStore):
def __init__(self, index_name: str, **kwargs) -> None:
# Milvus URI is the only required
self._milvus_uri = kwargs.get("MILVUS_URI")

if not self._milvus_uri:
self._milvus_uri = os.environ.get("MILVUS_URI")

assert (
self._milvus_uri
), "Please set your Milvus URI in evadb.yml file (third_party, MILVUS_URI) or environment variable (MILVUS_URI)."

# Check other Milvus variables for additional customization
self._milvus_user = kwargs.get("MILVUS_USER")

if not self._milvus_user:
self._milvus_user = os.environ.get("MILVUS_USER", "")

self._milvus_password = kwargs.get("MILVUS_PASSWORD")

if not self._milvus_password:
self._milvus_password = os.environ.get("MILVUS_PASSWORD", "")

self._milvus_db_name = kwargs.get("MILVUS_DB_NAME")

if not self._milvus_db_name:
self._milvus_db_name = os.environ.get("MILVUS_DB_NAME", "")

self._milvus_token = kwargs.get("MILVUS_TOKEN")

if not self._milvus_token:
self._milvus_token = os.environ.get("MILVUS_TOKEN", "")

self._client = get_milvus_client(
milvus_uri=self._milvus_uri,
milvus_user=self._milvus_user,
milvus_password=self._milvus_password,
milvus_db_name=self._milvus_db_name,
milvus_token=self._milvus_token,
)
self._collection_name = index_name

def create(self, vector_dim: int):
if self._collection_name in self._client.list_collections():
self._client.drop_collection(self._collection_name)
self._client.create_collection(
collection_name=self._collection_name,
dimension=vector_dim,
metric_type="COSINE",
)

def add(self, payload: List[FeaturePayload]):
milvus_data = [
{
"id": feature_payload.id,
"vector": feature_payload.embedding.reshape(-1).tolist(),
}
for feature_payload in payload
]
ids = [feature_payload.id for feature_payload in payload]

# Milvus Client does not have upsert operation, perform delete + insert to emulate it
self._client.delete(collection_name=self._collection_name, pks=ids)

self._client.insert(collection_name=self._collection_name, data=milvus_data)

def persist(self):
self._client.flush(self._collection_name)

def delete(self) -> None:
self._client.drop_collection(
collection_name=self._collection_name,
)

def query(self, query: VectorIndexQuery) -> VectorIndexQueryResult:
response = self._client.search(
collection_name=self._collection_name,
data=[query.embedding.reshape(-1).tolist()],
limit=query.top_k,
)[0]

distances, ids = [], []
for result in response:
distances.append(result["distance"])
ids.append(result["id"])

return VectorIndexQueryResult(distances, ids)
9 changes: 9 additions & 0 deletions evadb/third_party/vector_stores/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from evadb.catalog.catalog_type import VectorStoreType
from evadb.third_party.vector_stores.chromadb import ChromaDBVectorStore
from evadb.third_party.vector_stores.faiss import FaissVectorStore
from evadb.third_party.vector_stores.milvus import MilvusVectorStore
from evadb.third_party.vector_stores.pinecone import PineconeVectorStore
from evadb.third_party.vector_stores.qdrant import QdrantVectorStore
from evadb.utils.generic_utils import validate_kwargs
Expand Down Expand Up @@ -50,5 +51,13 @@ def init_vector_store(
validate_kwargs(kwargs, required_params, required_params)
return ChromaDBVectorStore(index_name, **kwargs)

elif vector_store_type == VectorStoreType.MILVUS:
from evadb.third_party.vector_stores.milvus import (
allowed_params,
required_params,
)

validate_kwargs(kwargs, allowed_params, required_params)
return MilvusVectorStore(index_name, **kwargs)
else:
raise Exception(f"Vector store {vector_store_type} not supported")
18 changes: 18 additions & 0 deletions evadb/utils/generic_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,16 @@ def try_to_import_chromadb_client():
)


def try_to_import_milvus_client():
try:
import pymilvus # noqa: F401
except ImportError:
raise ValueError(
"""Could not import pymilvus python package.
Please install it with 'pip install pymilvus`."""
)


def is_qdrant_available() -> bool:
try:
try_to_import_qdrant_client()
Expand All @@ -605,6 +615,14 @@ def is_chromadb_available() -> bool:
return False


def is_milvus_available() -> bool:
try:
try_to_import_milvus_client()
return True
except ValueError:
return False


##############################
## UTILS
##############################
Expand Down
8 changes: 8 additions & 0 deletions script/formatting/spelling.txt
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,11 @@ MaxPool
MemeImages
MemoTest
MetaData
MILVUS
Milvus
Milvus's
MilvusClient
MilvusVectorStore
MindsDB
MiniLM
MnistImageClassifier
Expand Down Expand Up @@ -1402,6 +1407,7 @@ memeimages
metaclass
metafile
metainfo
milvus
mindsdb
miniconda
mins
Expand Down Expand Up @@ -1526,6 +1532,7 @@ ptype
pushdown
px
py
pymilvus
pymupdfs
pypi
pypirc
Expand Down Expand Up @@ -1655,6 +1662,7 @@ testCreateIndexName
testDeleteOne
testFaissIndexImageDataset
testFaissIndexScanRewrite
testMilvusIndexImageDataset
testIndex
testIndexAutoUpdate
testOpenTable
Expand Down
7 changes: 5 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ def read(path, encoding="utf-8"):

chromadb_libs = ["chromadb"]

milvus_libs = ["pymilvus>=2.3.0"]

postgres_libs = [
"psycopg2",
]
Expand All @@ -123,8 +125,8 @@ def read(path, encoding="utf-8"):
xgboost_libs = ["flaml[automl]"]

forecasting_libs = [
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast" # MODEL TRAIN AND FINE TUNING
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast", # MODEL TRAIN AND FINE TUNING
]

imagegen_libs = [
Expand Down Expand Up @@ -168,6 +170,7 @@ def read(path, encoding="utf-8"):
"qdrant": qdrant_libs,
"pinecone": pinecone_libs,
"chromadb": chromadb_libs,
"milvus": milvus_libs,
"postgres": postgres_libs,
"ludwig": ludwig_libs,
"sklearn": sklearn_libs,
Expand Down
Loading

0 comments on commit 71b9aca

Please sign in to comment.