Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Milvus integration for vector create and search #1269

Merged
merged 27 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9fa1266
Added Milvus integration for vector create and search
RichardZhangRZ Oct 9, 2023
2793799
Merge branch 'staging' into milvus-integration
RichardZhangRZ Oct 9, 2023
0459817
Added docs and switched to using provided values for Milvus
RichardZhangRZ Oct 10, 2023
f1c6e6c
Fixed merge conflicts
RichardZhangRZ Oct 10, 2023
02d6d24
Skip Milvus integration test
RichardZhangRZ Oct 10, 2023
cc72ac5
Removed unnecessary required param
RichardZhangRZ Oct 10, 2023
f6d3202
Fixed linter errors
RichardZhangRZ Oct 10, 2023
11d9bc8
Fixed doc issues
RichardZhangRZ Oct 10, 2023
400aed6
Some quick changes
RichardZhangRZ Oct 10, 2023
8d69263
Removed value
RichardZhangRZ Oct 10, 2023
8965a8e
Merge branch 'staging' into milvus-integration
RichardZhangRZ Oct 16, 2023
3444f17
Added linting suppression
RichardZhangRZ Oct 17, 2023
a12c4ee
test commit
RichardZhangRZ Oct 17, 2023
1c17276
Added more words to diciontary
RichardZhangRZ Oct 17, 2023
4815828
Temp change
RichardZhangRZ Oct 17, 2023
0e38d3b
Formatting
RichardZhangRZ Oct 17, 2023
d0ef3e7
Revert "Temp change"
RichardZhangRZ Oct 17, 2023
410361b
Skip Milvus installation for testing:
RichardZhangRZ Oct 18, 2023
a71dd5d
Resolved merge conflicts
RichardZhangRZ Oct 23, 2023
790dc02
adopted to configuration management changes
RichardZhangRZ Oct 24, 2023
b7b5cb6
Add skip marker
RichardZhangRZ Oct 24, 2023
3e77b2e
Add temp change
RichardZhangRZ Oct 24, 2023
d67371a
Revert "Add temp change"
RichardZhangRZ Oct 24, 2023
e578ed0
formatting
RichardZhangRZ Oct 24, 2023
c5ea4e4
Merge branch 'staging' into milvus-integration
xzdandy Oct 27, 2023
c0206b1
Remove milvus from circle ci installation
xzdandy Oct 27, 2023
15b4791
Update the documentation to use the new `SET` statement
xzdandy Oct 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ parts:
- file: source/reference/vector_databases/qdrant
- file: source/reference/vector_databases/pgvector
- file: source/reference/vector_databases/pinecone
- file: source/reference/vector_databases/milvus

- file: source/reference/ai/index
title: AI Engines
Expand Down
37 changes: 37 additions & 0 deletions docs/source/reference/vector_databases/milvus.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Milvus
==========

Milvus is an open-source, distributed vector database designed for similarity search and analytics on large-scale vector data.
The connection to Milvus is based on the `pymilvus <https://pymilvus.readthedocs.io/en/latest>`_ library.

Dependency
----------

* pymilvus

Parameters
----------

To use Milvus you must have a URI to a running Milvus instance. Here are the `instructions to spin up a local instance <https://milvus.io/docs/install_standalone-docker.md>`_.
If you are running it locally, the Milvus instance should be running on ``http://localhost:19530``. Please be sure that the Milvus version is >= 2.3.0. Below are values that the Milvus integration uses:

* `MILVUS_URI` is the URI of the Milvus instance (which would be ``http://localhost:19530`` when running locally). **This value is required**
* `MILVUS_USER` is the name of the user for the Milvus instance.
* `MILVUS_PASSWORD` is the password of the user for the Milvus instance.
* `MILVUS_DB_NAME` is the name of the database to be used. This will default to the `default` database if not provided.
* `MILVUS_TOKEN` is the authorization token for the Milvus instance.

The above values can either be set via the ``SET`` statement, or in the os environment fields "MILVUS_URI", "MILVUS_USER", "MILVUS_PASSWORD", "MILVUS_DB_NAME", and "MILVUS_TOKEN"


.. code-block:: sql

SET MILVUS_URI = 'http://localhost:19530';


Create Index
-----------------

.. code-block:: sql

CREATE INDEX index_name ON table_name (data) USING MILVUS;
1 change: 1 addition & 0 deletions evadb/catalog/catalog_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ class VectorStoreType(EvaDBEnum):
PINECONE # noqa: F821
PGVECTOR # noqa: F821
CHROMADB # noqa: F821
MILVUS # noqa: F821


class VideoColumnName(EvaDBEnum):
Expand Down
5 changes: 5 additions & 0 deletions evadb/evadb_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,9 @@
"OPENAI_API_KEY": "",
"PINECONE_API_KEY": "",
"PINECONE_ENV": "",
"MILVUS_URI": "",
"MILVUS_USER": "",
"MILVUS_PASSWORD": "",
"MILVUS_DB_NAME": "",
"MILVUS_TOKEN": "",
}
12 changes: 12 additions & 0 deletions evadb/executor/executor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,18 @@ def handle_vector_store_params(
),
"PINECONE_ENV": catalog().get_configuration_catalog_value("PINECONE_ENV"),
}
elif vector_store_type == VectorStoreType.MILVUS:
return {
"MILVUS_URI": catalog().get_configuration_catalog_value("MILVUS_URI"),
"MILVUS_USER": catalog().get_configuration_catalog_value("MILVUS_USER"),
"MILVUS_PASSWORD": catalog().get_configuration_catalog_value(
"MILVUS_PASSWORD"
),
"MILVUS_DB_NAME": catalog().get_configuration_catalog_value(
"MILVUS_DB_NAME"
),
"MILVUS_TOKEN": catalog().get_configuration_catalog_value("MILVUS_TOKEN"),
}
else:
raise ValueError("Unsupported vector store type: {}".format(vector_store_type))

Expand Down
2 changes: 1 addition & 1 deletion evadb/interfaces/relational/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ def create_vector_index(
index_name (str): Name of the index.
table_name (str): Name of the table.
expr (str): Expression used to build the vector index.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB`.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB` or `MILVUS`.

Returns:
EvaDBCursor: The EvaDBCursor object.
Expand Down
3 changes: 2 additions & 1 deletion evadb/parser/evadb.lark
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ function_metadata_key: uid

function_metadata_value: constant

vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB)
vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB | MILVUS)

index_elem: ("(" uid_list ")"
| "(" function_call ")")
Expand Down Expand Up @@ -424,6 +424,7 @@ QDRANT: "QDRANT"i
PINECONE: "PINECONE"i
PGVECTOR: "PGVECTOR"i
CHROMADB: "CHROMADB"i
MILVUS: "MILVUS"i

// Computer vision tasks

Expand Down
2 changes: 2 additions & 0 deletions evadb/parser/lark_visitor/_create_statements.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,8 @@ def vector_store_type(self, tree):
vector_store_type = VectorStoreType.PGVECTOR
elif str.upper(token) == "CHROMADB":
vector_store_type = VectorStoreType.CHROMADB
elif str.upper(token) == "MILVUS":
vector_store_type = VectorStoreType.MILVUS
return vector_store_type


Expand Down
146 changes: 146 additions & 0 deletions evadb/third_party/vector_stores/milvus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# coding=utf-8
# Copyright 2018-2023 EvaDB
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import List

from evadb.third_party.vector_stores.types import (
FeaturePayload,
VectorIndexQuery,
VectorIndexQueryResult,
VectorStore,
)
from evadb.utils.generic_utils import try_to_import_milvus_client

allowed_params = [
"MILVUS_URI",
"MILVUS_USER",
"MILVUS_PASSWORD",
"MILVUS_DB_NAME",
"MILVUS_TOKEN",
]
required_params = []
_milvus_client_instance = None


def get_milvus_client(
milvus_uri: str,
milvus_user: str,
milvus_password: str,
milvus_db_name: str,
milvus_token: str,
):
global _milvus_client_instance
if _milvus_client_instance is None:
try_to_import_milvus_client()
import pymilvus

_milvus_client_instance = pymilvus.MilvusClient(
uri=milvus_uri,
user=milvus_user,
password=milvus_password,
db_name=milvus_db_name,
token=milvus_token,
)

return _milvus_client_instance


class MilvusVectorStore(VectorStore):
def __init__(self, index_name: str, **kwargs) -> None:
# Milvus URI is the only required
self._milvus_uri = kwargs.get("MILVUS_URI")

if not self._milvus_uri:
self._milvus_uri = os.environ.get("MILVUS_URI")

assert (
self._milvus_uri
), "Please set your Milvus URI in evadb.yml file (third_party, MILVUS_URI) or environment variable (MILVUS_URI)."

# Check other Milvus variables for additional customization
self._milvus_user = kwargs.get("MILVUS_USER")

if not self._milvus_user:
self._milvus_user = os.environ.get("MILVUS_USER", "")

self._milvus_password = kwargs.get("MILVUS_PASSWORD")

if not self._milvus_password:
self._milvus_password = os.environ.get("MILVUS_PASSWORD", "")

self._milvus_db_name = kwargs.get("MILVUS_DB_NAME")

if not self._milvus_db_name:
self._milvus_db_name = os.environ.get("MILVUS_DB_NAME", "")

self._milvus_token = kwargs.get("MILVUS_TOKEN")

if not self._milvus_token:
self._milvus_token = os.environ.get("MILVUS_TOKEN", "")

self._client = get_milvus_client(
milvus_uri=self._milvus_uri,
milvus_user=self._milvus_user,
milvus_password=self._milvus_password,
milvus_db_name=self._milvus_db_name,
milvus_token=self._milvus_token,
)
self._collection_name = index_name

def create(self, vector_dim: int):
if self._collection_name in self._client.list_collections():
self._client.drop_collection(self._collection_name)
self._client.create_collection(
collection_name=self._collection_name,
dimension=vector_dim,
metric_type="COSINE",
)

def add(self, payload: List[FeaturePayload]):
milvus_data = [
{
"id": feature_payload.id,
"vector": feature_payload.embedding.reshape(-1).tolist(),
}
for feature_payload in payload
]
ids = [feature_payload.id for feature_payload in payload]

# Milvus Client does not have upsert operation, perform delete + insert to emulate it
self._client.delete(collection_name=self._collection_name, pks=ids)

self._client.insert(collection_name=self._collection_name, data=milvus_data)
xzdandy marked this conversation as resolved.
Show resolved Hide resolved

def persist(self):
self._client.flush(self._collection_name)
xzdandy marked this conversation as resolved.
Show resolved Hide resolved

def delete(self) -> None:
self._client.drop_collection(
collection_name=self._collection_name,
)

def query(self, query: VectorIndexQuery) -> VectorIndexQueryResult:
response = self._client.search(
collection_name=self._collection_name,
data=[query.embedding.reshape(-1).tolist()],
limit=query.top_k,
)[0]

distances, ids = [], []
for result in response:
distances.append(result["distance"])
ids.append(result["id"])

return VectorIndexQueryResult(distances, ids)
9 changes: 9 additions & 0 deletions evadb/third_party/vector_stores/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from evadb.catalog.catalog_type import VectorStoreType
from evadb.third_party.vector_stores.chromadb import ChromaDBVectorStore
from evadb.third_party.vector_stores.faiss import FaissVectorStore
from evadb.third_party.vector_stores.milvus import MilvusVectorStore
from evadb.third_party.vector_stores.pinecone import PineconeVectorStore
from evadb.third_party.vector_stores.qdrant import QdrantVectorStore
from evadb.utils.generic_utils import validate_kwargs
Expand Down Expand Up @@ -50,5 +51,13 @@ def init_vector_store(
validate_kwargs(kwargs, required_params, required_params)
return ChromaDBVectorStore(index_name, **kwargs)

elif vector_store_type == VectorStoreType.MILVUS:
from evadb.third_party.vector_stores.milvus import (
allowed_params,
required_params,
)

validate_kwargs(kwargs, allowed_params, required_params)
return MilvusVectorStore(index_name, **kwargs)
else:
raise Exception(f"Vector store {vector_store_type} not supported")
18 changes: 18 additions & 0 deletions evadb/utils/generic_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,16 @@ def try_to_import_chromadb_client():
)


def try_to_import_milvus_client():
try:
import pymilvus # noqa: F401
except ImportError:
raise ValueError(
"""Could not import pymilvus python package.
Please install it with 'pip install pymilvus`."""
)


def is_qdrant_available() -> bool:
try:
try_to_import_qdrant_client()
Expand All @@ -605,6 +615,14 @@ def is_chromadb_available() -> bool:
return False


def is_milvus_available() -> bool:
try:
try_to_import_milvus_client()
return True
except ValueError:
return False


##############################
## UTILS
##############################
Expand Down
8 changes: 8 additions & 0 deletions script/formatting/spelling.txt
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,11 @@ MaxPool
MemeImages
MemoTest
MetaData
MILVUS
Milvus
Milvus's
MilvusClient
MilvusVectorStore
MindsDB
MiniLM
MnistImageClassifier
Expand Down Expand Up @@ -1402,6 +1407,7 @@ memeimages
metaclass
metafile
metainfo
milvus
mindsdb
miniconda
mins
Expand Down Expand Up @@ -1526,6 +1532,7 @@ ptype
pushdown
px
py
pymilvus
pymupdfs
pypi
pypirc
Expand Down Expand Up @@ -1655,6 +1662,7 @@ testCreateIndexName
testDeleteOne
testFaissIndexImageDataset
testFaissIndexScanRewrite
testMilvusIndexImageDataset
testIndex
testIndexAutoUpdate
testOpenTable
Expand Down
7 changes: 5 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ def read(path, encoding="utf-8"):

chromadb_libs = ["chromadb"]

milvus_libs = ["pymilvus>=2.3.0"]
Chitti-Ankith marked this conversation as resolved.
Show resolved Hide resolved

postgres_libs = [
"psycopg2",
]
Expand All @@ -123,8 +125,8 @@ def read(path, encoding="utf-8"):
xgboost_libs = ["flaml[automl]"]

forecasting_libs = [
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast" # MODEL TRAIN AND FINE TUNING
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast", # MODEL TRAIN AND FINE TUNING
]

imagegen_libs = [
Expand Down Expand Up @@ -168,6 +170,7 @@ def read(path, encoding="utf-8"):
"qdrant": qdrant_libs,
"pinecone": pinecone_libs,
"chromadb": chromadb_libs,
"milvus": milvus_libs,
"postgres": postgres_libs,
"ludwig": ludwig_libs,
"sklearn": sklearn_libs,
Expand Down
Loading