Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Milvus integration for vector create and search #1269

Merged
merged 27 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9fa1266
Added Milvus integration for vector create and search
RichardZhangRZ Oct 9, 2023
2793799
Merge branch 'staging' into milvus-integration
RichardZhangRZ Oct 9, 2023
0459817
Added docs and switched to using provided values for Milvus
RichardZhangRZ Oct 10, 2023
f1c6e6c
Fixed merge conflicts
RichardZhangRZ Oct 10, 2023
02d6d24
Skip Milvus integration test
RichardZhangRZ Oct 10, 2023
cc72ac5
Removed unnecessary required param
RichardZhangRZ Oct 10, 2023
f6d3202
Fixed linter errors
RichardZhangRZ Oct 10, 2023
11d9bc8
Fixed doc issues
RichardZhangRZ Oct 10, 2023
400aed6
Some quick changes
RichardZhangRZ Oct 10, 2023
8d69263
Removed value
RichardZhangRZ Oct 10, 2023
8965a8e
Merge branch 'staging' into milvus-integration
RichardZhangRZ Oct 16, 2023
3444f17
Added linting suppression
RichardZhangRZ Oct 17, 2023
a12c4ee
test commit
RichardZhangRZ Oct 17, 2023
1c17276
Added more words to diciontary
RichardZhangRZ Oct 17, 2023
4815828
Temp change
RichardZhangRZ Oct 17, 2023
0e38d3b
Formatting
RichardZhangRZ Oct 17, 2023
d0ef3e7
Revert "Temp change"
RichardZhangRZ Oct 17, 2023
410361b
Skip Milvus installation for testing:
RichardZhangRZ Oct 18, 2023
a71dd5d
Resolved merge conflicts
RichardZhangRZ Oct 23, 2023
790dc02
adopted to configuration management changes
RichardZhangRZ Oct 24, 2023
b7b5cb6
Add skip marker
RichardZhangRZ Oct 24, 2023
3e77b2e
Add temp change
RichardZhangRZ Oct 24, 2023
d67371a
Revert "Add temp change"
RichardZhangRZ Oct 24, 2023
e578ed0
formatting
RichardZhangRZ Oct 24, 2023
c5ea4e4
Merge branch 'staging' into milvus-integration
xzdandy Oct 27, 2023
c0206b1
Remove milvus from circle ci installation
xzdandy Oct 27, 2023
15b4791
Update the documentation to use the new `SET` statement
xzdandy Oct 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -230,16 +230,16 @@ jobs:
pip install --upgrade pip
if [ $RAY = "ENABLED" ]; then
if [ $PY_VERSION != "3.11" ]; then
pip install ".[dev,ray,qdrant,pinecone,chromadb]"
pip install ".[dev,ray,qdrant,pinecone,chromadb,milvus]"
else
pip install ".[dev,pinecone,chromadb]" # ray < 2.5.0 does not work with python 3.11 ray-project/ray#33864
pip install ".[dev,pinecone,chromadb,milvus]" # ray < 2.5.0 does not work with python 3.11 ray-project/ray#33864
fi
python -c "import yaml;f = open('evadb/evadb.yml', 'r+');config_obj = yaml.load(f, Loader=yaml.FullLoader);config_obj['experimental']['ray'] = True;f.seek(0);f.write(yaml.dump(config_obj));f.truncate();"
else
if [ $PY_VERSION != "3.11" ]; then
pip install ".[dev,ludwig,qdrant,pinecone,chromadb]"
pip install ".[dev,ludwig,qdrant,pinecone,chromadb,milvus]"
else
pip install ".[dev,pinecone,chromadb]" # ray < 2.5.0 does not work with python 3.11 ray-project/ray#33864
pip install ".[dev,pinecone,chromadb,milvus]" # ray < 2.5.0 does not work with python 3.11 ray-project/ray#33864
Copy link
Collaborator

@xzdandy xzdandy Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove milvus since we are not running any test.

fi
fi

Expand Down Expand Up @@ -486,7 +486,7 @@ jobs:
source test_evadb/bin/activate
pip install --upgrade pip
pip debug --verbose
pip install ".[dev,ludwig,qdrant,forecasting,pinecone,chromadb]"
pip install ".[dev,ludwig,qdrant,forecasting,pinecone,chromadb,milvus]"
xzdandy marked this conversation as resolved.
Show resolved Hide resolved
source test_evadb/bin/activate
bash script/test/test.sh -m "<< parameters.mode >>"

Expand Down
1 change: 1 addition & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ parts:
- file: source/reference/vector_databases/qdrant
- file: source/reference/vector_databases/pgvector
- file: source/reference/vector_databases/pinecone
- file: source/reference/vector_databases/milvus

- file: source/reference/ai/index
title: AI Engines
Expand Down
31 changes: 31 additions & 0 deletions docs/source/reference/vector_databases/milvus.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Milvus
==========

Milvus is an open-source, distributed vector database designed for similarity search and analytics on large-scale vector data.
The connection to Milvus is based on the `pymilvus <https://pymilvus.readthedocs.io/en/latest>`_ library.

Dependency
----------

* pymilvus

Parameters
----------

To use Milvus you must have a URI to a running Milvus instance. Here are the `instructions to spin up a local instance <https://milvus.io/docs/install_standalone-docker.md>`_.
If you are running it locally, the Milvus instance should be running on ``http://localhost:19530``. Please be sure that the Milvus version is >= 2.3.0. Below are values that the Milvus integration uses:

* `MILVUS_URI` is the URI of the Milvus instance (which would be ``http://localhost:19530`` when running locally). **This value is required**
* `MILVUS_USER` is the name of the user for the Milvus instance.
* `MILVUS_PASSWORD` is the password of the user for the Milvus instance.
* `MILVUS_DB_NAME` is the name of the database to be used. This will default to the `default` database if not provided.
* `MILVUS_TOKEN` is the authorization token for the Milvus instance.

The above values can either be set in the evadb.yml config file, or in the os environment fields "MILVUS_URI", "MILVUS_USER", "MILVUS_PASSWORD", "MILVUS_DB_NAME", and "MILVUS_TOKEN"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SET command is the way to do it now. Example: SET MILVUS_URI = 'http://localhost:19530;'. We no longer have evadb.yml file. OS environment is not recommended, since it does not work in a client-server setup.


Create Index
-----------------

.. code-block:: text

CREATE INDEX index_name ON table_name (data) USING MILVUS;
1 change: 1 addition & 0 deletions evadb/catalog/catalog_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ class VectorStoreType(EvaDBEnum):
PINECONE # noqa: F821
PGVECTOR # noqa: F821
CHROMADB # noqa: F821
MILVUS # noqa: F821


class VideoColumnName(EvaDBEnum):
Expand Down
7 changes: 6 additions & 1 deletion evadb/evadb.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,9 @@ third_party:
OPENAI_KEY: ""
PINECONE_API_KEY: ""
PINECONE_ENV: ""
REPLICATE_API_TOKEN: ""
MILVUS_URI: ""
MILVUS_USER: ""
MILVUS_PASSWORD: ""
MILVUS_DB_NAME: ""
MILVUS_TOKEN: ""
REPLICATE_API_TOKEN: ""
2 changes: 2 additions & 0 deletions evadb/executor/executor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,8 @@ def handle_vector_store_params(
return {"index_path": str(Path(index_path).parent)}
elif vector_store_type == VectorStoreType.PINECONE:
return {}
elif vector_store_type == VectorStoreType.MILVUS:
return {}
else:
raise ValueError("Unsupported vector store type: {}".format(vector_store_type))

Expand Down
2 changes: 1 addition & 1 deletion evadb/interfaces/relational/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ def create_vector_index(
index_name (str): Name of the index.
table_name (str): Name of the table.
expr (str): Expression used to build the vector index.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB`.
using (str): Method used for indexing, can be `FAISS` or `QDRANT` or `PINECONE` or `CHROMADB` or `MILVUS`.

Returns:
EvaDBCursor: The EvaDBCursor object.
Expand Down
3 changes: 2 additions & 1 deletion evadb/parser/evadb.lark
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ function_metadata_key: uid

function_metadata_value: string_literal | decimal_literal

vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB)
vector_store_type: USING (FAISS | QDRANT | PINECONE | PGVECTOR | CHROMADB | MILVUS)

index_elem: ("(" uid_list ")"
| "(" function_call ")")
Expand Down Expand Up @@ -423,6 +423,7 @@ QDRANT: "QDRANT"i
PINECONE: "PINECONE"i
PGVECTOR: "PGVECTOR"i
CHROMADB: "CHROMADB"i
MILVUS: "MILVUS"i

// Computer vision tasks

Expand Down
2 changes: 2 additions & 0 deletions evadb/parser/lark_visitor/_create_statements.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,8 @@ def vector_store_type(self, tree):
vector_store_type = VectorStoreType.PGVECTOR
elif str.upper(token) == "CHROMADB":
vector_store_type = VectorStoreType.CHROMADB
elif str.upper(token) == "MILVUS":
vector_store_type = VectorStoreType.MILVUS
return vector_store_type


Expand Down
148 changes: 148 additions & 0 deletions evadb/third_party/vector_stores/milvus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# coding=utf-8
# Copyright 2018-2023 EvaDB
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import List

from evadb.configuration.configuration_manager import ConfigurationManager
from evadb.third_party.vector_stores.types import (
FeaturePayload,
VectorIndexQuery,
VectorIndexQueryResult,
VectorStore,
)
from evadb.utils.generic_utils import try_to_import_milvus_client

required_params = []
_milvus_client_instance = None


def get_milvus_client(
milvus_uri: str,
milvus_user: str,
milvus_password: str,
milvus_db_name: str,
milvus_token: str,
):
global _milvus_client_instance
if _milvus_client_instance is None:
try_to_import_milvus_client()
import pymilvus

_milvus_client_instance = pymilvus.MilvusClient(
uri=milvus_uri,
user=milvus_user,
password=milvus_password,
db_name=milvus_db_name,
token=milvus_token,
)

return _milvus_client_instance


class MilvusVectorStore(VectorStore):
def __init__(self, index_name: str) -> None:
# Milvus URI is the only required
self._milvus_uri = ConfigurationManager().get_value("third_party", "MILVUS_URI")

if not self._milvus_uri:
self._milvus_uri = os.environ.get("MILVUS_URI")

assert (
self._milvus_uri
), "Please set your Milvus URI in evadb.yml file (third_party, MILVUS_URI) or environment variable (MILVUS_URI)."

# Check other Milvus variables for additional customization
self._milvus_user = ConfigurationManager().get_value(
"third_party", "MILVUS_USER"
)

if not self._milvus_user:
self._milvus_user = os.environ.get("MILVUS_USER", "")

self._milvus_password = ConfigurationManager().get_value(
"third_party", "MILVUS_PASSWORD"
)

if not self._milvus_password:
self._milvus_password = os.environ.get("MILVUS_PASSWORD", "")

self._milvus_db_name = ConfigurationManager().get_value(
"third_party", "MILVUS_DB_NAME"
)

if not self._milvus_db_name:
self._milvus_db_name = os.environ.get("MILVUS_DB_NAME", "")

self._milvus_token = ConfigurationManager().get_value(
"third_party", "MILVUS_TOKEN"
)

if not self._milvus_token:
self._milvus_token = os.environ.get("MILVUS_TOKEN", "")

self._client = get_milvus_client(
milvus_uri=self._milvus_uri,
milvus_user=self._milvus_user,
milvus_password=self._milvus_password,
milvus_db_name=self._milvus_db_name,
milvus_token=self._milvus_token,
)
self._collection_name = index_name

def create(self, vector_dim: int):
if self._collection_name in self._client.list_collections():
self._client.drop_collection(self._collection_name)
self._client.create_collection(
collection_name=self._collection_name,
dimension=vector_dim,
metric_type="COSINE",
)

def add(self, payload: List[FeaturePayload]):
milvus_data = [
{
"id": feature_payload.id,
"vector": feature_payload.embedding.reshape(-1).tolist(),
}
for feature_payload in payload
]
ids = [feature_payload.id for feature_payload in payload]

# Milvus Client does not have upsert operation, perform delete + insert to emulate it
self._client.delete(collection_name=self._collection_name, pks=ids)

self._client.insert(collection_name=self._collection_name, data=milvus_data)
xzdandy marked this conversation as resolved.
Show resolved Hide resolved

def persist(self):
self._client.flush(self._collection_name)
xzdandy marked this conversation as resolved.
Show resolved Hide resolved

def delete(self) -> None:
self._client.drop_collection(
collection_name=self._collection_name,
)

def query(self, query: VectorIndexQuery) -> VectorIndexQueryResult:
response = self._client.search(
collection_name=self._collection_name,
data=[query.embedding.reshape(-1).tolist()],
limit=query.top_k,
)[0]

distances, ids = [], []
for result in response:
distances.append(result["distance"])
ids.append(result["id"])

return VectorIndexQueryResult(distances, ids)
6 changes: 6 additions & 0 deletions evadb/third_party/vector_stores/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from evadb.catalog.catalog_type import VectorStoreType
from evadb.third_party.vector_stores.chromadb import ChromaDBVectorStore
from evadb.third_party.vector_stores.faiss import FaissVectorStore
from evadb.third_party.vector_stores.milvus import MilvusVectorStore
from evadb.third_party.vector_stores.pinecone import PineconeVectorStore
from evadb.third_party.vector_stores.qdrant import QdrantVectorStore
from evadb.utils.generic_utils import validate_kwargs
Expand Down Expand Up @@ -49,5 +50,10 @@ def init_vector_store(
validate_kwargs(kwargs, required_params, required_params)
return ChromaDBVectorStore(index_name, **kwargs)

elif vector_store_type == VectorStoreType.MILVUS:
from evadb.third_party.vector_stores.milvus import required_params

validate_kwargs(kwargs, required_params, required_params)
return MilvusVectorStore(index_name, **kwargs)
else:
raise Exception(f"Vector store {vector_store_type} not supported")
18 changes: 18 additions & 0 deletions evadb/utils/generic_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -562,6 +562,16 @@ def try_to_import_chromadb_client():
)


def try_to_import_milvus_client():
try:
import pymilvus # noqa: F401
except ImportError:
raise ValueError(
"""Could not import pymilvus python package.
Please install it with 'pip install pymilvus`."""
)


def is_qdrant_available() -> bool:
try:
try_to_import_qdrant_client()
Expand All @@ -586,6 +596,14 @@ def is_chromadb_available() -> bool:
return False


def is_milvus_available() -> bool:
try:
try_to_import_milvus_client()
return True
except ValueError:
return False


##############################
## UTILS
##############################
Expand Down
8 changes: 8 additions & 0 deletions script/formatting/spelling.txt
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,11 @@ MaxPool
MemeImages
MemoTest
MetaData
MILVUS
Milvus
Milvus's
MilvusClient
MilvusVectorStore
MindsDB
MiniLM
MnistImageClassifier
Expand Down Expand Up @@ -1402,6 +1407,7 @@ memeimages
metaclass
metafile
metainfo
milvus
mindsdb
miniconda
mins
Expand Down Expand Up @@ -1526,6 +1532,7 @@ ptype
pushdown
px
py
pymilvus
pymupdfs
pypi
pypirc
Expand Down Expand Up @@ -1655,6 +1662,7 @@ testCreateIndexName
testDeleteOne
testFaissIndexImageDataset
testFaissIndexScanRewrite
testMilvusIndexImageDataset
testIndex
testIndexAutoUpdate
testOpenTable
Expand Down
7 changes: 5 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ def read(path, encoding="utf-8"):

chromadb_libs = ["chromadb"]

milvus_libs = ["pymilvus>=2.3.0"]
Chitti-Ankith marked this conversation as resolved.
Show resolved Hide resolved

postgres_libs = [
"psycopg2",
]
Expand All @@ -121,8 +123,8 @@ def read(path, encoding="utf-8"):
sklearn_libs = ["scikit-learn"]

forecasting_libs = [
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast" # MODEL TRAIN AND FINE TUNING
"statsforecast", # MODEL TRAIN AND FINE TUNING
"neuralforecast", # MODEL TRAIN AND FINE TUNING
]

imagegen_libs = [
Expand Down Expand Up @@ -166,6 +168,7 @@ def read(path, encoding="utf-8"):
"qdrant": qdrant_libs,
"pinecone": pinecone_libs,
"chromadb": chromadb_libs,
"milvus": milvus_libs,
"postgres": postgres_libs,
"ludwig": ludwig_libs,
"sklearn": sklearn_libs,
Expand Down
Loading