Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add Sparse Float Vector support #29419

Open
1 task done
zhengbuqian opened this issue Dec 22, 2023 · 5 comments
Open
1 task done

[Feature]: Add Sparse Float Vector support #29419

zhengbuqian opened this issue Dec 22, 2023 · 5 comments
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@zhengbuqian
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Now milvus supports only dense vectors and lack the ability to store/index/search sparse vectors(vectors with up to million dimensions while only a handful of them are non zero). We wish to add sparse float vector support to Milvus so users can insert, index and search them with ease.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@zhengbuqian zhengbuqian added the kind/feature Issues related to feature request from users label Dec 22, 2023
@zhengbuqian
Copy link
Collaborator Author

zhengbuqian commented Dec 22, 2023

This comment is to track the implementation progress of sparse vector support in Milvus.

Pending:

  • Do we need sparse vector support in row based message?
  • internal/core/src/storage/Util.cpp and parquet_c.h how is this used? How do we handle sparse?
  • Mmap support for sparse field data

@zhengbuqian
Copy link
Collaborator Author

Knowhere tracking issue: zilliztech/knowhere#193

@xiaofan-luan
Copy link
Contributor

Exciting about the new feature!

sre-ci-robot pushed a commit that referenced this issue Feb 2, 2024
…30400)

issue: #29419

this PR solely adds proto definition. sparse float vector support will
be in subsequent PRs.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Mar 11, 2024
This commit adds sparse float vector support to segcore with the
following:

1. data type enum declarations
2. Adds corresponding data structures for handling sparse float vectors
in various scenarios, including:
* FieldData as a bridge between the binlog and the in memory data
structures
* mmap::Column as the in memory representation of a sparse float vector
column of a sealed segment;
* ConcurrentVector as the in memory representation of a sparse float
vector of a growing segment which supports inserts.
3. Adds logic in payload reader/writer to serialize/deserialize from/to
binlog
4. Adds the ability to allow the index node to build sparse float vector
index
5. Adds the ability to allow the query node to build growing index for
growing segment and temp index for sealed segment without index built

This commit also includes some code cleanness, comment improvement, and
some unit tests for sparse vector.

#29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
xiaofan-luan pushed a commit that referenced this issue Mar 12, 2024
…nd get raw vector by id (#30629)

This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

#29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
xiaofan-luan pushed a commit that referenced this issue Mar 13, 2024
…nents (#30630)

add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.

#29419

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@zhengbuqian
Copy link
Collaborator Author

basic sparse support has been added to master branch with the merge of #30357, #30629, #30630 and pymilvus #1920.

congqixia added a commit to congqixia/milvus-sdk-go that referenced this issue Mar 20, 2024
See also milvus-io/milvus#29419

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit to milvus-io/milvus-sdk-go that referenced this issue Mar 22, 2024
See also milvus-io/milvus#29419

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@zhengbuqian
Copy link
Collaborator Author

zhengbuqian commented Mar 28, 2024

For SDK owners:

We also need to support sparse float vector in C#/NodeJs/Java/Go SDK.

The accepted sparse input format:

  • if your language community already has a widely used representation of sparse float vector, we should accept that. For example PyMilvus supports the scipy.sparse representations as input.
  • if not, support a map representation like {30: 0.34, 78: 0.11, 22: 0.66}.

When sending requests to milvus(both insert and search), use one proto bytes to represent a single sparse vector, and encode it as densely packed bytes: idx, val, idx, val, .... Indices in the packed bytes should be in uint32 range and ordered in ascending order(the user input can be unordered though). No duplicate indices allowed.

Note that support for those SDKs is not a must-have for the formal milvus 2.4 release. We'll be adding more features for sparse and announcing GA in the next major release(2.5 or 2.6). I'll keep updating the issues as necessary.

Thanks a lot for the efforts!

sre-ci-robot pushed a commit that referenced this issue Apr 29, 2024
…e search (#32635)

issue: #29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue May 23, 2024
issue: #29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue May 26, 2024
issue: #29419
also re-enabled an e2e test using restful api, which is previously
disabled due to #32214.

In restful api, the accepted json formats of sparse float vector are:

* `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}`
* {"1": 0.1, "100": 0.2, "1000": 0.3}

for accepted indice and value range, see
https://milvus.io/docs/sparse_vector.md#FAQ

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit to milvus-io/pymilvus that referenced this issue May 27, 2024
issue: milvus-io/milvus#29419 

as range search supported has been added to sparse index

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Jun 3, 2024
issue: #29419
pr: #33231

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Jun 6, 2024
issue: #29419
pr: #33209 

codecov will fail due to newly added ut in test_sealed.cpp skipped due
to #33210

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Jun 6, 2024
issue: #29419
pr: #33656

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Co-authored-by: Buqian Zheng <zhengbuqian@gmail.com>
czs007 pushed a commit that referenced this issue Jun 7, 2024
issue: #29419
pr: #33713

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
buqian-zilliz pushed a commit to zhengbuqian/milvus that referenced this issue Jun 11, 2024
issue: milvus-io#29419
pr: milvus-io#33713

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Jun 12, 2024
issue: #29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Jun 12, 2024
issue: #29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
buqian-zilliz pushed a commit to zhengbuqian/milvus that referenced this issue Jun 25, 2024
issue: milvus-io#29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
buqian-zilliz pushed a commit to zhengbuqian/milvus that referenced this issue Jun 27, 2024
issue: milvus-io#29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
issue: milvus-io#29419
also re-enabled an e2e test using restful api, which is previously
disabled due to milvus-io#32214.

In restful api, the accepted json formats of sparse float vector are:

* `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}`
* {"1": 0.1, "100": 0.2, "1000": 0.3}

for accepted indice and value range, see
https://milvus.io/docs/sparse_vector.md#FAQ

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
issue: milvus-io#29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
issue: milvus-io#29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Jul 2, 2024
issue: #29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

2 participants