Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: add sparse float vector support to restful v2 #33231

Merged
merged 1 commit into from
May 26, 2024

Conversation

zhengbuqian
Copy link
Collaborator

@zhengbuqian zhengbuqian commented May 21, 2024

issue: #29419
also re-enabled an e2e test using restful api, which is previously disabled due to #32214.

In restful api, the accepted json formats of sparse float vector are:

  • {"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}
  • {"1": 0.1, "100": 0.2, "1000": 0.3}

for accepted indice and value range, see https://milvus.io/docs/sparse_vector.md#FAQ

@sre-ci-robot sre-ci-robot added size/L Denotes a PR that changes 100-499 lines. area/test sig/testing labels May 21, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels May 21, 2024
Copy link
Contributor

mergify bot commented May 21, 2024

@zhengbuqian ut workflow job failed, comment rerun ut can trigger the job again.

Copy link
Contributor

mergify bot commented May 21, 2024

@zhengbuqian E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link

codecov bot commented May 21, 2024

Codecov Report

Attention: Patch coverage is 86.17886% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 82.03%. Comparing base (32d3e22) to head (1799be6).
Report is 23 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #33231      +/-   ##
==========================================
- Coverage   82.07%   82.03%   -0.04%     
==========================================
  Files        1003     1012       +9     
  Lines      128993   128936      -57     
==========================================
- Hits       105869   105773      -96     
- Misses      19138    19171      +33     
- Partials     3986     3992       +6     
Files Coverage Δ
...nternal/distributed/proxy/httpserver/handler_v2.go 89.16% <100.00%> (+0.01%) ⬆️
...ernal/distributed/proxy/httpserver/wrap_request.go 96.55% <92.59%> (-0.53%) ⬇️
internal/distributed/proxy/httpserver/utils.go 87.76% <90.69%> (+0.14%) ⬆️
pkg/util/typeutil/schema.go 85.94% <78.00%> (-0.27%) ⬇️

... and 215 files with indirect coverage changes

@buqian-zilliz buqian-zilliz force-pushed the sparse-restful branch 2 times, most recently from adae63c to f7a11e6 Compare May 22, 2024 10:14
@mergify mergify bot added the ci-passed label May 22, 2024
@@ -654,6 +667,7 @@ func anyToColumns(rows []map[string]interface{}, sch *schemapb.CollectionSchema)
}

dynamicCol := make([][]byte, 0, rowsLen)
sparseDim := int64(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a collection can only has one sparse vector?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated

@@ -980,6 +1026,9 @@ func convertVectors2Placeholder(body string, dataType schemapb.DataType, dimensi
case schemapb.DataType_BFloat16Vector:
valueType = commonpb.PlaceholderType_BFloat16Vector
values, err = serializeByteVectors(gjson.Get(body, HTTPRequestData).Raw, dataType, dimension, dimension*2)
case schemapb.DataType_SparseFloatVector:
valueType = commonpb.PlaceholderType_SparseFloatVector
values, err = serializeSparseFloatVectors(gjson.Get(body, HTTPRequestData).Array(), dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether #20415 will happen again when sparse vector contains int64?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the parsing function to reject out of range numbers

@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. and removed size/L Denotes a PR that changes 100-499 lines. labels May 23, 2024
@mergify mergify bot removed the ci-passed label May 23, 2024
@zhengbuqian zhengbuqian added the PR | need cherry-pick need cherry pick to other branches label May 23, 2024
Copy link
Contributor

mergify bot commented May 23, 2024

@zhengbuqian ut workflow job failed, comment rerun ut can trigger the job again.

…using restful api

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
if err != nil {
return nil, err
}
val, err := getValue(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is format2 forgotten? the idx(string) can only be parse to uint 32.
  2. what abort format1? the idx( "indices": [?, ?, ?] ), please check the input and the result after decode
func CreateSparseFloatRowFromJSON(input []byte) ([]byte, error) {
	var vec map[string]interface{}
	decoder := json.NewDecoder(bytes.NewReader(input))
	decoder.DisallowUnknownFields()
	err := decoder.Decode(&vec)
	if err != nil {
		return nil, err
	}
	return CreateSparseFloatRowFromMap(vec)
}

Copy link
Contributor

@PowderLi PowderLi May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the doc described

The vector dimensions must be of Python int or numpy.integer type, and the values must be of Python float or numpy.floating type.

valid cases

  • "sparseVectorFieldName": {"indices": [9223372036854775807], "values": [0.1]}}
  • "sparseVectorFieldName": {"9223372036854775807": 0.1]}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline and summarizing here:

the referenced doc is only to describe that we support both native Python int and numpy integer type. the accepted value range is described in https://milvus.io/docs/sparse_vector.md#FAQ. so the above 2 cases are actually invalid as the index is out of range.

@mergify mergify bot added the ci-passed label May 24, 2024
@PowderLi
Copy link
Contributor

/lgtm

@PowderLi PowderLi removed their assignment May 24, 2024
@czs007
Copy link
Contributor

czs007 commented May 26, 2024

/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 1b67cec into milvus-io:master May 26, 2024
15 checks passed
@zhengbuqian zhengbuqian deleted the sparse-restful branch May 27, 2024 02:20
sre-ci-robot pushed a commit that referenced this pull request Jun 3, 2024
issue: #29419
pr: #33231

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@czs007 czs007 removed the PR | need cherry-pick need cherry pick to other branches label Jun 25, 2024
yellow-shine pushed a commit to yellow-shine/milvus that referenced this pull request Jul 2, 2024
issue: milvus-io#29419
also re-enabled an e2e test using restful api, which is previously
disabled due to milvus-io#32214.

In restful api, the accepted json formats of sparse float vector are:

* `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}`
* {"1": 0.1, "100": 0.2, "1000": 0.3}

for accepted indice and value range, see
https://milvus.io/docs/sparse_vector.md#FAQ

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/test ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm sig/testing size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants