-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: add sparse float vector support to restful v2 #33231
Conversation
@zhengbuqian ut workflow job failed, comment |
@zhengbuqian E2e jenkins job failed, comment |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #33231 +/- ##
==========================================
- Coverage 82.07% 82.03% -0.04%
==========================================
Files 1003 1012 +9
Lines 128993 128936 -57
==========================================
- Hits 105869 105773 -96
- Misses 19138 19171 +33
- Partials 3986 3992 +6
|
adae63c
to
f7a11e6
Compare
@@ -654,6 +667,7 @@ func anyToColumns(rows []map[string]interface{}, sch *schemapb.CollectionSchema) | |||
} | |||
|
|||
dynamicCol := make([][]byte, 0, rowsLen) | |||
sparseDim := int64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a collection can only has one sparse vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, updated
@@ -980,6 +1026,9 @@ func convertVectors2Placeholder(body string, dataType schemapb.DataType, dimensi | |||
case schemapb.DataType_BFloat16Vector: | |||
valueType = commonpb.PlaceholderType_BFloat16Vector | |||
values, err = serializeByteVectors(gjson.Get(body, HTTPRequestData).Raw, dataType, dimension, dimension*2) | |||
case schemapb.DataType_SparseFloatVector: | |||
valueType = commonpb.PlaceholderType_SparseFloatVector | |||
values, err = serializeSparseFloatVectors(gjson.Get(body, HTTPRequestData).Array(), dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether #20415 will happen again when sparse vector contains int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the parsing function to reject out of range numbers
f7a11e6
to
ab36f79
Compare
@zhengbuqian ut workflow job failed, comment |
…using restful api Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
ab36f79
to
1799be6
Compare
if err != nil { | ||
return nil, err | ||
} | ||
val, err := getValue(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is format2 forgotten? the idx(string) can only be parse to uint 32.
- what abort format1? the idx( "indices": [?, ?, ?] ), please check the input and the result after decode
func CreateSparseFloatRowFromJSON(input []byte) ([]byte, error) {
var vec map[string]interface{}
decoder := json.NewDecoder(bytes.NewReader(input))
decoder.DisallowUnknownFields()
err := decoder.Decode(&vec)
if err != nil {
return nil, err
}
return CreateSparseFloatRowFromMap(vec)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the doc described
The vector dimensions must be of Python int or numpy.integer type, and the values must be of Python float or numpy.floating type.
valid cases
- "sparseVectorFieldName": {"indices": [9223372036854775807], "values": [0.1]}}
- "sparseVectorFieldName": {"9223372036854775807": 0.1]}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline and summarizing here:
the referenced doc is only to describe that we support both native Python int and numpy integer type. the accepted value range is described in https://milvus.io/docs/sparse_vector.md#FAQ. so the above 2 cases are actually invalid as the index is out of range.
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: czs007, zhengbuqian The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
issue: milvus-io#29419 also re-enabled an e2e test using restful api, which is previously disabled due to milvus-io#32214. In restful api, the accepted json formats of sparse float vector are: * `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}` * {"1": 0.1, "100": 0.2, "1000": 0.3} for accepted indice and value range, see https://milvus.io/docs/sparse_vector.md#FAQ Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #29419
also re-enabled an e2e test using restful api, which is previously disabled due to #32214.
In restful api, the accepted json formats of sparse float vector are:
{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}
for accepted indice and value range, see https://milvus.io/docs/sparse_vector.md#FAQ