Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [string_match] The query results for all the string match are empty when the string field is primary key with INVERTED index created after flush #30728

Closed
1 task done
binbinlv opened this issue Feb 21, 2024 · 6 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@binbinlv
Copy link
Contributor

binbinlv commented Feb 21, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240218-99297ab8
- Deployment mode(standalone or cluster): both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc35
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The query results for all the string match are empty when the string field is primary key with INVERTED index created after flush

>>> res = collection.query("string like '%0'")
>>> len(res)
200
>>> res = collection.query("string like '%0%'")
>>> len(res)
452
>>> collection.flush()
>>>
>>>
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> res
[{'string': '0'}]
>>>
>>>
>>> collection.flush()
>>>
>>> res = collection.query("string like '0%'")
>>> len(res)
0
>>>
>>>
>>> res = collection.query("string like '%0'")
>>> len(res)
0
>>> res = collection.query("string like '%0%'")
>>> len(res)
0
>>> res
[]

Expected Behavior

query successfully with correct ids

Steps To Reproduce

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
connections.connect()

dim = 128
string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=65535, is_primary=True)
int64_field = FieldSchema(name="int64", dtype=DataType.INT64)
float_field = FieldSchema(name="float", dtype=DataType.FLOAT)
bool_field = FieldSchema(name="bool", dtype=DataType.BOOL)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
schema = CollectionSchema(fields=[string_field, float_field, bool_field, int64_field, float_vector])

import numpy as np
collection = Collection("test_search_collection_binbin_tmp_0", schema=schema)
nb = 2000
import random
default_search_params = {"metric_type": "L2", "params": {"nprobe": 100}}
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
res = collection.insert([[str(i) for i in range(nb)], [np.float32(i) for i in range(nb)], [np.bool_(i) for i in range(nb)], [i for i in range(nb)], vectors])
index_param = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 100}}
collection.create_index("float_vector", index_param, index_name="a")
index_param = {"index_type": "INVERTED"}
collection.create_index("string", index_param, index_name="b")
collection.load()
res = collection.query("string like '0%'")
len(res)
res = collection.query("string like '%0'")
len(res)
res = collection.query("string like '%0%'")
len(res)
collection.flush()
res = collection.query("string like '0%'")
len(res)
collection.flush()
res = collection.query("string like '0%'")
len(res)
res = collection.query("string like '%0'")
len(res)
res = collection.query("string like '%0%'")
len(res)

Milvus Log

No response

Anything else?

No response

@binbinlv binbinlv added kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 21, 2024
@binbinlv binbinlv added this to the 2.4.0 milestone Feb 21, 2024
@longjiquan
Copy link
Contributor

same with #30687, and already fixed by #30764
/assign @binbinlv

@binbinlv
Copy link
Contributor Author

binbinlv commented Feb 23, 2024

It is not fixed, maybe not the same root cause.

milvus:master-20240222-16b4c9a7
pymilvus: 2.4.0rc35

results:

>>> dim = 128
>>> string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=65535, is_primary=True)
>>> int64_field = FieldSchema(name="int64", dtype=DataType.INT64)
>>> float_field = FieldSchema(name="float", dtype=DataType.FLOAT)
>>> bool_field = FieldSchema(name="bool", dtype=DataType.BOOL)
>>> float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
>>> schema = CollectionSchema(fields=[string_field, float_field, bool_field, int64_field, float_vector])
>>> import numpy as np
>>> collection = Collection("test_search_collection_30728", schema=schema)
>>> nb = 2000
>>> import random
>>> default_search_params = {"metric_type": "L2", "params": {"nprobe": 100}}
>>> vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
>>> res = collection.insert([[str(i) for i in range(nb)], [np.float32(i) for i in range(nb)], [np.bool_(i) for i in range(nb)], [i for i in range(nb)], vectors])
>>> index_param = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 100}}
>>> collection.create_index("float_vector", index_param, index_name="a")
Status(code=0, message=)
>>> index_param = {"index_type": "INVERTED"}
>>> collection.create_index("string", index_param, index_name="b")
Status(code=0, message=)
>>> collection.load()
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> res = collection.query("string like '%0'")
>>> len(res)
200
>>> res = collection.query("string like '%0%'")
>>> len(res)
452
>>> collection.flush()
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> collection.flush()
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> res = collection.query("string like '%0'")
>>> len(res)
0
>>> res = collection.query("string like '%0%'")
>>> len(res)
0

@binbinlv
Copy link
Contributor Author

/unassign

@binbinlv binbinlv changed the title [Bug]: The query results for all the string match are empty when the string field is primary key with INVERTED index created after flush [Bug]: [string_match] The query results for all the string match are empty when the string field is primary key with INVERTED index created after flush Feb 23, 2024
@longjiquan
Copy link
Contributor

should be fixed by #30848

@longjiquan
Copy link
Contributor

/assign @binbinlv

sre-ci-robot pushed a commit that referenced this issue Feb 28, 2024
#30728

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
@binbinlv
Copy link
Contributor Author

Verified and fixed:

milvus: master-20240228-095cdbed-amd64
pymilvus: 2.4.0rc37
results:

>>> len(res)
1
>>> res = collection.query("string like '%0'")
>>> len(res)
200
>>> res = collection.query("string like '%0%'")
>>> len(res)
452
>>> collection.flush()
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> collection.flush()
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> res = collection.query("string like '%0'")
>>> len(res)
200
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>>
>>> res = collection.query("string like '%0%'")
>>> len(res)
452
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>> collection.flush()
>>>
>>>
>>>
>>>
>>>
>>> res = collection.query("string like '0%'")
>>> len(res)
1
>>> res = collection.query("string like '%0'")
>>> len(res)
200
>>> res = collection.query("string like '%0%'")
>>> len(res)
452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants