Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Query iterator failed to get entire result set #29406

Closed
1 task done
MrPresent-Han opened this issue Dec 21, 2023 · 3 comments
Closed
1 task done

[Bug]: Query iterator failed to get entire result set #29406

MrPresent-Han opened this issue Dec 21, 2023 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@MrPresent-Han
Copy link
Contributor

MrPresent-Han commented Dec 21, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.3.3
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Using query iterator and try to retrieve all matched results, but failed with only a small part of

Expected Behavior

No response

Steps To Reproduce

import pymilvus
from pymilvus import connections, Collection

print(pymilvus.__version__)

collection_name = 'prod_embed'

# Connect to the Milvus server
conn = connections.connect(uri=mappings[collection_name]['uri'], token=mappings[collection_name]['token'])

# Load the collection
zilliz_collection = Collection(collection_name)
zilliz_collection.load()

results = zilliz_collection.query(expr='unique_id != ""',
                                  output_fields=['unique_id', 'timestamp'])

# outputs 955037
print(len(results))


# Iterate through results
query_iterator = zilliz_collection.query_iterator(
        batch_size=10000,
        expr=f"unique_id != '' ",
        output_fields=['unique_id', 'timestamp'])

count = 0
# Iterate through results
while True:
    res = query_iterator.next()
    if len(res) == 0:
        print(f"Query iteration finished, close: {count}")
        query_iterator.close()
        break

    count += len(res)
    print(count)

# finally outputs only 132343
print(count)

Milvus Log

No response

Anything else?

No response

@MrPresent-Han MrPresent-Han added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 21, 2023
MrPresent-Han added a commit to MrPresent-Han/milvus that referenced this issue Dec 21, 2023
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
@yanliang567
Copy link
Contributor

/assign @MrPresent-Han

/assign @NicoYuan1986
@NicoYuan1986 please help to check whether we need a new test for this scenario?

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 22, 2023
@yanliang567 yanliang567 added this to the 2.3.4 milestone Dec 22, 2023
@yanliang567 yanliang567 removed their assignment Dec 22, 2023
MrPresent-Han added a commit to MrPresent-Han/milvus that referenced this issue Dec 25, 2023
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
MrPresent-Han added a commit to MrPresent-Han/milvus that referenced this issue Dec 25, 2023
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
MrPresent-Han added a commit to MrPresent-Han/milvus that referenced this issue Dec 25, 2023
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Dec 25, 2023
related: #29406

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
@NicoYuan1986
Copy link
Contributor

strange.
milvus: master-20231226-033456ea
pymilvus: 2.4.0rc9

>>> expression = "pk != -1"
>>> query_iterator = collection.query_iterator(expr=expression, batch_size=1000)
>>> 
>>> page_idx = 0
>>> while True:
...     res = query_iterator.next()  # next翻页
...     if len(res) == 0:
...         print("query iteration finished, close")
...         query_iterator.close()  # 没有新的结果,翻页结束,close掉iterator
...         break
...     page_idx += 1
...     print(f"page{page_idx}-------------------------{len(res)}")
... 
page1-------------------------500
page2-------------------------500
page3-------------------------334
page4-------------------------334
page5-------------------------334
page6-------------------------334
page7-------------------------334
page8-------------------------334
page9-------------------------334
page10-------------------------334
page11-------------------------334
page12-------------------------334
page13-------------------------334
page14-------------------------348
page15-------------------------978
query iteration finished, close

sre-ci-robot pushed a commit that referenced this issue Dec 27, 2023
related: #29406
pr: #29451

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
@yanliang567
Copy link
Contributor

Verified not repro on 2.3-20231227-eb11b1a5

  1. insert 1m data with pk(1-1m) and wait a few hours to make sure all segments compacted done
  2. insert 1m data with the same PK(1-1m), wait a few hours to make sure all compaction done
  3. check num_enetities = 2m
  4. check query(count(*))=2m
  5. query iterator returns 200 pages with 5000 entities each page, which means the total entities(after dedup) was retrieved successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants