Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Five out of seven flush requests 120s timeout #33954

Open
1 task done
ThreadDao opened this issue Jun 18, 2024 · 2 comments
Open
1 task done

[Bug]: Five out of seven flush requests 120s timeout #33954

ThreadDao opened this issue Jun 18, 2024 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4-20240618-79546a6c-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

create milvus with config:

  config:
    dataCoord:
      segment:
        sealProportion: 1.52e-05
    log:
      level: debug
    quotaAndLimits:
      flushRate:
        enabled: true
        max: 0.1 
    trace:
      exporter: jaeger
      jaeger:
        url: http://tempo-distributor.tempo:14268/api/traces

test steps

  1. create collection with 1024 partitions (partition-key), 1 shard
  2. create index
  3. insert 10m-128d data -> flush
  4. index -> load
  5. concurrent requests: search + upsert + flush
    image
  6. there are 5 of the 7 flush 120s timeout
[2024-06-18 10:44:06,589 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-18 10:44:06,589 -  INFO - fouram]: grpc     flush                                                                              7    5(71.43%) | 361353  249863  503063 367000 |    0.00        0.00 (stats.py:789)
[2024-06-18 10:44:06,590 -  INFO - fouram]: grpc     search                                                                           301    10(3.32%) |  63813   15424  120006  60000 |    0.17        0.01 (stats.py:789)
[2024-06-18 10:44:06,590 -  INFO - fouram]: grpc     upsert                                                                           160     0(0.00%) | 180105     588  301469 175000 |    0.09        0.00 (stats.py:789)
[2024-06-18 10:44:06,590 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-18 10:44:06,590 -  INFO - fouram]:          Aggregated                                                                       468    15(3.21%) | 108021     588  503063  81000 |    0.27        0.01 (stats.py:789)
[2024-06-18 10:44:06,590 -  INFO - fouram]:  (stats.py:790)

Expected Behavior

No response

Steps To Reproduce

https://argo-workflows.zilliz.cc/archived-workflows/qa/88b56c6a-eb3d-4862-95a6-b0c64434efde?nodeId=compact-opt-1024-with-flush-2

Milvus Log

pods:

compact-opt-flush2-milvus-datanode-5898b9d778-sshqx               1/1     Running     0                82m     10.104.5.70     4am-node12   <none>           <none>
compact-opt-flush2-milvus-indexnode-8c577d9d6-9tnms               1/1     Running     0                82m     10.104.17.163   4am-node23   <none>           <none>
compact-opt-flush2-milvus-indexnode-8c577d9d6-9wl8n               1/1     Running     0                82m     10.104.6.58     4am-node13   <none>           <none>
compact-opt-flush2-milvus-indexnode-8c577d9d6-qq9c4               1/1     Running     0                82m     10.104.20.226   4am-node22   <none>           <none>
compact-opt-flush2-milvus-mixcoord-5b9f79b984-zwfn2               1/1     Running     0                82m     10.104.4.88     4am-node11   <none>           <none>
compact-opt-flush2-milvus-proxy-b55c6db47-vnzc2                   1/1     Running     0                82m     10.104.13.204   4am-node16   <none>           <none>
compact-opt-flush2-milvus-querynode-0-786c99d5cc-k4bcz            1/1     Running     0                82m     10.104.18.196   4am-node25   <none>           <none>
compact-opt-flush2-milvus-querynode-0-786c99d5cc-q5znj            1/1     Running     0                82m     10.104.13.205   4am-node16   <none>           <none>

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 18, 2024
@ThreadDao ThreadDao added this to the 2.4.5 milestone Jun 18, 2024
@ThreadDao ThreadDao added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 18, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 19, 2024
@yanliang567
Copy link
Contributor

/unassign

@bigsheeper
Copy link
Contributor

The log got lost; Please let me know if this issue reproduce again, thx@ThreadDao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants