Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: error="IO error: While open a file for random read: /data/milvus/rdb_data/030164.sst: Too many open files #25798

Closed
1 task done
wangyongfei5558 opened this issue Jul 20, 2023 · 5 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@wangyongfei5558
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.1.4
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 4vCPU / 16GB
- GPU: 
- Others:

Current Behavior

Milvus is not working after running for around 2 days, the errors are

[2023/07/20 14:06:33.401 +08:00] [ERROR] [rootcoord/dml_channels.go:235] ["Broadcast failed"] [error="IO error: While open a file for random read: /data/milvus/rdb_data/030164.sst: Too many open files"] [chanName=by-dev-rootcoord-dml_5] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*dmlChannels).broadcast\n\t/root/milvusrpm-2.1/milvus/internal/rootcoord/dml_channels.go:235\ngithub.com/milvus-io/milvus/internal/rootcoord.(*timetickSync).sendTimeTickToChannel\n\t/root/milvusrpm-2.1/milvus/internal/rootcoord/timeticksync.go:351\ngithub.com/milvus-io/milvus/internal/rootcoord.(*timetickSync).startWatch.func1\n\t/root/milvusrpm-2.1/milvus/internal/rootcoord/timeticksync.go:312"]
[2023/07/20 14:06:33.401 +08:00] [WARN] [rootcoord/timeticksync.go:313] ["SendTimeTickToChannel fail"] [error="IO error: While open a file for random read: /data/milvus/rdb_data/030164.sst: Too many open files"]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

2023/07/20 14:06:33.401 +08:00] [ERROR] [rootcoord/dml_channels.go:235] ["Broadcast failed"] [error="IO error: While open a file for random read: /data/milvus/rdb_data/030164.sst: Too many open files"] [chanName=by-dev-rootcoord-dml_5] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*dmlChannels).broadcast\n\t/root/milvusrpm-2.1/milvus/internal/rootcoord/dml_channels.go:235\ngithub.com/milvus-io/milvus/internal/rootcoord.(*timetickSync).sendTimeTickToChannel\n\t/root/milvusrpm-2.1/milvus/internal/rootcoord/timeticksync.go:351\ngithub.com/milvus-io/milvus/internal/rootcoord.(*timetickSync).startWatch.func1\n\t/root/milvusrpm-2.1/milvus/internal/rootcoord/timeticksync.go:312"]
[2023/07/20 14:06:33.401 +08:00] [WARN] [rootcoord/timeticksync.go:313] ["SendTimeTickToChannel fail"] [error="IO error: While open a file for random read: /data/milvus/rdb_data/030164.sst: Too many open files"]

Anything else?

when this error happens, Milvus is just not responding

@wangyongfei5558 wangyongfei5558 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2023
@yanliang567
Copy link
Contributor

@wangyongfei5558 2.1.4 is quite old, could you please retry on latest v2.2.11, which we have fixed a lot of issues.
/assign @wangyongfei5558
/unassign

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2023
@xiaofan-luan
Copy link
Contributor

you can change the ulimit size of your system.

But as yanliang said, once the system is recovered you will need to upgrade

@stale
Copy link

stale bot commented Aug 21, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Aug 21, 2023
@stale stale bot closed this as completed Sep 7, 2023
@swapnil-potnis
Copy link

We are still facing this issue even with v2.1.4

@xiaofan-luan
Copy link
Contributor

  1. please upgrade to 2.3.20 or above asap, 2.1.x is a way old version to use in production
  2. change the os ulimit parameter might help on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants