Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fio 测试导致chunkserver offline #165

Closed
brucen1030 opened this issue Nov 18, 2020 · 1 comment
Closed

fio 测试导致chunkserver offline #165

brucen1030 opened this issue Nov 18, 2020 · 1 comment
Labels
question Further information is requested

Comments

@brucen1030
Copy link

brucen1030 commented Nov 18, 2020

版本

https://github.com/opencurve/curve/releases/tag/v1.0.0

步骤

fio测试之前curve_ops_tool status查看chunk server,md,etcd没有offline
fio -direct=1 -iodepth=64 -thread -rw=randwrite -bs=4k -numjobs=4 -runtime=30 -group_reporting -name=test-curve -filename=/dev/nbd0 -ioengine=libaio -io_limit=400000G
数据盘上有少量io。
之后,curve_ops_tool status查看chunk server offline

cluster is not healthy
total copysets: 300, unhealthy copysets: 110, unhealthy_ratio: 36.6667%
...
chunkserver: total num = 36, online = 32, offline = 4(recoveringout = 0, chunkserverlist: [])
left size: min = 687GB, max = 688GB, average = 687.29GB, range = 1GB, variance = 0.21

查看offline的chunkserver的log,类似

I 2020-11-18T02:09:45-0500 49594 chunkfile_pool.cpp:306] get chunk success! now pool size = 44017
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/30235
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/30235
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/30235, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/30235
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/35831
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/35831
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/35831, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/35831
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/42129
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/42129
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/42129, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/42129
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/21533
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/21533
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/21533, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/21533
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/4215
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/4215
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/4215, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/4215
E 2020-11-18T02:09:45-0500 49589 chunkserver_chunkfile.cpp:195] Error occured when create file. filepath = /data/chunkserver0/copysets/4294967448/data/chunk_57670
W 2020-11-18T02:09:45-0500 49589 chunkserver_datastore.cpp:197] Create chunk file failed.ChunkID = 57670, ErrorCode = 1
F 2020-11-18T02:09:45-0500 49589 op_request.cpp:479] write failed:  logic pool id: 1 copyset id: 152 chunkid: 57670 data size: 4096 data store return: 1

对应chunkserver手动无法拉起。尝试重启集群

ansible-playbook -i server.ini stop_curve.yml 
ansible-playbook -i server.ini start_curve.yml

之后有的chunkserver启动了,而另一些chunkserver offline。log与上面类似。

I 2020-11-18T02:36:46-0500 103320 chunkfile_pool.cpp:306] get chunk success! now pool size = 44013
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/13316
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/13316
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/13316, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/13316
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/20387
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/20387
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/20387, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/20387
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/25475
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/25475
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/25475, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/25475
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/34096
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/34096
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/34096, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/34096
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/734
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/734
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/734, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/734
E 2020-11-18T02:36:46-0500 103315 chunkserver_chunkfile.cpp:195] Error occured when create file. filepath = /data/chunkserver4/copysets/4294967520/data/chunk_91042
W 2020-11-18T02:36:46-0500 103315 chunkserver_datastore.cpp:197] Create chunk file failed.ChunkID = 91042, ErrorCode = 1
F 2020-11-18T02:36:46-0500 103315 op_request.cpp:532] write failed:  logic pool id: 1 copyset id: 224 chunkid: 91042 data size: 4096 data store return: 1

两次做fio测试都有类似问题,无法测试性能。如果有其它测试方法希望分享一下。
另外请问清理集群是 ansible-playbook -i server.ini clean_curve.yml 么?实际有时候运行完再部署还是不行,不知道是什么文件没删掉。
ansible配置文件见附件。config.zip

基本是抄的 https://github.com/opencurve/curve/blob/master/docs/cn/deploy.md

@brucen1030 brucen1030 added the question Further information is requested label Nov 18, 2020
@brucen1030
Copy link
Author

改/etc/sysctl.conf 和 /etc/sysctl.conf 可以解决。

ilixiaocui pushed a commit to ilixiaocui/curve that referenced this issue Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant