Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug:1807738] [GlusterFS-3.12.15] The brick went down after the FOP. #883

Closed
gluster-ant opened this issue Mar 12, 2020 · 2 comments
Closed
Labels
Migrated Type:Bug wontfix Managed by stale[bot]

Comments

@gluster-ant
Copy link
Collaborator

URL: https://bugzilla.redhat.com/1807738
Creator: knjeong at growthsoft.co.kr
Time: 20200227T05:45:15

Description of problem:

Suddenly, a brick went down during the operation and collected an error, but unfortunately no core dump was generated.

< GlusterFS(3.12.15) Volume Info >
Volume Name: nas
Type: Distributed-Replicate
Volume ID: 5a8cc386-85dc-4255-b287-352a499a28d5
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: NAS01:/gluster1/nas
Brick2: NAS02:/gluster1/nas
Brick3: NAS03:/gluster1/nas
Brick4: NAS01:/gluster2/nas ===> down brick(10.10.10.121:49154)
Brick5: NAS02:/gluster2/nas
Brick6: NAS03:/gluster2/nas
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.self-heal-window-size: 16
cluster.background-self-heal-count: 64
cluster.shd-wait-qlength: 32768
performance.cache-size: 4GB
performance.write-behind-window-size: 50MB
disperse.background-heals: 16
disperse.heal-wait-qlength: 2048
disperse.self-heal-window-size: 16
disperse.shd-wait-qlength: 16384
features.cache-invalidation: off
features.cache-invalidation-timeout: 60
performance.stat-prefetch: on
performance.cache-invalidation: off
performance.md-cache-timeout: 1
network.inode-lru-limit: 200000
performance.nl-cache-limit: 10MB
performance.nl-cache-positive-entry: FALSE
performance.nl-cache: off
cluster.use-compound-fops: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 32
server.event-threads: 16
performance.low-prio-threads: 64
performance.normal-prio-threads: 64
performance.high-prio-threads: 64
performance.least-prio-threads: 64
performance.io-thread-count: 64
cluster.eager-lock: on

< Down Brick Log - 10.10.10.121:49154 >
[2020-02-25 18:39:47.375248] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor]
[2020-02-25 18:39:47.375658] E [MSGID: 115073] [server-rpc-fops.c:1833:server_fxattrop_cbk] 0-nas-server: 17031439: FXATTROP 2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor]
[2020-02-25 18:39:47.375701] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x103e10f, Program: GlusterFS 3.3, ProgVers: 330, Proc: 34) to rpc-transport (tcp.nas-server)
[2020-02-25 18:39:47.375750] E [server.c:195:server_submit_reply] (-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bfa6) [0x7f2b6f552fa6] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bba3) [0x7f2b6f552ba3] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x94a6) [0x7f2b6f5404a6] ) 0-: Reply submission failed
[2020-02-25 18:39:47.376339] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor]
[2020-02-25 18:39:47.376365] E [MSGID: 115090] [server-rpc-fops.c:2171:server_compound_cbk] 0-nas-server: 17031440: COMPOUND2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor]
pending frames:
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(33)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(52)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2020-02-25 18:39:47
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.15
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f2b842db4e0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2b842e5414]
/lib64/libc.so.6(+0x36280)[0x7f2b8293b280]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x242eb)[0x7f2b6f55b2eb]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x2674d)[0x7f2b6f55d74d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc8e9)[0x7f2b6f5438e9]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc98d)[0x7f2b6f54398d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd3a5)[0x7f2b6f5443a5]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ce)[0x7f2b6f5439ce]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd23e)[0x7f2b6f54423e]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd356)[0x7f2b6f544356]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ae)[0x7f2b6f5439ae]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd464)[0x7f2b6f544464]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x27195)[0x7f2b6f55e195]
/lib64/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f2b8409d246]
/lib64/libpthread.so.0(+0x7dd5)[0x7f2b8313add5]
/lib64/libc.so.6(clone+0x6d)[0x7f2b82a02ead]

< CLIENT01 gluster LOG >
[2020-02-25 14:18:28.617641] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0xff906d sent = 2020-02-25 13:48:27.301196. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:18:28.627224] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.961143] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x10001f7 sent = 2020-02-25 14:18:28.627356. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:48:28.961402] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.970718] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235752053: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 15:19:59.238515] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1007af2 sent = 2020-02-25 14:49:55.891752. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:19:59.238736] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.570378] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x100ef9f sent = 2020-02-25 15:19:59.238815. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:49:59.570438] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.615705] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235897960: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 16:29:29.884503] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1017eda sent = 2020-02-25 15:59:22.576432. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:29:29.884547] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.151714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x101f311 sent = 2020-02-25 16:29:29.884632. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:59:30.151759] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.163008] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236055074: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 17:39:20.481714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1028daa sent = 2020-02-25 17:09:17.597659. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 17:39:20.481784] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.783690] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x102fefb sent = 2020-02-25 17:39:20.481881. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 18:09:20.783739] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.864368] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236216079: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 18:39:47.372960] C [rpc-clnt.c:449:rpc_clnt_fill_request_info] 5-nas-client-3: cannot lookup the saved frame corresponding to xid (16748653)
[2020-02-25 18:39:47.373034] W [socket.c:1956:__socket_read_reply] 5-nas-client-3: notify for event MAP_XID failed for 10.10.10.121:49154
[2020-02-25 18:39:47.373060] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 5-nas-client-3: disconnected from nas-client-3. Client process will keep trying to connect to glusterd until brick's port is available
[2020-02-25 18:39:47.373332] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(XATTROP(33)) called at 2020-02-25 18:19:23.681712 (xid=0x10396ec)
[2020-02-25 18:39:47.373366] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:39:47.373670] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2020-02-25 18:39:47.372139 (xid=0x103e10f)
[2020-02-25 18:39:47.373704] W [MSGID: 114031] [client-rpc-fops.c:1782:client3_3_fxattrop_cbk] 5-nas-client-3: remote operation failed
[2020-02-25 18:39:47.373723] E [MSGID: 114031] [client-rpc-fops.c:1557:client3_3_finodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373927] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373942] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(COMPOUND(49)) called at 2020-02-25 18:39:47.372211 (xid=0x103e110)
[2020-02-25 18:39:47.373981] W [MSGID: 114031] [client-rpc-fops.c:3138:client3_3_compound_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.374029] W [MSGID: 114061] [client-common.c:438:client_pre_flush] 5-nas-client-3: (a79e63ee-5610-41bb-bd59-72bf7164354d) remote_fd is -1. EBADFD [File descriptor in bad state]
[2020-02-25 18:39:47.374330] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236374556: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 18:39:47.380497] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419406: TRUNCATE() /Image/655/148418_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.464163] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419417: TRUNCATE() /Image/655/152411_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.466615] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419422: TRUNCATE() /Image/655/152411.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.812181] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419427: TRUNCATE() /Image/655/151868_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:52.409176] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419532: TRUNCATE() /Image/655/147289_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:58.037141] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 5-nas-client-3: changing port to 49154 (from 0)
[2020-02-25 18:39:58.040347] E [socket.c:2376:socket_connect_finish] 5-nas-client-3: connection to 10.10.10.121:49154 failed (Connection refused); disconnecting socket

In addition, we recently asked about the issue of increased CPU load for this gluster server.
(https://bugzilla.redhat.com/show_bug.cgi?id=1806244)

Do you know why the brick went down?

Please tell me if you need more information.

Version-Release number of selected component (if applicable): 3.12.15

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

@stale
Copy link

stale bot commented Oct 8, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Oct 8, 2020
@stale
Copy link

stale bot commented Oct 23, 2020

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migrated Type:Bug wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

1 participant