Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug:1773532] Gluster brick randomly segfaults #861

Closed
gluster-ant opened this issue Mar 12, 2020 · 32 comments
Closed

[bug:1773532] Gluster brick randomly segfaults #861

gluster-ant opened this issue Mar 12, 2020 · 32 comments
Labels
Migrated Type:Bug wontfix Managed by stale[bot]

Comments

@gluster-ant
Copy link
Collaborator

URL: https://bugzilla.redhat.com/1773532
Creator: ddrazyk at gmail
Time: 20191118T11:33:57

Created attachment 1637263
Compressed logs from journald, glusterd and vdsm.

Description of problem:
I am running a 3 node ovirt cluster with glusterfs storage domain. Gluster is configured with lvm cache with writeback caching with hardware RAID (one virtual device is SSD and second is HDD on LSI controller) backed by xfs. There are two volumes served by this cluster: wiosna-vmstore which serves as Data storage and wiosna-iso which is an ISO domain. Both have sharding turned on. Management is on a separate physical machine.
I randomly get glusterd segfaults which cause a brick to go down (it's either iso or vmstore, never both). When two nodes get a segfault then all VM's end up in Paused state. All hosts run a

Version-Release number of selected component (if applicable):
vdsm-gluster-4.30.33-1.el7.x86_64
glusterfs-6.6-1.el7.x86_64

How reproducible:
Don't know. Occurs randomly.

Steps to Reproduce:
N/A

Actual results:
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: patchset: git://git.gluster.org/glusterfs.git
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: signal received: 11
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: time of crash:
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: 2019-11-18 00:53:27
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: configuration details:
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: argp 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: backtrace 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: dlfcn 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: libpthread 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: llistxattr 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: setfsid 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: spinlock 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: epoll.h 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: xattr.h 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: st_atim.tv_nsec 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: package-string: glusterfs 6.5
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: ---------

Expected results:
Normal operation.

Additional info:

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:16:07
ykaul at redhat commented:
Is this an updat

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:16:36
ykaul at redhat commented:
Can you please share complete logs?

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:19:17
ddrazyk at gmail commented:
Hi Yaniv,
which one you need? Journald?

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:20:53
ykaul at redhat commented:
(In reply to Dominik Drazyk from comment #3)

Hi Yaniv,
which one you need? Journald?

  1. If the version you are using is 4.1, please upgrade promptly.
  2. Gluster logs.

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:30:17
ddrazyk at gmail commented:
Created attachment 1637268
Gluster logs

Gluster logs from node01

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:30:54
ddrazyk at gmail commented:
Created attachment 1637269
Gluster logs node02

Gluster logs from node02

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:31:20
ddrazyk at gmail commented:
Created attachment 1637270
Gluster logs node03

Gluster logs for node03

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T12:35:14
ddrazyk at gmail commented:
I use ovirt 4.3 - newest version. I've updated to the newest version this morning. Before the update I had packages from 12.10.2019. Otherwise glusterd is in the newest version that comes from ovirt repository 4.3 for CentOS 7.

@gluster-ant
Copy link
Collaborator Author

Time: 20191118T13:34:41
ykaul at redhat commented:
I can see the crash:
2019-11-18 00:53:04.287021] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-wiosna-vmstore-posix: write failed: offset 0, [Invalid argument]
[2019-11-18 00:53:04.287072] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-wiosna-vmstore-server: 26723901: WRITEV 5 (5bb969ab-4ed1-4afa-b136-78bf7ead800d), client: CTX_ID:2cd653a2-1bae-4df7-adbe-9b8151275b15-GRAPH_ID:0-PID:14082-HOST:node03.wiosna.org.pl-PC_NAME:wiosna-vmstore-client-0-RECON_NO:-0, error-xlator: wiosna-vmstore-posix [Invalid argument]
[2019-11-18 00:53:26.572901] E [socket.c:1303:socket_event_poll_err] (-->/lib64/libglusterfs.so.0(+0x8b806) [0x7f6fd1543806] -->/usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0xa48a) [0x7f6fc58a148a] -->/usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0x81fc) [0x7f6fc589f1fc] ) 0-socket: invalid argument: this->private [Invalid argument]
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-11-18 00:53:27
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.5
/lib64/libglusterfs.so.0(+0x27130)[0x7f6fd14df130]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f6fd14e9b34]
/lib64/libc.so.6(+0x363b0)[0x7f6fcfb1c3b0]
/usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0xa4cc)[0x7f6fc58a14cc]
/lib64/libglusterfs.so.0(+0x8b806)[0x7f6fd1543806]
/lib64/libpthread.so.0(+0x7e65)[0x7f6fd031ee65]
/lib64/libc.so.6(clone+0x6d)[0x7f6fcfbe488d]

But I'm certainly just as concerned on the write failures which the log is flooded with:
[2019-11-17 02:20:47.554644] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-wiosna-vmstore-posix: write failed: offset 0, [Invalid argument]
[2019-11-17 02:20:47.554694] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-wiosna-vmstore-server: 12021489: WRITEV 2 (82c44424-d251-42cf-ad23-13f0f9ea5ed7), client: CTX_ID:c51b6193-1da1-4e46-9d30-637f419dcae1-GRAPH_ID:0-PID:15471-HOST:node01.wiosna.org.pl-PC_NAME:wiosna-vmstore-client-0-RECON_NO:-0, error-xlator: wiosna-vmstore-posix [Invalid argument]

Gobinda, can you take a look?

@gluster-ant
Copy link
Collaborator Author

Time: 20191119T09:02:55
godas at redhat commented:
In VDSm log I can see the volumes are created with 4096 blocksize.
2019-11-17 10:21:39,885+0200 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Return 'GlusterVolume.status' in bridge with {'volumeStatus': {'bricks': [{'hostuuid': '08181378-2670-46d6-8784-7cf5df121f34', 'blockSize': '4096', 'sizeFree': '72913.555', 'sizeTotal': '102346.004', 'mntOptions': 'rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=1024,noquota', 'device': '/dev/mapper/gluster_vg_nvme-gluster_lv_data_fast2', 'brick': 'gluster1:/gluster_bricks/data_fast2/data_fast2', 'fsName': 'xfs'}, {'hostuuid': 'ea6dc070-7bf7-4258-87c8-38183f49805d', 'blockSize': '4096', 'sizeFree': '72909.633', 'sizeTotal': '102346.004', 'mntOptions': 'rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=1024,noquota', 'device': '/dev/mapper/gluster_vg_nvme-gluster_lv_data_fast2', 'brick': 'gluster2:/gluster_bricks/data_fast2/data_fast2', 'fsName': 'xfs'}, {'hostuuid': 'ad1547fe-7469-4f10-b9cf-1dd30317ce2c', 'blockSize': '4096', 'sizeFree': '15292.875', 'sizeTotal': '15350.000', 'mntOptions': 'rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota', 'device': '/dev/mapper/gluster_vg_sda3-gluster_lv_data_fast2', 'brick': 'ovirt3:/gluster_bricks/data_fast2/data_fast2', 'fsName': 'xfs'}], 'volumeStatsInfo': {'sizeTotal': '107317563392', 'sizeUsed': '31939444736', 'sizeFree': '75378118656'}, 'name': 'data_fast2'}} (init:356)

@dominik Are you using 4kN drive? If not then I wonder whether 4kN code is giving trouble. [Invalid argument] could be because of 4kN.

@gluster-ant
Copy link
Collaborator Author

Time: 20191119T09:32:25
ddrazyk at gmail commented:
Hi Gobinda,

fdisk reports as follows:
Sector size (logical/physical): 512 bytes / 4096 bytes (that's SSD virtual drive used for caching)
Sector size (logical/physical): 512 bytes / 4096 bytes (that's HDD virtual drive used for data)
LSI tools return the same - logical sector size as 512 and physical as 4096.
Both are RAID1 on LSI controller. I used default options for creating LVM pools and XFS. Mount options are as below:
/dev/storage/hdd /opt/data xfs rw,inode64,noatime,nouuid,nodiratime 0 0

@gluster-ant
Copy link
Collaborator Author

Time: 20191120T07:04:07
godas at redhat commented:
Hi Dominik,
Thanks for the info but fdisk always gives 4096 eventhough it's 512.
Can you please check the output of "blockdev --getss /dev/" ?
The invalid argument from log looks like there is mismatch between the requested size and the actual size.
But crash may not be because of this could be bug in socket.

@gluster-ant
Copy link
Collaborator Author

Time: 20191120T07:37:31
ddrazyk at gmail commented:
Both are 512:
[dominik@node01 ~]$ sudo blockdev --getss /dev/sdb
512
[dominik@node01 ~]$ sudo blockdev --getss /dev/sda
512

@gluster-ant
Copy link
Collaborator Author

Time: 20191121T06:01:23
godas at redhat commented:
Hi Raghavendra,
Can you please help me here to find out reason of crashing, I am thinking there is some issue in socket?

@gluster-ant
Copy link
Collaborator Author

Time: 20191125T07:02:18
srakonde at redhat commented:
Hi,

Can you please provide us output of "bt" and "t a a bt" from the corefile? That helps us investigating this issue faster. If possible, do share the core file.

Thanks,
Sanju

@gluster-ant
Copy link
Collaborator Author

Time: 20191126T07:57:21
ddrazyk at gmail commented:
Hi Sanju,
where can I find the core file?

Kind regards,
Dominik

@gluster-ant
Copy link
Collaborator Author

Time: 20191126T08:06:01
srakonde at redhat commented:
It should be in its default location / directory, unless you have customized kernel.core_pattern in /etc/sysctl.conf.

In my case, I have something like below. So, my core files will be stored under /root/cores/

[root@localhost glusterfs]# cat /etc/sysctl.conf

Own core file pattern...

kernel.core_pattern=/root/cores/core.%e.%p.%h.%t
[root@localhost glusterfs]#

HTH,
Sanju

@gluster-ant
Copy link
Collaborator Author

Time: 20191126T08:25:22
ddrazyk at gmail commented:
Ok I found it.

(gdb) bt
#0 0x00007f7d0c38564c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1 0x00007f7d18028ae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

(gdb) t a a bt

Thread 40 (Thread 0x7f7d184bf4c0 (LWP 138985)):
#0 0x00007f7d16e03fd7 in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f7d18027cd8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2 0x000055f596714723 in main ()

Thread 39 (Thread 0x7f7d000ac700 (LWP 140549)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 38 (Thread 0x7f7cf46b3700 (LWP 140758)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 37 (Thread 0x7f7cf4431700 (LWP 140760)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 36 (Thread 0x7f7d0d59b700 (LWP 138991)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d18002ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2 0x00007f7d18003da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 35 (Thread 0x7f7cf4472700 (LWP 140759)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 34 (Thread 0x7f7d0ed9e700 (LWP 138988)):
#0 0x00007f7d16e0a381 in sigwait () from /lib64/libpthread.so.0
#1 0x000055f5967181ab in glusterfs_sigwaiter ()
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 33 (Thread 0x7f7d0dd9c700 (LWP 138990)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d18002ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2 0x00007f7d18003da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 32 (Thread 0x7f7d0cd9a700 (LWP 138992)):
#0 0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1 0x00007f7d18043044 in runner () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
---Type to continue, or q to quit---
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 31 (Thread 0x7f7d000ed700 (LWP 140548)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7f7d0016f700 (LWP 139188)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7f7d008f8700 (LWP 139011)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d02f69d24 in index_worker () from /usr/lib64/glusterfs/6.6/xlator/features/index.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f7d0e59d700 (LWP 138989)):
#0 0x00007f7d1668f80d in nanosleep () from /lib64/libc.so.6
#1 0x00007f7d1668f6a4 in sleep () from /lib64/libc.so.6
#2 0x00007f7d17fef678 in pool_sweeper () from /lib64/libglusterfs.so.0
#3 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f7d0012e700 (LWP 139189)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f7cedffb700 (LWP 139134)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f7d0f59f700 (LWP 138987)):
#0 0x00007f7d16e09e5d in nanosleep () from /lib64/libpthread.so.0
#1 0x00007f7d17fd2396 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 24 (Thread 0x7f7cef7fe700 (LWP 139114)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f7d001f1700 (LWP 139118)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
---Type to continue, or q to quit---
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f7cf46f4700 (LWP 140757)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f7cf47b7700 (LWP 140552)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f7cf4735700 (LWP 140756)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f7cf4776700 (LWP 140755)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f7cf47f8700 (LWP 140551)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f7d0006b700 (LWP 140550)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f7d001b0700 (LWP 139149)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f7cee7fc700 (LWP 139126)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f7ceeffd700 (LWP 139117)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
---Type to continue, or q to quit---
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f7cf7fff700 (LWP 139013)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d08c17e23 in changelog_ev_connector () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f7ceffff700 (LWP 139020)):
#0 0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d096812db in posix_fsyncer_pick () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2 0x00007f7d09681565 in posix_fsyncer () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f7cf4ff9700 (LWP 139019)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d0967b663 in posix_ctx_janitor_thread_proc () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f7cf5ffb700 (LWP 139017)):
#0 0x00007f7d1668f80d in nanosleep () from /lib64/libc.so.6
#1 0x00007f7d1668f6a4 in sleep () from /lib64/libc.so.6
#2 0x00007f7d096810b0 in posix_disk_space_check_thread_proc () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f7d080f7700 (LWP 139012)):
#0 0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f7cf67fc700 (LWP 139016)):
#0 0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1 0x00007f7d08c1808a in changelog_ev_dispatch () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f7cf57fa700 (LWP 139018)):
#0 0x00007f7d1668f80d in nanosleep () from /lib64/libc.so.6
#1 0x00007f7d1668f6a4 in sleep () from /lib64/libc.so.6
#2 0x00007f7d096808da in posix_health_check_thread_proc () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f7cf6ffd700 (LWP 139015)):
#0 0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1 0x00007f7d08c1808a in changelog_ev_dispatch () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

---Type to continue, or q to quit---
Thread 5 (Thread 0x7f7cf77fe700 (LWP 139014)):
#0 0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1 0x00007f7d08c1808a in changelog_ev_dispatch () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f7d0a0c8700 (LWP 138998)):
#0 0x00007f7d166c8e63 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f7d180288c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f7d01e01700 (LWP 139010)):
#0 0x00007f7d166c8e63 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f7d180288c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f7d02602700 (LWP 139009)):
#0 0x00007f7d166c8e63 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f7d180288c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f7d0a8c9700 (LWP 138997)):
#0 0x00007f7d0c38564c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1 0x00007f7d18028ae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f7d166c888d in clone () from /lib64/libc.so.6

I can share the core file as a link (it's more than 700MB). Is that fine with bugzilla's policy?

@gluster-ant
Copy link
Collaborator Author

Time: 20191128T09:39:55
ddrazyk at gmail commented:
Hi Sanju,
any progress with this bug?

Kind regards,
Dominik

@gluster-ant
Copy link
Collaborator Author

Time: 20191202T06:34:35
srakonde at redhat commented:
Hi,

I suspect whether you have provided the right coredump, because all the backtraces look usual. I'm unable to make anything out of this.

Thanks,
Sanju

@gluster-ant
Copy link
Collaborator Author

Time: 20191204T09:55:06
ddrazyk at gmail commented:
Hi Sanju,
I can install debuginfo packages if that might help with debugging. Do I need to restart glusterd on every node after that?

Kind regards,
Dominik

@gluster-ant
Copy link
Collaborator Author

Time: 20191205T08:57:19
godas at redhat commented:
Forwarding needinfo on Sanju.

@gluster-ant
Copy link
Collaborator Author

Time: 20191205T10:07:16
srakonde at redhat commented:
(In reply to Dominik Drazyk from comment #21)

Hi Sanju,
I can install debuginfo packages if that might help with debugging. Do I
need to restart glusterd on every node after that?

Kind regards,
Dominik

Looking at the backtrace you have provided, I can say that you have already installed debug-info packages. I suspect you have provided the backtrace from the wrong core file, as it doesn't have any backtrace where we can see any process crashing. Can you please cross check?

Thanks,
Sanju

@gluster-ant
Copy link
Collaborator Author

Time: 20191206T07:43:49
ddrazyk at gmail commented:
Hi Sanju,
below is the newest backtrace.

journalctl -u glusterd:

gru 06 02:54:38 node02 opt-data-vmstore[22826]: pending frames:
gru 06 02:54:38 node02 opt-data-vmstore[22826]: patchset: git://git.gluster.org/glusterfs.g
gru 06 02:54:38 node02 opt-data-vmstore[22826]: signal received: 11
gru 06 02:54:38 node02 opt-data-vmstore[22826]: time of crash:
gru 06 02:54:38 node02 opt-data-vmstore[22826]: 2019-12-06 01:54:38
gru 06 02:54:38 node02 opt-data-vmstore[22826]: configuration details:
gru 06 02:54:38 node02 opt-data-vmstore[22826]: argp 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: backtrace 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: dlfcn 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: libpthread 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: llistxattr 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: setfsid 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: spinlock 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: epoll.h 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: xattr.h 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: st_atim.tv_nsec 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: package-string: glusterfs 6.6
gru 06 02:54:38 node02 opt-data-vmstore[22826]: ---------

ls -l /var/tmp/abrt/
total 12
drwxr-x---. 2 root abrt 4096 12-06 02:56 ccpp-2019-12-06-02:54:38-22833
-rw-------. 1 root root 20 12-06 02:54 last-ccpp
-rw-------. 1 root root 23 08-29 00:12 last-via-server

ls -l coredump
-rw-r-----. 1 root abrt 514240512 12-06 02:54 coredump

So this should be the correct coredump. The brick on node02 was down since 02:54.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s 10.20.99.202 --volfile-id wiosna-vmstore.10.20.99.202.o'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f82e893964c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
Missing separate debuginfos, use: debuginfo-install glusterfs-server-6.6-1.el7.x86_64
(gdb)
(gdb) bt
#0 0x00007f82e893964c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1 0x00007f82f45dcae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

(gdb) t a a bt

Thread 41 (Thread 0x7f82e934e700 (LWP 22832)):
#0 0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1 0x00007f82f45f7044 in runner () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 40 (Thread 0x7f82d026d700 (LWP 23162)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 39 (Thread 0x7f82d02ef700 (LWP 23160)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 38 (Thread 0x7f82c6ffd700 (LWP 22876)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 37 (Thread 0x7f82dc0a1700 (LWP 22873)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 36 (Thread 0x7f82c67fc700 (LWP 22880)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 35 (Thread 0x7f82d0571700 (LWP 23084)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 34 (Thread 0x7f82d06f7700 (LWP 22883)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 33 (Thread 0x7f82d06b6700 (LWP 22884)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
---Type to continue, or q to quit---
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 32 (Thread 0x7f82c77fe700 (LWP 22851)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 31 (Thread 0x7f82d05b2700 (LWP 23083)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7f82dc060700 (LWP 22882)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7f82d02ae700 (LWP 23161)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f82d17fa700 (LWP 22845)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82e5c2f663 in posix_ctx_janitor_thread_proc ()
from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f82d05f3700 (LWP 23082)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f82d022c700 (LWP 23163)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f82d0634700 (LWP 23081)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

---Type to continue, or q to quit---
Thread 24 (Thread 0x7f82c7fff700 (LWP 22850)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f82d0330700 (LWP 23159)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f82d0675700 (LWP 22885)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f82d01eb700 (LWP 23164)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f82ea350700 (LWP 22830)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f45b6ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2 0x00007f82f45b7da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f82eab51700 (LWP 22829)):
#0 0x00007f82f2c4380d in nanosleep () from /lib64/libc.so.6
#1 0x00007f82f2c436a4 in sleep () from /lib64/libc.so.6
#2 0x00007f82f45a3678 in pool_sweeper () from /lib64/libglusterfs.so.0
#3 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f82eb352700 (LWP 22828)):
#0 0x00007f82f33be381 in sigwait () from /lib64/libpthread.so.0
#1 0x0000558a11f461ab in glusterfs_sigwaiter ()
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f82dcf39700 (LWP 22837)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82df5aad24 in index_worker () from /usr/lib64/glusterfs/6.6/xlator/features/index.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f82f4a734c0 (LWP 22826)):
#0 0x00007f82f33b7fd7 in pthread_join () from /lib64/libpthread.so.0
---Type to continue, or q to quit---
#1 0x00007f82f45dbcd8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2 0x0000558a11f42723 in main ()

Thread 15 (Thread 0x7f82d0ff9700 (LWP 22846)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82e5c352db in posix_fsyncer_pick () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2 0x00007f82e5c35565 in posix_fsyncer () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f82e9b4f700 (LWP 22831)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f45b6ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2 0x00007f82f45b7da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f82dca34700 (LWP 22839)):
#0 0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82e51cbe23 in changelog_ev_connector ()
from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f82ebb53700 (LWP 22827)):
#0 0x00007f82f33bde5d in nanosleep () from /lib64/libpthread.so.0
#1 0x00007f82f4586396 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f82f48a74c0 (LWP 730)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82f37d186c in handle_fildes_io () from /lib64/librt.so.1
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f82d27fc700 (LWP 22843)):
#0 0x00007f82f2c4380d in nanosleep () from /lib64/libc.so.6
#1 0x00007f82f2c436a4 in sleep () from /lib64/libc.so.6
#2 0x00007f82e5c350b0 in posix_disk_space_check_thread_proc ()
from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f82d1ffb700 (LWP 22844)):
#0 0x00007f82f2c4380d in nanosleep () from /lib64/libc.so.6
#1 0x00007f82f2c436a4 in sleep () from /lib64/libc.so.6
#2 0x00007f82e5c34422 in posix_fs_health_check () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3 0x00007f82e5c348f2 in posix_health_check_thread_proc ()
from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#4 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6
---Type to continue, or q to quit---

Thread 8 (Thread 0x7f82d2ffd700 (LWP 22842)):
#0 0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1 0x00007f82e51cc08a in changelog_ev_dispatch ()
from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f82d37fe700 (LWP 22841)):
#0 0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1 0x00007f82e51cc08a in changelog_ev_dispatch ()
from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f82d3fff700 (LWP 22840)):
#0 0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1 0x00007f82e51cc08a in changelog_ev_dispatch ()
from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f82e667c700 (LWP 22834)):
#0 0x00007f82f2c7ce63 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f82f45dc8c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f82e406a700 (LWP 22838)):
#0 0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f82de442700 (LWP 22836)):
#0 0x00007f82f2c7ce63 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f82f45dc8c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f82dec43700 (LWP 22835)):
#0 0x00007f82f2c7ce63 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f82f45dc8c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f82e6e7d700 (LWP 22833)):
#0 0x00007f82e893964c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1 0x00007f82f45dcae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2 0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

I guess that's all I can extract from the coredump. I can attach a link to the coredump file if that's helpful.

Kind regards,
Dominik

@gluster-ant
Copy link
Collaborator Author

Time: 20200107T14:21:40
srakonde at redhat commented:
Changing the component to core as it is crash by glusterfsd process.

@gluster-ant
Copy link
Collaborator Author

Time: 20200114T10:28:18
ddrazyk at gmail commented:
Hi Sanju,
I deployed another setup based on ovirt 4.3.7 and glusterfs with same hardware (LSI 3108 controller, Intel silver 41xx cpus) and had similar issues. However no segfaults have occurred so far (I've been testing it for 3 days).
In the new setup, I removed features.shard from the gluster volume and connected via nfs-ganesha. There are no errors in the logs. Using native ovirt gluster connector throws below errors in brick log (similar to the original issue):
[2020-01-14 09:59:19.615459] E [MSGID: 113072] [posix-inode-fd-ops.c:1886:posix_writev] 0-ssd-posix: write failed: offset 0, [Invalid argument]
[2020-01-14 09:59:19.615497] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ssd-server: 36: WRITEV 0 (0d4c3c9f-08bf-4b1c-926c-a1abb0e9898a), client: CTX_ID:4bda4abd-49d1-42e8-bcfe-e3353a418932-GRAPH_ID:0-PID:77040-HOST:node02-PC_NAME:ssd-client-0-RECON_NO:-0, error-xlator: ssd-posix [Invalid argument]

Does that help a bit with troubleshooting?

Kind regards,
Dominik

@gluster-ant
Copy link
Collaborator Author

Time: 20200211T14:13:18
jahernan at redhat commented:
This seems the same as bug #1782495. It should be fixed from versions 6.7 and 7.1. Your initial report was on Gluster 6.6. Can you check if it has been upgraded to 6.7 ? that would explain why it doesn't crash anymore.

@gluster-ant
Copy link
Collaborator Author

Time: 20200219T14:07:04
moagrawa at redhat commented:
Hi Dominik,

Please share if you have any updates.

Thanks,
Mohit Agrawal

@gluster-ant
Copy link
Collaborator Author

Time: 20200225T10:59:27
ddrazyk at gmail commented:
Hi Mohit,
I can try to upgrade Gluster to 7.1, but I need confirmation that it's compatible with oVirt 4.3.

Kind regards,
Dominik

@xhernandez
Copy link
Contributor

Dominik, I think you don't need to upgrade, at least for now. If I understood it correctly, you said that after installing another cluster with oVirt 4.3.7 the problem disappeared. I would like that you check which version of Gluster is being used in this new cluster. If it's 6.7, that would explain the problem, because the issue seems the same as this one and it was fixed in 6.7.

@stale
Copy link

stale bot commented Oct 9, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Oct 9, 2020
@stale
Copy link

stale bot commented Oct 24, 2020

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Oct 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migrated Type:Bug wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

2 participants