Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug:1787463] Glusterd process is periodically crashing with a segmentation fault #1106

Closed
gluster-ant opened this issue Mar 17, 2020 · 7 comments
Labels
Migrated Type:Bug wontfix Managed by stale[bot]

Comments

@gluster-ant
Copy link
Collaborator

URL: https://bugzilla.redhat.com/1787463
Creator: awingerter at opentext
Time: 20200102T23:25:51

Description of problem: Glusterd process is periodically crashing with a segmentation fault. This happens occasionally on some of our nodes. I've been unable to determine a reason.

Dec 18 18:13:53 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV
Dec 18 19:02:49 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV
Dec 19 18:24:15 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV
Dec 21 05:45:39 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV

Version-Release number of selected component (if applicable):

[root@ch1c7ocvgl01 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

[root@ch1c7ocvgl01 /]# rpm -qa | grep gluster
glusterfs-libs-6.1-1.el7.x86_64
glusterfs-server-6.1-1.el7.x86_64
tendrl-gluster-integration-1.6.3-10.el7.noarch
centos-release-gluster6-1.0-1.el7.centos.noarch
python2-gluster-6.1-1.el7.x86_64
centos-release-gluster5-1.0-1.el7.centos.noarch
glusterfs-api-6.1-1.el7.x86_64
nfs-ganesha-gluster-2.8.2-1.el7.x86_64
glusterfs-client-xlators-6.1-1.el7.x86_64
glusterfs-cli-6.1-1.el7.x86_64
glusterfs-6.1-1.el7.x86_64
glusterfs-fuse-6.1-1.el7.x86_64
glusterfs-events-6.1-1.el7.x86_64

How reproducible:

Unable to reproduce at this time. Issue occurs periodically with an indeterminate cause.

Steps to Reproduce:
N/A

Actual results:
N/A

Expected results:

glusterd should not crash with a segmentation fault.

Additional info:

Several core dumps are located here. Too large to attach.

https://nextcloud.anthonywingerter.net/index.php/s/3n5sSE3SNxfyeyj

Please let me know what further info I can provide.

[root@ch1c7ocvgl01 ~]# gluster volume info

Volume Name: autosfx-prd
Type: Distributed-Replicate
Volume ID: 25e6b3a9-f339-4439-b41e-6084c7527320
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/autosfx/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/autosfx/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/autosfx/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/autosfx/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/autosfx/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/autosfx/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/autosfx/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/autosfx/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/autosfx/brick09 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.client-io-threads: off
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 6GB
features.ctime: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
cluster.enable-shared-storage: enable

Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 50e7c3e8-adb9-427f-ae56-c327829a7d34
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl02.covisint.net:/var/lib/glusterd/ss_brick
Brick2: ch1c7ocvgl03.covisint.net:/var/lib/glusterd/ss_brick
Brick3: ch1c7ocvgl01.covisint.net:/var/lib/glusterd/ss_brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.enable-shared-storage: enable

Volume Name: hc-pstore-prd
Type: Distributed-Replicate
Volume ID: 1947247c-b3e0-4bd9-b808-011273e45195
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/hc-pstore-prd/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/hc-pstore-prd/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/hc-pstore-prd/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/hc-pstore-prd/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/hc-pstore-prd/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/hc-pstore-prd/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick09 (arbiter)
Options Reconfigured:
auth.allow: exlap1354.covisint.net,exlap1355.covisint.net
performance.write-behind-window-size: 8MB
cluster.shd-wait-qlength: 4096
cluster.shd-max-threads: 64
performance.io-thread-count: 16
cluster.eager-lock: on
network.ping-timeout: 15
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.quorum-type: auto
cluster.heal-timeout: 500
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.readdir-optimize: on
cluster.rebalance-stats: on
cluster.background-self-heal-count: 256
cluster.data-self-heal-algorithm: full
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.self-heal-daemon: enable
features.ctime: on
performance.cache-size: 2GB
storage.fips-mode-rchecksum: on
performance.read-ahead: on
performance.cache-invalidation: on
client.event-threads: 8
server.event-threads: 8
performance.stat-prefetch: on
cluster.lookup-optimize: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.enable-shared-storage: enable

Volume Name: plink-prd
Type: Distributed-Replicate
Volume ID: f146a391-c92e-4965-9026-09f16d2d1c53
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/plink/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/plink/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/plink/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/plink/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/plink/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/plink/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/plink/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/plink/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/plink/brick09 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.client-io-threads: off
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 3800MB
features.ctime: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
cluster.enable-shared-storage: enable

Volume Name: pstore-prd
Type: Distributed-Replicate
Volume ID: d77c45ef-19ca-4add-9dac-1bc401244395
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/pstore-prd/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/pstore-prd/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/pstore-prd/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/pstore-prd/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/pstore-prd/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/pstore-prd/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick09 (arbiter)
Options Reconfigured:
cluster.min-free-disk: 1GB
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 6GB
features.ctime: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
auth.allow: exlap779.covisint.net,exlap780.covisint.net
cluster.enable-shared-storage: enable

Volume Name: rvsshare-prd
Type: Distributed-Replicate
Volume ID: bee2d0f7-9215-4be8-9fc6-302fd568d5ed
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/rvsshare-prd/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/rvsshare-prd/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/rvsshare-prd/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/rvsshare-prd/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/rvsshare-prd/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/rvsshare-prd/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick09 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 6GB
features.ctime: off
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
auth.allow: exlap825.covisint.net,exlap826.covisint.net
cluster.enable-shared-storage: enable

Volume Name: test
Type: Distributed-Replicate
Volume ID: 07c36821-382d-45bd-9f17-e7e48811d2a2
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/test/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/test/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/test/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/test/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/test/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/test/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/test/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/test/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/test/brick09 (arbiter)
Options Reconfigured:
performance.write-behind-window-size: 8MB
cluster.shd-wait-qlength: 4096
cluster.shd-max-threads: 64
performance.io-thread-count: 16
cluster.eager-lock: on
network.ping-timeout: 15
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.quorum-type: auto
cluster.heal-timeout: 500
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.readdir-optimize: on
cluster.rebalance-stats: on
cluster.background-self-heal-count: 256
cluster.data-self-heal-algorithm: full
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.self-heal-daemon: enable
performance.cache-size: 2GB
storage.fips-mode-rchecksum: on
performance.read-ahead: on
performance.cache-invalidation: on
client.event-threads: 16
server.event-threads: 16
performance.stat-prefetch: on
cluster.lookup-optimize: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.enable-shared-storage: enable

[root@ch1c7ocvgl01 ~]# gluster volume status
Status of volume: autosfx-prd
Gluster process TCP Port RDMA Port Online Pid

Brick ch1c7ocvgl01:/covisint/gluster/autosf
x/brick01 49152 0 Y 8316
Brick ch1c7ocvgl02:/covisint/gluster/autosf
x/brick02 49152 0 Y 8310
Brick ch1c7ocvga11:/covisint/gluster/autosf
x/brick03 49152 0 Y 8688
Brick ch1c7ocvgl03:/covisint/gluster/autosf
x/brick04 49152 0 Y 8388
Brick ch1c7ocvgl04:/covisint/gluster/autosf
x/brick05 49152 0 Y 7705
Brick ch1c7ocvga11:/covisint/gluster/autosf
x/brick06 49153 0 Y 8689
Brick ch1c7ocvgl05:/covisint/gluster/autosf
x/brick07 49152 0 Y 8128
Brick ch1c7ocvgl06:/covisint/gluster/autosf
x/brick08 49152 0 Y 7811
Brick ch1c7ocvga11:/covisint/gluster/autosf
x/brick09 49154 0 Y 8690
Self-heal Daemon on localhost N/A N/A Y 15133
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et N/A N/A Y 13966
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et N/A N/A Y 25439
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et N/A N/A Y 27470
Self-heal Daemon on ch1c7ocvga11.covisint.n
et N/A N/A Y 4772
Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et N/A N/A Y 10152

Task Status of Volume autosfx-prd

There are no active volume tasks

Status of volume: gluster_shared_storage
Gluster process TCP Port RDMA Port Online Pid

Brick ch1c7ocvgl02.covisint.net:/var/lib/gl
usterd/ss_brick 49153 0 Y 8319
Brick ch1c7ocvgl03.covisint.net:/var/lib/gl
usterd/ss_brick 49153 0 Y 8381
Brick ch1c7ocvgl01.covisint.net:/var/lib/gl
usterd/ss_brick 49153 0 Y 8332
Self-heal Daemon on localhost N/A N/A Y 15133
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et N/A N/A Y 13966
Self-heal Daemon on ch1c7ocvga11.covisint.n
et N/A N/A Y 4772
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et N/A N/A Y 25439
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et N/A N/A Y 27470
Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et N/A N/A Y 10152

Task Status of Volume gluster_shared_storage

There are no active volume tasks

Status of volume: hc-pstore-prd
Gluster process TCP Port RDMA Port Online Pid

Brick ch1c7ocvgl01:/covisint/gluster/hc-pst
ore-prd/brick01 49156 0 Y 15244
Brick ch1c7ocvgl02:/covisint/gluster/hc-pst
ore-prd/brick02 49155 0 Y 30807
Brick ch1c7ocvga11:/covisint/gluster/hc-pst
ore-prd/brick03 49155 0 Y 8755
Brick ch1c7ocvgl03:/covisint/gluster/hc-pst
ore-prd/brick04 49156 0 Y 14874
Brick ch1c7ocvgl04:/covisint/gluster/hc-pst
ore-prd/brick05 49154 0 Y 21306
Brick ch1c7ocvga11:/covisint/gluster/hc-pst
ore-prd/brick06 49156 0 Y 8734
Brick ch1c7ocvgl05:/covisint/gluster/hc-pst
ore-prd/brick07 49156 0 Y 7865
Brick ch1c7ocvgl06:/covisint/gluster/hc-pst
ore-prd/brick08 49154 0 Y 5401
Brick ch1c7ocvga11:/covisint/gluster/hc-pst
ore-prd/brick09 49157 0 Y 8744
Self-heal Daemon on localhost N/A N/A Y 15133
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et N/A N/A Y 13966
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et N/A N/A Y 27470
Self-heal Daemon on ch1c7ocvga11.covisint.n
et N/A N/A Y 4772
Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et N/A N/A Y 25439
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et N/A N/A Y 10152

Task Status of Volume hc-pstore-prd

There are no active volume tasks

Another transaction is in progress for plink-prd. Please try again after some time.

Status of volume: pstore-prd
Gluster process TCP Port RDMA Port Online Pid

Brick ch1c7ocvgl01:/covisint/gluster/pstore
-prd/brick01 49155 0 Y 23221
Brick ch1c7ocvgl02:/covisint/gluster/pstore
-prd/brick02 49156 0 Y 7888
Brick ch1c7ocvga11:/covisint/gluster/pstore
-prd/brick03 49161 0 Y 8835
Brick ch1c7ocvgl03:/covisint/gluster/pstore
-prd/brick04 49155 0 Y 18838
Brick ch1c7ocvgl04:/covisint/gluster/pstore
-prd/brick05 49155 0 Y 18114
Brick ch1c7ocvga11:/covisint/gluster/pstore
-prd/brick06 49162 0 Y 8848
Brick ch1c7ocvgl05:/covisint/gluster/pstore
-prd/brick07 49155 0 Y 24013
Brick ch1c7ocvgl06:/covisint/gluster/pstore
-prd/brick08 49155 0 Y 9192
Brick ch1c7ocvga11:/covisint/gluster/pstore
-prd/brick09 49163 0 Y 8859
Self-heal Daemon on localhost N/A N/A Y 15133
Self-heal Daemon on ch1c7ocvga11.covisint.n
et N/A N/A Y 4772
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et N/A N/A Y 27470
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et N/A N/A Y 13966
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et N/A N/A Y 25439
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et N/A N/A Y 10152
Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524

Task Status of Volume pstore-prd

There are no active volume tasks

Another transaction is in progress for rvsshare-prd. Please try again after some time.

Status of volume: test
Gluster process TCP Port RDMA Port Online Pid

Brick ch1c7ocvgl01:/covisint/gluster/test/b
rick01 49158 0 Y 20468
Brick ch1c7ocvgl02:/covisint/gluster/test/b
rick02 49158 0 Y 30442
Brick ch1c7ocvga11:/covisint/gluster/test/b
rick03 49167 0 Y 8966
Brick ch1c7ocvgl03:/covisint/gluster/test/b
rick04 49158 0 Y 27364
Brick ch1c7ocvgl04:/covisint/gluster/test/b
rick05 49156 0 Y 19154
Brick ch1c7ocvga11:/covisint/gluster/test/b
rick06 49168 0 Y 8980
Brick ch1c7ocvgl05:/covisint/gluster/test/b
rick07 49157 0 Y 13820
Brick ch1c7ocvgl06:/covisint/gluster/test/b
rick08 49157 0 Y 10030
Brick ch1c7ocvga11:/covisint/gluster/test/b
rick09 49169 0 Y 9015
Self-heal Daemon on localhost N/A N/A Y 15133
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et N/A N/A Y 27470
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et N/A N/A Y 13966
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et N/A N/A Y 25439
Self-heal Daemon on ch1c7ocvga11.covisint.n
et N/A N/A Y 4772
Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et N/A N/A Y 10152

Task Status of Volume test

There are no active volume tasks

@gluster-ant
Copy link
Collaborator Author

Time: 20200106T11:07:47
srakonde at redhat commented:
I have tried to look at the backtrace from the cores. Even though I installed release-6.1 I don't find any debug symbols.

It looks like:
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f442f3244a7 in ?? ()
[Current thread is 1 (LWP 21520)]
(gdb) bt
#0 0x00007f442f3244a7 in ?? ()
#1 0x4cce3ca800000001 in ?? ()
#2 0x0000000000018b1e in ?? ()
#3 0x00007f442f41faa8 in ?? ()
#4 0x00007f4400000000 in ?? ()
#5 0x00007f440c0174f0 in ?? ()
#6 0x00007f442f7b1b20 in ?? ()
#7 0x00007f441c4030c0 in ?? ()
#8 0x00007f442f7b1b90 in ?? ()
#9 0x00007f441c4030dc in ?? ()
#10 0x0000000000000007 in ?? ()
#11 0x0000562c75ecd4e0 in ?? ()
#12 0x00007f442f324db7 in ?? ()
#13 0x00007f4400000000 in ?? ()
#14 0x0000000000000000 in ?? ()

Can you please share output of "t a a bt" output?

Thanks,
Sanju

@gluster-ant
Copy link
Collaborator Author

Time: 20200109T15:53:47
awingerter at opentext commented:
Sanju,

Thank you for the response.

I am very unfamiliar with using gdb and collecting backtraces from the cores.

Would it be possible for you to detail the configuration / collection steps needed?

Thanks and best regards,
-Anthony-

@gluster-ant
Copy link
Collaborator Author

Time: 20200110T05:37:18
srakonde at redhat commented:
Hi Anthony,

  1. Take the core into gdb
    gdb glusterd
  2. bt command gives you the backtrace of thread 1 and "t a a bt"(thread all apply backtrace) gives you backtrace of all threads. give "t a a bt" command at the gdb and collect the data.

Hope that helps,
Sanju

@gluster-ant
Copy link
Collaborator Author

Time: 20200120T15:49:46
awingerter at opentext commented:
Sanju,

Thank you for the response.
I apologize for getting back to you so late.

Here is some data from one of the cores where glusterd crashed.

[root@ch1c7ocvgl04 /]# gdb glusterd /core.7525
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 7657]
[New LWP 7526]
[New LWP 7529]
[New LWP 7525]
[New LWP 7527]
[New LWP 7528]
[New LWP 7531]
[New LWP 7530]
[New LWP 7656]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fbac6a094a7 in glusterd_op_ac_brick_op_failed () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
Missing separate debuginfos, use: debuginfo-install glusterfs-server-6.1-1.el7.x86_64
(gdb) t a a bt

Thread 9 (Thread 0x7fbac3a77700 (LWP 7656)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007fbac6aafddb in hooks_worker () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#2 0x00007fbad16fedd5 in start_thread (arg=0x7fbac3a77700) at pthread_create.c:307
#3 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 8 (Thread 0x7fbac7e99700 (LWP 7530)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007fbad28ff810 in syncenv_task () from /lib64/libglusterfs.so.0
#2 0x00007fbad29006c0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac7e99700) at pthread_create.c:307
#4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 7 (Thread 0x7fbac7698700 (LWP 7531)):
#0 0x00007fbad0fbcf73 in select () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fbad293e7e4 in runner () from /lib64/libglusterfs.so.0
#2 0x00007fbad16fedd5 in start_thread (arg=0x7fbac7698700) at pthread_create.c:307
#3 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 6 (Thread 0x7fbac8e9b700 (LWP 7528)):
#0 0x00007fbad0f8ce2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fbad0f8ccc4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x00007fbad28eb54d in pool_sweeper () from /lib64/libglusterfs.so.0
#3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac8e9b700) at pthread_create.c:307
#4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 5 (Thread 0x7fbac969c700 (LWP 7527)):
#0 0x00007fbad1706361 in do_sigwait (sig=0x7fbac969be1c, set=) at ../sysdeps/unix/sysv/linux/sigwait.c:60
#1 __sigwait (set=0x7fbac969be20, sig=0x7fbac969be1c) at ../sysdeps/unix/sysv/linux/sigwait.c:95
#2 0x000055b5e9cda1bb in glusterfs_sigwaiter ()
#3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac969c700) at pthread_create.c:307
#4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x7fbad2dbe780 (LWP 7525)):
#0 0x00007fbad16fff47 in pthread_join (threadid=140440114784000, thread_return=0x0) at pthread_join.c:90
#1 0x00007fbad2923478 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2 0x000055b5e9cd6735 in main ()

Thread 3 (Thread 0x7fbac869a700 (LWP 7529)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007fbad28ff810 in syncenv_task () from /lib64/libglusterfs.so.0
#2 0x00007fbad29006c0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac869a700) at pthread_create.c:307
#4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

---Type to continue, or q to quit---
Thread 2 (Thread 0x7fbac9e9d700 (LWP 7526)):
#0 0x00007fbad1705e3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fbad28cdf76 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2 0x00007fbad16fedd5 in start_thread (arg=0x7fbac9e9d700) at pthread_create.c:307
#3 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7fbac3276700 (LWP 7657)):
#0 0x00007fbac6a094a7 in glusterd_op_ac_brick_op_failed () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#1 0x00007fbac6a09db7 in glusterd_op_sm () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#2 0x00007fbac6a419dc in glusterd_mgmt_v3_lock_peers_cbk_fn () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#3 0x00007fbac6a40faa in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#4 0x00007fbad2669021 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#5 0x00007fbad2669387 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#6 0x00007fbad26659f3 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#7 0x00007fbac5c0b875 in socket_event_handler () from /usr/lib64/glusterfs/6.1/rpc-transport/socket.so
#8 0x00007fbad2924286 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#9 0x00007fbad16fedd5 in start_thread (arg=0x7fbac3276700) at pthread_create.c:307
#10 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@gluster-ant
Copy link
Collaborator Author

Time: 20200224T07:01:28
srakonde at redhat commented:
Hi Anthony,

Sorry for delayed response on this bug. Can you please install the debuginfo package related to glusterfs and then provide the back trace?

Thanks,
Sanju

@stale
Copy link

stale bot commented Oct 13, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Oct 13, 2020
@schaffung
Copy link
Member

Looking into the issue, it seems like the requested information is still not provided. Closing this issue for now, it can be opened if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migrated Type:Bug wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

2 participants