Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One brick offline with signal received: 11 #1699

Closed
Bockeman opened this issue Oct 23, 2020 · 55 comments · Fixed by #1730
Closed

One brick offline with signal received: 11 #1699

Bockeman opened this issue Oct 23, 2020 · 55 comments · Fixed by #1730

Comments

@Bockeman
Copy link

Description of problem:
One brick on one server is offline and all attempts to bring it back online have failed.
The corresponding brick on the other (of a replica 2) server is ok. Other bricks are ok.

The following do not clear the problem:

  • reboot
  • gluster volume start gluvol1
  • gluster volume reset-brick gluvol1 verijolt:/srv/brick06 verijolt:/srv/brick06 commit force

The problem appears to be similar to #1531
but the cause is different, and the number of volumes and bricks is different.
(I note the observation comment regarding "replica 2" and split-brain, but the cost (time/effort) to recover
from split-brain is manageable and usually due to external causes, such as a power cut.)

My urgent need is to find a way out of the current situation and bring back online brick06 on the second server.
Not so urgent is the need for gluster to handle this condition in a graceful way and report to the user/admin
what is the real cause of the problem and how to fix it (if it cannot be fixed automatically).

The exact command to reproduce the issue:
Not sure what actually caused this situation to arise, but activity at the time was:
Multiple clients, all active, but with minimal activity.
Intense activity from one client (actually one of the two gluster servers), scripted "chown" on over a million files
which had been running for over 5 hours and was 83% complete.
edit or "sed -i" on a 500MB script file (but should not have tipped over the 22GB Mem + 8 GB Swap)

The full output of the command that failed:

gluster volume status gluvol1 \
2>&1 | awk '{print "  " $0}'; date +\ \ %F\ %T%n
  Status of volume: gluvol1
  Gluster process                             TCP Port  RDMA Port  Online  Pid
  ------------------------------------------------------------------------------
  Brick veriicon:/srv/brick06                 49162     0          Y       2065834
  Brick verijolt:/srv/brick06                 N/A       N/A        N       N/A  
  Brick veriicon:/srv/brick05                 49163     0          Y       2065859
  Brick verijolt:/srv/brick05                 49161     0          Y       4775 
  Brick veriicon:/srv/brick07                 49164     0          Y       2065887
  Brick verijolt:/srv/brick07                 49162     0          Y       4797 
  Self-heal Daemon on localhost               N/A       N/A        Y       1969 
  Quota Daemon on localhost                   N/A       N/A        Y       4867 
  Bitrot Daemon on localhost                  N/A       N/A        Y       4882 
  Scrubber Daemon on localhost                N/A       N/A        Y       4938 
  Self-heal Daemon on veriicon                N/A       N/A        Y       2063499
  Quota Daemon on veriicon                    N/A       N/A        Y       2304107
  Bitrot Daemon on veriicon                   N/A       N/A        Y       2304118
  Scrubber Daemon on veriicon                 N/A       N/A        Y       2304144
   
  Task Status of Volume gluvol1
  ------------------------------------------------------------------------------
  There are no active volume tasks
   
  2020-10-22 23:35:11

Expected results:
Some way to bring that brick back online.

- The output of the gluster volume info command:

gluster volume info gluvol1 \
2>&1 | awk '{print "  " $0}'; date +\ \ %F\ %T%n
  Volume Name: gluvol1
  Type: Distributed-Replicate
  Volume ID: 5af1e1c9-afbd-493d-a567-d7989cf3b9ea
  Status: Started
  Snapshot Count: 0
  Number of Bricks: 3 x 2 = 6
  Transport-type: tcp
  Bricks:
  Brick1: veriicon:/srv/brick06
  Brick2: verijolt:/srv/brick06
  Brick3: veriicon:/srv/brick05
  Brick4: verijolt:/srv/brick05
  Brick5: veriicon:/srv/brick07
  Brick6: verijolt:/srv/brick07
  Options Reconfigured:
  storage.max-hardlinks: 10000000
  features.quota-deem-statfs: on
  features.scrub-freq: monthly
  features.inode-quota: on
  features.quota: on
  features.scrub: Active
  features.bitrot: on
  transport.address-family: inet
  nfs.disable: on
  performance.client-io-threads: off
  cluster.self-heal-daemon: on
  cluster.min-free-inodes: 5%
  2020-10-22 23:34:01

- The operating system / glusterfs version:
Fedora F32
Linux veriicon 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux verijolt 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
glusterfs 7.8

Additional info:

on veriicon (1st server)
systemctl status glusterd
o glusterd.service - GlusterFS, a clustered file-system server
     Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2020-10-18 11:20:39 BST; 4 days ago
       Docs: man:glusterd(8)
    Process: 833 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 838 (glusterd)
      Tasks: 314 (limit: 23818)
     Memory: 11.9G
        CPU: 2d 13h 26min 1.990s
     CGroup: /system.slice/glusterd.service
                 838 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
             2063406 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick04 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick>
             2063429 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick00 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick>
             2063452 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick01 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick>
             2063499 /usr/sbin/glusterfs -s localhost --volfile-id shd/gluvol0 -p /var/run/gluster/shd/gluvol0/gluvol0-shd.pid -l /var/log/glusterf>
             2065834 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick06 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick>
             2065859 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick05 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick>
             2065887 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick07 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick>
             2304107 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quota>
             2304118 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S>
             2304144 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log/glusterfs/scrub.lo>

on verijolt (2nd server)

systemctl status glusterd
o glusterd.service - GlusterFS, a clustered file-system server
      Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
      Active: active (running) since Thu 2020-10-22 18:00:21 BST; 21min ago
        Docs: man:glusterd(8)
     Process: 823 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
    Main PID: 847 (glusterd)
       Tasks: 262 (limit: 26239)
      Memory: 3.4G
         CPU: 49min 14.474s
      CGroup: /system.slice/glusterd.service
               847 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
              1701 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/gluvol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *dht.readdir-opti>
              1969 /usr/sbin/glusterfs -s localhost --volfile-id shd/gluvol0 -p /var/run/gluster/shd/gluvol0/gluvol0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/361019a1040168a1.socket --xlator-option *replicate*.node-uu>
              4575 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick04 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick04.pid -S /var/run/gluster/106ca420dff7312a.socket --brick-name /srv/brick04 -l /var/log/gluster>
              4598 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick00 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick00.pid -S /var/run/gluster/9978a7a4ac3bc9dd.socket --brick-name /srv/brick00 -l /var/log/gluster>
              4621 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick01 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick01.pid -S /var/run/gluster/c64e4259882d3f4d.socket --brick-name /srv/brick01 -l /var/log/gluster>
              4775 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick05 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick05.pid -S /var/run/gluster/0135293616f7a351.socket --brick-name /srv/brick05 -l /var/log/gluster>
              4797 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick07 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick07.pid -S /var/run/gluster/b7d8c6bd3c8b2992.socket --brick-name /srv/brick07 -l /var/log/gluster>
              4867 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/739fa467313dc700.socket --process-name quotad
              4882 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/a646064795a52ac2.socket --global-timer-wheel
              4938 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log/glusterfs/scrub.log -S /var/run/gluster/903362075a5b27cc.socket --global-timer-wheel
 
 Oct 22 18:04:34 verijolt srv-brick06[4753]: dlfcn 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: libpthread 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: llistxattr 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: setfsid 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: spinlock 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: epoll.h 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: xattr.h 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: st_atim.tv_nsec 1
 Oct 22 18:04:34 verijolt srv-brick06[4753]: package-string: glusterfs 7.8
 Oct 22 18:04:34 verijolt srv-brick06[4753]: ---------

There's a problem with brick06 on server verijolt.

snippet from /var/log/messages

  Oct 22 18:03:13 verijolt systemd[2266]: Finished Mark boot as successful.
  Oct 22 18:04:34 verijolt srv-brick06[4753]: pending frames:
  Oct 22 18:04:34 verijolt audit[4753]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=4753 comm="glfs_iotwr002" exe="/usr/sbin/glusterfsd" sig=11 res=1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(0) op(0)
  Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(0) op(0)
  Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(GETXATTR)
  Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP)
  Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP)
  Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP)
  Oct 22 18:04:34 verijolt srv-brick06[4753]: patchset: git://git.gluster.org/glusterfs.git
  Oct 22 18:04:34 verijolt srv-brick06[4753]: signal received: 11
  Oct 22 18:04:34 verijolt srv-brick06[4753]: time of crash:
  Oct 22 18:04:34 verijolt srv-brick06[4753]: 2020-10-22 17:04:34
  Oct 22 18:04:34 verijolt srv-brick06[4753]: configuration details:
  Oct 22 18:04:34 verijolt srv-brick06[4753]: argp 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: backtrace 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: dlfcn 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: libpthread 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: llistxattr 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: setfsid 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: spinlock 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: epoll.h 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: xattr.h 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: st_atim.tv_nsec 1
  Oct 22 18:04:34 verijolt srv-brick06[4753]: package-string: glusterfs 7.8
  Oct 22 18:04:34 verijolt srv-brick06[4753]: ---------
  Oct 22 18:04:34 verijolt systemd[1]: Created slice system-systemd\x2dcoredump.slice.
  Oct 22 18:04:34 verijolt audit: BPF prog-id=45 op=LOAD
  Oct 22 18:04:34 verijolt audit: BPF prog-id=46 op=LOAD
  Oct 22 18:04:34 verijolt audit: BPF prog-id=47 op=LOAD
  Oct 22 18:04:34 verijolt systemd[1]: Started Process Core Dump (PID 4906/UID 0).
  Oct 22 18:04:34 verijolt audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4906-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
  Oct 22 18:04:34 verijolt systemd-coredump[4907]: Process 4753 (glusterfsd) of user 0 dumped core.#012#012Stack trace of thread 4845:#012#0  0x00007f812ba19729 posix_get_ancestry_non_directory (posix.so + 0x31729)#012#1  0x00007f812ba19c9f posix_get_ancestry (posix.so + 0x31c9f)#012#2  0x00007f812ba22c20 posix_readdirp (posix.so + 0x3ac20)#012#3  0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#4  0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#5  0x00007f812b92f929 br_stub_readdirp (bitrot-stub.so + 0x9929)#012#6  0x00007f812b9187d2 posix_acl_readdirp (access-control.so + 0x77d2)#012#7  0x00007f812b8d76d0 pl_readdirp (locks.so + 0xd6d0)#012#8  0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#9  0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#10 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#11 0x00007f812b8861f1 up_readdirp (upcall.so + 0xd1f1)#012#12 0x00007f8130cc61bd default_readdirp_resume (libglusterfs.so.0 + 0xd41bd)#012#13 0x00007f8130c44035 call_resume (libglusterfs.so.0 + 0x52035)#012#14 0x00007f812b86e128 iot_worker (io-threads.so + 0x7128)#012#15 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#16 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4755:#012#0  0x00007f81309ab962 __sigtimedwait (libc.so.6 + 0x3d962)#012#1  0x00007f8130b4e5bc sigwait (libpthread.so.0 + 0x145bc)#012#2  0x000055ca5c22081b glusterfs_sigwaiter (glusterfsd + 0x981b)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4759:#012#0  0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1  0x00007f8130c95b79 runner (libglusterfs.so.0 + 0xa3b79)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4756:#012#0  0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1  0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2  0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3  0x00007f8130c470e5 pool_sweeper (libglusterfs.so.0 + 0x550e5)#012#4  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4760:#012#0  0x00007f8130a6fc5e epoll_wait (libc.so.6 + 0x101c5e)#012#1  0x00007f8130c816a2 event_dispatch_epoll_worker (libglusterfs.so.0 + 0x8f6a2)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4761:#012#0  0x00007f8130a6fc5e epoll_wait (libc.so.6 + 0x101c5e)#012#1  0x00007f8130c816a2 event_dispatch_epoll_worker (libglusterfs.so.0 + 0x8f6a2)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4770:#012#0  0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1  0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2  0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3  0x00007f812b9f57b0 posix_disk_space_check_thread_proc (posix.so + 0xd7b0)#012#4  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4771:#012#0  0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1  0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2  0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3  0x00007f812b9f5156 posix_health_check_thread_proc (posix.so + 0xd156)#012#4  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4767:#012#0  0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1  0x00007f812b97d402 changelog_ev_dispatch (changelog.so + 0x1c402)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4842:#012#0  0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1  0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2  0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4772:#012#0  0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1  0x00007f812b9f0114 posix_ctx_janitor_thread_proc (posix.so + 0x8114)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4850:#012#0  0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1  0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2  0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4852:#012#0  0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1  0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2  0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4773:#012#0  0x00007f8130b49e92 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe92)#012#1  0x00007f812b9f59bb posix_fsyncer_pick (posix.so + 0xd9bb)#012#2  0x00007f812b9f5c25 posix_fsyncer (posix.so + 0xdc25)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4858:#012#0  0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1  0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2  0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4860:#012#0  0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1  0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2  0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4857:#012#0  0x00007f8130b45da3 __pthread_mutex_lock (libpthread.so.0 + 0xbda3)#012#1  0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4757:#012#0  0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1  0x00007f8130c5d3d9 syncenv_task (libglusterfs.so.0 + 0x6b3d9)#012#2  0x00007f8130c5e1a5 syncenv_processor (libglusterfs.so.0 + 0x6c1a5)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4758:#012#0  0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1  0x00007f8130c5d3d9 syncenv_task (libglusterfs.so.0 + 0x6b3d9)#012#2  0x00007f8130c5e1a5 syncenv_processor (libglusterfs.so.0 + 0x6c1a5)#012#3  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4762:#012#0  0x00007f8130b49e92 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe92)#012#1  0x00007f812b81d994 index_worker (index.so + 0x7994)#012#2  0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3  0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4768:#012#0  0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1  0x00007f812b97d402 changelog_ev_dispatch (changelog.so + 0x1c402
  Oct 22 18:04:34 verijolt audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4906-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
  Oct 22 18:04:34 verijolt audit: BPF prog-id=47 op=UNLOAD
  Oct 22 18:04:34 verijolt audit: BPF prog-id=46 op=UNLOAD
  Oct 22 18:04:34 verijolt audit: BPF prog-id=45 op=UNLOAD
  Oct 22 18:04:34 verijolt systemd[1]: systemd-coredump@0-4906-0.service: Succeeded.
  Oct 22 18:04:37 verijolt abrt-server[4960]: Deleting problem directory ccpp-2020-10-22-18:04:35.4544-4753 (dup of ccpp-2020-10-22-14:23:50.21866-346125)
  Oct 22 18:04:38 verijolt abrt-notification[5020]: Process 346125 (glusterfsd) crashed in posix_get_ancestry_non_directory()

snippet from /var/log/glusterfs/bricks/srv-brick06.log

  [2020-10-22 17:04:34.150577] E [inodelk.c:506:__inode_unlock_lock] 0-gluvol1-locks:  Matching lock not found for unlock 0-9223372036854775807, by 181b0088357f0000 on 0x7f8124011df0
  [2020-10-22 17:04:34.150614] E [MSGID: 115053] [server-rpc-fops_v2.c:271:server4_inodelk_cbk] 0-gluvol1-server: 55: INODELK <gfid:73a42b0e-64d1-4606-8337-fce48490a10b> (73a42b0e-64d1-4606-8337-fce48490a10b), client: CTX_ID:2784eca2-176b-4f8a-b0b8-0a81d4e811b3-GRAPH_ID:0-PID:1701-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0, error-xlator: gluvol1-locks [Invalid argument]
  pending frames:
  frame : type(0) op(0)
  frame : type(0) op(0)
  frame : type(1) op(GETXATTR)
  frame : type(1) op(LOOKUP)
  frame : type(1) op(LOOKUP)
  frame : type(1) op(LOOKUP)
  patchset: git://git.gluster.org/glusterfs.git
  signal received: 11
  time of crash: 
  2020-10-22 17:04:34
  configuration details:
  argp 1
  backtrace 1
  dlfcn 1
  libpthread 1
  llistxattr 1
  setfsid 1
  spinlock 1
  epoll.h 1
  xattr.h 1
  st_atim.tv_nsec 1
  package-string: glusterfs 7.8
  /lib64/libglusterfs.so.0(+0x2afa4)[0x7f8130c1cfa4]
  /lib64/libglusterfs.so.0(gf_print_trace+0x333)[0x7f8130c27c93]
  /lib64/libc.so.6(+0x3ca70)[0x7f81309aaa70]
  /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x31729)[0x7f812ba19729]
  /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x31c9f)[0x7f812ba19c9f]
  /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x3ac20)[0x7f812ba22c20]
  /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b]
  /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b]
  /usr/lib64/glusterfs/7.8/xlator/features/bitrot-stub.so(+0x9929)[0x7f812b92f929]
  /usr/lib64/glusterfs/7.8/xlator/features/access-control.so(+0x77d2)[0x7f812b9187d2]
  /usr/lib64/glusterfs/7.8/xlator/features/locks.so(+0xd6d0)[0x7f812b8d76d0]
  /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b]
  /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b]
  /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b]
  /usr/lib64/glusterfs/7.8/xlator/features/upcall.so(+0xd1f1)[0x7f812b8861f1]
  /lib64/libglusterfs.so.0(default_readdirp_resume+0x21d)[0x7f8130cc61bd]
  /lib64/libglusterfs.so.0(call_resume+0x75)[0x7f8130c44035]
  /usr/lib64/glusterfs/7.8/xlator/performance/io-threads.so(+0x7128)[0x7f812b86e128]
  /lib64/libpthread.so.0(+0x9432)[0x7f8130b43432]
  /lib64/libc.so.6(clone+0x43)[0x7f8130a6f913]
  ---------
@Bockeman
Copy link
Author

Urgent problem no longer an issue
I repeated the volume stop/start again today, and, voila, brick06 on server verijolt came back online.

export start_date=`date +%F\ %T`; \
gluster volume stop gluvol1; \
echo "  ${start_date}" && date +\ \ %F\ %T%n
  Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
  y
  volume stop: gluvol1: success
  2020-10-23 12:43:21
  2020-10-23 12:43:44

export start_date=`date +%F\ %T`; \
gluster volume start gluvol1 \
2>&1 | awk '{print "  " $0;fflush()}'; echo "  ${start_date}" && date +\ \ %F\ %T%n
  volume start: gluvol1: success
  2020-10-23 12:44:20
  2020-10-23 12:44:34

export start_date=`date +%F\ %T`; \
gluster volume status \
2>&1 | awk '{print "  " $0;fflush()}'; echo "  ${start_date}" && date +\ \ %F\ %T%n
  Status of volume: gluvol0
  Gluster process                             TCP Port  RDMA Port  Online  Pid
  ------------------------------------------------------------------------------
  Brick veriicon:/srv/brick04                 49156     0          Y       2063406
  Brick verijolt:/srv/brick04                 49157     0          Y       4575 
  Brick veriicon:/srv/brick00                 49157     0          Y       2063429
  Brick verijolt:/srv/brick00                 49158     0          Y       4598 
  Brick veriicon:/srv/brick01                 49158     0          Y       2063452
  Brick verijolt:/srv/brick01                 49159     0          Y       4621 
  Self-heal Daemon on localhost               N/A       N/A        Y       1969 
  Quota Daemon on localhost                   N/A       N/A        Y       53485
  Bitrot Daemon on localhost                  N/A       N/A        Y       53517
  Scrubber Daemon on localhost                N/A       N/A        Y       53546
  Self-heal Daemon on veriicon                N/A       N/A        Y       2063499
  Quota Daemon on veriicon                    N/A       N/A        Y       2355307
  Bitrot Daemon on veriicon                   N/A       N/A        Y       2355350
  Scrubber Daemon on veriicon                 N/A       N/A        Y       2355380
   
  Task Status of Volume gluvol0
  ------------------------------------------------------------------------------
  Task                 : Rebalance           
  ID                   : c683596c-82b0-41b9-9d7e-10d4208bb440
  Status               : completed           
   
  Status of volume: gluvol1
  Gluster process                             TCP Port  RDMA Port  Online  Pid
  ------------------------------------------------------------------------------
  Brick veriicon:/srv/brick06                 49159     0          Y       2355232
  Brick verijolt:/srv/brick06                 49160     0          Y       53399
  Brick veriicon:/srv/brick05                 49160     0          Y       2355254
  Brick verijolt:/srv/brick05                 49163     0          Y       53421
  Brick veriicon:/srv/brick07                 49161     0          Y       2355276
  Brick verijolt:/srv/brick07                 49164     0          Y       53443
  Self-heal Daemon on localhost               N/A       N/A        Y       1969 
  Quota Daemon on localhost                   N/A       N/A        Y       53485
  Bitrot Daemon on localhost                  N/A       N/A        Y       53517
  Scrubber Daemon on localhost                N/A       N/A        Y       53546
  Self-heal Daemon on veriicon                N/A       N/A        Y       2063499
  Quota Daemon on veriicon                    N/A       N/A        Y       2355307
  Bitrot Daemon on veriicon                   N/A       N/A        Y       2355350
  Scrubber Daemon on veriicon                 N/A       N/A        Y       2355380
   
  Task Status of Volume gluvol1
  ------------------------------------------------------------------------------
  There are no active volume tasks
   
  2020-10-23 12:46:34
  2020-10-23 12:46:34

I don't understand this. My history shows that I did the stop, on this server, yesterday:

history | egrep 'stop|start'
  866  2020-10-22 14:22:08 gluster volume stop gluvol0
  867  2020-10-22 14:22:39 gluster volume stop gluvol1
  868  2020-10-22 14:23:24 gluster volume start gluvol0
  869  2020-10-22 14:23:41 gluster volume start gluvol1

So what got fixed? Why did the repeat today change things when yesterday it had no effect?

There's still something wrong
Of course, now no longer so urgent, do we have any idea what caused this one brick to go offline?
Is there more information which I might be able to supply?

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

As you mentioned brick06 was crashed that's why gluster cli was showing status N/A.
After start the volume it is expected brick status is showing correctly by CLI.
Can you please brick core with gdb and share the output "thread apply all bt full"

@Bockeman
Copy link
Author

Some more info
Just to see if the immediate problem had been fixed by the volume stop/start, I looked at the brick log again.
The tail of /var/log/glusterfs/bricks/srv-brick06.log is shown in the details below. Notice the timestamp is after the start (as logged in details of prior comment) and that the log appears to have been truncated.

[2020-10-23 11:44:22.347133] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.21"
[2020-10-23 11:44:22.347177] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:dce4fcdc-10f5-45de-8000-f4030618b984-GRAPH_ID:4-PID:1013-HOST:verilate-PC_NAME:gluvol1-client-5-RECON_NO:-6 (version: 6.9) with subvol /srv/brick06 
[2020-10-23 11:44:22.350754] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 0
[2020-10-23 11:44:23.384978] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.29"
[2020-10-23 11:44:23.384997] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cead90c8-cf17-4f64-864e-d5fb99e0d21f-GRAPH_ID:4-PID:1025-HOST:veritosh-PC_NAME:gluvol1-client-5-RECON_NO:-9 (version: 6.9) with subvol /srv/brick06 
[2020-10-23 11:44:23.386722] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 1
[2020-10-23 11:44:23.388107] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.27"
[2020-10-23 11:44:23.388124] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:c741e6ef-5725-4f23-b4c3-9b7f94be1a01-GRAPH_ID:4-PID:1519-HOST:verirack-PC_NAME:gluvol1-client-5-RECON_NO:-7 (version: 6.9) with subvol /srv/brick06 
[2020-10-23 11:44:23.401629] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19"
[2020-10-23 11:44:23.401649] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:23.401665] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:33c6affe-ce70-4122-9658-289a4ebf0420-GRAPH_ID:0-PID:53485-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 
[2020-10-23 11:44:23.666142] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.18"
[2020-10-23 11:44:23.666179] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:23.666194] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:76a4bc50-328e-45dd-9cdf-99329aec9ad2-GRAPH_ID:0-PID:1277-HOST:veriicon-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.8) with subvol /srv/brick06 
[2020-10-23 11:44:23.669918] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19"
[2020-10-23 11:44:23.669932] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:e0928c38-c13e-4ab5-8d43-b26e397967ee-GRAPH_ID:0-PID:1303-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-3 (version: 7.8) with subvol /srv/brick06 
[2020-10-23 11:44:23.686523] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.10"
[2020-10-23 11:44:23.686538] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:fd9a184d-61a0-41ae-bfc5-83c67a7a0497-GRAPH_ID:4-PID:1044-HOST:verialto-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06 
[2020-10-23 11:44:23.704923] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.12"
[2020-10-23 11:44:23.704939] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:36ddaf3e-f7e2-4224-a9b8-5b1a6de94a8e-GRAPH_ID:0-PID:1282-HOST:vericalm-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.8) with subvol /srv/brick06 
[2020-10-23 11:44:23.705368] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.17"
[2020-10-23 11:44:23.705385] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:7bee7605-6be2-41b3-8fc4-1fc289bade00-GRAPH_ID:4-PID:1030-HOST:veriheat-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06 
[2020-10-23 11:44:23.777276] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.11"
[2020-10-23 11:44:23.777291] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:eeda6a67-5c53-44f7-ba1a-b366adaf97fe-GRAPH_ID:4-PID:1240-HOST:veriblob-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.7) with subvol /srv/brick06 
[2020-10-23 11:44:25.426054] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19"
[2020-10-23 11:44:25.426068] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:25.426076] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:21d955c2-e8fd-4a41-847a-38729647bd8e-GRAPH_ID:4-PID:1969-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 
[2020-10-23 11:44:25.434554] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19"
[2020-10-23 11:44:25.434571] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:25.434579] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cff7c9fe-fbdc-445d-b1c8-c068493de96b-GRAPH_ID:0-PID:53517-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 
[2020-10-23 11:44:27.440119] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19"
[2020-10-23 11:44:27.440211] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:27.440221] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:35535c81-dd54-497c-bfe1-a5784f1c0e65-GRAPH_ID:0-PID:53546-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 
pending frames:
frame : type(1) op(FSETXATTR)
frame : type(1) op(LOOKUP)
frame : type(1) op(OPEN)
frame : type(1) op(READ)
frame : type(0) op(0)
frame : type(1) op(READ)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

It seems the brick process is crashed again. Can you attached a core with gdb and share the output "thread apply all bt full" ?

@Bockeman
Copy link
Author

@mohit84 thanks for picking this up.

I understand that the CLI status for brick06 on server verijolt is showing "N" under "online" when the brick has crashed.

At the moment, brick06 on verijolt is showing "Y" despite the truncated brick log showing a potential crash (signal received: 11).
However, verijolt is now working flat out (100% on 4 CPUs) attempting to recover, I assume.

The tail of /var/log/glusterfs/glfsheal-gluvol1.log shows

[2020-10-23 12:36:16.652483] I [MSGID: 108031] [afr-common.c:2531:afr_local_discovery_cbk] 0-gluvol1-replicate-1: selecting local read_child gluvol1-client-7 
[2020-10-23 12:36:16.653235] I [MSGID: 108031] [afr-common.c:2531:afr_local_discovery_cbk] 0-gluvol1-replicate-2: selecting local read_child gluvol1-client-9 
[2020-10-23 12:36:16.653396] I [MSGID: 104041] [glfs-resolve.c:974:__glfs_active_subvol] 0-gluvol1: switched to graph 76657269-6a6f-6c74-2d35-363332312d32 (0) 
[2020-10-23 12:36:48.660675] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-gluvol1-client-5: server 192.168.0.19:49160 has not responded in the last 42 seconds, disconnecting.
[2020-10-23 12:36:48.660744] I [MSGID: 114018] [client.c:2341:client_rpc_notify] 0-gluvol1-client-5: disconnected from gluvol1-client-5. Client process will keep trying to connect to glusterd until brick's port is available 
[2020-10-23 12:36:48.661038] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2020-10-23 12:36:06.660617 (xid=0x5)
[2020-10-23 12:36:48.661051] W [MSGID: 114032] [client-handshake.c:1552:client_dump_version_cbk] 0-gluvol1-client-5: received RPC status error [Transport endpoint is not connected]
[2020-10-23 12:36:48.661111] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2020-10-23 12:36:06.660626 (xid=0x6)
[2020-10-23 12:36:48.661120] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-gluvol1-client-5: socket disconnected
[2020-10-23 12:36:51.663231] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-gluvol1-client-5: changing port to 49160 (from 0)
[2020-10-23 12:37:33.667582] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-gluvol1-client-5: server 192.168.0.19:49160 has not responded in the last 42 seconds, disconnecting.
[2020-10-23 12:37:33.667663] I [MSGID: 114018] [client.c:2341:client_rpc_notify] 0-gluvol1-client-5: disconnected from gluvol1-client-5. Client process will keep trying to connect to glusterd until brick's port is available 
[2020-10-23 12:37:33.667862] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2020-10-23 12:36:51.667518 (xid=0xa)
[2020-10-23 12:37:33.667891] W [MSGID: 114032] [client-handshake.c:1552:client_dump_version_cbk] 0-gluvol1-client-5: received RPC status error [Transport endpoint is not connected]
[2020-10-23 12:37:33.667969] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2020-10-23 12:36:51.667528 (xid=0xb)
[2020-10-23 12:37:33.667980] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-gluvol1-client-5: socket disconnected
[2020-10-23 12:37:36.670101] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-gluvol1-client-5: changing port to 49160 (from 0)
[2020-10-23 12:38:18.675590] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-gluvol1-client-5: server 192.168.0.19:49160 has not responded in the last 42 seconds, disconnecting.
[2020-10-23 12:38:18.675664] I [MSGID: 114018] [client.c:2341:client_rpc_notify] 0-gluvol1-client-5: disconnected from gluvol1-client-5. Client process will keep trying to connect to glusterd until brick's port is available 
[2020-10-23 12:38:18.675764] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2020-10-23 12:37:36.675526 (xid=0xf)
[2020-10-23 12:38:18.675775] W [MSGID: 114032] [client-handshake.c:1552:client_dump_version_cbk] 0-gluvol1-client-5: received RPC status error [Transport endpoint is not connected]
[2020-10-23 12:38:18.675833] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2020-10-23 12:37:36.675538 (xid=0x10)
[2020-10-23 12:38:18.675843] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-gluvol1-client-5: socket disconnected
[2020-10-23 12:38:21.677950] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-gluvol1-client-5: changing port to 49160 (from 0)

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

Is CLI showing some different pid other than brick_pid that was crashed?
Can you check the pid status in ps -aef | grep <brick_pid_showing_by_CLI>
Please share the dump of /var/log/glusterfs from all the nodes.

@Bockeman
Copy link
Author

@mohit84 Please could you give me a bit more help in running debugger etc., it's been a while since I last did that.

Can you please brick core with gdb and share the output "thread apply all bt full"

could you tell me the commands to use.

@Bockeman
Copy link
Author

ps -aef | grep 53399
root       53399       1 99 12:44 ?        04:53:46 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick06.pid -S /var/run/gluster/e81ffbefdc82

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

It means brick is running , the pid(53399) is same showing by CLI, right.

Before attach a core with gdb please install debug package of glusterfs.

  1. gdb /usr/sbin/glusterfsd -c <brick_core>
  2. Run thread apply all bt full and share the output.

@Bockeman
Copy link
Author

Please share the dump of /var/log/glusterfs from all the nodes.

I'm not sure what you mean by dump and all the nodes. This directory is large (~11GB) as this du shows:

export start_date=`date +%F\ %T`; \
du -xsmc /var/log/glusterfs/* \
2>&1 | sort -n | awk '{printf("  %8d\t%s\n",$1,substr($0,index($0,$2)))}'; \
echo "  ${start_date}" && date +\ \ %F\ %T
         1      /var/log/glusterfs/cli.log
         1      /var/log/glusterfs/cmd_history.log
         1      /var/log/glusterfs/gfproxy
         1      /var/log/glusterfs/gluvol0-rebalance.log
         1      /var/log/glusterfs/gluvol0-replace-brick-mount.log
         1      /var/log/glusterfs/gluvol1-replace-brick-mount.log
         1      /var/log/glusterfs/quota_crawl
         1      /var/log/glusterfs/snaps
         2      /var/log/glusterfs/glfsheal-gluvol0.log
         2      /var/log/glusterfs/glfsheal-gluvol1.log
         2      /var/log/glusterfs/srv-gluvol0.log
         3      /var/log/glusterfs/glusterd.log
         4      /var/log/glusterfs/scrub.log
         7      /var/log/glusterfs/quotad.log
        36      /var/log/glusterfs/gluvol1-rebalance.log_201004_0005.gz
        64      /var/log/glusterfs/srv-gluvol1.log
       134      /var/log/glusterfs/glustershd.log
       303      /var/log/glusterfs/gluvol1-rebalance.log
       409      /var/log/glusterfs/bitd.log_201004_0005.gz
       473      /var/log/glusterfs/bricks
      9762      /var/log/glusterfs/bitd.log
     11195      total
  2020-10-23 14:09:43
  2020-10-23 14:09:43

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

for now you can share glusterd.log and brick logs from all the nodes.

@Bockeman
Copy link
Author

It means brick is running , the pid(53399) is same showing by CLI, right.

Agreed, it looks as though the brick process is running, but I don't think this brick is in a working state.
It should be being healed, but the pending count is not going down.

  # Healing
  server        brick:  00  01  02  03  04  05  06  07
  veriicon    pending:   0   0   0   0   0   0  101653   0
  verijolt    pending:   0   0   0   0   0   0   0   0
  veriicon      split:   0   0   0   0   0   0   0   0
  verijolt      split:   0   0   0   0   0   0   0   0
  veriicon    healing:   0   0   0   0   0   0   0   0
  verijolt    healing:   0   0   0   0   0   0   0   0
  2020-10-23 14:26:07

Script used to collate healing status

echo "  # Healing"; \
gluster volume heal gluvol0 info summary > gluster_heal_summary.out; \
gluster volume heal gluvol1 info summary >> gluster_heal_summary.out; \
awk '/Brick/{brick=substr($2,length($2)-1)+0; server=substr($2,1,index($2,":")-1); \
if((server in server_a)==0){server_a[server]=++servers;server_n[servers]=server}} \
/pending/{if(brick>bricks)bricks=brick; pending[server_a[server]"_"brick]=$NF} \
/split/{if(brick>bricks)bricks=brick; splitbrain[server_a[server]"_"brick]=$NF} \
/healing/{if(brick>bricks)bricks=brick; healing[server_a[server]"_"brick]=$NF} \
END{printf("  %-10s %8s:","server","brick");for(i=0;i<=bricks;i++)printf("  %02d",i);printf("\n"); \
for(j=1;j<=servers;j++){printf("  %-10s %8s:",server_n[j],"pending");for(i=0;i<=bricks;i++) \
printf("  %2d",pending[server_a[server_n[j]]"_"i]);printf("\n");} \
for(j=1;j<=servers;j++){printf("  %-10s %8s:",server_n[j],"split");for(i=0;i<=bricks;i++) \
printf("  %2d",splitbrain[server_a[server_n[j]]"_"i]);printf("\n");} \
for(j=1;j<=servers;j++){printf("  %-10s %8s:",server_n[j],"healing");for(i=0;i<=bricks;i++) \
printf("  %2d",healing[server_a[server_n[j]]"_"i]);printf("\n");} \
}' gluster_heal_summary.out; date +\ \ %F\ %T; \
echo ""

@Bockeman
Copy link
Author

Before attach a core with gdb please install debug package of glusterfs.

gdb /usr/sbin/glusterfsd -c <brick_core>
Run thread apply all bt full and share the output.

On fedora, I installed -devel versions, assuming that is what you meant be debug package
dnf install -y glusterfs-api-devel glusterfs-devel

where is the <brick_core> located?

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

The core is saved the location configured at /proc/sys/kernel/core_pattern, you need to ti install glusterfs-debuginfo package.

@Bockeman
Copy link
Author

glusterd.log

@Bockeman
Copy link
Author

The core is saved the location configured at /proc/sys/kernel/core_pattern, you need to ti install glusterfs-debuginfo package.

ls -l /proc/sys/kernel/core_pattern
-rw-r--r-- 1 root root 0 2020-10-23 14:50 /proc/sys/kernel/core_pattern
dnf info glusterfs-debuginfo 
Last metadata expiration check: 0:37:38 ago on Fri 23 Oct 2020 14:14:36 BST.
Error: No matching Packages to list

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

cat /proc/sys/kernel/core_pattern

@Bockeman
Copy link
Author

The full logs exceed the 10MB limit, so I pruned them a bit.

srv-brick05_201023.log
srv-brick06_201023.log
srv-brick07_201023.log

@Bockeman
Copy link
Author

cat /proc/sys/kernel/core_pattern

cat /proc/sys/kernel/core_pattern 
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
ls -l /usr/lib/systemd/systemd-coredump
-rwxr-xr-x 1 root root 61776 2020-09-21 08:47 /usr/lib/systemd/systemd-coredump

The date on this file does not correspond to yesterdays brick crash.

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

Below are the latest brick logs for brick06, i am not seeing any issue in brick logs.

[2020-10-23 11:44:20.574310] I [MSGID: 100030] [glusterfsd.c:2865:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 7.8 (args: /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick06.pid -S /var/run/gluster/e81ffbefdc824bb9.socket --brick-name /srv/brick06 -l /var/log/glusterfs/bricks/srv-brick06.log --xlator-option -posix.glusterd-uuid=04eb8fdd-ebb8-44c9-9064-5578f43e55b8 --process-name brick --brick-port 49160 --xlator-option gluvol1-server.listen-port=49160)
[2020-10-23 11:44:20.598225] I [glusterfsd.c:2593:daemonize] 0-glusterfs: Pid of current running process is 53399
[2020-10-23 11:44:20.602226] I [socket.c:957:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2020-10-23 11:44:20.604864] I [MSGID: 101190] [event-epoll.c:679:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2020-10-23 11:44:20.604898] I [MSGID: 101190] [event-epoll.c:679:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2020-10-23 11:44:20.721821] I [rpcsvc.c:2689:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2020-10-23 11:44:20.722754] W [socket.c:4161:reconfigure] 0-gluvol1-quota: disabling non-blocking IO
[2020-10-23 11:44:20.762663] I [socket.c:957:__socket_server_bind] 0-socket.gluvol1-changelog: closing (AF_UNIX) reuse check socket 15
[2020-10-23 11:44:20.763042] I [trash.c:2449:init] 0-gluvol1-trash: no option specified for 'eliminate', using NULL
[2020-10-23 11:44:22.347133] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.21"
[2020-10-23 11:44:22.347177] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:dce4fcdc-10f5-45de-8000-f4030618b984-GRAPH_ID:4-PID:1013-HOST:verilate-PC_NAME:gluvol1-client-5-RECON_NO:-6 (version: 6.9) with subvol /srv/brick06
[2020-10-23 11:44:22.350754] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 0
[2020-10-23 11:44:23.384978] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.29"
[2020-10-23 11:44:23.384997] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cead90c8-cf17-4f64-864e-d5fb99e0d21f-GRAPH_ID:4-PID:1025-HOST:veritosh-PC_NAME:gluvol1-client-5-RECON_NO:-9 (version: 6.9) with subvol /srv/brick06
[2020-10-23 11:44:23.386722] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 1
[2020-10-23 11:44:23.388107] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.27"
[2020-10-23 11:44:23.388124] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:c741e6ef-5725-4f23-b4c3-9b7f94be1a01-GRAPH_ID:4-PID:1519-HOST:verirack-PC_NAME:gluvol1-client-5-RECON_NO:-7 (version: 6.9) with subvol /srv/brick06
[2020-10-23 11:44:23.401629] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19"
[2020-10-23 11:44:23.401649] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:23.401665] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:33c6affe-ce70-4122-9658-289a4ebf0420-GRAPH_ID:0-PID:53485-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06
[2020-10-23 11:44:23.666142] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.18"
[2020-10-23 11:44:23.666179] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:23.666194] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:76a4bc50-328e-45dd-9cdf-99329aec9ad2-GRAPH_ID:0-PID:1277-HOST:veriicon-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.8) with subvol /srv/brick06
[2020-10-23 11:44:23.669918] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19"
[2020-10-23 11:44:23.669932] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:e0928c38-c13e-4ab5-8d43-b26e397967ee-GRAPH_ID:0-PID:1303-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-3 (version: 7.8) with subvol /srv/brick06
[2020-10-23 11:44:23.686523] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.10"
[2020-10-23 11:44:23.686538] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:fd9a184d-61a0-41ae-bfc5-83c67a7a0497-GRAPH_ID:4-PID:1044-HOST:verialto-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06
[2020-10-23 11:44:23.704923] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.12"
[2020-10-23 11:44:23.704939] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:36ddaf3e-f7e2-4224-a9b8-5b1a6de94a8e-GRAPH_ID:0-PID:1282-HOST:vericalm-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.8) with subvol /srv/brick06
[2020-10-23 11:44:23.705368] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.17"
[2020-10-23 11:44:23.705385] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:7bee7605-6be2-41b3-8fc4-1fc289bade00-GRAPH_ID:4-PID:1030-HOST:veriheat-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06
[2020-10-23 11:44:23.777276] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.11"
[2020-10-23 11:44:23.777291] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:eeda6a67-5c53-44f7-ba1a-b366adaf97fe-GRAPH_ID:4-PID:1240-HOST:veriblob-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.7) with subvol /srv/brick06
[2020-10-23 11:44:25.426054] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.19"
[2020-10-23 11:44:25.426068] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:25.426076] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:21d955c2-e8fd-4a41-847a-38729647bd8e-GRAPH_ID:4-PID:1969-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06
[2020-10-23 11:44:25.434554] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19"
[2020-10-23 11:44:25.434571] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:25.434579] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cff7c9fe-fbdc-445d-b1c8-c068493de96b-GRAPH_ID:0-PID:53517-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06
[2020-10-23 11:44:27.440119] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "
", received addr = "192.168.0.19"
[2020-10-23 11:44:27.440211] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
[2020-10-23 11:44:27.440221] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:35535c81-dd54-497c-bfe1-a5784f1c0e65-GRAPH_ID:0-PID:53546-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06

For specific to heal we need to check glustershd.log and heal logs glfsheal-gluvol1.log

@Bockeman
Copy link
Author

@mohit84 thanks for continuing to try and help me out here.
There are two things:

  1. understanding what has happened and getting information to identify the bug
  2. get my gluster system back to a working state (it is holding up at the moment using brick06 (10TB) on veriicon, but I have no resilience, I do not have a full backup for all this data -- I had been relying on gluster to avoid that).

Is there anything you can suggest that I might try for the latter?

@Bockeman
Copy link
Author

Below are the latest brick logs for brick06, i am not seeing any issue in brick logs.

My fault. These log files are too big for 10MB limit for pasting into this ticket. I stripped by date, not realising the crash dumps are not prefixed by date

  [2020-10-23 11:44:27.440211] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
  [2020-10-23 11:44:27.440221] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:35535c81-dd54-497c-bfe1-a5784f1c0e65-GRAPH_ID:0-PID:53546-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 
  pending frames:
  frame : type(1) op(FSETXATTR)
  frame : type(1) op(LOOKUP)
  frame : type(1) op(OPEN)
  frame : type(1) op(READ)
  frame : type(0) op(0)
  frame : type(1) op(READ)
  patchset: git://git.gluster.org/glusterfs.git
  signal received: 11

I'll see if I can attach a more complete tail of that log file.

@Bockeman
Copy link
Author

srv-brick06_201023.log

Can you explain why brick06 appears to be online, yet this log shows it has crashed?
Any suggestions for moving this to a working state?

@Bockeman
Copy link
Author

For specific to heal we need to check glustershd.log and heal logs glfsheal-gluvol1.log

glfsheal-gluvol1.log
glustershd_201023.log

@xhernandez
Copy link
Contributor

xhernandez commented Oct 23, 2020

dnf info glusterfs-debuginfo 
Last metadata expiration check: 0:37:38 ago on Fri 23 Oct 2020 14:14:36 BST.
Error: No matching Packages to list

You need to run this to install the debug information (assuming you are running latest version):

dnf debuginfo-install glusterfs-server-7.8-1.fc32.x86_64

@Bockeman
Copy link
Author

Before attach a core with gdb please install debug package of glusterfs.

  1. gdb /usr/sbin/glusterfsd -c <brick_core>
  2. Run thread apply all bt full and share the output.

I had to do a load of installs:

dnf debuginfo-install glusterfs-server
dnf install -y gdb
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/23/63b517080071fe0d9871c7c1534df99fd7f970.debug
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/39/2de5e09ed27bf2fe1722c0198295777db75ef5.debug
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/63/debfea3b4768cdcfb082e38cd754688642b1ec.debug
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/99/1df1f4a01379a1fd494b9a8fc104c0f02e2a5e.debug
dnf debuginfo-install glibc-2.31-4.fc32.x86_64 keyutils-libs-1.6-4.fc32.x86_64 krb5-libs-1.18.2-22.fc32.x86_64 libacl-2.2.53-5.fc32.x86_64 libaio-0.3.111-7.fc32.x86_64 libattr-2.4.48-8.fc32.x86_64 libcom_err-1.45.5-3.fc32.x86_64 libgcc-10.2.1-5.fc32.x86_64 libselinux-3.0-5.fc32.x86_64 libtirpc-1.2.6-1.rc4.fc32.x86_64 libuuid-2.35.2-1.fc32.x86_64 openssl-libs-1.1.1g-1.fc32.x86_64 pcre2-10.35-7.fc32.x86_64 sssd-client-2.4.0-1.fc32.x86_64 zlib-1.2.11-21.fc32.x86_64

I think a found the core dump

``` ls -l /var/spool/abrt/ccpp-2020-10-22-14:23:50.21866-346125 \ 2>&1 | awk '{print " " $0}'; date +\ \ %F\ %T%n total 196960 -rw-r----- 1 root abrt 6 2020-10-22 14:23 abrt_version -rw-r----- 1 root abrt 17 2020-10-22 14:23 analyzer -rw-r----- 1 root abrt 6 2020-10-22 14:23 architecture -rw-r----- 1 root abrt 34 2020-10-22 14:23 cgroup -rw-r----- 1 root abrt 411 2020-10-22 14:23 cmdline -rw-r----- 1 root abrt 9 2020-10-22 14:23 component -rw-r----- 1 root abrt 40928 2020-10-22 14:23 core_backtrace -rw-r----- 1 root abrt 201433088 2020-10-22 14:23 coredump -rw-r----- 1 root abrt 2 2020-10-23 01:29 count -rw-r----- 1 root abrt 2216 2020-10-22 14:23 cpuinfo -rw-r----- 1 root abrt 32 2020-10-22 14:23 crash_function -rw-r----- 1 root abrt 6348 2020-10-22 14:23 dso_list -rw-r----- 1 root abrt 281 2020-10-22 14:23 environ -rw-r----- 1 root abrt 20 2020-10-22 14:23 executable -rw-r----- 1 root abrt 82 2020-10-22 14:23 exploitable -rw-r----- 1 root abrt 8 2020-10-22 14:23 hostname -rw-r----- 1 root abrt 126 2020-10-22 14:23 journald_cursor -rw-r----- 1 root abrt 22 2020-10-22 14:23 kernel -rw-r----- 1 root abrt 10 2020-10-23 01:29 last_occurrence -rw-r----- 1 root abrt 1323 2020-10-22 14:23 limits -rw-r----- 1 root abrt 35739 2020-10-22 14:23 maps -rw-r----- 1 root abrt 3809 2020-10-22 14:23 mountinfo -rw-r----- 1 root abrt 4003 2020-10-22 14:23 open_fds -rw-r----- 1 root abrt 691 2020-10-22 14:23 os_info -rw-r----- 1 root abrt 30 2020-10-22 14:23 os_release -rw-r----- 1 root abrt 25 2020-10-22 14:23 package -rw-r----- 1 root abrt 6 2020-10-22 14:23 pid -rw-r----- 1 root abrt 6 2020-10-22 14:23 pkg_arch -rw-r----- 1 root abrt 1 2020-10-22 14:23 pkg_epoch -rw-r----- 1 root abrt 19 2020-10-22 14:23 pkg_fingerprint -rw-r----- 1 root abrt 14 2020-10-22 14:23 pkg_name -rw-r----- 1 root abrt 6 2020-10-22 14:23 pkg_release -rw-r----- 1 root abrt 14 2020-10-22 14:23 pkg_vendor -rw-r----- 1 root abrt 3 2020-10-22 14:23 pkg_version -rw-r----- 1 root abrt 1326 2020-10-22 14:23 proc_pid_status -rw-r----- 1 root abrt 1 2020-10-22 14:23 pwd -rw-r----- 1 root abrt 28 2020-10-22 14:23 reason -rw-r----- 1 root abrt 1 2020-10-22 14:23 rootdir -rw-r----- 1 root abrt 4 2020-10-22 14:23 runlevel -rw-r----- 1 root abrt 10 2020-10-22 14:23 time -rw-r----- 1 root abrt 4 2020-10-22 14:23 type -rw-r----- 1 root abrt 1 2020-10-22 14:23 uid -rw-r----- 1 root abrt 5 2020-10-22 14:23 username -rw-r----- 1 root abrt 40 2020-10-22 14:23 uuid 2020-10-23 16:29:37 ```

I then ran

gdb /usr/sbin/glusterfsd -c /var/spool/abrt/ccpp-2020-10-22-14:23:50.21866-346125/coredump \
-ex "set logging file gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log" -ex "set logging on"

giving the following on screen output

  GNU gdb (GDB) Fedora 9.1-6.fc32
  Copyright (C) 2020 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>.
  Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.
  
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from /usr/sbin/glusterfsd...
  Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd-7.8-1.fc32.x86_64.debug...
  [New LWP 346228]
  [New LWP 346125]
  [New LWP 346129]
  [New LWP 346128]
  [New LWP 346132]
  [New LWP 346138]
  [New LWP 346141]
  [New LWP 346139]
  [New LWP 346144]
  [New LWP 346227]
  [New LWP 346287]
  [New LWP 346134]
  [New LWP 346135]
  [New LWP 346127]
  [New LWP 346131]
  [New LWP 346137]
  [New LWP 346140]
  [New LWP 346142]
  [New LWP 346143]
  [New LWP 346145]
  [New LWP 346133]
  [New LWP 346136]
  [New LWP 346126]
  [New LWP 346130]
  
  warning: .dynamic section for "/lib64/libnss_sss.so.2" is not at the expected address (wrong library or version mismatch?)
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".
  Core was generated by `/usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x00007f8814458729 in posix_get_ancestry_non_directory (this=this@entry=0x7f8804008930, leaf_inode=<optimized out>, head=head@entry=0x7f880c682b50, path=path@entry=0x0, type=type@entry=2, op_errno=op_errno@entry=0x7f880c682b4c, 
      xdata=<optimized out>) at posix-inode-fd-ops.c:3218
  3218        list = alloca(size);
  [Current thread is 1 (Thread 0x7f880c684700 (LWP 346228))]
  Copying output to gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log.
  Copying debug output to gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log.
  (gdb) thread apply all bt full
  --Type <RET> for more, q to quit, c to continue without paging--c

where I entered
thread apply all bt full

giving
gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log

@Bockeman
Copy link
Author

Thanks @mohit84 and @xhernandez
I hope someone can make sense of the coredump info from gdb.

Meanwhile, does anyone have any suggestions for nudging my brick06 back into life?
(The HDD itself looks fine, no SMART errors, and disk usage looks comparable with the working brick06 on veriicon:

``` export start_date=`date +%F\ %T`; \ du -smc /srv/brick06/* \ 2>&1 | sort -n | awk '{printf(" %8d\t%s\n",$1,substr($0,index($0,$2)))}'; echo " ${start_date}" && date +\ \ %F\ %T 0 /srv/brick06/snap 1 /srv/brick06/ftp 1 /srv/brick06/mounted.txt 3587 /srv/brick06/root 11851 /srv/brick06/var 8091137 /srv/brick06/vault 8106574 total 2020-08-23 15:36:31 2020-08-23 16:28:17 on veriicon 0 /srv/brick06/snap 1 /srv/brick06/ftp 1 /srv/brick06/mounted.txt 3587 /srv/brick06/root 11851 /srv/brick06/var 8091131 /srv/brick06/vault 8106568 total 2020-08-23 15:38:00 2020-08-23 16:21:56 ```
)

@mohit84
Copy link
Contributor

mohit84 commented Oct 23, 2020

Hi,
Thanks for sharing the coredump. It seems the file is having huge xattr on backend that's why size is showing 263507(257k) and the function calls in iot_worker thread, iot_worker thread stack size is 256k that's why a brick process is crashed.

Any idea why you did create so many xattrs on the backend ?

Thread 1 (Thread 0x7f880c684700 (LWP 346228)):
#0  0x00007f8814458729 in posix_get_ancestry_non_directory (this=this@entry=0x7f8804008930, leaf_inode=<optimized out>, head=head@entry=0x7f880c682b50, path=path@entry=0x0, type=type@entry=2, op_errno=op_errno@entry=0x7f880c682b4c, xdata=<optimized out>) at posix-inode-fd-ops.c:3218
        remaining_size = 0
        op_ret = <optimized out>
        pathlen = -1
        handle_size = 0
        pgfid = '\000' <repeats 15 times>
        nlink_samepgfid = 0
        stbuf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}
        list = 0x0
        list_offset = 0
        priv = <optimized out>
        size = 263507
        parent = 0x0
        loc = 0x7f87f407ed20
        leaf_path = <optimized out>
        key = '\000' <repeats 4095 times>
        dirpath = '\000' <repeats 4095 times>
        pgfidstr = '\000' <repeats 36 times>
        len = <optimized out>
        __FUNCTION__ = "posix_get_ancestry_non_directory"
#1  0x00007f8814458c9f in posix_get_ancestry (this=this@entry=0x7f8804008930, leaf_inode=<optimized out>, head=head@entry=0x7f880c682b50, path=path@entry=0x0, type=type@entry=2, op_errno=op_errno@entry=0x7f880c682b4c, xdata=0x7f87f4082778) at posix-inode-fd-ops.c:3316
        ret = -1
        priv = <optimized out>
#2  0x00007f8814461c20 in posix_readdirp (frame=0x7f87f407d0f8, this=0x7f8804008930, fd=0x7f87f4081ca8, size=140222300433232, off=0, dict=0x7f87f4082778) at posix-inode-fd-ops.c:5630
        entries = {{list = {next = 0x7f880c682b50, prev = 0x7f880c682b50}, {next = 0x7f880c682b50, prev = 0x7f880c682b50}}, d_ino = 2, d_off = 0, d_len = 3623899760, d_type = 32647, d_stat = {ia_flags = 140222300433296, ia_ino = 140222159465296, ia_dev = 140222518552358, ia_rdev = 206158430232, ia_size = 10846431729114805760, ia_nlink = 208153600, ia_uid = 32648, ia_gid = 3597268480, ia_blksize = 2525381680, ia_blocks = 140221891468232, ia_atime = 0, ia_mtime = 140221891408544, ia_ctime = 140222518550629, ia_btime = 25, ia_atime_nsec = 426807549, ia_mtime_nsec = 32648, ia_ctime_nsec = 4093651256, ia_btime_nsec = 32647, ia_attributes = 10846431729114805760, ia_attributes_mask = 140221890963032, ia_gfid = "8)\000\364\207\177\000\000\001\000\000\000\000\000\000", ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x7f8804008930, inode = 0x7f87f407d0f8, d_name = 0x7f880c682c20 "\250\034\b\364\207\177"}
        op_ret = -1
        op_errno = 0
        entry = 0x0
        __FUNCTION__ = "posix_readdirp"
#3  0x00007f88196ec90b in default_readdirp (frame=0x7f87f407d0f8, this=<optimized out>, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f880400c350
        next_xl = 0x7f8804008930
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#4  0x00007f88196ec90b in default_readdirp (frame=frame@entry=0x7f87f407d0f8, this=<optimized out>, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, off=off@entry=0, xdata=xdata@entry=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f880400e3f0
        next_xl = 0x7f880400c350
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#5  0x00007f881436e929 in br_stub_readdirp (frame=frame@entry=0x7f87f407dbc8, this=0x7f8804010c30, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, offset=offset@entry=0, dict=dict@entry=0x7f87f4082778) at bit-rot-stub.c:2898
        _new = 0x7f87f407d0f8
        old_THIS = 0x7f8804010c30
        next_xl_fn = 0x7f88196ec830 <default_readdirp>
        tmp_cbk = 0x7f88143730d0 <br_stub_readdirp_cbk>
        ret = <optimized out>
        op_errno = <optimized out>
        xref = <optimized out>
        priv = <optimized out>
        __FUNCTION__ = "br_stub_readdirp"
#6  0x00007f88143577d2 in posix_acl_readdirp (frame=frame@entry=0x7f87f407e698, this=0x7f8804012b50, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, offset=offset@entry=0, dict=dict@entry=0x7f87f4082778) at posix-acl.c:1648
        _new = 0x7f87f407dbc8
        old_THIS = 0x7f8804012b50
        next_xl_fn = 0x7f881436e530 <br_stub_readdirp>
        tmp_cbk = 0x7f881435af30 <posix_acl_readdirp_cbk>
        ret = <optimized out>
        alloc_dict = <optimized out>
        __FUNCTION__ = "posix_acl_readdirp"
#7  0x00007f88143166d0 in pl_readdirp (frame=0x7f87f407f168, this=0x7f8804014790, fd=0x7f87f4081ca8, size=0, offset=0, xdata=0x7f87f4082778) at posix.c:3046
        _new = 0x7f87f407e698
        old_THIS = <optimized out>
        next_xl_fn = 0x7f8814357560 <posix_acl_readdirp>
        tmp_cbk = 0x7f881431ef70 <pl_readdirp_cbk>
        __FUNCTION__ = "pl_readdirp"
#8  0x00007f88196ec90b in default_readdirp (frame=0x7f87f407f168, this=<optimized out>, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f88040163b0
        next_xl = 0x7f8804014790
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#9  0x00007f88196ec90b in default_readdirp (frame=0x7f87f407f168, this=<optimized out>, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f8804018490
        next_xl = 0x7f88040163b0
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#10 0x00007f88196ec90b in default_readdirp (frame=frame@entry=0x7f87f407f168, this=<optimized out>, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, off=off@entry=0, xdata=xdata@entry=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f880401a130
        next_xl = 0x7f8804018490
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#11 0x00007f88142c51f1 in up_readdirp (frame=frame@entry=0x7f87f4089e68, this=0x7f880401be00, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, off=off@entry=0, dict=dict@entry=0x7f87f4082778) at upcall.c:1324
        _new = 0x7f87f407f168
        old_THIS = 0x7f880401be00
        next_xl_fn = 0x7f88196ec830 <default_readdirp>
        tmp_cbk = 0x7f88142bc820 <up_readdirp_cbk>
        local = <optimized out>
        __FUNCTION__ = "up_readdirp"
#12 0x00007f88197051bd in default_readdirp_resume (frame=0x7f87f407fc38, this=0x7f880401da80, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2169
        _new = 0x7f87f4089e68
        old_THIS = <optimized out>
        next_xl_fn = 0x7f88142c4fc0 <up_readdirp>
        tmp_cbk = 0x7f88196e6760 <default_readdirp_cbk>
        __FUNCTION__ = "default_readdirp_resume"
#13 0x00007f8819683035 in call_resume (stub=0x7f87f409cea8) at call-stub.c:2555
        old_THIS = 0x7f880401da80
        __FUNCTION__ = "call_resume"
#14 0x00007f88142ad128 in iot_worker (data=0x7f8804056060) at io-threads.c:232
        conf = 0x7f8804056060
        this = 0x7f880401da80
        stub = 0x7f87f409cea8
        sleep_till = {tv_sec = 1603373148, tv_nsec = 770933280}
        ret = <optimized out>
        pri = 0
        bye = <optimized out>
        __FUNCTION__ = "iot_worker"
#15 0x00007f8819582432 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140222300440320, 3244609149766524837, 140222428397518, 140222428397519, 140222159741384, 140222300440320, -3308120404764074075, -3308148579319488603}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#16 0x00007f88194ae913 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

@Bockeman
Copy link
Author

Any idea why you did create so many xattrs on the backend ?

A 12TB RAID5 box died (either/both controller and power supply), but the 5 HDD's are ok. I am painstakingly restoring the data from the HDDs onto a gluster volume. I am confident that I am getting this right because of good parity across the HDDs and consistent checksums on a file by file basis. The data on this box was an rsnapshot backup so it contains a lot of hard links.
I have got to the stage where the data files, including all the hard links (millions of them) are restored ok, but the permissions, ownership and timestamps of the dirs and files restored onto gluster are incorrect. Hence I have scripted changing these.

It is conceivable that these scripts, full of chmod,chown and touch for each file in turn, place a burden on gluster. I stated in first (submission) comment at the top that this was a possible cause.

If running such a script does "create many xattrs on the backend" then this is a likely cause.
(I thought xattrs were to do with the gluster management, not the normal file attributes for permissions, ownership and timestamps. And by backend do you mean the gluster server?)

Why has only one brick crashed? Why was it fine for 5 hours or so?

If this is the cause, then once my gluster volume is back to normal (brick06 on verijolt properly online), then I can break up my restore into more manageable chunks. This is a one-off exercise, I will not and do not want to be doing this again!

Given you have a clue as to the cause, how would you suggest I bring brick06 on verijolt back to life?

@xhernandez
Copy link
Contributor

xhernandez commented Oct 23, 2020

The fastest way I see to fix this is to identify the file that has so many extended attributes and remove/clear them.

To do that, inside gdb, can you execute this:

x/16bx loc->gfid

This will return 16 hexadecimal numbers, like this:

0x7f24df07b034: 0xXX    0xYY    0x--    0x--    0x--    0x--    0x--    0x--
0x7f24df07b03c: 0x--    0x--    0x--    0x--    0x--    0x--    0x--    0x--

You need to take the first two values and go to this directory in server verijolt:

/srv/brick06/.glusterfs/XX/YY

You should find a file there that has all the 16 numbers returned by gdb (with some '-' in the middle).

Once you identify the file, you need to execute this command:

# getfattr -m. -e hex -d <file>

Depending on what this returns, we'll decide how to proceed.

@mohit84
Copy link
Contributor

mohit84 commented Oct 26, 2020

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

@amarts
Copy link
Member

amarts commented Oct 26, 2020

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

Should we have a default value for this option? Say 42 (ie, a sane random). That way, we can prevent a bad experience which Bockerman got into by prompting an error to application much earlier. After that, they can decide if the value needs to be increased or not depending on their usecase.

My suggestion is however restrictive, we should keep default options which prevents any borderline issues like this, and makes sure glusterfs provides good performance, and stability. Users can alter the options only knowing what their usecase is, and that should be allowed, as they will be responsible for that particular usecase.

@pranithk
Copy link
Member

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

Won't that make the application unusable on glusterfs?

@xhernandez
Copy link
Contributor

I don't see the value of this option. If we set it to a lower value, the application creating hardlinks will fail. If the user increases the value (because they are actually creating more hardlinks) we'll have a crash. And even worse, self-heal is unable to heal those files because the size of the xattrs is too big.

What we need to do in this case is to disable gfid2path and any other feature that requires per-hardlink data (besides fixing the crash, of course). Even if we fix the crash and make it possible to handle hundreds of xattrs, it will be very bad from a performance point of view.

@mohit84
Copy link
Contributor

mohit84 commented Oct 26, 2020

The current default value is 100 and in xfs i have tried to create 100 hardlink i am not able to create more than 47 hard-link.After reaching the number of hardlink count is 47 setxattr throwing an error "No space left on device".

{.key = {"max-hardlinks"},
.type = GF_OPTION_TYPE_INT,
.min = 0,
.default_value = "100",
.op_version = {GD_OP_VERSION_4_0_0},
.flags = OPT_FLAG_SETTABLE | OPT_FLAG_DOC,
.tags = {"posix"},
.validate = GF_OPT_VALIDATE_MIN,
.description = "max number of hardlinks allowed on any one inode.\n"
"0 is unlimited, 1 prevents any hardlinking at all."},

I think we need to restrict the maximum value of max-hardlinks, i don;t think after restrict/confifure the max-hardlink application won't be able to use glusterfs.

@xhernandez
Copy link
Contributor

xhernandez commented Oct 26, 2020

Would it be possible summarise what went wrong that essentially made gluster unstable?

The segmentation fault happens because we use a stack allocated buffer to store the the contents of the xattrs. This is done in two steps: first we get the needed size, and then we allocate a buffer of that size to store the data. The problem happens because of the combination of 2 things:

  1. The needed size is not checked for a maximum value, we allow any size.
  2. The size of the stack is 256 KiB

This causes a segmentation fault when trying to allocate more space than available from stack.

  • Is there a set of preventative measures that could be adopted to prevent reoccurance?

In your particular case I would recommend to disable gfid2path feature. You also seem to be using quota. Quota works on a per-directory basis, but given you have multiple hardlinks, I'm not sure if it makes sense (to which directory the quota should be accounted for ?). If not strictly necessary, I would also disable quota.

This doesn't fix the existing issues. Disabling gfid2path will prevent creating additional xattrs for newer files of new hardlinks, but it won't delete existing ones. We should sweep all the files of each brick and remove them. However standard tools (getfattr) doesn't seem to support big xatts either, so I'm not sure how to do that unless btrfs has some specific tool (I don't know btrfs)

  • Are there more checks and safeguards that could/should be implemented in the code?

Sure. We'll need to improve that in current code, at least to avoid a crash and return an error if size of too big.

  • Please could you suggest some steps that could be takeen to sanitse and discover things likely to cause failure including

As I've just commented, we should disable dome features and clean existing xattrs, but I don't know how to do that on btrfs if getfattr doesn't work.

  • files that exist on more than one brick

You are using a replica. It's expected to have the same file in more than one brick.

  • files with properties that break current limits (too many hardlinks, too many attributes, ...)

Probably any file that returns an error for getfattr will also have issues in Gluster.

  • dangling gfid files, where the named file has been deleted directly from a brick, but not the corresponding gfid file

You should never do this. It can cause more troubles.

To find them, this command should work:

find <brick root>/.glusterfs/ -type f -links 1

Any file returned inside /.glusterfs/<xx>/<yy>/ with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).

This won't find symbolic links that have been deleted.

@xhernandez
Copy link
Contributor

The current default value is 100 and in xfs i have tried to create 100 hardlink i am not able to create more than 47 hard-link.After reaching the number of hardlink count is 47 setxattr throwing an error "No space left on device".

This is caused because you are using XFS and it limits the xattr size to 64KiB. It's not a limitation on the number of hardlinks. XFS and Gluster can create many more. But when we also add an xattr for each hardlink, the limit becomes the xattr size.

Apparently btrfs doesn't have a 64 KiB limit. That's why the issue has happened without detecting any error.

I think we need to restrict the maximum value of max-hardlinks, i don;t think after restrict/confifure the max-hardlink application won't be able to use glusterfs.

If the application needs to create 100 hardlinks, it won't work on Gluster if we don't allow more than 47. So the application won't be usable on Gluster.

@mohit84
Copy link
Contributor

mohit84 commented Oct 26, 2020

The current default value is 100 and in xfs i have tried to create 100 hardlink i am not able to create more than 47 hard-link.After reaching the number of hardlink count is 47 setxattr throwing an error "No space left on device".

This is caused because you are using XFS and it limits the xattr size to 64KiB. It's not a limitation on the number of hardlinks. XFS and Gluster can create many more. But when we also add an xattr for each hardlink, the limit becomes the xattr size.

Apparently btrfs doesn't have a 64 KiB limit. That's why the issue has happened without detecting any error.

I think we need to restrict the maximum value of max-hardlinks, i don;t think after restrict/confifure the max-hardlink application won't be able to use glusterfs.

If the application needs to create 100 hardlinks, it won't work on Gluster if we don't allow more than 47. So the application won't be usable on Gluster.

Thanks for clarifying it more.

@Bockeman
Copy link
Author

Much appreciate all comments above from @mohit84 @xhernandez @pranithk and @amarts.
I have been beset with issues, most of which are off topic, but including local disks filling up (e.g. with 10G bitd.log).
I intend to digest and respond to all comments.

@Bockeman
Copy link
Author

@mohit84 wrote

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

More background info:
In addition to using gluster to provide HA SAN storage for a variety of purposes including cloud storage using Nextcloud, there was a requirement to move the contents from several years of rsnapshot onto one of the volumes.

  • 11TB of disk usage
  • 5,313,170 files
  • 7,556,12 inodes
  • maximum number of hardlinks: 9,774
    • why so many hardlinks?
    • All the data is from multiple users and multiple PCs running windows, rsync'd onto linux via Samba/cifs, so there are no hard links in the original data.
    • hardlinks are created by rsnapshot (rsync) for each run (daily.0, daily.1, ...)
    • additionally because a large amount of duplication of data across users/PCs, duplicate files were replaced by hardlinks

Prior to attempting to restore the rsnapshot

gluster volume get gluvol1 storage.max-hardlinks \
2>&1 | awk '{print "  " $0}'; date +\ \ %F\ %T
  Option                                  Value                                   
  ------                                  -----                                   
  storage.max-hardlinks                   10000000                                

I had set max-hardlinks to an aggressively high number, based on the number of files to be transferred (i.e. 10,000,000 being more than 5,313,170) because, at the time I did not know the maximum number of hard links actually present (9,774).

Before the crash (brick06 offline, the topic of this issue) all the contents had been transferred onto gluster, apparently successfully because checksums matched the source. However, transferring this data took place over several weeks, not all in one go, and consequently not all of the hardlinks were preserved (not a requirement).

  • 11TB of disk usage
  • 5,475,759 files
  • 941,681 inodes
  • maximum number of hardlinks: 1,213

I am still checking that the data on the gluster target volume matches the source. So far I have not found any discrepancy (apart from differences in permissions, ownership and timestamps). So I am assuming that, in general, gluster is handling inodes with over 1,000 hard links. However, some operations, like healing one inode/file with 908 hardlinks is stuck.

Am I right to assume that "storage.max-hardlinks" being too low is not the cause of the problem and that having a higher value does nothing to prevent recurrence of the issue?

@Bockeman
Copy link
Author

@amarts wrote

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

Should we have a default value for this option? Say 42 (ie, a sane random). That way, we can prevent a bad experience which Bockerman got into by prompting an error to application much earlier. After that, they can decide if the value needs to be increased or not depending on their usecase.

My suggestion is however restrictive, we should keep default options which prevents any borderline issues like this, and makes sure glusterfs provides good performance, and stability. Users can alter the options only knowing what their usecase is, and that should be allowed, as they will be responsible for that particular usecase.

I agree, my data set is unusual, and I do not mind having to set an option to override any default.
However, I would like to be assured, that whatever limits there are, they are handled gracefully such that the user/admin can make adjustments before the risk of any corruption.

@pranithk wrote

Won't that make the application unusable on glusterfs?

I'm not sure what "application" you are imagining. Gluster is providing "storage", and if some of that storage contains backup or snapshot data, any user can read that, restore it to own user area, and run whatever application is desired.

My 11TB of data, some of which has more than 100 hardlinks per file/inode, appears to be usable.

@Bockeman
Copy link
Author

@xhernandez Thanks for your detailed response.

In your particular case I would recommend to disable gfid2path feature. You also seem to be using quota. Quota works on a per-directory basis, but given you have multiple hardlinks, I'm not sure if it makes sense (to which directory the quota should be accounted for ?). If not strictly necessary, I would also disable quota.

Please could you tell me how to disable the gfid2path feature, I cannot find it in the documentation.

I disabled quota.

I deleted all 809 files that were causing the [Argument list too long] that self-healing could not handle.

    files that exist on more than one brick

You are using a replica. It's expected to have the same file in more than one brick.

I mean, files that exist on more than one brick for a given server. Cases like this result in

ls -l /srv/gluvol1/vault/bobw/files/home/.mozilla/firefox/6qw6a8eb.cockpit \
2>&1 | awk '/?/{print "  " $0}'; date +\ \ %F\ %T
  -????????? ? ?    ?             ?                ? broadcast-listeners.json
  -????????? ? ?    ?             ?                ? prefs.js
  2020-10-05 16:17:28
    dangling gfid files, where the named file has been deleted directly from a brick, but not the corresponding gfid file

You should never do this. It can cause more troubles.

Obviously, I do not intentionally delete files directly from bricks, but I have found this is often the only way to resolve certain issues (like split-brain), but it is always possible with manual intervention like this that I could make a mistake.

I will attempt to getfattr and hardlink count for all files on all bricks, but I will need to be careful how I do that. There's no point on getfattr for each file, it only needs to be done once per inode. Given a "find" on a 11TB subset of all the data takes over 5 hours, this could take days.

@mohit84
Copy link
Contributor

mohit84 commented Oct 28, 2020

I don't believe the issue is only specific to hardlink. gluster populates a xattr per hardlink basis, unless we don't know about the xattr populated on the backend it is difficult to find the reason. As xavi asked earlier to share about the xatts but getfattr is failing due to throwing an error "Argument list too long" so it is difficult, i am not sure if btrsfs provide some tool to fetch this info.

In gluster default value of storage.max-hardlinks is 100 , unless you have not changed the value you can't create more than 100
hardlinks so I am not sure issue is only specific to hardlinks. As you said quota is enabled , quota also populates some xattr but not too much. I am not sure about an application if it has created many custom xattrs on backend.

For the time-being you can disable storage.gfid2path as like below after disable this application can create a hardlink and gluster won't populate a new xattr(gfid2path) for every hardlink, after disable it we can restrict only gfid2path xattr but we can't restrict if application is trying to populate custom xattr on file.
gluster v set / storage.gfid2path off

For specific to brick crash we have to fix the code path, we need to call MALLOC instead of calling alloca in case if xattrsize is greater than some limit like(64k/128k).

@Bockeman
Copy link
Author

Bockeman commented Nov 1, 2020

Disaster struck, see #1729 and #1728.
I have now recovered lost data and am able to resume analysis on my gluster data.

I discovered a large number of files with silly permissions, and reported that on #1731

I am nervous about the integrity of my files, any suggestions welcome.

I am continuing with:

I will attempt to getfattr and hardlink count for all files on all bricks, but I will need to be careful how I do that. There's no point on getfattr for each file, it only needs to be done once per inode. Given a "find" on a 11TB subset of all the data takes over 5 hours, this could take days.

@Bockeman
Copy link
Author

Bockeman commented Nov 1, 2020

Perhaps someone could explain the value of hard-link for each file residing under <brick_root>/.glusterfs/XX/YY/.

Also, why is there a mixture of hard-links and symbolic-links?

           1584                63673  1 drwx------ root     root     20-08-21 15:06:34.6387425020 brick07/.glusterfs/ff/ff
             59              4192010  1 lrwxrwxrwx root     root     20-10-10 20:48:09.9794764880 brick07/.glusterfs/ff/fe/fffefb07-795e-4a27-90da-6db78c897c92
             58              3507907  1 lrwxrwxrwx root     root     20-10-08 19:39:32.8287566040 brick07/.glusterfs/ff/fe/fffedcc4-7293-4fe9-ab34-714a6ba015e5
              0              3555211 17 ---------T bobw     warren   20-10-08 20:48:31.1710559470 brick07/.glusterfs/ff/fe/fffea855-c37e-4691-825f-46caf90e9e28
          47108              1488652  2 -rwxr--r-- bobw     warren   07-11-11 15:07:43.8593750000 brick07/.glusterfs/ff/fe/fffea714-5236-45ee-820d-722fa3332694
            987              1732599  2 -r--r--r-- bobw     warren   20-03-15 19:44:06.5220859100 brick07/.glusterfs/ff/fe/fffea64c-b721-46a7-8d44-5980b3b14f8e
             64               556533  1 lrwxrwxrwx root     root     20-08-21 21:14:42.9929821840 brick07/.glusterfs/ff/fe/fffea161-6b61-41b3-b5ab-1cc17e00d321
              0              1767388  2 -rwxrwxr-x bobw     warren   19-03-10 16:28:47.0000000000 brick07/.glusterfs/ff/fe/fffe8f40-2b58-4ca4-8284-fc869de5cb0c
          32768              1960787  2 -rw-r--r-- apache   apache   20-05-09 21:28:54.0000000000 brick07/.glusterfs/ff/fe/fffe800f-c3ff-40d5-8e37-a7cea0726d87
             71              3766279  1 lrwxrwxrwx root     root     20-10-09 02:17:39.5293480090 brick07/.glusterfs/ff/fe/fffe7192-53d1-4718-bad3-9f9ba764354f
             52              2726222  1 lrwxrwxrwx root     root     20-09-12 21:56:00.1328402300 brick07/.glusterfs/ff/fe/fffe526c-3980-4d1b-85b2-08952c60c80b
              0              3391565  9 ---------T bobw     warren   20-10-08 15:25:04.3622390120 brick07/.glusterfs/ff/fe/fffe3a45-439c-40b4-a29b-d317c4c16fd3
         396400              4500352  2 -rw-r--r-- apache   apache   20-10-12 12:44:13.3103199880 brick07/.glusterfs/ff/fe/fffe2d3b-66fe-459b-99ff-fabf8ef7301f
             57               392951  1 lrwxrwxrwx root     root     20-08-21 18:52:10.6153474860 brick07/.glusterfs/ff/fe/fffe1f2a-6094-43cf-a6d2-5ea4e5a21095
             71              4022210  1 lrwxrwxrwx root     root     20-10-09 09:49:12.9389294170 brick07/.glusterfs/ff/fe/fffe1e86-3b01-45ee-8efa-cd198175b6c7
            825              4783638  2 -rwxrwxrwx bobw     warren   20-10-28 21:30:43.9246942800 brick07/.glusterfs/ff/fe/fffe1327-f179-4cfb-b85b-567a17d7143a
           7553               682312  2 -r--r--r-- bobw     warren   19-08-28 10:40:50.2532174380 brick07/.glusterfs/ff/fe/fffe0b07-bf0c-45dd-8e33-c268fa426d8c
        6119845              1622319  2 -rwxr-xr-x bobw     warren   20-04-19 18:46:56.1955260000 brick07/.glusterfs/ff/fe/fffe084e-4c39-453e-9a2c-50de00a677d2

This means that finding "dangling" gfids (i.e. a gfid file with no corresponding actual file) is more difficult than @xhernandez suggests:

To find them, this command should work:

find <brick root>/.glusterfs/ -type f -links 1

Any file returned inside /.glusterfs/// with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).

@xhernandez
Copy link
Contributor

Perhaps someone could explain the value of hard-link for each file residing under <brick_root>/.glusterfs/XX/YY/.

Also, why is there a mixture of hard-links and symbolic-links?

           1584                63673  1 drwx------ root     root     20-08-21 15:06:34.6387425020 brick07/.glusterfs/ff/ff
             59              4192010  1 lrwxrwxrwx root     root     20-10-10 20:48:09.9794764880 brick07/.glusterfs/ff/fe/fffefb07-795e-4a27-90da-6db78c897c92
             58              3507907  1 lrwxrwxrwx root     root     20-10-08 19:39:32.8287566040 brick07/.glusterfs/ff/fe/fffedcc4-7293-4fe9-ab34-714a6ba015e5
              0              3555211 17 ---------T bobw     warren   20-10-08 20:48:31.1710559470 brick07/.glusterfs/ff/fe/fffea855-c37e-4691-825f-46caf90e9e28
          47108              1488652  2 -rwxr--r-- bobw     warren   07-11-11 15:07:43.8593750000 brick07/.glusterfs/ff/fe/fffea714-5236-45ee-820d-722fa3332694
            987              1732599  2 -r--r--r-- bobw     warren   20-03-15 19:44:06.5220859100 brick07/.glusterfs/ff/fe/fffea64c-b721-46a7-8d44-5980b3b14f8e
             64               556533  1 lrwxrwxrwx root     root     20-08-21 21:14:42.9929821840 brick07/.glusterfs/ff/fe/fffea161-6b61-41b3-b5ab-1cc17e00d321
              0              1767388  2 -rwxrwxr-x bobw     warren   19-03-10 16:28:47.0000000000 brick07/.glusterfs/ff/fe/fffe8f40-2b58-4ca4-8284-fc869de5cb0c
          32768              1960787  2 -rw-r--r-- apache   apache   20-05-09 21:28:54.0000000000 brick07/.glusterfs/ff/fe/fffe800f-c3ff-40d5-8e37-a7cea0726d87
             71              3766279  1 lrwxrwxrwx root     root     20-10-09 02:17:39.5293480090 brick07/.glusterfs/ff/fe/fffe7192-53d1-4718-bad3-9f9ba764354f
             52              2726222  1 lrwxrwxrwx root     root     20-09-12 21:56:00.1328402300 brick07/.glusterfs/ff/fe/fffe526c-3980-4d1b-85b2-08952c60c80b
              0              3391565  9 ---------T bobw     warren   20-10-08 15:25:04.3622390120 brick07/.glusterfs/ff/fe/fffe3a45-439c-40b4-a29b-d317c4c16fd3
         396400              4500352  2 -rw-r--r-- apache   apache   20-10-12 12:44:13.3103199880 brick07/.glusterfs/ff/fe/fffe2d3b-66fe-459b-99ff-fabf8ef7301f
             57               392951  1 lrwxrwxrwx root     root     20-08-21 18:52:10.6153474860 brick07/.glusterfs/ff/fe/fffe1f2a-6094-43cf-a6d2-5ea4e5a21095
             71              4022210  1 lrwxrwxrwx root     root     20-10-09 09:49:12.9389294170 brick07/.glusterfs/ff/fe/fffe1e86-3b01-45ee-8efa-cd198175b6c7
            825              4783638  2 -rwxrwxrwx bobw     warren   20-10-28 21:30:43.9246942800 brick07/.glusterfs/ff/fe/fffe1327-f179-4cfb-b85b-567a17d7143a
           7553               682312  2 -r--r--r-- bobw     warren   19-08-28 10:40:50.2532174380 brick07/.glusterfs/ff/fe/fffe0b07-bf0c-45dd-8e33-c268fa426d8c
        6119845              1622319  2 -rwxr-xr-x bobw     warren   20-04-19 18:46:56.1955260000 brick07/.glusterfs/ff/fe/fffe084e-4c39-453e-9a2c-50de00a677d2

This means that finding "dangling" gfids (i.e. a gfid file with no corresponding actual file) is more difficult than @xhernandez suggests:

To find them, this command should work:
find <brick root>/.glusterfs/ -type f -links 1
Any file returned inside /.glusterfs/// with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).

As I already said in my comment, this method won't work fine if you also have symbolic links. The command only finds regular files with a single hardlink. Since gluster keeps a hardlink between the real file and the gfid in .glusterfs/xx/yy, any file inside .glusterfs with a single hardlink means that there's not real file associated.

The symbolic links inside .glusterfs may represent real symbolic link files or directories. To differentiate them is more complex.

mohit84 added a commit that referenced this issue Nov 4, 2020
#1730)

In case of file is having huge xattrs on backend a brick process is
crashed while alloca(size) limit has been crossed 256k because iot_worker
stack size is 256k.

Use MALLOC to allocate memory instead of using alloca
Fixes: #1699
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Change-Id: I100468234f83329a7d65b43cbe4e10450c1ccecd
rkothiya pushed a commit that referenced this issue Nov 24, 2020
#1730) (#1828)

In case of file is having huge xattrs on backend a brick process is
crashed while alloca(size) limit has been crossed 256k because iot_worker
stack size is 256k.

Fixes: #1699
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Change-Id: I100468234f83329a7d65b43cbe4e10450c1ccecd
rkothiya pushed a commit that referenced this issue Nov 27, 2020
#1730) (#1827)

In case of file is having huge xattrs on backend a brick process is
crashed while alloca(size) limit has been crossed 256k because iot_worker
stack size is 256k.

Fixes: #1699
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Change-Id: I100468234f83329a7d65b43cbe4e10450c1ccecd
@rkothiya rkothiya added the release 8 release 8 label Nov 27, 2020
pranithk pushed a commit to pranithk/glusterfs that referenced this issue May 25, 2021
gluster#1730) (gluster#1827)

In case of file is having huge xattrs on backend a brick process is
crashed while alloca(size) limit has been crossed 256k because iot_worker
stack size is 256k.

Fixes: gluster#1699
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Change-Id: I100468234f83329a7d65b43cbe4e10450c1ccecd
csabahenk pushed a commit to csabahenk/glusterfs that referenced this issue Mar 7, 2023
gluster#1730)

In case of file is having huge xattrs on backend a brick process is
crashed while alloca(size) limit has been crossed 256k because iot_worker
stack size is 256k.

> Fixes: gluster#1699
> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
> Change-Id: I100468234f83329a7d65b43cbe4e10450c1ccecd
> (Cherry pick from commit fd666ca)
> (Reviewed on upstream link gluster#1828)

Change-Id: I100468234f83329a7d65b43cbe4e10450c1ccecd
Bug: 1903468
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://code.engineering.redhat.com/gerrit/220872
Tested-by: RHGS Build Bot <nigelb@redhat.com>
Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants