Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changelog: A brick process is getting crash due to SIGSEGV in changelog #3521

Closed
mohit84 opened this issue May 13, 2022 · 0 comments · Fixed by #3522
Closed

changelog: A brick process is getting crash due to SIGSEGV in changelog #3521

mohit84 opened this issue May 13, 2022 · 0 comments · Fixed by #3522

Comments

@mohit84
Copy link
Contributor

mohit84 commented May 13, 2022

A brick process is getting crashed while using glusterfind tool. The glusterfind tool uses changelog xlator and
the xlator has race condition to handle crpc object list so at the time of calling ev_connector thread it is getting
crashed.

Below is the volume configuration

Volume Name: vol_test
Type: Replicate
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1:/bricks/brick1/brick
Brick2: host2:/bricks/brick2/brick
Brick3: host3:/bricks/brick3/brick
Options Reconfigured:
changelog.capture-del-path: on
changelog.changelog: on
storage.build-pgfid: on
performance.io-thread-count: 32
server.event-threads: 16
client.event-threads: 16
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
cluster.granular-entry-heal: enable

Below is the stack dump from the glusterfsd core

warning: .dynamic section for "/lib64/libm.so.6" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/libdl.so.2" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/libnss_files.so.2" is not at the expected address (wrong library or version mismatch?)

warning: Could not load shared library symbols for %0*Zx, 0x%0*Zx).
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s srv-rhel-gluster03.pjmt.local --volfile-id vol_dados_jc'.
Program terminated with signal 11, Segmentation fault.
#0  list_del (old=0x7ff6ec0009c0) at ../../../../libglusterfs/src/glusterfs/list.h:72
72	    old->prev->next = old->next;
(gdb) bt
#0  list_del (old=0x7ff6ec0009c0) at ../../../../libglusterfs/src/glusterfs/list.h:72
#1  list_move_tail (head=0x7ff760078408, list=0x7ff6ec0009c0) at ../../../../libglusterfs/src/glusterfs/list.h:99
#2  changelog_ev_connector (data=0x7ff760078338) at changelog-ev-handle.c:236
#3  0x00007ff7743b5ea5 in start_thread (arg=0x7ff73bfff700) at pthread_create.c:307
#4  0x00007ff773c7bb0d in __GI_epoll_pwait (epfd=0, events=0x0, maxevents=0, timeout=0, set=0x0) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:48
#5  0x0000000000000000 in ?? ()
(gdb) list
67	}
68	
69	static inline void
70	list_del(struct list_head *old)
71	{
72	    old->prev->next = old->next;
73	    old->next->prev = old->prev;
74	
75	    old->next = (void *)0xbabebabe;
76	    old->prev = (void *)0xcafecafe;
(gdb) p old
$1 = (struct list_head *) 0x7ff6ec0009c0
(gdb) p *old
$2 = {next = 0xbabebabe, prev = 0xcafecafe}

mohit84 added a commit to mohit84/glusterfs that referenced this issue May 13, 2022
A brick process is getting crashed while using glusterfind tool.
The glusterfind tool uses changelog xlator and the xlator has race
condition to handle crpc object list so at the time of calling
ev_connector thread it is getting crashed.

Solution: The xlator is not using correct lock to sync the list
          in crpc object so use crpc->lock to handle the crpc->list.

Fixes: gluster#3521
Change-Id: I13ec8603dc06ecba4cd293cb48012a2ebef55749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
mohit84 added a commit to mohit84/glusterfs that referenced this issue May 23, 2022
A brick process is getting crashed while using glusterfind tool.
The glusterfind tool uses changelog xlator and the xlator has race
condition to handle crpc object list so at the time of calling
ev_connector thread it is getting crashed.

Solution: The xlator is not using correct lock to sync the list
          in crpc object so use crpc->lock to handle the crpc->list.

Fixes: gluster#3521
Change-Id: I13ec8603dc06ecba4cd293cb48012a2ebef55749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
mohit84 added a commit to mohit84/glusterfs that referenced this issue May 23, 2022
A brick process is getting crashed while using glusterfind tool.
The glusterfind tool uses changelog xlator and the xlator has race
condition to handle crpc object list so at the time of calling
ev_connector thread it is getting crashed.

Solution: The xlator is not using correct lock to sync the list
          in crpc object so use crpc->lock to handle the crpc->list.

Fixes: gluster#3521
Change-Id: I13ec8603dc06ecba4cd293cb48012a2ebef55749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
xhernandez pushed a commit that referenced this issue Jun 27, 2022
A brick process is getting crashed while using glusterfind tool.
The glusterfind tool uses changelog xlator and the xlator has race
condition to handle crpc object list so at the time of calling
ev_connector thread it is getting crashed.

Solution: The xlator is not using correct lock to sync the list
          in crpc object so use crpc->lock to handle the crpc->list.

Fixes: #3521
Change-Id: I13ec8603dc06ecba4cd293cb48012a2ebef55749
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant