io-stat: xlator Segmentation fault #3902

mohit84 · 2022-11-13T15:26:27Z

The process is getting crashed during call ios_bump_stats in cbk code path. After checked the code it seems the process is getting crashed because the ios_stat_head list is destroyed by the function ios_destroy_top_stats without taking a list mutex while receive a clear profile event from the client. If at the same time a process is trying to access the list it can be crash.

Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204
Solution: Destroy the ios_stat under the list mutex.
Fixes: #3901
Signed-off-by: Mohit Agrawal moagrawa@redhat.com

The process is getting crashed during call ios_bump_stats in cbk code path. After checked the code it seems the process is getting crashed because the ios_stat_head list is destroyed by the function ios_destroy_top_stats without taking a list mutex while receive a clear profile event from the client. If at the same time a process is trying to access the list it can be crash. Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 Solution: Destroy the ios_stat under the list mutex. Fixes: gluster#3901 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

mohit84 · 2022-11-13T15:26:37Z

/run regression

The process is getting crashed during call ios_bump_stats in cbk code path. After checked the code it seems the process is getting crashed because the ios_stat_head list is destroyed by the function ios_destroy_top_stats without taking a list mutex while receive a clear profile event from the client. If at the same time a process is trying to access the list it can be crash. Solution: Destroy the ios_stat under the list mutex. > Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 > Fixes: gluster#3901 > Signed-off-by: Mohit Agrawal moagrawa@redhat.com > (Cherry picked from commit 3e874d0) > (Reviewed on upstream link gluster#3902) Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 Fixes: gluster#3901 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

The process is getting crashed during call ios_bump_stats in cbk code path. After checked the code it seems the process is getting crashed because the ios_stat_head list is destroyed by the function ios_destroy_top_stats without taking a list mutex while receive a clear profile event from the client. If at the same time a process is trying to access the list it can be crash. Solution: Destroy the ios_stat under the list mutex. > Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 > Fixes: #3901 > Signed-off-by: Mohit Agrawal moagrawa@redhat.com > (Cherry picked from commit 3e874d0) > (Reviewed on upstream link #3902) Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 Fixes: #3901 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Changelog from upstream: commit 1806d64310627e4f7c945a631a33d977114ce6fd Author: Shwetha Acharya <sacharya@redhat.com> Date: Thu Apr 6 14:12:25 2023 +0530 Add GlusterFS 10.4 release notes (#4101) * Add GlusterFS 10.4 release notes Updates: #4100 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> Co-authored-by: Xavi Hernandez <xhernandez@gmail.com> commit e4f4a20684c7dd37e7359e87f5069c381f9ff67a Author: Xavi Hernandez <xhernandez@gmail.com> Date: Wed Apr 5 08:40:16 2023 +0200 snapview-server: mark the end of the directory (#4050) Several Gluster components expect that op_errno is set to ENOENT when there are no more entries in a directory being read. Previously, snapview-server returned EINVAL in this case, causing an infinite loop in some cases. Updates: #4029 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 34e1b5cc7090afc5802998fe4ca483c639129e61 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Tue Apr 4 16:52:25 2023 +0200 tests: update tests to match current devel branch (#4089) Many fixes have been applied to many tests in devel branch. This patch backports all these fixes to release-10 branch. Updates: #4020 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit e0f740ca974fc756db488c5d72b38db0ac5bd0ec Author: Xavi Hernandez <xhernandez@gmail.com> Date: Tue Apr 4 07:48:36 2023 +0200 snapview-server: make timestamps stable (#4075) In the previous implementation, when the mtime, ctime and atime of an snapshot virtual directory was requested, the returned time was the current time. Apparently, the old versions of kernel's nfs client did ignore this change during a readdir operation. However, newer versions are checking it and retrying the whole readdir operation when these times differ from the previous request (I guess that it assumes that the directory contents have been changed and tries to get the new contents). This causes a long delay or even an infinite loop. The optimal change would be to keep the time of modification and changes in the inode context of the virtual directories to always return stable and consistent data, but this requires a significant amount of changes. For now, just return a constant value for these specific entries. Fixes: #4071 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 2e975916119ce4867fe49208030532118de32bb1 Author: mohit84 <moagrawa@redhat.com> Date: Tue Apr 4 11:06:24 2023 +0530 core: fix potential deadlock in gf_print_trace (#3914) It is unsafe when entering the signal handler gf_print_trace with setting logger as gf_logger_syslog. The fatal reason is that syslog will be called to print trace. However, non-reentrant function 'malloc' is involved in such a procedure. Solution: Skip print when logger is set as gf_logger_syslog. > Change-Id: Ica454d01c7aebaad5a1412e7b19c533567fe486c > Fixes: #3882 > Signed-off-by: ChenJinhao chen.jinhao@zte.com.cn > (Cherry picked from commit 0639931bfd265eda02970ea22282637da6ca80f8) > (Reviewed on upstream link gluster/glusterfs#3898) Change-Id: Ica454d01c7aebaad5a1412e7b19c533567fe486c Fixes: #3882 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> commit 11129e0029500f5f76a973e33b7c49b3c3fde811 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Fri Mar 31 10:19:24 2023 +0200 glusterd: generate unique names in /tmp (#4077) Glusterd generates some files inside /tmp. In general that shouldn't be a problem. However during the regression test run, some tests start several glusterd instances in the same machine. When this happens, there's a chance that two processes try to update the same file at the same time, causing errors and spurious test failures. This patch forces that the filename generated in /tmp is different for each process, avoiding this problem. Updates: #4020 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 3d507a2ca57cbad9f35b2fc4e5cc1303c812d8bc Author: Shwetha Acharya <sacharya@redhat.com> Date: Thu Mar 30 13:26:05 2023 +0530 test: ./tests/bugs/posix/bug-1651445.t is failing while running test suite (#3696) (#3771) The ./tests/bugs/posix/bug-1651445.t is getting failed continuously while running test suite. The test case is failing after reaching a situation while brick is throwing an ENOSPC error and after cleanup, as the test case is trying to create a file it is failing. The file creation is failing because the flag (disk_space_full) is reset after every 5s by a thread posix_ctx_disk_thread_proc. The test case is failing also in centos-8 because LVM reserved more space in centos-8 as compare to centos-7 Solution: 1) After cleanup data wait for 5s to reset the flag. Earlier the test case did the same but it was changed by the patch(#3637). 2) Change the overwrite condition in posix_writev. 3) In case of centos-8 call 2nd dd command with low block size. >Fixes: #3695 >Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: Ifa0310ba9266651557e29480f5ea476016726e41 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> Co-authored-by: mohit84 <moagrawa@redhat.com> commit 0cbf51a9827af0e3a35f5cfa823bfa39740bbc58 Author: mohit84 <moagrawa@redhat.com> Date: Thu Mar 30 13:02:19 2023 +0530 fuse: Resolve asan bug in during receive event notification (#4024) The fuse xlator notify function tries to assign data object to graph object without checking an event. In case of upcall event data object represents upcall object so during access of graph object the process crashed for asan build. Solution: Access the graph->id only while an event is associated specifically to fuse xlator > Fixes: #3954 > Change-Id: I6b2869256b26d22163879737dcf163510d1cd8bf > Signed-off-by: Mohit Agrawal moagrawa@redhat.com > (Reviewed on upstream link #4019) Fixes: #3954 Change-Id: I6b2869256b26d22163879737dcf163510d1cd8bf commit 21381797d743b75aab47209f2500bb3007710f56 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Wed Mar 29 07:40:50 2023 +0200 posix-aio: fix iocb contents for writev (#4037) The structure defined for iocb for user-space contains a union with a vector based substructure that seems created to pass iovec-based operations to the kernel. However this structure is not supported by the kernel and the library doesn't translate it. In little-endian architectures, this structure is binary compatible with the one expected by the kernel, but this is not true for big-endian architectures. To avoid this problem, instead of using the iovec-based substructure, the common structure is used to also pass the vectors. Fixes: #4031 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 7d5f9eb132b08b0b32c94316803eb9cf9f32f900 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Tue Mar 28 14:14:20 2023 +0200 afr: fix size of stack-allocated array (#4054) One array allocated using alloca() was not using the right size, corrupting adjacent memory when the array was used. Also updated the related tests since they were not working correctly. The test tried to pass some data through a variable that was created in a child subshell, so the variable was empty in the parent. To implement the same functionality but supporting passing data between subshells, a new test command has been created: TEST_WITHIN. It works like TEST but if the test is not successful, it waits for some time before marking the test as bad. Once the test succeeds, whatever data the test has returned during its execution will be available in the variable TEST_OUTPUT. Fixes: #4042 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 1cd35434f22df2528040aae417c5823a48e750d7 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Mon Mar 27 15:38:14 2023 +0200 posix: fix directory gfid handle if a rename fails (#4052) * posix: fix directory gfid handle if a rename fails When a directory is renamed to a non-empty existing directory, the rename will fail. However, the gfid handle of the old directory was removed before attempting the rename, at it was not restored in case of failure. This patch only removes the gfid handle once the rename has succeeded. Fixes: #2752 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 87432778bb0f103c8ea1bdb358228e94e65ade72 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Mon Mar 27 15:36:45 2023 +0200 configure: force 'char' type to be signed (#4039) On some systems, the 'char' type is interpreted as an unsigned char. This may cause some issues as Gluster code assumes that 'char' is signed. This patch adds the '-fsigned-char' option during compilation to make sure it works as expected. Updates: #1000 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit c267f1a6ea283fe49b4b9aa887dcc26314cad7eb Author: Xavi Hernandez <xhernandez@gmail.com> Date: Mon Mar 27 15:34:35 2023 +0200 nfs: fix ACL encoding for big-endian archs (#4041) Encoding and decoding ACLs from binary data in gNFS was done without taking into account the endianess of the machine. * Also Fix FreeBSD include issue Updates: #4020 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 2222522e296523d071b2a37cfba735cb8b5d6c29 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Mon Mar 20 13:06:41 2023 +0100 bug-1650403.t: increase timemout This tests creates 5 volums and enbales and disables self-heal 50 time for each one. Each enable/disable operation takes around 1 second, so the total time this will take is around 5 * 50 * 2 = 500 seconds. The script timeout is set to 500 seconds, which is too close to the required time, causing a lot of spurious failures. Updates: #4020 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit f060ba80a092d2d07af5ca4f298dcfb9561fd62b Author: Xavi Hernandez <xhernandez@gmail.com> Date: Sun Mar 19 15:47:00 2023 +0100 hashfn: fix inconsistencies on big-endian architectures (#4014) The computation of the SuperFashHash function did assume that the code was run on a little-endian machine, causing a different result when it's run on a big-endian machine. This patch explicitly accesses the memory using little-endian mode to keep backwards compatibility but to produce the same result on big-endian architectures. Fixes: #3345 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 4f51c1c868115ce38b46c5f8393c7f04ca8df015 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Sun Mar 19 15:46:21 2023 +0100 xdr: fix stack overflow when processing glx_dir(p)list structures (#4013) The glx_dirlist and glx_dirplist XDR structures are defined as a linked list of objects. When rpcgen generates code to encode and decode these structures, it does so by implementing a simple recursive function. When the list of entries to encode is large, the recursive call could cause a stack overflow. The best way to fix this is by transforming the linked list into a variable length array in the XDR definition. However this would break backward compatibility, making this change impossible to backport to older releases. This fix uses a hack by implementing custom encoding/decoding functions that don't use recursivity and ignores the ones generated by rpcgen. Fixes: #3346 Signed-off-by: Xavi Hernandez <xhernandez@gmail.com> commit 2b13a69b36ff9ddcea2f40e93dc6e3cf3aff6920 Author: Xavi Hernandez <xhernandez@gmail.com> Date: Tue Jan 24 07:48:31 2023 +0100 mem-pool: fix memory corruption in debug builds (#3923) * mem-pool.c: improve invalidate() function 1. Reduce padding of the invalid struct (from 40 bytes to 32 bytes) 2. Reduce no. of calls to memcpy() This changes slightly the way the memory is invalidated, but I hope now memcpy() calls are all aligned. Change-Id: I236a478d61b0e41bb01cdd90bd94a155ae40ef19 Updates: #1000 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * memory accounting - reduce code in non DEBUG build - gf_mem_update_acct_info() is not needed when not in DEBUG mode - re-order variables in the structure according to access pattern - Turn xlator_mem_acct_unref() into xlator_mem_acct_destroy() and call it only when refcnt euqals 0 - which is quite rare. Change-Id: I5fcae603da943320bfbe5596a9403b1d91dfccd2 Updates: #3855 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * caa_unlikely/likely for some common validations While it would have been better to remove redundant validations, a start would be to decorate them with caa_likely/unlikely as a hint to the compiler (and the reader) about the probability of an 'if' statement. Change-Id: Icc743d45ef5737665e5dffe008b525a168a8867b Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * mem-pool: fix memory corruption in debug builds In debug builds, all dynamically allocated memory blocks of the same type are kept in a list to quickly identify them if necessary. This list is updated inside a critical region protected by a mutex. When a memory block is resized using realloc(), there's a chance that the returned pointer has been moved to another memory address. If this happens, the list was updated to point to the new location. However, the resize and the list update are not atomic, which means that a removal or modification of a memory block adjacent (in the list) to the resized one, will see the pointer to the old memory address, which is not valid anymore. This causes a use-after-free issue that corrupts memory. This patch removes the memory block from the list before resizing it to avoid issues with concurrent list accesses. Once the resize is complete, the memory block is re-added to the list. Fixes: #3659 Change-Id: I64730998414b9d3695947d73ba993fad340a6582 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Co-authored-by: Yaniv Kaul <ykaul@redhat.com> commit 90af4d22cea721a0768c79ff0f1fcae3b50f8641 Author: mohit84 <moagrawa@redhat.com> Date: Tue Nov 15 19:25:04 2022 +0530 io-stat: xlator Segmentation fault (#3904) The process is getting crashed during call ios_bump_stats in cbk code path. After checked the code it seems the process is getting crashed because the ios_stat_head list is destroyed by the function ios_destroy_top_stats without taking a list mutex while receive a clear profile event from the client. If at the same time a process is trying to access the list it can be crash. Solution: Destroy the ios_stat under the list mutex. > Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 > Fixes: #3901 > Signed-off-by: Mohit Agrawal moagrawa@redhat.com > (Cherry picked from commit 3e874d0b50f474f90861b58d391b17a0a7c6e343) > (Reviewed on upstream link gluster/glusterfs#3902) Change-Id: I1b4d56517fa405eb84da7fffca61e15530652204 Fixes: #3901 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> commit 1ed25f589657e70d392fa8760feacc821dbc5cf5 Author: Anoop C S <anoopcs@cryptolab.net> Date: Mon Nov 7 11:20:15 2022 +0530 api: Use opendir for directories in glfs_open and glfs_h_open (#3894) In addition to files glfs_open() and glfs_h_open() are capable of handling directory opens. But there are various other components like DHT and probably other client xlators which are tightly coupled to work with directory opens using just opendir. One such example is the case where fsetxattr() is called with a file descriptor opened for the directory using glfs_open() or glfs_h_open() resulting in EBADFD. Therefore we make a differentiation within these APIs to correctly call syncop_open() or syncop_opendir() for file and directory entries respectively to avoid any possible file descriptor errors. Credits: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Anoop C S <anoopcs@cryptolab.net> Signed-off-by: Anoop C S <anoopcs@cryptolab.net> commit 7c5316111a11249e259378b982fc5193b2daac92 Author: Shwetha Acharya <sacharya@redhat.com> Date: Tue Sep 27 12:27:52 2022 +0530 posix: posix xlator does not respects storage.reserve value (#3637) (#3805) * posix: posix xlator does not respects storage.reserve value In a small storage environment (brick_root is < 100G) the POSIX xlator does not respect the storage.reserve value.The flag value is set after every 5s basis and so in that window if the client has generated the data the posix xlator does not validate storage.reserve spacee check and allow client to consume the brick space unless the flag has not been set by a posixctxres thread. Solution: Before doing any writev for an external client check the current free storage space with writev buffer and if it has surpassed the limit return ENOSPC. The priv->write_value parameter has been updated during call unlink and truncate fop also to use the correct value. >Fixes: #3636 >Change-Id: I7e174553c22893dd44438f48406e895e13b5db5e >Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> * posix: Resolve reviewer comments >Fixes: #3636 >Change-Id: I569b8e5d96f138204d25e9753a92cb19135bd584 >Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> * posix: Calculate file written size based on (pre|post)op block size difference to avoid overwrite cases. >Fixes: #3636 >Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I87efee72e9cdbd1a20df30b07a6e2587ce0675a6 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> Co-authored-by: mohit84 <moagrawa@redhat.com> Signed-off-by: Alexander Schreiber <als@thangorodrim.de>

amarts approved these changes Nov 14, 2022

View reviewed changes

mohit84 requested a review from xhernandez November 14, 2022 04:47

xhernandez approved these changes Nov 14, 2022

View reviewed changes

xhernandez merged commit 1ec7504 into gluster:devel Nov 14, 2022

mohit84 mentioned this pull request Nov 15, 2022

io-stat: xlator Segmentation fault #3904

Merged

mohit84 mentioned this pull request Nov 15, 2022

io-stat: xlator Segmentation fault #3905

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io-stat: xlator Segmentation fault #3902

io-stat: xlator Segmentation fault #3902

mohit84 commented Nov 13, 2022

mohit84 commented Nov 13, 2022

io-stat: xlator Segmentation fault #3902

io-stat: xlator Segmentation fault #3902

Conversation

mohit84 commented Nov 13, 2022

mohit84 commented Nov 13, 2022