New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing condition when bricks not already mounted would lead to gluster mount failing. #13
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Trigger gluster client mount with upstart event "local-filesystems" for making sure bricks are already mounted
Please submit this patch through review.gluster.org. More details at http://www.gluster.org/community/documentation/index.php/Development_Work_Flow |
gluster-ant
pushed a commit
that referenced
this pull request
Aug 20, 2018
…re.t Problem: In line #13 of the test case, it checks whether the file is present on first 2 bricks or not. If it is not present on even one of the bricks it will break the loop and checks for the dirty marking on the parent on the 3rd brick and checks for file not present on the 1st and 2nd bricks. The below scenario can happen in this case: - File gets created on 1st and 3rd bricks - In line #13 it sees file is not present on both 1st & 2nd bricks and breaks the loop - In line #51 test fails because the file will be present on the 1st brick - In line #53 test will fail because the file creation was not failed on quorum bricks and dirty marking will not be there on the parent on 3rd brick Fix: Don't break from the loop if file is present on either brick 1 or brick 2. Change-Id: I918068165e4b9124c1de86cfb373801b5b432bd9 fixes: bz#1612054 Signed-off-by: karthik-us <ksubrahm@redhat.com>
amarts
pushed a commit
to amarts/glusterfs_fork
that referenced
this pull request
Sep 11, 2018
…re.t Problem: In line gluster#13 of the test case, it checks whether the file is present on first 2 bricks or not. If it is not present on even one of the bricks it will break the loop and checks for the dirty marking on the parent on the 3rd brick and checks for file not present on the 1st and 2nd bricks. The below scenario can happen in this case: - File gets created on 1st and 3rd bricks - In line gluster#13 it sees file is not present on both 1st & 2nd bricks and breaks the loop - In line gluster#51 test fails because the file will be present on the 1st brick - In line gluster#53 test will fail because the file creation was not failed on quorum bricks and dirty marking will not be there on the parent on 3rd brick Fix: Don't break from the loop if file is present on either brick 1 or brick 2. Change-Id: I918068165e4b9124c1de86cfb373801b5b432bd9 fixes: bz#1612054 Signed-off-by: karthik-us <ksubrahm@redhat.com>
gluster-ant
pushed a commit
that referenced
this pull request
Dec 17, 2018
Traceback: Direct leak of 765 byte(s) in 9 object(s) allocated from: #0 0x7ffb9cad2c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7ffb9c5f8949 in __gf_malloc ./libglusterfs/src/mem-pool.c:136 #2 0x7ffb9c5f91bb in gf_vasprintf ./libglusterfs/src/mem-pool.c:236 #3 0x7ffb9c5f938a in gf_asprintf ./libglusterfs/src/mem-pool.c:256 #4 0x7ffb826714ab in afr_get_heal_info ./xlators/cluster/afr/src/afr-common.c:6204 #5 0x7ffb825765e5 in afr_handle_heal_xattrs ./xlators/cluster/afr/src/afr-inode-read.c:1481 #6 0x7ffb825765e5 in afr_getxattr ./xlators/cluster/afr/src/afr-inode-read.c:1571 #7 0x7ffb9c635af7 in syncop_getxattr ./libglusterfs/src/syncop.c:1680 #8 0x406c78 in glfsh_process_entries ./heal/src/glfs-heal.c:810 #9 0x408555 in glfsh_crawl_directory ./heal/src/glfs-heal.c:898 #10 0x408cc0 in glfsh_print_pending_heals_type ./heal/src/glfs-heal.c:970 #11 0x408fc5 in glfsh_print_pending_heals ./heal/src/glfs-heal.c:1012 #12 0x409546 in glfsh_gather_heal_info ./heal/src/glfs-heal.c:1154 #13 0x403e96 in main ./heal/src/glfs-heal.c:1745 #14 0x7ffb99bc411a in __libc_start_main ../csu/libc-start.c:308 The dictionary is referenced by caller to print the status. So set it as dynstr, the last unref of dictionary will free it. updates: bz#1633930 Change-Id: Ib5a7cb891e6f7d90560859aaf6239e52ff5477d0 Signed-off-by: Kotresh HR <khiremat@redhat.com>
gluster-ant
pushed a commit
that referenced
this pull request
Jul 24, 2019
EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1730715 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
gluster-ant
pushed a commit
that referenced
this pull request
Aug 22, 2019
EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1739427 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
gluster-ant
pushed a commit
that referenced
this pull request
Oct 24, 2019
EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Backport of: > Patch:https://review.gluster.org/#/c/glusterfs/+/23066/ > Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 > BUG: 1730715 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> (cherry picked from commit b01a435) Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1739451 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
guihecheng
pushed a commit
to guihecheng/glusterfs
that referenced
this pull request
Nov 13, 2019
EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test gluster#13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Upstream-patch: https://review.gluster.org/c/glusterfs/+/23066 Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 BUG: 1732779 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/176870 Tested-by: RHGS Build Bot <nigelb@redhat.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
gluster-ant
pushed a commit
that referenced
this pull request
Feb 25, 2020
EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test #13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 Fixes: bz#1805053 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
This was referenced Mar 12, 2020
Closed
dmantipov
added a commit
to dmantipov/glusterfs
that referenced
this pull request
Aug 26, 2021
When handling RPC_CLNT_DISCONNECT event, glustershd may be already disconnected and removed from the list of services, and an attempt to extract an entry from empty list causes the following error: ==1364671==ERROR: AddressSanitizer: heap-buffer-overflow on address ... READ of size 1 at 0x60d00001c48f thread T23 #0 0x7ff1a5f6db8c in __interceptor_fopen64.part.0 (/lib64/libasan.so.6+0x53b8c) gluster#1 0x7ff1a5c63717 in gf_is_service_running libglusterfs/src/common-utils.c:4180 gluster#2 0x7ff190178ad3 in glusterd_proc_is_running xlators/mgmt/glusterd/src/glusterd-proc-mgmt.c:157 gluster#3 0x7ff19017ce29 in glusterd_muxsvc_common_rpc_notify xlators/mgmt/glusterd/src/glusterd-svc-mgmt.c:440 gluster#4 0x7ff190176e75 in __glusterd_muxsvc_conn_common_notify xlators/mgmt/glusterd/src/glusterd-conn-mgmt.c:172 gluster#5 0x7ff18fee0940 in glusterd_big_locked_notify xlators/mgmt/glusterd/src/glusterd-handler.c:66 gluster#6 0x7ff190176ec7 in glusterd_muxsvc_conn_common_notify xlators/mgmt/glusterd/src/glusterd-conn-mgmt.c:183 gluster#7 0x7ff1a5b57b60 in rpc_clnt_handle_disconnect rpc/rpc-lib/src/rpc-clnt.c:821 gluster#8 0x7ff1a5b58082 in rpc_clnt_notify rpc/rpc-lib/src/rpc-clnt.c:882 gluster#9 0x7ff1a5b4da47 in rpc_transport_notify rpc/rpc-lib/src/rpc-transport.c:520 gluster#10 0x7ff18fba1d4f in socket_event_poll_err rpc/rpc-transport/socket/src/socket.c:1370 gluster#11 0x7ff18fbb223c in socket_event_handler rpc/rpc-transport/socket/src/socket.c:2971 gluster#12 0x7ff1a5d646ff in event_dispatch_epoll_handler libglusterfs/src/event-epoll.c:638 gluster#13 0x7ff1a5d6539c in event_dispatch_epoll_worker libglusterfs/src/event-epoll.c:749 gluster#14 0x7ff1a5917298 in start_thread /usr/src/debug/glibc-2.33-20.fc34.x86_64/nptl/pthread_create.c:481 gluster#15 0x7ff1a5551352 in clone (/lib64/libc.so.6+0x100352) 0x60d00001c48f is located 12 bytes to the right of 131-byte region [0x60d00001c400,0x60d00001c483) freed by thread T19 here: #0 0x7ff1a5fc8647 in free (/lib64/libasan.so.6+0xae647) Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com> Updates: gluster#1000
dmantipov
added a commit
to dmantipov/glusterfs
that referenced
this pull request
Aug 26, 2021
When handling RPC_CLNT_DISCONNECT event, glustershd may be already disconnected and removed from the list of services, and an attempt to extract an entry from empty list causes the following error: ==1364671==ERROR: AddressSanitizer: heap-buffer-overflow on address ... READ of size 1 at 0x60d00001c48f thread T23 #0 0x7ff1a5f6db8c in __interceptor_fopen64.part.0 (/lib64/libasan.so.6+0x53b8c) gluster#1 0x7ff1a5c63717 in gf_is_service_running libglusterfs/src/common-utils.c:4180 gluster#2 0x7ff190178ad3 in glusterd_proc_is_running xlators/mgmt/glusterd/src/glusterd-proc-mgmt.c:157 gluster#3 0x7ff19017ce29 in glusterd_muxsvc_common_rpc_notify xlators/mgmt/glusterd/src/glusterd-svc-mgmt.c:440 gluster#4 0x7ff190176e75 in __glusterd_muxsvc_conn_common_notify xlators/mgmt/glusterd/src/glusterd-conn-mgmt.c:172 gluster#5 0x7ff18fee0940 in glusterd_big_locked_notify xlators/mgmt/glusterd/src/glusterd-handler.c:66 gluster#6 0x7ff190176ec7 in glusterd_muxsvc_conn_common_notify xlators/mgmt/glusterd/src/glusterd-conn-mgmt.c:183 gluster#7 0x7ff1a5b57b60 in rpc_clnt_handle_disconnect rpc/rpc-lib/src/rpc-clnt.c:821 gluster#8 0x7ff1a5b58082 in rpc_clnt_notify rpc/rpc-lib/src/rpc-clnt.c:882 gluster#9 0x7ff1a5b4da47 in rpc_transport_notify rpc/rpc-lib/src/rpc-transport.c:520 gluster#10 0x7ff18fba1d4f in socket_event_poll_err rpc/rpc-transport/socket/src/socket.c:1370 gluster#11 0x7ff18fbb223c in socket_event_handler rpc/rpc-transport/socket/src/socket.c:2971 gluster#12 0x7ff1a5d646ff in event_dispatch_epoll_handler libglusterfs/src/event-epoll.c:638 gluster#13 0x7ff1a5d6539c in event_dispatch_epoll_worker libglusterfs/src/event-epoll.c:749 gluster#14 0x7ff1a5917298 in start_thread /usr/src/debug/glibc-2.33-20.fc34.x86_64/nptl/pthread_create.c:481 gluster#15 0x7ff1a5551352 in clone (/lib64/libc.so.6+0x100352) 0x60d00001c48f is located 12 bytes to the right of 131-byte region [0x60d00001c400,0x60d00001c483) freed by thread T19 here: #0 0x7ff1a5fc8647 in free (/lib64/libasan.so.6+0xae647) Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com> Updates: gluster#1000
amarts
pushed a commit
that referenced
this pull request
Sep 4, 2021
When handling RPC_CLNT_DISCONNECT event, glustershd may be already disconnected and removed from the list of services, and an attempt to extract an entry from empty list causes the following error: ==1364671==ERROR: AddressSanitizer: heap-buffer-overflow on address ... READ of size 1 at 0x60d00001c48f thread T23 #0 0x7ff1a5f6db8c in __interceptor_fopen64.part.0 (/lib64/libasan.so.6+0x53b8c) #1 0x7ff1a5c63717 in gf_is_service_running libglusterfs/src/common-utils.c:4180 #2 0x7ff190178ad3 in glusterd_proc_is_running xlators/mgmt/glusterd/src/glusterd-proc-mgmt.c:157 #3 0x7ff19017ce29 in glusterd_muxsvc_common_rpc_notify xlators/mgmt/glusterd/src/glusterd-svc-mgmt.c:440 #4 0x7ff190176e75 in __glusterd_muxsvc_conn_common_notify xlators/mgmt/glusterd/src/glusterd-conn-mgmt.c:172 #5 0x7ff18fee0940 in glusterd_big_locked_notify xlators/mgmt/glusterd/src/glusterd-handler.c:66 #6 0x7ff190176ec7 in glusterd_muxsvc_conn_common_notify xlators/mgmt/glusterd/src/glusterd-conn-mgmt.c:183 #7 0x7ff1a5b57b60 in rpc_clnt_handle_disconnect rpc/rpc-lib/src/rpc-clnt.c:821 #8 0x7ff1a5b58082 in rpc_clnt_notify rpc/rpc-lib/src/rpc-clnt.c:882 #9 0x7ff1a5b4da47 in rpc_transport_notify rpc/rpc-lib/src/rpc-transport.c:520 #10 0x7ff18fba1d4f in socket_event_poll_err rpc/rpc-transport/socket/src/socket.c:1370 #11 0x7ff18fbb223c in socket_event_handler rpc/rpc-transport/socket/src/socket.c:2971 #12 0x7ff1a5d646ff in event_dispatch_epoll_handler libglusterfs/src/event-epoll.c:638 #13 0x7ff1a5d6539c in event_dispatch_epoll_worker libglusterfs/src/event-epoll.c:749 #14 0x7ff1a5917298 in start_thread /usr/src/debug/glibc-2.33-20.fc34.x86_64/nptl/pthread_create.c:481 #15 0x7ff1a5551352 in clone (/lib64/libc.so.6+0x100352) 0x60d00001c48f is located 12 bytes to the right of 131-byte region [0x60d00001c400,0x60d00001c483) freed by thread T19 here: #0 0x7ff1a5fc8647 in free (/lib64/libasan.so.6+0xae647) Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com> Updates: #1000
dmantipov
added a commit
to dmantipov/glusterfs
that referenced
this pull request
Nov 29, 2021
Unconditionally free serialized dict data in '__glusterd_send_svc_configure_req()'. Found with AddressSanitizer: ==273334==ERROR: LeakSanitizer: detected memory leaks Direct leak of 89 byte(s) in 1 object(s) allocated from: #0 0x7fc2ce2a293f in __interceptor_malloc (/lib64/libasan.so.6+0xae93f) gluster#1 0x7fc2cdff9c6c in __gf_malloc libglusterfs/src/mem-pool.c:201 gluster#2 0x7fc2cdff9c6c in __gf_malloc libglusterfs/src/mem-pool.c:188 gluster#3 0x7fc2cdf86bde in dict_allocate_and_serialize libglusterfs/src/dict.c:3285 gluster#4 0x7fc2b8398843 in __glusterd_send_svc_configure_req xlators/mgmt/glusterd/src/glusterd-svc-helper.c:830 gluster#5 0x7fc2b8399238 in glusterd_attach_svc xlators/mgmt/glusterd/src/glusterd-svc-helper.c:932 gluster#6 0x7fc2b83a60f1 in glusterd_shdsvc_start xlators/mgmt/glusterd/src/glusterd-shd-svc.c:509 gluster#7 0x7fc2b83a5124 in glusterd_shdsvc_manager xlators/mgmt/glusterd/src/glusterd-shd-svc.c:335 gluster#8 0x7fc2b8395364 in glusterd_svcs_manager xlators/mgmt/glusterd/src/glusterd-svc-helper.c:143 gluster#9 0x7fc2b82e3a6c in glusterd_op_start_volume xlators/mgmt/glusterd/src/glusterd-volume-ops.c:2412 gluster#10 0x7fc2b835ec5a in gd_mgmt_v3_commit_fn xlators/mgmt/glusterd/src/glusterd-mgmt.c:329 gluster#11 0x7fc2b8365497 in glusterd_mgmt_v3_commit xlators/mgmt/glusterd/src/glusterd-mgmt.c:1639 gluster#12 0x7fc2b836ad30 in glusterd_mgmt_v3_initiate_all_phases xlators/mgmt/glusterd/src/glusterd-mgmt.c:2651 gluster#13 0x7fc2b82d504b in __glusterd_handle_cli_start_volume xlators/mgmt/glusterd/src/glusterd-volume-ops.c:364 gluster#14 0x7fc2b817465c in glusterd_big_locked_handler xlators/mgmt/glusterd/src/glusterd-handler.c:79 gluster#15 0x7fc2ce020ff9 in synctask_wrap libglusterfs/src/syncop.c:385 gluster#16 0x7fc2cd69184f (/lib64/libc.so.6+0x5784f) Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com> Fixes: gluster#1000
xhernandez
pushed a commit
that referenced
this pull request
Mar 4, 2022
Unconditionally free serialized dict data in '__glusterd_send_svc_configure_req()'. Found with AddressSanitizer: ==273334==ERROR: LeakSanitizer: detected memory leaks Direct leak of 89 byte(s) in 1 object(s) allocated from: #0 0x7fc2ce2a293f in __interceptor_malloc (/lib64/libasan.so.6+0xae93f) #1 0x7fc2cdff9c6c in __gf_malloc libglusterfs/src/mem-pool.c:201 #2 0x7fc2cdff9c6c in __gf_malloc libglusterfs/src/mem-pool.c:188 #3 0x7fc2cdf86bde in dict_allocate_and_serialize libglusterfs/src/dict.c:3285 #4 0x7fc2b8398843 in __glusterd_send_svc_configure_req xlators/mgmt/glusterd/src/glusterd-svc-helper.c:830 #5 0x7fc2b8399238 in glusterd_attach_svc xlators/mgmt/glusterd/src/glusterd-svc-helper.c:932 #6 0x7fc2b83a60f1 in glusterd_shdsvc_start xlators/mgmt/glusterd/src/glusterd-shd-svc.c:509 #7 0x7fc2b83a5124 in glusterd_shdsvc_manager xlators/mgmt/glusterd/src/glusterd-shd-svc.c:335 #8 0x7fc2b8395364 in glusterd_svcs_manager xlators/mgmt/glusterd/src/glusterd-svc-helper.c:143 #9 0x7fc2b82e3a6c in glusterd_op_start_volume xlators/mgmt/glusterd/src/glusterd-volume-ops.c:2412 #10 0x7fc2b835ec5a in gd_mgmt_v3_commit_fn xlators/mgmt/glusterd/src/glusterd-mgmt.c:329 #11 0x7fc2b8365497 in glusterd_mgmt_v3_commit xlators/mgmt/glusterd/src/glusterd-mgmt.c:1639 #12 0x7fc2b836ad30 in glusterd_mgmt_v3_initiate_all_phases xlators/mgmt/glusterd/src/glusterd-mgmt.c:2651 #13 0x7fc2b82d504b in __glusterd_handle_cli_start_volume xlators/mgmt/glusterd/src/glusterd-volume-ops.c:364 #14 0x7fc2b817465c in glusterd_big_locked_handler xlators/mgmt/glusterd/src/glusterd-handler.c:79 #15 0x7fc2ce020ff9 in synctask_wrap libglusterfs/src/syncop.c:385 #16 0x7fc2cd69184f (/lib64/libc.so.6+0x5784f) Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com> Fixes: #1000
csabahenk
pushed a commit
to csabahenk/glusterfs
that referenced
this pull request
Mar 7, 2023
EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it (so, in some way, they are overlapping). For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offsets 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that half of the bricks process the write before the read, and the half do the read before the write. Some bricks will return 10 bytes of data while the otherw will return 0 bytes (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs more than half consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. The issue happened because EC assumed that a write to a given offset implies that offsets below it exist. This fix prevents the read of the chunk from bricks if the current size of the file is smaller than the read chunk offset. This size is correctly tracked, so this fixes the issue. Also modifying ec-stripe.t file for Test gluster#13 within it. In this patch, if a file size is less than the offset we are writing, we fill zeros in head and tail and do not consider it strip cache miss. That actually make sense as we know what data that part holds and there is no need of reading it from bricks. Upstream-patch: https://review.gluster.org/c/glusterfs/+/23066 Change-Id: Ic342e8c35c555b8534109e9314c9a0710b6225d6 BUG: 1732779 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/176870 Tested-by: RHGS Build Bot <nigelb@redhat.com> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
This was referenced Oct 16, 2023
mohit84
added a commit
to mohit84/glusterfs
that referenced
this pull request
Oct 20, 2023
The client is throwing below stacktrace while asan is enabled. The client is facing an issue while application is trying to call removexattr in 2x1 subvol and non-mds subvol is down. As we can see in below stacktrace dht_setxattr_mds_cbk is calling dht_setxattr_non_mds_cbk and dht_setxattr_non_mds_cbk is trying to wipe local because call_cnt is 0 but dht_setxattr_mds_cbk is trying to access frame->local that;s why it is crashed. x621000051c34 is located 1844 bytes inside of 4164-byte region [0x621000051500,0x621000052544) freed by thread T7 here: #0 0x7f916ccb9388 in __interceptor_free.part.0 (/lib64/libasan.so.8+0xb9388) GlusterFS#1 0x7f91654af204 in dht_local_wipe /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-helper.c:713 gluster#2 0x7f91654af204 in dht_setxattr_non_mds_cbk /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-common.c:3900 gluster#3 0x7f91694c1f42 in client4_0_removexattr_cbk /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:1061 gluster#4 0x7f91694ba26f in client_submit_request /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client.c:288 gluster#5 0x7f91695021bd in client4_0_removexattr /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:4480 gluster#6 0x7f91694a5f56 in client_removexattr /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client.c:1439 gluster#7 0x7f91654a1161 in dht_setxattr_mds_cbk /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-common.c:3979 gluster#8 0x7f91694c1f42 in client4_0_removexattr_cbk /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:1061 gluster#9 0x7f916cbc4340 in rpc_clnt_handle_reply /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:723 gluster#10 0x7f916cbc4340 in rpc_clnt_notify /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:890 gluster#11 0x7f916cbb7ec5 in rpc_transport_notify /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-transport.c:504 gluster#12 0x7f916a1aa5fa in socket_event_poll_in_async /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2358 gluster#13 0x7f916a1bd7c2 in gf_async ../../../../libglusterfs/src/glusterfs/async.h:187 gluster#14 0x7f916a1bd7c2 in socket_event_poll_in /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2399 gluster#15 0x7f916a1bd7c2 in socket_event_handler /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2790 gluster#16 0x7f916a1bd7c2 in socket_event_handler /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2710 gluster#17 0x7f916c946d22 in event_dispatch_epoll_handler /root/glusterfs_new/glusterfs/libglusterfs/src/event-epoll.c:614 gluster#18 0x7f916c946d22 in event_dispatch_epoll_worker /root/glusterfs_new/glusterfs/libglusterfs/src/event-epoll.c:725 gluster#19 0x7f916be8cdec in start_thread (/lib64/libc.so.6+0x8cdec) Solution: Use switch instead of using if statement to wind a operation, in case of switch the code will not try to access local after wind a operation for last dht subvol. Fixes: gluster#3732 Change-Id: I031bc814d6df98058430ef4de7040e3370d1c677 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
mohit84
added a commit
that referenced
this pull request
Oct 23, 2023
The client is throwing below stacktrace while asan is enabled. The client is facing an issue while application is trying to call removexattr in 2x1 subvol and non-mds subvol is down. As we can see in below stacktrace dht_setxattr_mds_cbk is calling dht_setxattr_non_mds_cbk and dht_setxattr_non_mds_cbk is trying to wipe local because call_cnt is 0 but dht_setxattr_mds_cbk is trying to access frame->local that;s why it is crashed. x621000051c34 is located 1844 bytes inside of 4164-byte region [0x621000051500,0x621000052544) freed by thread T7 here: #0 0x7f916ccb9388 in __interceptor_free.part.0 (/lib64/libasan.so.8+0xb9388) #1 0x7f91654af204 in dht_local_wipe /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-helper.c:713 #2 0x7f91654af204 in dht_setxattr_non_mds_cbk /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-common.c:3900 #3 0x7f91694c1f42 in client4_0_removexattr_cbk /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:1061 #4 0x7f91694ba26f in client_submit_request /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client.c:288 #5 0x7f91695021bd in client4_0_removexattr /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:4480 #6 0x7f91694a5f56 in client_removexattr /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client.c:1439 #7 0x7f91654a1161 in dht_setxattr_mds_cbk /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-common.c:3979 #8 0x7f91694c1f42 in client4_0_removexattr_cbk /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:1061 #9 0x7f916cbc4340 in rpc_clnt_handle_reply /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:723 #10 0x7f916cbc4340 in rpc_clnt_notify /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:890 #11 0x7f916cbb7ec5 in rpc_transport_notify /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-transport.c:504 #12 0x7f916a1aa5fa in socket_event_poll_in_async /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2358 #13 0x7f916a1bd7c2 in gf_async ../../../../libglusterfs/src/glusterfs/async.h:187 #14 0x7f916a1bd7c2 in socket_event_poll_in /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2399 #15 0x7f916a1bd7c2 in socket_event_handler /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2790 #16 0x7f916a1bd7c2 in socket_event_handler /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2710 #17 0x7f916c946d22 in event_dispatch_epoll_handler /root/glusterfs_new/glusterfs/libglusterfs/src/event-epoll.c:614 #18 0x7f916c946d22 in event_dispatch_epoll_worker /root/glusterfs_new/glusterfs/libglusterfs/src/event-epoll.c:725 #19 0x7f916be8cdec in start_thread (/lib64/libc.so.6+0x8cdec) Solution: Use switch instead of using if statement to wind a operation, in case of switch the code will not try to access local after wind a operation for last dht subvol. Fixes: #3732 Change-Id: I031bc814d6df98058430ef4de7040e3370d1c677 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With XFS bricks, there is race condition at boot when bricks filesystem is not already mounted and glusterd fails to start up.
By adding "local-filesystems" event to the mounting script, it schedules mounting glusterfs mountpoint only after all bricks available.