Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dht: fix use after free issue in dht_setxattr_mds_cbk #3744

Closed
wants to merge 1 commit into from

Conversation

mohit84
Copy link
Contributor

@mohit84 mohit84 commented Aug 25, 2022

It is a day one bug when a feature was implemented, the
issue was not caught earlier because the client crash
depends on the environment.The client was crashing because
during wind it was not passing cookie and in cbk it
was trying to access cookie.

Solution: Pass a xlator_t as a cookie during STACK_WIND_COOKIE
to avoid a crash.

Fixes: #3732
Change-Id: I5fd75e90a94d852093dfa40fb37d4211b4e9fdfb
Signed-off-by: Mohit Agrawal moagrawa@redhat.com

@mohit84
Copy link
Contributor Author

mohit84 commented Aug 25, 2022

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Aug 25, 2022

/run regression

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/basic/distribute/spare_file_rebalance.t

0 test(s) generated core

8 test(s) needed retry
./tests/000-flaky/basic_distribute_rebal-all-nodes-migrate.t
./tests/000-flaky/basic_mount-nfs-auth.t
./tests/000-flaky/bugs_distribute_bug-1117851.t
./tests/000-flaky/bugs_glusterd_bug-857330/normal.t
./tests/000-flaky/bugs_glusterd_bug-857330/xml.t
./tests/000-flaky/bugs_nfs_bug-1116503.t
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/basic/distribute/spare_file_rebalance.t

7 flaky test(s) marked as success even though they failed
./tests/000-flaky/basic_distribute_rebal-all-nodes-migrate.t
./tests/000-flaky/basic_mount-nfs-auth.t
./tests/000-flaky/bugs_distribute_bug-1117851.t
./tests/000-flaky/bugs_glusterd_bug-857330/normal.t
./tests/000-flaky/bugs_glusterd_bug-857330/xml.t
./tests/000-flaky/bugs_nfs_bug-1116503.t
./tests/000-flaky/glusterd-restart-shd-mux.t
https://build.gluster.org/job/gh_centos7-regression/2744/

@mohit84
Copy link
Contributor Author

mohit84 commented Aug 26, 2022

/run regression

@mohit84 mohit84 requested a review from amarts August 26, 2022 11:58
amarts
amarts previously approved these changes Aug 29, 2022
STACK_WIND(frame, dht_common_mds_xattrop_cbk, local->mds_subvol,
local->mds_subvol->fops->xattrop, &local->loc,
GF_XATTROP_ADD_ARRAY, xattrop, NULL);
STACK_WIND_COOKIE(frame, dht_common_mds_xattrop_cbk, frame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this frame the same that the callback receives as its first argument ? why do we need to pass it as a cookie ? we could just change the callback to use the frame instead of the cookie.

However it seems to me that the callbacks are using the cookie to get the name of the subvolume where the request was sent. In this case, not passing the cookie is the right thing to do because when STACK_WIND() is called, it assigns the new frame created for the subvolume to the cookie, so the callback receives the frame of the subvolume, not its own.

I'm not sure what was the problem you saw, but maybe it was caused because the frame was destroyed by the subvolume itself. I don't know.

Anyway, I don't like this implicit way to passing frames that are already completed and could potentially be destroyed. Why not just pass local->mds_subvol as the cookie ? it should be enough for the purposes of the callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was callback function was trying to access the volume name from the cookie but we are calling callback function through STACK_WIND not STACK_WIND_COOKIE. Yes there are both ways to solve the issue either we need to use STACK_WIND_COOKIE or change the callback function, the callback function is using a cookie to print the mds subvol name the same we can access through frame->local also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STACK_WIND also initializes the cookie to point to the frame created for the subvolume, so the code wasn't entirely wrong:

    295 #define STACK_WIND_COMMON(frame, rfn, has_cookie, cky, obj, fn, params...)     \
    296     do {                                                                       \
    297         call_frame_t *_new = NULL;                                             \
    298         xlator_t *old_THIS = NULL;                                             \
    299         typeof(fn) next_xl_fn = fn;                                            \
    300                                                                                \
    301         _new = mem_get0(frame->root->pool->frame_mem_pool);                    \
    302         if (caa_unlikely(!_new)) {                                             \
    303             break;                                                             \
    304         }                                                                      \
    305         typeof(fn##_cbk) tmp_cbk = rfn;                                        \
    306         _new->root = frame->root;                                              \
    307         _new->parent = frame;                                                  \
    308         LOCK_INIT(&_new->lock);                                                \
    309         /* (void *) is required for avoiding gcc warning */                    \
    310         _new->cookie = ((has_cookie == 1) ? (void *)(cky) : (void *)_new);     \

So what the code was doing was correct, cookie had the correct value. I'm not sure what codepath leads to a use after free given that we always call callbacks synchronously from the child xlator. I'll check it.

However, in your code you pass frame as the cookie. frame is the current frame, not the frame created for the subvolume, so when coookie->this->name is used, the name will be the dht's name, not the name of the subvolume, which was the intention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the stack traces from the address sanitizer, I would say that the problem is that a removexattr request has been unwound, which will eventually destroy the entire stack, while a request sent to a subvolume was still pending.

However I haven't identified where this can happen yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the stack traces from the address sanitizer, I would say that the problem is that a removexattr request has been unwound, which will eventually destroy the entire stack, while a request sent to a subvolume was still pending.

However I haven't identified where this can happen yet.

Thanks for identifying it , I will check it again and update.

@SoumyaWind
Copy link

Hi @mohit84 ,

I am working on CVE-2022-48340. So I just want to check if there any update on this? Or any other commit that can fix this issue.

@rohitkuma1313
Copy link

Any update on this? @mohit84

The client is throwing below stacktrace while asan is enabled.
The client is facing an issue while application is trying
to call removexattr in 2x1 subvol and non-mds subvol is down.
As we can see in below stacktrace dht_setxattr_mds_cbk is calling
dht_setxattr_non_mds_cbk and dht_setxattr_non_mds_cbk is trying to
wipe local because call_cnt is 0 but dht_setxattr_mds_cbk is trying
to access frame->local that;s why it is crashed.

x621000051c34 is located 1844 bytes inside of 4164-byte region [0x621000051500,0x621000052544)
freed by thread T7 here:
    #0 0x7f916ccb9388 in __interceptor_free.part.0 (/lib64/libasan.so.8+0xb9388)
    GlusterFS#1 0x7f91654af204 in dht_local_wipe /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-helper.c:713
    gluster#2 0x7f91654af204 in dht_setxattr_non_mds_cbk /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-common.c:3900
    gluster#3 0x7f91694c1f42 in client4_0_removexattr_cbk /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:1061
    gluster#4 0x7f91694ba26f in client_submit_request /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client.c:288
    gluster#5 0x7f91695021bd in client4_0_removexattr /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:4480
    gluster#6 0x7f91694a5f56 in client_removexattr /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client.c:1439
    gluster#7 0x7f91654a1161 in dht_setxattr_mds_cbk /root/glusterfs_new/glusterfs/xlators/cluster/dht/src/dht-common.c:3979
    gluster#8 0x7f91694c1f42 in client4_0_removexattr_cbk /root/glusterfs_new/glusterfs/xlators/protocol/client/src/client-rpc-fops_v2.c:1061
    gluster#9 0x7f916cbc4340 in rpc_clnt_handle_reply /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:723
    gluster#10 0x7f916cbc4340 in rpc_clnt_notify /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:890
    gluster#11 0x7f916cbb7ec5 in rpc_transport_notify /root/glusterfs_new/glusterfs/rpc/rpc-lib/src/rpc-transport.c:504
    gluster#12 0x7f916a1aa5fa in socket_event_poll_in_async /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2358
    gluster#13 0x7f916a1bd7c2 in gf_async ../../../../libglusterfs/src/glusterfs/async.h:187
    gluster#14 0x7f916a1bd7c2 in socket_event_poll_in /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2399
    gluster#15 0x7f916a1bd7c2 in socket_event_handler /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2790
    gluster#16 0x7f916a1bd7c2 in socket_event_handler /root/glusterfs_new/glusterfs/rpc/rpc-transport/socket/src/socket.c:2710
    gluster#17 0x7f916c946d22 in event_dispatch_epoll_handler /root/glusterfs_new/glusterfs/libglusterfs/src/event-epoll.c:614
    gluster#18 0x7f916c946d22 in event_dispatch_epoll_worker /root/glusterfs_new/glusterfs/libglusterfs/src/event-epoll.c:725
    gluster#19 0x7f916be8cdec in start_thread (/lib64/libc.so.6+0x8cdec)

Solution: Use switch instead of using if statement to wind a operation, in case of switch
          the code will not try to access local after wind a operation for last dht
          subvol.

Fixes: gluster#3732
Change-Id: I031bc814d6df98058430ef4de7040e3370d1c677
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@gluster-ant
Copy link
Collaborator

!!! Couldn't read commit file !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AddressSanitizer: heap-use-after-free
6 participants