Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug:1707195] VM stuck in a shutdown because of a pending fuse request #1080

Closed
gluster-ant opened this issue Mar 12, 2020 · 4 comments
Closed
Labels
Migrated Type:Bug wontfix Managed by stale[bot]

Comments

@gluster-ant
Copy link
Collaborator

URL: https://bugzilla.redhat.com/1707195
Creator: rgowdapp at redhat
Time: 20190507T02:48:00

VM fails to shutdown, getting stuck in 'Powering down' status. This is because its 'qemu-kvm' process gets in a zombie/defunct state:

more ps-Ll.txt
F S UID PID PPID LWP C PRI NI ADDR SZ WCHAN TTY TIME CMD
6 Z 107 20631 1 20631 0 80 0 - 0 do_exi ? 8:45 [qemu-kvm]
3 D 107 20631 1 20635 0 80 0 - 2386845 fuse_r ? 1:12 [qemu-kvm]

The customer has collected a crash dump of the affected VM and also statedumps from all the glusterfs process running in this machine when this problem is present.

Thread ID 20635 is the one of interest:

crash> bt 20635
PID: 20635 TASK: ffff9ed3926eb0c0 CPU: 7 COMMAND: "IO iothread1"
#0 [ffff9ec8e351fa28] __schedule at ffffffff91967747
#1 [ffff9ec8e351fab0] schedule at ffffffff91967c49
#2 [ffff9ec8e351fac0] __fuse_request_send at ffffffffc09d24e5 [fuse]
#3 [ffff9ec8e351fb30] fuse_request_send at ffffffffc09d26e2 [fuse]
#4 [ffff9ec8e351fb40] fuse_send_write at ffffffffc09dbc76 [fuse]
#5 [ffff9ec8e351fb70] fuse_direct_io at ffffffffc09dc0d6 [fuse]
#6 [ffff9ec8e351fc58] __fuse_direct_write at ffffffffc09dc562 [fuse]
#7 [ffff9ec8e351fca8] fuse_direct_IO at ffffffffc09dd3ca [fuse]
#8 [ffff9ec8e351fd70] generic_file_direct_write at ffffffff913b8663
#9 [ffff9ec8e351fdc8] fuse_file_aio_write at ffffffffc09ddbd5 [fuse]
#10 [ffff9ec8e351fe60] do_io_submit at ffffffff91497a73
#11 [ffff9ec8e351ff40] sys_io_submit at ffffffff91497f40
#12 [ffff9ec8e351ff50] tracesys at ffffffff9197505b (via system_call)
RIP: 00007f9ff0758697 RSP: 00007f9db86814b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: ffffffffffffffff
RDX: 00007f9db86814d0 RSI: 0000000000000001 RDI: 00007f9ff268e000
RBP: 0000000000000080 R8: 0000000000000080 R9: 000000000000006a
R10: 0000000000000078 R11: 0000000000000246 R12: 00007f9db86814c0
R13: 0000560264b9b518 R14: 0000560264b9b4f0 R15: 00007f9db8681bb0
ORIG_RAX: 00000000000000d1 CS: 0033 SS: 002b

From the core, this is the file the above process is writing to:

crash> files -d 0xffff9ec8e8f9f740
DENTRY INODE SUPERBLK TYPE PATH
ffff9ec8e8f9f740 ffff9ed39e705700 ffff9ee009adc000 REG /rhev/data-center/mnt/glusterSD/172.16.20.21:_vmstore2/e5dd645f-88bb-491c-9145-38fa229cbc4d/images/8e84c1ed-48ba-4b82-9882-c96e6f260bab/29bba0a1-6c7b-4358-9ef2-f8080405778d

So in this case we're accessing the vmstore2 volume.

This is the glusterfs process:

root 4863 0.0 0.0 1909580 49316 ? S<sl Apr04 25:51 /usr/sbin/glusterfs --volfile-server=172.16.20.21 --volfile-server=172.16.20.22 --volfile-server=172.16.20.23 --volfile-id=/vmstore2 /rhev/data-center/mnt/glusterSD/172.16.20.21:_vmstore2

Backtrace of this process on the crash dump:

crash> bt 4863
PID: 4863 TASK: ffff9edfa9ff9040 CPU: 11 COMMAND: "glusterfs"
#0 [ffff9ed3a332fc28] __schedule at ffffffff91967747
#1 [ffff9ed3a332fcb0] schedule at ffffffff91967c49
#2 [ffff9ed3a332fcc0] futex_wait_queue_me at ffffffff9130cf76
#3 [ffff9ed3a332fd00] futex_wait at ffffffff9130dc5b
#4 [ffff9ed3a332fe48] do_futex at ffffffff9130f9a6
#5 [ffff9ed3a332fed8] sys_futex at ffffffff9130fec0
#6 [ffff9ed3a332ff50] system_call_fastpath at ffffffff91974ddb
RIP: 00007f6e5eeccf47 RSP: 00007ffdd311c7d0 RFLAGS: 00000246
RAX: 00000000000000ca RBX: 00007f6e59496700 RCX: ffffffffffffffff
RDX: 0000000000001308 RSI: 0000000000000000 RDI: 00007f6e594969d0
RBP: 00007f6e60552780 R8: 0000000000000000 R9: 00007f6e5e6e314d
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6e59496d28
R13: 0000000000000000 R14: 0000000000000006 R15: 00007ffdd311c920
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b

We have a few pending frames in this process. Reviewing the corresponding statedump:

grep complete=0 glusterdump.4863.dump.1556091368 -c
7

Looking for these pending frames in the statedump:


[global.callpool.stack.1]
stack=0x7f6e4007c828
uid=107
gid=107
pid=20635
unique=5518502
lk-owner=bd2351a6cc7fcb8b
op=WRITE
type=1
cnt=6

[global.callpool.stack.1.frame.1]
frame=0x7f6dec04de38
ref_count=0
translator=vmstore2-write-behind
complete=0
parent=vmstore2-open-behind
wind_from=default_writev_resume
wind_to=(this->children->xlator)->fops->writev
unwind_to=default_writev_cbk

[global.callpool.stack.1.frame.2]
frame=0x7f6dec0326f8
ref_count=1
translator=vmstore2-open-behind
complete=0
parent=vmstore2-md-cache
wind_from=mdc_writev
wind_to=(this->children->xlator)->fops->writev
unwind_to=mdc_writev_cbk

[global.callpool.stack.1.frame.3]
frame=0x7f6dec005bf8
ref_count=1
translator=vmstore2-md-cache
complete=0
parent=vmstore2-io-threads
wind_from=default_writev_resume
wind_to=(this->children->xlator)->fops->writev
unwind_to=default_writev_cbk

[global.callpool.stack.1.frame.4]
frame=0x7f6e400ab0f8
ref_count=1
translator=vmstore2-io-threads
complete=0
parent=vmstore2
wind_from=io_stats_writev
wind_to=(this->children->xlator)->fops->writev
unwind_to=io_stats_writev_cbk

[global.callpool.stack.1.frame.5]
frame=0x7f6e4007c6c8
ref_count=1
translator=vmstore2
complete=0
parent=fuse
wind_from=fuse_write_resume
wind_to=FIRST_CHILD(this)->fops->writev
unwind_to=fuse_writev_cbk

[global.callpool.stack.1.frame.6]
frame=0x7f6e4002cb98
ref_count=1
translator=fuse
complete=0

So I believe we're pending in the 'write-behind' translator. Please, I'd need some help to figure out the cause of the hang.

Thank you,

Natalia

@gluster-ant
Copy link
Collaborator Author

Time: 20190507T02:48:45
rgowdapp at redhat commented:
I do see a write request hung in write-behind. Details of write-request from state-dump:

[xlator.performance.write-behind.wb_inode]
path=/e5dd645f-88bb-491c-9145-38fa229cbc4d/images/8e84c1ed-48ba-4b82-9882-c96e6f260bab/29bba0a1-6c7b-4358-9ef2-f8080405778d
inode=0x7f6e40060888
gfid=6348d15d-7b17-4993-9da9-3f588c2ad5a8
window_conf=1048576
window_current=0
transit-size=0
dontsync=0

[.WRITE]
unique=5518502
refcount=1
wound=no
generation-number=0
req->op_ret=131072
req->op_errno=0
sync-attempts=0
sync-in-progress=no
size=131072
offset=4184756224
lied=0
append=0
fulfilled=0
go=0

I'll go through this and will try to come up with an RCA.

--- Additional comment from Raghavendra G on 2019-04-29 07:21:50 UTC ---

There is a race in the way O_DIRECT writes are handled. Assume two overlapping write requests w1 and w2.

  • w1 is issued and is in wb_inode->wip queue as the response is still pending from bricks. Also wb_request_unref in wb_do_winds is not yet invoked.

     list_for_each_entry_safe (req, tmp, tasks, winds) {
      list_del_init (&req->winds);
    
              if (req->op_ret == -1) {
      	call_unwind_error_keep_stub (req->stub, req->op_ret,
                                           req->op_errno);
              } else {
                      call_resume_keep_stub (req->stub);
      }
    
              wb_request_unref (req);
      }
    
  • w2 is issued and wb_process_queue is invoked. w2 is not picked up for winding as w1 is still in wb_inode->wip. w1 is added to todo list and wb_writev for w2 returns.

  • response to w1 is received and invokes wb_request_unref. Assume wb_request_unref in wb_do_winds (see point 1) is not invoked yet. Since there is one more refcount, wb_request_unref in wb_writev_cbk of w1 doesn't remove w1 from wip.

  • wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it fails to wind w2 as w1 is still in wip.

  • wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is removed from all queues including w1.

  • After this point there is no invocation of wb_process_queue unless new request is issued from application causing w2 to be hung till the next request.

This bug is similar to bz 1626780 and bz 1379655. Though the issue is similar, fixes to these to bzs won't fix the current bug and hence this bug is not a duplicate. This bug will require a new fix and I'll post a patch to gerrit shortly.

@gluster-ant
Copy link
Collaborator Author

Time: 20190507T02:50:50
bugzilla-bot at gluster.org commented:
REVIEW: https://review.gluster.org/22668 (performance/write-behind: remove request from wip list in wb_writev_cbk) posted (#1) for review on release-6 by Raghavendra G

@stale
Copy link

stale bot commented Oct 8, 2020

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Oct 8, 2020
@stale
Copy link

stale bot commented Oct 23, 2020

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migrated Type:Bug wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

1 participant