-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
io.FileIO hang all threads if fstat blocks on inaccessible NFS server #76367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Using io.FileIO can hang all threads when accessing an inaccessible NFS To reproduce this, you need to open the file like this: fd = os.open(filename, ...)
fio = io.FileIO(fd, "r+", closefd=True) Inside fileio_init, there is a checkfd call, calling fstat without releasing The expected behavior is blocking only the thread blocked on the system call, Here is the log showing this issue, created with the attached reproducer # python fileio_nfs_test.py mnt/fileio.out dumbo.tlv.redhat.com Everything is hang now! After some time I run this from another shell: And now the script is unblocked and finish. 2017-11-30 18:45:29,683 - (MainThread) - OK We have a canary thread logging every second. Once we tried to open And here is the backtrace of the hang process in the kernel: # cat /proc/3436/stack You cannot attach to the process with gdb, since it is in D state, but once Thread 2 (Thread 0x7f97a2ea5700 (LWP 4799)): Thread 1 (Thread 0x7f97ac10d740 (LWP 4798)): Looking at python code - there are two helpers in fileio.c that call fstat
And both helpers are called from fileio_init (the implementation of io.FileIO())
Reported by RHV user, see https://bugzilla.redhat.com/1518676 |
Forgot to mention - reproducible with python 2.7. Similar issues exists in python 3, but I did not try to reproduce since we I posted patches for both 2.7 and master: |
We already release the GIL when calling lseek() in fileio.c, in the portable_lseek() function. So it makes sense to also do it in _io_FileIO_readall_impl() in the same file. os.lseek() also releases the GIL. I found another functions which calls lseek() without releasing the GIL:
I'm not sure that these 3 functions should be modified. In case of doubt, I prefer to not touch the code. |
The bug has been fixed in Python 2.7, 3.6 and the master branch. Thank you Nir Soffer for the bug report and the fix! |
This has been fixed in all active branches (2.7, 3.6 and master) so I think we can close it as 'fixed'. Thanks, Nir! |
(Oops, closing was my intent of my previous comment, but I forgot it, |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: