Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupts on Windows #29

Open
gaborcsardi opened this issue Feb 1, 2021 · 2 comments
Open

Interrupts on Windows #29

gaborcsardi opened this issue Feb 1, 2021 · 2 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@gaborcsardi
Copy link
Member

From Florian Balmer

Hello Gábor

Please apologize for contacting you directly via e-mail, but I don't
have a GitHub account to submit an issue, and your address was
immediately revealed by Googling.

While researching about how to acquire file locks, and abort after a
specified timeout on already-locked files, I've come across your
[r-lib/filelock] repository on GitHub.

First of all, thank you very much for sharing your code!

In my supplementary resources [0-4] besides MSDN [5-8], I found the
following pattern:

OVERLAPPED structure allocated on local function stack
Issue asynchronous request
...
if (ERROR_IO_PENDING)
{
Wait for IO completion, or timeout elapse
if (timeout elapsed)
{
CancelIo()
GetOverlappedResult() --or-- WaitForSingleObject(OVERLAPPED.hEvent)
...

The emphasize here is that it's necessary to wait for the canceled IO
operation to complete, before returning from the function, or the
completion routine may access the implicitly free'd stack-allocated
OVERLAPPED structure, and cause potential havoc.

The documentation for LockFileEx() gives the hint to call
GetOverlappedResult() in case of ERROR_IO_PENDING. However, the
documentation for GetOverlappedResult() does not mention this case
explicitly, just Read/WriteFile(), DeviceIoControl(), and a few
others. And I'm not sure if file locking goes down the
DeviceIoControl() route, or is implemented above/besides the file
system layer, so the documentation leaves some ambiguity, if I get
everything correct.

Anyway, I tend to believe it's a good idea to wait for IO completion
after calling CancelIo(), and to be overly correct, maybe only after
CancelIo() returned a nonzero value (but neither of the code samples
from the supplementary resources care about the CancelIo() return
value).

This is just meant to be an input, feel free to do whatever you like
with this feedback. I don't even expect you to reply to this message
(I can't check my e-mails daily, anyway, at the moment). But if you
have the knowledge, or know some expert to confirm that calling
GetOverlappedResult() is wrong after canceling an asynchronous lock
request, I'd appreciate a short note.

Best wishes
--Florian

[0] You can use a file as a synchronization object, too | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20140905-00/?p=63

[1] Ready... cancel... wait for it! (part 1) | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20110202-00/?p=11613

[2] Ready... cancel... wait for it! (part 2) | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20110203-00/?p=11603

[3] Ready... cancel... wait for it! (part 3) | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20110204-00/?p=11583

[4] Asynchronous I/O | Newcomer J. M.
http://flounder.com/asynchexplorer.htm

[5] LockFileEx function
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex

[6] CancelIo function | MSDN
https://docs.microsoft.com/en-us/windows/win32/fileio/cancelio

[7] GetOverlappedResult function | MSDN
https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresult

[8] DeviceIoControl function
https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol

@gaborcsardi
Copy link
Member Author

Thanks!

Why not create a GitHub account, and add this as an issue? You did a
great investigation, it is kind of a waste that very few people know
about this.

As for the issue, you are right, and freeing the cancelled handles can
lead to undefined behavior. I have never actually seen that with
filelock, because the handles are only freed somewhat later, but have
seen the same with processx, for cancelled reads on pipes. I did fix
this in processx:
r-lib/processx@a8f09d1
but forgot that I had the same issue in filelock.

Waiting for the cancellation is not great, btw. Whenever you try to
synchronously wait in async code, that's a code smell. What you can do
is to add the handle(s) to a free-list (thing to be freed later), and
then check on the handle later, or use IOCPs.

Anyway, if you don't want to register to GitHub, then can I add your
email to an issue in r-lib/filelock?

Thanks again,
Gabor

@gaborcsardi
Copy link
Member Author

Thanks!

Why not create a GitHub account, and add this as an issue? You did a
great investigation, it is kind of a waste that very few people know
about this.

Anyway, if you don't want to register to GitHub, then can I add your
email to an issue in r-lib/filelock?

Thanks for your reply! I'm sure I will create a GitHub account, sooner
or later, but I feel like I don't need one, right now. I'm managing my
own code with Fossil, on my own web server.

If it simplifies management and discussion of issues for you and your
contributors, please feel free to feed excerpts from my e-mails to the
GitHub issue tracker. I don't mind if you add my e-mail address, but
maybe with some minimal anti-spambot obfuscation to give me some
minimal feeling of privacy ;-)

As for the issue, you are right, and freeing the cancelled handles can
lead to undefined behavior. I have never actually seen that with
filelock, because the handles are only freed somewhat later, but have
seen the same with processx, for cancelled reads on pipes. I did fix
this in processx:
r-lib/processx@a8f09d1
but forgot that I had the same issue in filelock.

Another good reason to call GetOverlappedResult(), besides the
mentioned potential access-after-free to the OVERLAPPED structure on
the local function stack.

Indeed, in case of the race condition, i.e. the locking IO request
terminating in parallel to the CancelIo() call, one may also end up
with the lock acquired when CancelIo() returns. Querying
GetOverlappedResult() should catch this, and return
TRUE/ERROR_OPERATION_ABORTED if CancelIo() was quick enough to take
effect. If this were not the case, and there were no other errors, the
lock could be kept and used normally, instead of discarded implicitly
by calling CloseHandle(), even more so because MSDN recommends to
unlock files by explicit UnLockFileEx calls.

Waiting for the cancellation is not great, btw. Whenever you try to
synchronously wait in async code, that's a code smell. What you can do
is to add the handle(s) to a free-list (thing to be freed later), and
then check on the handle later, or use IOCPs.

Cancelled, but nonetheless successfully completed reads (for which
CancelIo() lost the race) could be left unprocessed in the IOCP, but
wouldn't immediately closing the file handles and the IOCP (without
waiting for the cancellation) lead to the same situation? Maybe in the
end, closing file handles with pending IO requests is a common
scenario that file system drivers must be able to handle gracefully.

Yes, all this doesn't look trivial. With short lock timeouts of
several 100 ms, an easier solution for my use case may be to repeat a
few sleep-and-try-synchronous-locking cycles, even more so if the goal
is to get a synchronous file handle opened without
FILE_FLAG_OVERLAPPED.

--Florian

@gaborcsardi gaborcsardi added the bug an unexpected problem or unintended behavior label Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant