Join GitHub today
Windows event notification #52
Windows has 3 incompatible families of event notifications APIs: IOCP,
(Actually, all 3 together still isn't sufficient, b/c there are some things that still require threads – like console IO. But never mind. Just remember that when someone tells you that Windows' I/O subsystem is great, that their statement isn't wrong but does require taking a certain narrow perspective...)
The WaitFor*Event family
IOCP is the crown jewel of Windows the I/O subsystem, and what you generally hear recommended. It follows a natively asynchronous model where you just go ahead and issue a read or write or whatever, and it runs in the background until eventually the kernel tells you it's done. It provides an O(1) notification mechanism. It's pretty slick. But... it's not as obvious a choice as everyone makes it sound. (Did you know the Chrome team has mostly given up on trying to make it work?)
On the other hand, IOCP is the only way to do a number of things like: non-blocking IO to the filesystem, or monitoring the filesystem for changes, or non-blocking IO on named pipes. (Named pipes are popular for talking to subprocesses – though it's also possible to use a socket if you set it up right.)
You can also use
Given all of the above, our current design is a hybrid that uses
The other attractive option would be if we could solve the issues with IOCP and switch to using it alone – this would be simpler and get rid of the O(n)
Working around IOCP's limitations
I'm not really sure what the best approach here is. One option is just to limit the number of outstanding UDP data to some fixed amount (maybe tunable through a "virtual" (i.e. implemented by us) sockopt), and drop packets or return errors if we exceed that. This is clearly solvable in principle, it's just a bit annoying to figure out the details.
Spurious extra latency in TCP receives
I think that using the
Checking readability / writability
It turns out that IOCP actually can check readability! It's not mentioned on MSDN at all, but there's a well-known bit of folklore about the "zero-byte read". If you issue a zero-byte read, it won't complete until there's data ready to read. ref1 (← official MS docs!), ref2, ref3.
That's for SOCK_STREAM sockets. What about SOCK_DGRAM? libuv does zero-byte reads with
What about writability? Empirically, if you have a non-blocking socket on windows with a full send buffer and you do a zero-byte send, it returns
For writability of SOCK_DGRAM I don't think there's any trick, but it's not clear how meaningful SOCK_DGRAM writability is anyway. If we do our own buffering than presumably we can implement it there.
Alternatively, there is a remarkable piece of undocumented sorcery, where you reach down directly to make syscalls, bypassing the Winsock userland, and apparently can get OVERLAPPED notifications when a socket is readable/writable: ref1, ref2, ref3, ref4, ref5. I guess this is how
Implementing all this junk
I actually got a ways into this. Then I ripped it out when I realized how many nasty issues there were beyond just typing in long and annoying API calls. But it could easily be resurrected; see 7e7a809 and its parent.
If we do want to switch to using IOCP in general, then the sequence would go something like:
This was referenced
Feb 12, 2017
Something to think about in this process: according to the docs on Windows, with overlapped I/o you can still get WSAEWOULDBLOCK ("too many outstanding overlapped requests"). No-one on the internet seems to have any idea when this actually occurs or why. Twisted has a FIXME b/c they don't handle it, just propagate the error out. Maybe we can do better? Or maybe not.
(I bet if you don't throttle UDP sends at all then that's one way to get this? Though libuv AFAICT doesn't throttle UDP sends at all and I guess they do ok... it'd be worth trying this just to see how Windows handles it.)
This was referenced
Mar 11, 2017
referenced this issue
Apr 10, 2017
This was referenced
Jun 22, 2017
referenced this issue
Jul 10, 2017
That was me, and afaict the IOCP 0-byte trick only works for readability.
According to https://blog.grijjy.com/2018/08/29/creating-high-performance-udp-servers-on-windows-and-linux/ the 0-byte read trick does with MSG_PEEK (even though the msdn docs say it only works for non-overlapped sockets?).
Looking at this again 18 months later, and with that information in mind... there's still no urgent need to change what we're doing, though if we move forward with some of the wilder ideas in #399 then that might make IOCP more attractive (in particular see #399 (comment)).
How bad would it be to make
I guess we could dig into the "undocumented sorcery" mentioned above and see how bad it is...
I got curious about this and did do a bit more digging.
So when you call
Normally, running on standard out-of-the-box install of Windows,
That part is actually fine. It's arcane and low-level, but documented well-enough in the source code of projects like libuv and wepoll, and basically sensible. You put together an array of structs listing the sockets you want to wait for and what events you want to wait for, and then submit it to the kernel, and get back a regular IOCP notification when it's ready.
However! This is not the only thing
Like most unstructured hooking APIs, the designers had high hopes that this would be used for all kinds of fancy things, but in practice it suffers from the deadly flaw of lack of composability:
And in fact,
Apparently it picks one of the hooks at random and forces it to handle all of the sockets. Which can't possibly work, but nonetheless:
Also, the LSP design means that you can inject arbitrary code into any process by modifying the registry keys that configure the LSPs. This tends to break sandboxes, and means that buggy LSPs cause crashes critical system processes like
So this whole design is fundamentally broken, and MS is actively trying to get rid of it: as of Windows 8 / Window Server 2012, it's officially deprecated, and to help nudge people away from it, Windows 8 "Metro" apps simply ignore any installed LSPs.
Nonetheless, it is apparently used by old firewalls and anti-virus programs and also malware. (I guess they must just blindly hook everything, to work around the composability problems?), so some percentage of Windows machines in the wild do still have LSPs installed.
Another fun aspect of LSPs: there are some functions that are not "officially" part of Winsock, but are rather "Microsoft extensions", like
Twisted and asyncio both get this wrong, btw – they call
You can detect whether an LSP has hooked any particular socket by using
You can also blatantly violate the whole LSP abstraction layer by passing
Given that libuv seems to get away with this in practice, and Windows 8 Metro apps do something like this, and that Twisted and asyncio are buggy in the presence of LSPs and seem to get away with it in practice, I think we can probably use
But... having used
On a Windows system, you can get a list of "providers" (both LSPs and base providers) by running
(This only has the service providers; the command also lists "name space providers", which is a different extension mechanism, but I left those out.)
So if we group by entries with the same GUID, there's one base provider that does IPv4 (TCP, UDP, and RAW), another that does IPv6 (TCP, UDP, and RAW), one that handles Hyper-V sockets, one that handles IrDA sockets, and one that handles "RSVP" (whatever that is).
The three special GUIDs that libuv thinks can work with the AFD magic seem to correspond to the IPv4 and IPv6 providers, and then the third one from some googling appears to MS's standard Bluetooth provider. (I guess this VM doesn't have the Bluetooth drivers installed.)
So: I think libuv on Windows always ends up using the AFD magic when it's working IPv4 or IPv6 or Bluetooth sockets. When it gets something else – meaning IrDA, Hyper-V, RSVP, or something even more exotic involving a third-party driver – it falls back on calling
But AFAIK, on Windows, Python's
I think we can use
I randomly stumbled upon this message. Since this is stuff is so arcane that it's actually kinda fun, some extra pointers:
This is actually kind of a bug in libuv -- this logic makes libuv use it's "slow" mode too eagerly (libuv has a fallback mechanism where it simply calls select() in a thread). If a socket is bound to an IFS LSP, it'll have a different GUID that's not one of those 3 on the whitelist, however the socket would still work with IOCTL_AFD_POLL. Since all base service providers are built into windows and they all use AFD, it's simply enough to check whether the protocol chain length (in the
Note that windows itself actually does this most of the time: since windows vista, select() cannot be intercepted by LSPs unless the LSP does all of the following:
I have yet to encounter the first LSP that actually does this.
If you wanted to be super pedantic about it, you could call both SIO_BASE_HANDLE and SIO_BSP_HANDLE_SELECT and check if they are the same.
There's really no need to do any of this grouping in practice, they all use the same code path in the AFD driver. I removed it from wepoll in piscisaureus/wepoll@e7e8385.
Also note that RSVP is deprecated and no longer supported, and that since the latest windows 10 update there is AF_UNIX which also works with IOCTL_AFD_POLL just fine.
@piscisaureus Oh hey, thanks for dropping by! I was wondering why wepoll didn't seem to check GUIDs, and was completely baffled by the docs for
What do you think of unconditionally calling
And do you happen to know what happens if you try to use the AFD poll stuff with a handle that isn't a MSAFD base provider handle? Does it give a sensible error, or...?
Libuv doesn't do that. It uses SIO_BASE_HANDLE to determine whether it can use some optimizations, but it doesn't attempt to bypass LSPs.
If you want to bypass LSPs outright, then you should do it when you create the socket. This is done by using
When you create an non-IFS-LSP-bound socket and then get and use the base handle everywhere, it would likely cause memory leaks and other weirdness (in particular if you also use the base handle when closing the socket with
Not too many people will care probably. The only LSP i've seen people use intentionally is Proxifier. That's an IFS LSP, so there's no need to bypass it, the socket is a valid AFD handle.
You'll get a STATUS_INVALID_HANDLE error.
Oh, you're right. I was looking at
It looks like the only place
I see, excellent points, thank you. And on further thought, it probably wouldn't have worked anyway, since for the sockets we control we can use IOCP
I guess the main reason this seemed attractive is that it would let us have a single code path, without treating LSP sockets differently. I'm kind of terrified of that, because I have no idea how to test the LSP code paths. Do you by chance happen to know any reliable way to set up LSPs for testing? ("OK, so to make sure we're testing our code under realistic conditions, we're going to install malware on all our Windows test systems...")
Though, it's also probably not a huge deal, because it sounds like we can have a single straight-line code path in all cases anyway, where we do the
Thanks for all the information, by the way, this is super helpful.
Indeed CancelIo/CancelIoEx doesn't work with non IFS sockets. IIRC libuv also works around issues with SetFileCompletionNotificationModes not working in the presence of non-IFS LSPs. Come to think of it, I'm kind of surprised that CreateIoCompletionPort does work, it seems unlikely that LSPs can trap that?
This was also the main reason we added uv_poll -- to integrate with c-ares (dns library).
Proxifier, as mentioned earlier, is an IFS LSP I have seen in the wild.
IIRC older versions of the Windows SDK also contained a reference implementation for different LSP types. You could compile and install it. I've never gotten around to doing that.
Looking again at Network Programming for Microsoft Windows, 2nd ed., I think the way LSPs handle IO completion ports is that when you call
That said, I have no idea why this doesn't work with
That is definitely what we're going to do in Trio, at least to start :-).
Oh beautiful, we are totally going to steal that wholesale. Thank you!
So it sounds like at this point we now know everything we need to to reimplement