Patch selectmodule.c to support WSAPoll on Windows #60711

tpn · 2012-11-18T22:32:54Z

BPO	16507
Nosy	@gvanrossum, @jcea, @pitrou, @giampaolo, @tpn
Files	wsapoll.patch miminal-wsapoll.patch runtime_wsapoll.patch runtime_wsapoll.patch runtime_wsapoll.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-06-20.13:53:30.295>
created_at = <Date 2012-11-18.22:32:54.167>
labels = ['type-feature']
title = 'Patch selectmodule.c to support WSAPoll on Windows'
updated_at = <Date 2013-06-20.13:53:30.283>
user = 'https://github.com/tpn'

bugs.python.org fields:

activity = <Date 2013-06-20.13:53:30.283>
actor = 'sbt'
assignee = 'none'
closed = True
closed_date = <Date 2013-06-20.13:53:30.295>
closer = 'sbt'
components = []
creation = <Date 2012-11-18.22:32:54.167>
creator = 'trent'
dependencies = []
files = ['28038', '28201', '28207', '28341', '28799']
hgrepos = []
issue_num = 16507
keywords = ['gsoc']
message_count = 32.0
messages = ['175927', '175929', '175948', '176864', '176917', '177109', '177634', '180256', '180309', '180315', '180317', '180318', '180322', '180325', '180327', '180328', '180345', '180349', '180350', '180353', '180358', '180360', '180386', '180393', '180396', '180397', '180406', '180407', '180410', '180412', '180422', '180424']
nosy_count = 7.0
nosy_names = ['gvanrossum', 'jcea', 'pitrou', 'giampaolo.rodola', 'trent', 'neologix', 'sbt']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue16507'
versions = ['Python 3.4']

tpn · 2012-11-18T22:32:52Z

Attached patch adds select.poll() support on Windows via WSAPoll.

It's hacky; I was curious to see whether or not it could be done, and whether or not tulip's pollster would work with it.

It compiles and works, but doesn't play very nicely with tulip. Also, just about every lick of code that tests poll() does so in a UNIX-specific way, so it's hard to test.

As with select, WSAPoll() will barf if you feed it anything other than SOCKETs (i.e. it doesn't work against non-socket file descriptors).

pitrou · 2012-11-18T23:19:20Z

Related post:
http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

tpn · 2012-11-19T08:12:39Z

On Sun, Nov 18, 2012 at 03:19:19PM -0800, Antoine Pitrou wrote:

Antoine Pitrou added the comment:

Related post:
http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

Yeah, came across that yesterday.  Few other relevant links, for the
records:

    http://social.msdn.microsoft.com/Forums/en/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90 (has a code example of what doesn't work)

    http://www.codeproject.com/Articles/140533/The-Differences-Between-Network-Calls-in-Windows-a

    http://blogs.msdn.com/b/wndp/archive/2006/10/26/wsapoll.aspx

    http://curl.haxx.se/mail/lib-2012-10/0038.html

sbt · 2012-12-03T20:03:37Z

Attached is an alternative patch which only touches selectmodule.c. It still does not support WinXP.

Note that in this version register() and modify() do not ignore the POLLPRI flag if it was *explicitly* passed. But I am not sure how best to deal with POLLPRI.

sbt · 2012-12-04T15:01:37Z

Here is a version which loads WSAPoll at runtime. Still no tests or docs.

sbt · 2012-12-07T18:50:50Z

It seems that the return code of WSAPoll() does not include the count of array items with revents == POLLNVAL. In the case where all of them are POLLNVAL, instead of returning 0 (which usually indicates a timeout) it returns -1 and WSAGetLastError() == WSAENOTSOCK.

This does not match the MSDN documentation which claims that the return code is the number of descriptors for which revents is non-zero. But it arguably does agree with the FreeBSD and MacOSX man pages which say that it returns the number of descriptors that are "ready for I/O".

BTW, the implementation of select_poll() assumes that the return code of poll() (if non-negative) is equal to the number of non-zero revents fields. But select_have_broken_poll() considers a MacOSX poll() implementation to be good even in cases where this assumption is not true:

static int select_have_broken_poll(void)
{
    int poll_test;
    int filedes[2];
    struct pollfd poll_struct = { 0, POLLIN|POLLPRI|POLLOUT, 0 };
    if (pipe(filedes) < 0) {
        return 1;
    }
    poll_struct.fd = filedes[0];
    close(filedes[0]);
    close(filedes[1]);
    poll_test = poll(&poll_struct, 1, 0);
    if (poll_test < 0) {
        return 1;
    } else if (poll_test == 0 && poll_struct.revents != POLLNVAL) {
        return 1;
    }
    return 0;
}

Note that select_have_broken_poll() == FALSE if poll_test == 0 and poll_struct.revents == POLLNVAL.

sbt · 2012-12-16T22:41:29Z

Here is a new version with tests and docs.

Note that the docs do not mention the bug mentioned in

http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

Maybe they should?

Note that that bug makes it a bit difficult to use poll with tulip on Windows. (But one could restrict timeouts to one second and always check outstanding connect attempts using select() when poll() returns.)

gvanrossum · 2013-01-19T19:33:42Z

This works well enough (tested in old version of Tulip), right? What's holding it up?

gvanrossum · 2013-01-20T19:41:08Z

Oh, it needs a new patch -- the patch fails to apply in the 3.4
(default) branch.

gvanrossum · 2013-01-20T21:09:13Z

Here's a new version of the patch. (Will test on Windows next.)

gvanrossum · 2013-01-20T21:35:06Z

That compiles (after hacking the line endings). One Tulip test fails, PollEventLooptests.testSockClientFail. But that's probably because the PollSelector class hasn't been adjusted for Windows yet (need to dig this out of the Pollster code that was deleted when switching to neologix's Selector).

gvanrossum · 2013-01-20T21:35:09Z

That compiles (after hacking the line endings). One Tulip test fails, PollEventLooptests.testSockClientFail. But that's probably because the PollSelector class hasn't been adjusted for Windows yet (need to dig this out of the Pollster code that was deleted when switching to neologix's Selector).

sbt · 2013-01-20T21:54:52Z

That compiles (after hacking the line endings). One Tulip test fails,
PollEventLooptests.testSockClientFail. But that's probably because the
PollSelector class hasn't been adjusted for Windows yet (need to dig this
out of the Pollster code that was deleted when switching to neologix's
Selector).

Sorry I did not deal with this earlier. I can make the modifications to PollSelector tommorrow.

Just to describe the horrible hack: every time poll() needs to be called we first check if there are any registered async connects. If so then I first use select([], [], connectors) to detect any failed connections, and then use poll() normally.

This does mean that to detect failed connections we must never use too large a timeout with poll() when there are outstanding connects. Of course one must decide what is an acceptable maximum timeout -- too short and you might damage battery life, too long and you will not get prompt notification of failures.

gvanrossum · 2013-01-20T22:05:01Z

Ow. How painful. I'll leave this for you to do. Note that this also
requires separating EVENT_WRITE from EVENT_CONNECT -- I am looking
into this now, but I am not sure how far I will get with this.

gvanrossum · 2013-01-20T22:42:35Z

(FWIW, I've got the EVENT_CONNECT separation done.)

neologix · 2013-01-20T22:45:57Z

Time for a stupid question from someone who doesn't know anything about Windows: if WSAPoll() is really terminally broken, is it really worth the hassle exposing it and warping the API?
AFAICT, FD_SETSIZE is already bumped to 512 on Windows, and Windows select() is limited by the fd_set size, not the maximum descriptor: so what exactly does WSAPoll() bring over select() on Windows?
(Especially if there are plans to support IOCP, wouldn't that make WSAPoll() obsolete?)

gvanrossum · 2013-01-21T17:38:50Z

This is a very good question to which I have no good answer. If it weren't for this, we could probably do away with the distinction between add_writer and add_connector, and a lot of code could be simpler. (Or is that distinction also needed for IOCP?)

sbt · 2013-01-21T18:40:31Z

On 21/01/2013 5:38pm, Guido van Rossum wrote:

This is a very good question to which I have no good answer.
If it weren't for this, we could probably do away with the distinction
between add_writer and add_connector, and a lot of code could be
simpler. (Or is that distinction also needed for IOCP?)

The distinction is not needed by IOCP. I am also not too sure that
running tulip on WSAPoll() is a good idea, even if the select module
provides it.

OFF-TOPIC: Although it is not the optimal way of running tulip with
IOCP, I have managed to implement IocpSelector and IocpSocket classes
well enough to pass tulip's unittests (except for the ssl one).

I did have to make some changes to the tests: selectors have a
wrap_socket() method which prepares a socket for use with the selector.
On Unix it just returns the socket unchanged, whereas for IocpSelector
it returns an IocpSocket wrapper.

I also had to make the unittests behave gracefully if there is a
"spurious wakeup", i.e. the socket is reported as readable, but trying
to read fails with BlockingIOError. (Spurious wakeups are possible but
very rare with select() etc.)

It would be possible to make IocpSelector deal with pipe handles too.

gvanrossum · 2013-01-21T19:00:18Z

Thanks -- I am now close to rejecting the WSAPoll() patch, and even
closer to rejecting its use for Tulip on Windows. That would in turn
mean that we should kill add/remove_connector() and also the
EVENT_CONNECT flag in selector.py. Anyone not in favor please speak
up!

Regarding your IOCP changes, that sounds pretty exciting. Richard,
could you check those into the Tulip as a branch? (Maybe a new branch
named 'iocp'.)

sbt · 2013-01-21T19:51:23Z

On 21/01/2013 7:00pm, Guido van Rossum wrote:

Regarding your IOCP changes, that sounds pretty exciting. Richard,
could you check those into the Tulip as a branch? (Maybe a new branch
named 'iocp'.)

OK. It may take me a while to rebase them.

sbt · 2013-01-21T20:41:56Z

I have created an iocp branch.

neologix · 2013-01-21T20:55:57Z

I have created an iocp branch.

You could probably report the fixes for spurious notifications in the
default branch.

sbt · 2013-01-22T13:31:10Z

It appears that Linux's "spurious readiness notifications" are a deliberate deviation from the POSIX standard. (They are mentioned in the BUGS section of the man page for select.)

Should I just apply the following patch to the default branch?

diff -r 3ef7f1fe286c tulip/events_test.py
--- a/tulip/events_test.py      Mon Jan 21 18:55:29 2013 -0800
+++ b/tulip/events_test.py      Tue Jan 22 12:09:21 2013 +0000
@@ -200,7 +200,12 @@
         r, w = unix_events.socketpair()
         bytes_read = []
         def reader():
-            data = r.recv(1024)
+            try:
+                data = r.recv(1024)
+            except BlockingIOError:
+                # Spurious readiness notifications are possible
+                # at least on Linux -- see man select.
+                return
             if data:
                 bytes_read.append(data)
             else:
@@ -218,7 +223,12 @@
         r, w = unix_events.socketpair()
         bytes_read = []
         def reader():
-            data = r.recv(1024)
+            try:
+                data = r.recv(1024)
+            except BlockingIOError:
+                # Spurious readiness notifications are possible
+                # at least on Linux -- see man select.
+                return
             if data:
                 bytes_read.append(data)
             else:

neologix · 2013-01-22T14:36:41Z

It appears that Linux's "spurious readiness notifications" are a deliberate deviation from the POSIX standard. (They are mentioned in the BUGS section of the man page for select.)

I don't think it's a deliberate deviation, but really bugs/limitations
(I can remember at least one occurrence case where a UDP segment would
be received, which triggered a notification, but the segment was
subsequently discarded because of an invalid checksum). AFAICT kernel
developers tried to fix those spurious notifications, but some of them
were quite tricky (see e.g. http://lwn.net/Articles/318264/ for
epoll() patches, and
http://lists.schmorp.de/pipermail/libev/2009q1/000627.html for an
example spurious epoll() notification scenario).

That's something we have to live with (like pthread condition spurious
wakeups), select()/poll()/epoll() are mere hints that the FD is
readable/writable...

Also, in real code you have to be prepared to catch EAGAIN regardless
of spurious notifications: when a FD is reported as read ready, it
just means that there are some data to read. Depending on the
watermark, it could mean that only one byte is available.

So if you want to receive e.g. a large amount of data and the FD is
non-blocking, you can do something like:

"""
buffer = []
while True:
try:
data = s.recv(8096)
except BlockingIOError:
break

        if data is None:
            break
        buffer += data
"""

Otherwise, you'd have to read() only one byte at a time, and go back
to the select()/poll() syscall.

(For write ready, you can obviously have "spurious" notifications if
you try to write more than what is available in the output socket
buffer).

Should I just apply the following patch to the default branch?

LGTM.

pitrou · 2013-01-22T14:56:23Z

Also, in real code you have to be prepared to catch EAGAIN regardless
of spurious notifications: when a FD is reported as read ready, it
just means that there are some data to read. Depending on the
watermark, it could mean that only one byte is available.

If only one byte is available, recv(4096) should simply return a partial result.

sbt · 2013-01-22T15:00:56Z

According to Alan Cox

It's a design decision and a huge performance win. It's one of the areas
where POSIX read in its strictest form cripples your performance.

See https://lkml.org/lkml/2011/6/18/103

(For write ready, you can obviously have "spurious" notifications if
you try to write more than what is available in the output socket
buffer).

Wouldn't you just get a partial write (assuming an AF_INET, SOCK_STREAM socket)?

neologix · 2013-01-22T16:01:55Z

If only one byte is available, recv(4096) should simply return a partial result.

Of course, but how do you know if there's data left to read without
calling select() again? It's much better to call read() until you get
EAGAIN than calling select() between each read()/write() call.

Wouldn't you just get a partial write (assuming an AF_INET, SOCK_STREAM socket)?

For SOCK_STREAM, yes, not for SOCK_DGRAM (or for a pipe when trying to
write more than PIPE_BUF, although I guess any sensible implementation
doesn't report the pipe write ready if there's less than PIPE_BUF
space left).

It's a design decision and a huge performance win. It's one of the areas
where POSIX read in its strictest form cripples your performance.

Yes, he's referring to the fact that there are cases where you could
avoid some spurious notifications, but that would incur a performance
hit: that's exactly the same rationale behind condition variables
spurious wakups: since the user-code must be prepared to handle
spurious notifications, let's take advantage of it.

But there are been various fixes in the past years to avoid spurious
notifications in epoll() for example, because while they allow certain
optimizations in the kernel, spurious wakeups can cost to user-level
applications...

I'm 99% sure that Linux isn't the only OS allowing spurious wakeups,
since it's essentially an unsolvable issue (temporary shortage of
buffer, or the example given by Alan Cox of a pipe with two
readers...).

neologix · 2013-01-22T16:06:32Z

For SOCK_STREAM, yes, not for SOCK_DGRAM (or for a pipe when trying to
write more than PIPE_BUF, although I guess any sensible implementation
doesn't report the pipe write ready if there's less than PIPE_BUF
space left).

That should be of course "when trying to write LESS than PIPE_BUF",
since it's required to be atomic.

gvanrossum · 2013-01-22T16:27:37Z

Short reads/writes are orthogonal to EAGAIN. All the mainline code treats
readiness as a hint only, so tests should too.

--Guido van Rossum (sent from Android phone)

sbt · 2013-01-22T17:09:03Z

For SOCK_STREAM, yes, not for SOCK_DGRAM

I thought SOCK_DGRAM messages just got truncated at the receiving end.

neologix · 2013-01-22T19:04:08Z

I thought SOCK_DGRAM messages just got truncated at the receiving end.

You were referring to partial writes: for a datagram-oriented
protocol, if the datagram can't be sent atomically (in one
send()/write() call), the kernel will return EAGAIN. On the receiving
side, it will get truncated is the buffer is too small.

Going back to the subject: so what do we say, let's just forget about
supporting WSAPoll at all (both in CPython and tulip)?

If we ever choose to export it, I think the least we should do would
be to not export it as select.poll(): since it has - not so subtle -
semantic differences with poll(), code using previously select() on
Windows may silently break when poll() is suddenly available: e.g.
asyncore with use_poll=True would probably deadlock in case of
unreachable host, if WSAPoll doesn't report connect() failures.

When I see the hoops Richard had to go through to make WSAPoll usable
in tulip, my gut feeling is that exposing it wouldn't be making a
favor to poor unsuspecting Windows programmers :-\

gvanrossum · 2013-01-22T19:13:38Z

Agreed, it does not sound very useful to support WSAPoll(), neither in
selector.py (which is intended to eventually be turned into
stdlib/select.py) nor in PEP-3156. And then, what other use is there
for it, really?

sbt mannequin added the type-feature A feature request or enhancement label Dec 16, 2012

sbt mannequin closed this as completed Jun 20, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch selectmodule.c to support WSAPoll on Windows #60711

Patch selectmodule.c to support WSAPoll on Windows #60711

tpn commented Nov 18, 2012

tpn commented Nov 18, 2012

pitrou commented Nov 18, 2012

tpn commented Nov 19, 2012

sbt mannequin commented Dec 3, 2012

sbt mannequin commented Dec 4, 2012

sbt mannequin commented Dec 7, 2012

sbt mannequin commented Dec 16, 2012

gvanrossum commented Jan 19, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

sbt mannequin commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

neologix mannequin commented Jan 20, 2013

gvanrossum commented Jan 21, 2013

sbt mannequin commented Jan 21, 2013

gvanrossum commented Jan 21, 2013

sbt mannequin commented Jan 21, 2013

sbt mannequin commented Jan 21, 2013

neologix mannequin commented Jan 21, 2013

sbt mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

pitrou commented Jan 22, 2013

sbt mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

gvanrossum commented Jan 22, 2013

sbt mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

gvanrossum commented Jan 22, 2013

Patch selectmodule.c to support WSAPoll on Windows #60711

Patch selectmodule.c to support WSAPoll on Windows #60711

Comments

tpn commented Nov 18, 2012

tpn commented Nov 18, 2012

pitrou commented Nov 18, 2012

tpn commented Nov 19, 2012

sbt mannequin commented Dec 3, 2012

sbt mannequin commented Dec 4, 2012

sbt mannequin commented Dec 7, 2012

sbt mannequin commented Dec 16, 2012

gvanrossum commented Jan 19, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

sbt mannequin commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

gvanrossum commented Jan 20, 2013

neologix mannequin commented Jan 20, 2013

gvanrossum commented Jan 21, 2013

sbt mannequin commented Jan 21, 2013

gvanrossum commented Jan 21, 2013

sbt mannequin commented Jan 21, 2013

sbt mannequin commented Jan 21, 2013

neologix mannequin commented Jan 21, 2013

sbt mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

pitrou commented Jan 22, 2013

sbt mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

gvanrossum commented Jan 22, 2013

sbt mannequin commented Jan 22, 2013

neologix mannequin commented Jan 22, 2013

gvanrossum commented Jan 22, 2013