-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing Solaris "/dev/poll" in the "select" module #50646
Comments
In Python 2.6 we added support for Linux "epoll" and *BSD "kqueue" in What do you think?. I volunteer to do the work, if you agree this is a feature we want to |
Solaris 10 introduced "The Event Completion Framework". I am not http://developers.sun.com/solaris/articles/event_completion.html It suggests that /dev/poll is not as performant, but I'm not sure I |
I want to move this forward. Apparently, "/dev/poll" could be actually used transparently in python "select.poll()" implementation. The semantics seems to be the same, so we could use the "poll" syscall or "/dev/poll" statically at compiling time, or dinamically at poll object creation time (try to open /dev/poll and go to "poll" syscall if that fails). Some details: http://developers.sun.com/solaris/articles/using_devpoll.html I agree that Solaris 10 event framework would be nice to support too, but that would be another feature request. |
First version of the patch. Review 0ee4386d8f51.diff http://bugs.python.org/file23526/0ee4386d8f51.diff Details:
With this, I am starting to think that providing a separate "select.devpoll" is actually a better idea. Opinions?. |
I have decided to segregate "select.devpoll" to a separate object, like "select.epoll". |
Solved points 1, 3 and 4. 2 will be solved with the documentation. 5 and 6 still pending. |
Documentation added. That solves 2 and 5. I still have to solve 6. |
Please, review. With current code, each devpoll object has capacity for managing 256 fds, by default. This is about 2048 bytes. The cost seems reasonable, since a normal program will have only a few devpoll objects around. I have considered an optional parameter to tune this, but interaction with rlimit is messy. Even if we manage 65536 fds, the memory cost is about 512Kbytes per devpoll, and you surely can affort it if you are actually managing 65536 descriptors... The code is not threadsafe. It doesn't crash, but concurrent use of a devpoll has undefined results. Please, review for integration. |
Please, check the new changeset, with all your feedback. Thanks!. |
Another changeset. Hopefully, final :-). Please, review. |
+ increases this value, c:func:`devpoll` will return a possible I think this should change to: + increases this value, c:func:`devpoll` will return a possibly or even better: + increases this value, c:func:`devpoll` may return an Cheers |
Thanks, Ross. Your suggestion has been committed in my branch. Waiting for more feedback. |
Is write()ing a devpoll fd a blocking operation in the kernel? Obviously, the ioctl() call *is* blocking :-) |
Also, you can use Py_RETURN_NONE instead of: |
They are operations potentially slow, there are not realtime specifications. My machine is quite capable (16 CPUs), but let's see what a bit of DTRACE scripting tells us: First, let's time the open: """ syscall::open*:return The result, times in nanoseconds: """ Most "open"s finish in 4 microseconds, but we have a handful of "slow" openings of hundred of microseconds. Anyway, argueing with the GIL here is a nonissue, since the "open" is only done at object creation, and a sane program will only create a handful of devpoll objets. Let's see now the write: """ syscall::open*:return """ The results for a single descriptor registered: """ Most writes are really fast, half a microsecond, but we have sporadic latencies of a hundred of microseconds. Re-registering 200 sockets per loop, I have: """ Most writes takes around 8 microseconds. So now the problem is to estimate how much time I need to release & readquire the GIL. For that, and while we don't have DTRACE integration in python (my next project, I hope), I change sourcecode to add a manual probe using the Solaris High Resolution timers, I get a median time of around 550 nanoseconds (half a microsecond), with a few spikes, when the GIL is not contested. So, unlocking the GIL is adding around half microsecond to the syscalls. This is fast, but comparable with the syscall actual duration. Liberating the GIL is doubling the time to register&unregister a socket (supposely we are doing a single socket per iteration, something realistic in real code). Being realistic, any work we do with the descriptors (like actually reading or writing it) is going to make this 0.5 microsecond gain pointless. Freeing the GIL, we are protected of any kernel contention, and playing for the rules :-). Anyway I am open to feedback. PS: I also checked with GIL contention, and results are comparable. That is surprising, maybe related to the GIL improvements in 3.3. I haven't investigated the issue further. |
New changeset, after Ross feedback. Thanks!. |
That was thorough :-) Seems OK though. + if (n < size) { If n < size, it's not a Python error is it? I would say it's the OS's fault. Otherwise, it looks good... although I don't have currently access to a Solaris box to test it. |
The timing for the GIL I am providing is for release&acquiring. That is, all the work. In fact I am having in count too the syscall inside the release&acquire, to account for cache issues. That is, between the release and the acquire, there is a real syscall. Of course, I am not timing it. So, the timing I am providing is accurate. |
No, but it's a Python error if Python wrongly assumes that write() won't |
The problem with partial writes is that the data is not an unstructured stream of bytes, but a concrete binary dump. You can not simply retry again. My bet is that "/dev/poll" neves does partial writes. If I am mistaken, I will bug the Illumos people to help me to solve it. So far, this seems a theorical problema. Just in case it is not, we raise an exception and urge the programmer to report us the issue. The other option would be to try to be clever and do things like retrying if the partial write is a multiple of the struct size, but what if it not so?. Instead of guessing and play games, I rather prefer to know if this is actually a problem in the wild. In my tests, with 65530 sockets, I never saw this "impossible" exception. |
New changeset. The only change is: """
diff --git a/Modules/selectmodule.c b/Modules/selectmodule.c
--- a/Modules/selectmodule.c
+++ b/Modules/selectmodule.c
@@ -685,8 +685,16 @@
return -1;
}
if (n < size) {
- PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
- "Please, report in http://bugs.python.org/");
+ /*
+ ** Data writed to /dev/poll is a binary data structure. It is not
+ ** clear what to do if a partial write occurred. For now, raise
+ ** an exception and see if we actually found this problem in
+ ** the wild.
+ */
+ PyErr_Format(PyExc_IOError, "failed to write all pollfds. "
+ "Please, report at http://bugs.python.org/. "
+ "Data to report: Size tried: %d, actual size written: %d",
+ size, n);
return -1;
}
return 0;
""" If there are no disagreements, I will commit to default (3.3) soon (in a few hours/a day). Thanks!. |
New changeset 8f7ab4bf7ad9 by Jesus Cea in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: