New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle EINTR in the stdlib #63085
Comments
As discussed in http://mail.python.org/pipermail/python-dev/2013-August/128204.html, I think that we shouldn't let EINTR leak to Python code: it should be handled properly by the C code, so that users (and the Python part of the stdlib) don't have to worry about this low-level historical nuisance. For code that doesn't release the GIL, we could simply use this glibc macro: Now, I'm not sure about how to best handle this for code that releases the GIL. Basically:
should become begin_handle_eintr: if (pid < 0 && errno == EINTR) {
if (PyErr_CheckSignals())
return NULL;
goto begin_handle_eintr;
} Should we do this with a macro? If yes, should it be a new one that should be placed around Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS (like BEGIN_SELECT_LOOP in selectmodule.c) or could we have a single macro that would do both (i.e. release the GIL / reacquire the GIL, and try again in case of EINTR, unless a signal handler raised an exception)? From a cursory look, the main files affected would be: |
FYI - use the changes made in http://bugs.python.org/issue12268 as a guide for how to deal with EINTR properly at the C level. See the _PyIO_trap_eintr() function for example. See also _eintr_retry_call() in Lib/subprocess.py. FWIW, there are times when we *want* the interrupted system call to return control to Python rather than retrying the call. If someone is making a Python equivalent of the low level system call such as select() or poll(), the EINTR should be exposed for Python code to handle. Things like time.sleep() are documented as sleeping for less time when a signal has arrived even though an exception may not be raised. People have written code which depends on this behavior so adding an EINTR retry for the remaining sleep time would break some programs. Getting an EINTR errno does *not* mean you can simply retry the system calls with the exact same arguments. ie: If you did that with the select() call within time.sleep it'd be trivial to make the process sleep forever by sending it signals with a frequency less than the sleep time. |
Gregory, thanks, that's what I was planning to do. But since the recent discussions (mainly on selectors), there are points I obviously don't - and won't - agree with (such as select() returning EINTR or returning early, same for sleep()), I'm not interested in this anymore. (BTW, as for applications relying on EINTR being returned, I'm positive *way more applications* will break because of the recent change making file descriptors close-on-exec by default). |
I'm a bit curious, do you know of any use cases?
As mentioned in another issue, you would use a special wakeup fd to
Indeed. That's already done in e.g. socketmodule.c : take a look at the |
On Sat, Aug 31, 2013 at 9:56 AM, Charles-François Natali
Whoa. Maybe you're overreacting a bit? I personally see a big divide
Again, I'd make a distinction: I agree for send(), recv() etc., but I |
I wrote too many words. In short: time.sleep()'s behavior should remain as it is today given how it is documented to behave. If you disagree, consider adding an optional interruptable=False parameter so that both behavior options exist. ALL IO calls and wait* should handle EINTR transparently for the user and never expose it to the Python application. select(), poll() and equivalents. If you want to transparently handle EINTR on these, just make sure you deal with the timeouts properly. While I suspect a few people wanted to see the signal interruption on those I agree: very uncommon and undesirable for most. If people need a specific signal interruption they should define a signal handler that raises. |
(replying to Guido's post in another thread)
I don't think that's a problem: the way I was planning to tackle signals is to call PyErr_CheckSignals() before retrying upon EINTR: this runs signal handlers, and returns a non 0 value if an exception occured (e.g. KeyboardInterrupt): if that's the case, then we simply break out of the loop, and let the exception bubble up. |
Alright, here's a first step: select/poll/epoll/etc now return empty |
Am I correct in thinking that you're simply replacing the OSError(EINTR) with returning empty lists? This is bound to subtly break code, e.g. the code that expects reasonably that a return value of three empty lists means the timeout really ran out (i.e. the version of the code that is already the most careful). Shouldn't you restart the poll with the remaining time until timeout? |
I wouldn't call that "being the most careful". I've always had an implicit understanding that calls with timeouts may, for whatever reason, return sooner than requested (or later!), and the most careful approach is to re-check the clock again. |
I've always had the implicit understanding that if I use an *infinite* timeout then the call will not timeout. |
Wow, that's a good point. select() and friends are not documented to If we don't want select() to silently retry on EINTR, then I think we Speaking of which, I see that SelectSelector.select() returns an empty |
exactly. at the system call level you can be interrupted. re-checking the clock is the right thing to do if the elapsed time actually matters.
We should go ahead and retry for the user for select/poll/epoll/kqueue. If they care about being able to break out of that low level call due to a signal, they should set a signal handler which raises an exception. I have *never* seen code intentionally get an EINTR exception from a select or poll call and have often seen code tripped up because it or a library it was using forgot to handle it. We're a high level language: Lets be sane by default and do the most desirable thing for the user. Retry the call internally with a safely adjusted timeout: |
We went through this whole discussion before. Returning immediately with three empty lists is better than raising InterruptedError. Retrying is not always better. |
Modules/socketmodule.c is using a simple style to implement socket timeouts using select(). If I were to naively copy this style over to pure Python, it would work in current Pythons; I'd get occasionally an OSError(EINTR), which I would have presumably been annoyed with and am now catching properly. Now if my working code was made to run with a select() modified as proposed, an EINTR would instead cause the program to fail more obscurely: its sockets occasionally -- and apparently without reason -- time out much earlier. In that situation I would have a hard time finding the reason, particularly if running on an OS where the system select() doesn't spuriously return early with a timeout ("man select" on Linux guarantees this, for example). Similarly, an existing program might rely on select() with an infinite timeout to only return when one of the descriptors is ready, particularly if called with only one or two descriptors. Overall, I would far prefer the status quo over a change in the logic from one slightly-subtle situation to another differently slightly-subtle one. I believe this would end up with programs that need to take special care about both kinds of subtlenesses just to run on two versions of Python. I may be wrong, in this case sorry to take your time. :-) |
Guido's point was that it is already a bug in code to not check the elapsed As for "presumably have been annoyed by the occasional OSError(EINTR) and That's what has gotten me on a kick to hide EINTR from python developers For the record: I am perfectly fine with select and friends returning an What I don't want is to ever see OSError(EINTR) in the future. |
Just for the record, I was initially in favor of recomputing the AFAICT, an early return for calls such as poll()/epoll() etc is
Well, I've always assumed that time.sleep(n) would sleep n seconds, but:
"""
static int
floatsleep(double secs)
[...]
Py_BEGIN_ALLOW_THREADS
err = select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &t);
Py_END_ALLOW_THREADS
if (err != 0) {
#ifdef EINTR
if (errno == EINTR) {
if (PyErr_CheckSignals())
return -1;
}
else
#endif
{
PyErr_SetFromErrno(PyExc_IOError);
return -1;
}
}
[...]
""" So really, I'm like Gregory: I don't care which solution we chose, but |
I don't understand how it's a bug. You're assuming select() has |
On dim., 2013-12-01 at 08:14 +0000, Charles-François Natali wrote:
Well this is wishing thinking, since by returning an empty list you |
I know that returning an empty list changes the semantics: I just If you want to implement retry with timeout re-computation, I'm not (BTW, if we go this way, then time.sleep() should probably also be |
Or, since we now have the selectors module, we could let select() live By the way, it's already too late for 3.4, which is in feature freeze. |
I do not consider this a feature; that EINTR is exposed as an exception from the API is a bug. But Larry is the only one who can actually make that decision as the 3.4 release manager (+nosy'd).
The user now only has one thing to deal with instead of two: an empty list being returned; something they should already have been dealing with. Gone will be the OSError(EINTR) exception as a rare, often never tested for, alternate form of the same retry needed indication. I never see code intentionally wanting to receive and handle an OSError(EINTR) exception but I constantly run into code that is buggy due to some library it is using not getting this right... Where it isn't up to the code exhibiting the problem because the only place to fix it is within the library they use that is outside of that code's control. We've got the opportunity to fix this nit once and for all here, lets do it. |
select() currently works as specified; you are proposing a We're left with the fact that the API is inconvenient: but we now have (or we can retry on EINTR, which has the benefit of not creating new
Returning an empty list when no timeout has been passed has never been a |
I don't want this checked in to 3.4. (Congratulations, this is my first "no" as a release manager!) |
FYI Charles-François and me are working on a PEP to address this issue: the PEP-475. The PEP is not ready yet for a review. |
See also bpo-23285 for the PEP |
Yes, this issue was fully fixed by the implementation of the PEP-475 in Python 3.5. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: