Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant warnings when using coro.with_timeout on IO operations #45

Open
ghost opened this issue Jul 2, 2013 · 6 comments
Open

Redundant warnings when using coro.with_timeout on IO operations #45

ghost opened this issue Jul 2, 2013 · 6 comments

Comments

@ghost
Copy link

ghost commented Jul 2, 2013

def with_timeout(coro.sock conn):
try:
    coro.with_timeout(io_method, conn)
except coro.TimeoutError, unused:
    cleanup()
...

Inside the cleanup method, I cannot do anything meaningful without getting "notify_of_close ... unable to interrupt thread" message (this includes deferring cleanup to another coroutine).

Interestingly, if I don't trap coro.TimeoutError, coro.event_loop exits with coro.DeadCoroutine exception.

What is a correct way to impose timeout on IO operation? On a coroutine?

@ehuss
Copy link
Contributor

ehuss commented Jul 2, 2013

If I had to guess, you need to pass the method and args to the with_timeout function like this:

coro.with_timeout(io_method, conn)

@ghost
Copy link
Author

ghost commented Jul 3, 2013

Thanks for correction! So, what i find is that "unable to interrupt thread" messages are harmless, but there is no way to get rid of them. When coro.TimeoutError is handled, the current coroutine is active, but it is also likely waiting for IO. When the socket is closed, notify_of_close is invoked, and it finds that the coroutine which is waiting in the poller queue is also active. Therefore, "unable to interrupt thread" is produced. If there was a way to somehow dislodge the current coroutine from the poller when coro.TimeoutError is raised...

@ehuss
Copy link
Contributor

ehuss commented Jul 6, 2013

"Unable to interrupt thread" is there (IIRC) to catch some bugs. That message often indicates a problem with the code. For example, if the thread is scheduled to run because of a read or write event, and you close the socket before it is allowed to run, it will attempt to read or write on a closed file descriptor which can be dangerous (if the file descriptor gets reused). Using multiple coro threads on the same socket is an advanced topic and requires great care.

Are you interrupting the other thread and then closing the socket? You could swap that order (so that closing successfully schedules it). Without knowing more about what you are doing, it's tough to say, but in general it's not good to ignore that message.

Sam may remember more about it since he dealt with it most.

@samrushing
Copy link
Member

FYI: I see "notify_of_close: ..." errors all the time when I run coro on linux, so it's probably related to the linux epoller.

@ghost
Copy link
Author

ghost commented Jul 12, 2013

Hi,

i'm trying to do some network IO, with a timeout imposed. The problem, as I see it, is this:
Any live coroutine "exists" somewhere - it is scheduled to run, or waiting for timer, or waiting for network IO, condition variable, etc. Lets say our coroutine is trying to read, but the socket has nothing at this moment. The scheduler movies it from active queue to poller. The intention is that when data arrives on the socket, the poller will release and activate the coroutine.

In this situation, if the timeout exception is raised and intercepted, the coroutine receives control and becomes active. At the same time, it is also stays with the poller, waiting for the IO. lets say, I want to close the socket and move on.

When I close the socket, notify_of_close is invoked. It checks with the poller if there is any coroutine waiting for IO events for the socket I am closing. It sure finds this same coroutine which is running now, the one which trapped timeout exception and invokes socket.close(). The coroutine can't close itself, the scheduler is smart enough to realize this and issue a warning. This warning is harmless, but it serves to show that there is a kind of split-brain situation not taken care of. There is contention between timers and poller which needs to be sorted.

@ghost
Copy link
Author

ghost commented Jul 12, 2013

Here is a shortest example I can think of to illustrate the issue:

def test(coro.sock conn):
    try:
        # 3 seconds to read 10 bytes
        coro.with_timeout(3, conn.recv_exact, 10)
    except coro.TimeoutError, unused:
        conn.close()

The last line here can be changed to coro.spawn(conn.close) with some interesting results.

Perhaps what is missing is something like notify_of_timeout method, or some other way to remove the coroutine from the poller when timeout occures. For this to work, an IO event key can be set as an attribute of the coroutine while the poller has it. When timeout timer fires, it can check for this key, and if it exists, use it to remove the coroutine from the poller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants