Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upUnix: Degregister descriptors that return ECONNRESET #149
Conversation
580f9d6
to
d6e0d08
| @@ -500,6 +501,8 @@ impl OsIpcReceiverSet { | |||
| } | |||
| Err(err) if err.channel_is_closed() => { | |||
| self.pollfds.remove(&evt_token).unwrap(); | |||
| let io = EventedFd(&poll_entry.fd); | |||
| self.poll.deregister(&io).unwrap(); | |||
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 15, 2017
Author
Collaborator
After I thought about it and looked at the code more, I realized the real problem is that we don't deregister the descriptor when we hit an error here. Theoretically just deregistering the descriptor here should resolve the issue.
This comment has been minimized.
This comment has been minimized.
antrik
Feb 15, 2017
Contributor
Well, we actually discussed this in the original PR IIRC. My assumption was that we don't need to remove it explicitly, since closing the FD should do so automatically -- but I wasn't aware of the duplicate FD special case back then :-( This makes total sense now: after the sender end is closed, and no more messages outstanding, recv() returns the channel_is_closed status (i.e. ECONNRESET), so we close the original FD and drop the token -- but a duplicate FD might still be open, in which case epoll_wait() will diligently keep reporting the hup status in following calls.
Explicitly removing the FD should indeed fix this for good. Have you tried whether this alone fixes the failures?
(BTW, is there any particular reason why you introduced the io temporary? It doesn't seem more readable to me than as a one-liner... Might be different perhaps if the temporary name was more informative -- but io doesn't really tell me anything here...)
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 15, 2017
Author
Collaborator
This makes total sense now
I know right. After I saw this, it seemed plain as day.
Explicitly removing the FD should indeed fix this for good. Have you tried whether this alone fixes the failures?
Yes. Just removing the fd causes the tests to pass.
BTW, is there any particular reason why you introduced the io temporary?
gdb if I remember right. Had a breakpoint that printed its value. I can remove it.
|
@antrik Does that look ok to you? It made the try run succeed at least. |
|
The later change looks good; the original one I have doubts about... See comments. BTW, any chance we could create an |
| @@ -500,6 +501,8 @@ impl OsIpcReceiverSet { | |||
| } | |||
| Err(err) if err.channel_is_closed() => { | |||
| self.pollfds.remove(&evt_token).unwrap(); | |||
| let io = EventedFd(&poll_entry.fd); | |||
| self.poll.deregister(&io).unwrap(); | |||
This comment has been minimized.
This comment has been minimized.
antrik
Feb 15, 2017
Contributor
Well, we actually discussed this in the original PR IIRC. My assumption was that we don't need to remove it explicitly, since closing the FD should do so automatically -- but I wasn't aware of the duplicate FD special case back then :-( This makes total sense now: after the sender end is closed, and no more messages outstanding, recv() returns the channel_is_closed status (i.e. ECONNRESET), so we close the original FD and drop the token -- but a duplicate FD might still be open, in which case epoll_wait() will diligently keep reporting the hup status in following calls.
Explicitly removing the FD should indeed fix this for good. Have you tried whether this alone fixes the failures?
(BTW, is there any particular reason why you introduced the io temporary? It doesn't seem more readable to me than as a one-liner... Might be different perhaps if the temporary name was more informative -- but io doesn't really tell me anything here...)
| panic!("Readable event for unknown token: {:?}", evt_token) | ||
| // Do not panic for readable events that are errors or hup. We | ||
| // have already closed the descriptor. | ||
| if !(evt_kind.is_error() || evt_kind.is_hup()) { |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 15, 2017
Contributor
I have some doubts about this. So this means its possible for the event to be readable and at the same time hup or error? (Is this documented anywhere?) If that's the case, maybe we need to rethink the way we handle error and/or hup status in general?... With all that uncertainty around this, I'd rather defer such changes, unless it is strictly necessary to fix the issue at hand.
In either case, I still don't see though how we could ever get a readable event after dropping the token, since we only do that after we get informed that the sender closed the channel... So I'd rather we still panic if that should ever happen. That's what uncovered the actual bug above, after all...
This would be highly valuable, but I wouldn't mind if this lands later. |
d6e0d08
to
c2ca2a2
|
I removed the second part and only left the
I can try. I'll submit that as a separate PR so that I am no longer blocking the PR @nox can you try your PR with the updated changes. Sorry to keep asking you to do this. |
| @@ -500,6 +501,8 @@ impl OsIpcReceiverSet { | |||
| } | |||
| Err(err) if err.channel_is_closed() => { | |||
| self.pollfds.remove(&evt_token).unwrap(); | |||
| let io = EventedFd(&poll_entry.fd); | |||
| self.poll.deregister(&io).unwrap(); | |||
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 15, 2017
Author
Collaborator
This makes total sense now
I know right. After I saw this, it seemed plain as day.
Explicitly removing the FD should indeed fix this for good. Have you tried whether this alone fixes the failures?
Yes. Just removing the fd causes the tests to pass.
BTW, is there any particular reason why you introduced the io temporary?
gdb if I remember right. Had a breakpoint that printed its value. I can remove it.
After we return ECONNRESET from recv, we remove the descriptors token from the pollfds HashMap. Ensure that we also deregister the descriptor when this occurs.
c2ca2a2
to
c16ac1d
c16ac1d
to
e0f660c
|
@bors-servo r+ |
|
|
|
|
|
@bors-servo r=antrik I Git bad. |
|
|
|
|
@bors-servo retry I also Homu bad. |
|
|
dlrobertson commentedFeb 15, 2017
•
edited
Deregister descriptors that return
ECONNRESETonrecv.Resolves: #133