Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on second getMessage() call after EAGAIN error #14

Closed
kdeme opened this issue Nov 30, 2018 · 5 comments
Closed

Segfault on second getMessage() call after EAGAIN error #14

kdeme opened this issue Nov 30, 2018 · 5 comments

Comments

@kdeme
Copy link
Contributor

kdeme commented Nov 30, 2018

The handler (with getMessage()) where this happens is: https://github.com/status-im/nim-eth-p2p/blob/master/eth_p2p/discovery.nim#L243

I think segfault will occur because of following flow:

  1. Error (EAGAIN in this case) occurs on recvfrom in readDatagramLoop and setReadError is ran.
  2. A raise transp.getError() is done in getMessage or peekMessage in the transport function (see handler above).
  3. Next time the handler is ran for this DatagramTransport the segfault will occur at getMessage.

Why EAGAIN happens, I don't know.

So far, I can only reproduce this (not 100% of the time) by running: https://github.com/status-im/nim-eth-p2p/blob/master/tests/tshh_connect.nim

This was tested only on a Linux x86_64 machine.

@cheatfate
Copy link
Collaborator

I need to understand when EAGAIN happens, at the end or in process?

@kdeme
Copy link
Contributor Author

kdeme commented Nov 30, 2018

in process, the err var here contains it:

And then it gets raised by the getMessage call in the handler. The segfault happens only later on a second getMessage.

@cheatfate
Copy link
Collaborator

But why you are trying to perform one more getMessage()?

@kdeme
Copy link
Contributor Author

kdeme commented Nov 30, 2018

I don't. It gets called automatically by the handler on next data received by same DatagramTransport.

@kdeme
Copy link
Contributor Author

kdeme commented Nov 30, 2018

I think I found the root cause of my EAGAIN error. It is due to a poll() in an async proc (which is not nice anyhow, but that was not the point).
If you look at the poll() code it makes sense that this can happen, although it requires some conditions.

A "normal" poll() which adds the async proc with the poll() inside to the callbacks and next also the readDatagramLoop handler on an Event.Read . processCallbacks must run first the async proc with poll(). Next in that poll again Event.Read for same socket, processCallbacks runs again, both callbacks are run ... et voila, EAGAIN

So I guess this can be closed, but perhaps some documentation could be added that this should not be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants