New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes are not reconnecting to each other after OnTerminated due to long time without connection #462
Comments
Even in normal operation connections are dropping all time and calling We are going to double check that. |
Please find out where the problem lies. |
I am trying, master. aehuaheuaea We noticed that this Timeout is kind of normal. |
An issue was also opened at Akka.Net |
Apparently, the lines are these one, @erikzhang: Lines 33 to 35 in 60a02ee
By commenting these lines we can stop the problem reported here and in #463. Even commenting these lines everyone seams to work normally. O.o When not commented we have: [ERROR][11/15/2018 14:36:29][Thread 0003][akka://NeoSystem/user/$b] Network is unreachable
Cause: System.Net.Sockets.SocketException (0x80004005): Network is unreachable
at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
at System.Net.Sockets.Socket.SendTo(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, EndPoint remoteEP) It can not communicate saying it is unreachable. Maybe it is something related to the way it was terminated, because it works perfectly until we have problem with the Network adapter. |
By the way, what is this Socket, Erik? aheuaheuahea Is it TCP or WS? |
@erikzhang and @jsolman, the strange think is that when nodes are stuck like this and trying to sync restarting is the best choice. I mean, by killing the application and restarting again the consensus nodes sync very fast. The two options I thought are:
What do you think, @erikzhang? |
I like the idea of destroying NeoSystem and initialize it again. Basically we would need to call method |
I believe this has been solved, but let's keep a reference of this on #620. |
@erikzhang, @jsolman
We are not expert in this part of the code, but we are now with a dedicated team checking it out in different manners.
After removing the connection from a node, as expected, it finished all its connections after some couple of minutes:
However, the problem is that it is not being able to fully reconnect to these nodes. This is one of the main causes of chaynsync problems both for Consensus Nodes and for normal Seed RPC nodes.
After all these aforementioned batch of
OnTerminated
calls the actors seams to have some problem in reestablishing the connections, which are often reportingOnTerminated
after some time.But is keep reporting
ConnectedPeers.Count
close to the limit (10 as default).The text was updated successfully, but these errors were encountered: