Nodes are not reconnecting to each other after OnTerminated due to long time without connection #462

vncoelho · 2018-11-11T20:38:23Z

We are not expert in this part of the code, but we are now with a dedicated team checking it out in different manners.

After removing the connection from a node, as expected, it finished all its connections after some couple of minutes:

OnTerminated bye bye  
 endPoint.Address=172.18.0.2
OnTerminated bye bye  
 endPoint.Address=172.18.0.6
OnTerminated bye bye  
 endPoint.Address=172.18.0.3
OnTerminated bye bye  
 endPoint.Address=172.18.0.9
OnTerminated bye bye  
 endPoint.Address=172.18.0.8
OnTerminated bye bye  
 endPoint.Address=172.18.0.7
OnTerminated bye bye  
 endPoint.Address=172.18.0.4

However, the problem is that it is not being able to fully reconnect to these nodes. This is one of the main causes of chaynsync problems both for Consensus Nodes and for normal Seed RPC nodes.

After all these aforementioned batch of OnTerminated calls the actors seams to have some problem in reestablishing the connections, which are often reporting OnTerminated after some time.
But is keep reporting ConnectedPeers.Count close to the limit (10 as default).

The text was updated successfully, but these errors were encountered:

vncoelho · 2018-11-11T20:51:51Z

Even in normal operation connections are dropping all time and calling OnTerminated for some peers, even when they are stable, maybe it is due to the lack of communication between them.

We are going to double check that.

erikzhang · 2018-11-14T07:09:13Z

Please find out where the problem lies.

vncoelho · 2018-11-14T11:13:47Z

I am trying, master. aehuaheuaea

We noticed that this Timeout is kind of normal.
We need some more experiments. Let's keep this in mind that there is a possible problem. We are not 100% sure.

vncoelho · 2018-11-14T15:00:15Z

An issue was also opened at Akka.Net

vncoelho · 2018-11-15T14:37:33Z

Apparently, the lines are these one, @erikzhang:

neo/neo/Network/UPnP.cs

Lines 33 to 35 in 60a02ee

    
           s.SendTo(data, ipe); 
        
           s.SendTo(data, ipe); 
        
           s.SendTo(data, ipe);

By commenting these lines we can stop the problem reported here and in #463. Even commenting these lines everyone seams to work normally. O.o
I got it was a HandShake message.

When not commented we have:

[ERROR][11/15/2018 14:36:29][Thread 0003][akka://NeoSystem/user/$b] Network is unreachable
Cause: System.Net.Sockets.SocketException (0x80004005): Network is unreachable
   at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at System.Net.Sockets.Socket.SendTo(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, EndPoint remoteEP)

It can not communicate saying it is unreachable.
But the network is on and we still can ping that machine.

Maybe it is something related to the way it was terminated, because it works perfectly until we have problem with the Network adapter.

vncoelho · 2018-11-15T15:08:59Z

By the way, what is this Socket, Erik? aheuaheuahea

Is it TCP or WS?
I still did not see where WS are used.

vncoelho · 2018-11-21T20:59:17Z

@erikzhang and @jsolman, the strange think is that when nodes are stuck like this and trying to sync restarting is the best choice.

I mean, by killing the application and restarting again the consensus nodes sync very fast.
In this sense, I think that the problem might be something related to priority in receiving the messages (some are expiring and getting lost).

The two options I thought are:

Create a method that kills neosystem and restart everything inside neo
Stop all services and give total priority in just receiving blocks

What do you think, @erikzhang?

vncoelho · 2018-11-21T21:04:17Z

I like the idea of destroying NeoSystem and initialize it again.
However, it does not track and solve the source of the problem.
But I think it is a simple and good solution for now.

Basically we would need to call method ReinitializaNeoSystem() when we detected that a node is getting behind.

vncoelho · 2019-04-09T16:40:06Z

I believe this has been solved, but let's keep a reference of this on #620.

erikzhang added the discussion Initial issue state - proposed but not yet accepted label Nov 11, 2018

vncoelho mentioned this issue Nov 14, 2018

Network is unreachable #463

Closed

vncoelho mentioned this issue Nov 22, 2018

Why are blocks created with only the miner transaction when the mempool has transactions #474

Closed

vncoelho closed this as completed Apr 9, 2019

Thacryba pushed a commit to simplitech/neo that referenced this issue Feb 17, 2020

recent changes (neo-project#462)

b484ccd

Lichen9618 mentioned this issue Dec 7, 2023

node: ?Neo3-GUI?Design & Implementation #3013

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes are not reconnecting to each other after OnTerminated due to long time without connection #462

Nodes are not reconnecting to each other after OnTerminated due to long time without connection #462

vncoelho commented Nov 11, 2018 •

edited

vncoelho commented Nov 11, 2018 •

edited

erikzhang commented Nov 14, 2018

vncoelho commented Nov 14, 2018

vncoelho commented Nov 14, 2018

vncoelho commented Nov 15, 2018 •

edited

vncoelho commented Nov 15, 2018 •

edited

vncoelho commented Nov 21, 2018 •

edited

vncoelho commented Nov 21, 2018 •

edited

vncoelho commented Apr 9, 2019

Nodes are not reconnecting to each other after OnTerminated due to long time without connection #462

Nodes are not reconnecting to each other after OnTerminated due to long time without connection #462

Comments

vncoelho commented Nov 11, 2018 • edited

vncoelho commented Nov 11, 2018 • edited

erikzhang commented Nov 14, 2018

vncoelho commented Nov 14, 2018

vncoelho commented Nov 14, 2018

vncoelho commented Nov 15, 2018 • edited

vncoelho commented Nov 15, 2018 • edited

vncoelho commented Nov 21, 2018 • edited

vncoelho commented Nov 21, 2018 • edited

vncoelho commented Apr 9, 2019

vncoelho commented Nov 11, 2018 •

edited

vncoelho commented Nov 11, 2018 •

edited

vncoelho commented Nov 15, 2018 •

edited

vncoelho commented Nov 15, 2018 •

edited

vncoelho commented Nov 21, 2018 •

edited

vncoelho commented Nov 21, 2018 •

edited