-
Notifications
You must be signed in to change notification settings - Fork 370
Repeated adding/removing of neighbors brakes TCP connection #345
Comments
I will add a 2 Gi bounty to who ever fixes this issue. |
Tried almost close to 20 times. Cannot reproduce it. |
Hi @nayanmshah, apologies for my late reply, but I was swamped with work these days. |
+1 reproducible by almost everyone testing Nelson at this moment. Does this affect UDP connections? |
+1 Also saw this which I've never seen before: |
@romansemko No this only affects TCP connections. |
By the way, the 2 Gi bounty is still available if this gets fixed |
Oh, maybe I should give it a shot then 👍 |
Would be awesome! |
@bahamapascal Can you retry with 1.4.1.4? I can't reproduce this on 1.4.1.4 after 50+ remove/add neighbors. |
I can try running Nelson with TCP. Will report if I see any difference. |
@mlouielu Will do and let you know |
Tested with 1.4.1.4 again and same issue. |
I'm not a java developer but I tried finding the problem and fixing it. I didn't commit anything but here's the problem explanation + fix: The problem refers to the multithreading of ReplicatorSinkPool. The pool is currently limited to 32 threads. If you add a TCP neighbor, a new thread will be created. If this pool exceeds the 32 threads (limit of 32 seen in network/Replicator.java) the new threads will be queued but never executed because of the already added ones that are still executing. To reproduce this behavior just set the limit in Replicator.java (NUM_THREADS) to 3 and try removing and adding a TCP neighbor 4 times. I fixed the problem after many tries with the following:
After that I added some lines to network/ReplicatorSinkPool.java:
What happens is the following: Reach out to me on slack if I'm wrong: @Leibi133 |
I hope you get the bounty. :D |
Sorry, you're surely totally right @alon-e. So please forget about my last fix. The correct fix is the following (without the old fix):
network/Node.java:
network/replicator/ReplicatorSinkProcessor.java:
Lines with + have been added, lines with - removed. All other lines weren't touched. |
I have forked IRI and integrated the changes as @Leibi133 decribed: Have tested it now and it's working perfectly! A big thanks to Leibi33 for the Fix! |
The fix is now integrated to the dev branch: 3049751 Thanks to @Leibi133 for the fix as well as Alon-e and Paul H for integrating it to the dev brach. |
[fix iotaledger#265] Define Ancestor forward parameters and data types
When a TCP neighbor is repeatedly added and removed via API call it will eventually brake the connection
This behavior varies, but in general after 10 - 20 removals and re-adding's it will trigger this bug.
Reestablishing of the connection is then only possible by restarting the IRI application. Some times if that doesn't help, the affected TCP neighbor also has to restart his node.
UDP neighbors are not affected by this.
The text was updated successfully, but these errors were encountered: