New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TCPNeighbor send() synchronized to prevent queue overflows #940

Merged
merged 3 commits into from Sep 16, 2018

Conversation

Projects
None yet
3 participants
@gjeee
Copy link
Contributor

gjeee commented Aug 16, 2018

Description

 @Override
    public void send(DatagramPacket packet) {
        if (sendQueue.remainingCapacity() == 0) {
            sendQueue.poll();
        }
        byte[] bytes = packet.getData().clone();
        sendQueue.add(ByteBuffer.wrap(bytes));
}

The sendQueue which is used in #TCPNeighbor.send() is a ArrayBlockingQueue with a capacity of 10. I think this method needs a lock so that the poll() and add() are done synchronized (to prevent queue full exceptions), since three threads are calling this send method:

  1. spawnReplyToRequestThread: replyToRequestFromQueue -> replyToRequest -> sendPacket -> neighbor.send(sendingPacket);
  2. spawnTipRequesterThread: neighbors.forEach(n -> n.send(tipRequestingPacket));
  3. spawnBroadcasterThread: sendPacket -> neighbor.send(sendingPacket);

Although #Node.sendPacket() also contains a lock, the spawnTipRequesterThread does not provide such a mechanism. This can also explain that sometimes intermediate milestones are skipped (because of these dropouts).

Fixes #841
Fixes #937
Fixes #837

It also might be useful to increase the size of this queue, so when people are syncing not too many are dropped.

Type of change

  • Bug fix (a non-breaking change which fixes an issue)

How Has This Been Tested?

No.

Checklist:

Please delete items that are not relevant.

  • My code follows the style guidelines for this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
Update TCPNeighbor.java
easy fix to prevent queue overflows
Update TCPNeighbor.java
increased queue
@GalRogozinski
Copy link
Member

GalRogozinski left a comment

going for testing

@@ -19,7 +19,7 @@
private static final Logger log = LoggerFactory.getLogger(Neighbor.class);
private int tcpPort;

private final ArrayBlockingQueue<ByteBuffer> sendQueue = new ArrayBlockingQueue<>(10);
private final ArrayBlockingQueue<ByteBuffer> sendQueue = new ArrayBlockingQueue<>(1000);

This comment has been minimized.

@GalRogozinski

GalRogozinski Aug 22, 2018

Member

This change has to be tested...
@gjeee do you see the queue filling up?

@@ -84,11 +84,14 @@ public void setSink(Socket sink) {
*/
@Override
public void send(DatagramPacket packet) {
if (sendQueue.remainingCapacity() == 0) {
sendQueue.poll();
synchronized (sendQueue) {

This comment has been minimized.

@GalRogozinski

GalRogozinski Aug 22, 2018

Member

That is the quickest solution. Kinda patchy but this is the best thing to do right now.
@akashgoswami is currently working on a new implementation of networking which will solve these issues

Update TCPNeighbor.java
added 2 log statements to check if queue fills up.
@gjeee

This comment has been minimized.

Copy link
Contributor

gjeee commented Aug 22, 2018

ok i set the capacity back to 10 and added some log statements to see how this queue behaves.
i tested it and the maximum it reached was 6 and only just after start-up. after a while it is almost always 0. however i only have 1 tcp neighbor and was almost completely synched when i restarted.

i guess it can start dropping msgs if you have more tcp neighbors and when you startup with a more out-of-sync node.

@grittibaenz (he has 5 tcp neighbors) will test this patch as well and report on the findings.

@gjeee

This comment has been minimized.

Copy link
Contributor

gjeee commented Aug 22, 2018

well i was too premature, in my setup after a while it starts dropping (like crazy)....it happened around

  • 14:41:46
  • 14:57:02
  • 15:21:38

pls check my logs sendQueue.txt

this explains a lot if weird syncing issues with nodes with tcp-neighbors

@grittibaenz

This comment has been minimized.

Copy link

grittibaenz commented Aug 22, 2018

@gjeee asked me to run IRI 1.5.3 + 940.patch. I left the node running for 20 minutes. It produced 26038 messages with the line:

08/22 15:38:22.550 [pool-3-thread-1] INFO  com.iota.iri.network.Neighbor - Sendqueue full...dropped 1 tx
...
08/22 15:58:13.659 [pool-3-thread-1] INFO  com.iota.iri.network.Neighbor - Sendqueue full...dropped 1 tx

I have 8 neighbors, 5 with TCP and 3 UDP. Please let me know if you need more details.

@gjeee

This comment has been minimized.

Copy link
Contributor

gjeee commented Aug 22, 2018

@GalRogozinski
after increasing the size to 100, the maximum utilization i have seen so far is 22. so i think 50 or 100 would be sufficient to minimize risk of messages getting dropped. i can do a final commit where i set the correct capacity and only issue a warning in case of a dropped msg.

@GalRogozinski GalRogozinski merged commit d8dbd54 into iotaledger:dev Sep 16, 2018

2 checks passed

Codacy/PR Quality Review Up to standards. A positive pull request.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gjeee

This comment has been minimized.

Copy link
Contributor

gjeee commented Oct 17, 2018

i see this has been merged. however the latest commit still contains superfluous log statements. shall i do a new PR? @GalRogozinski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment