Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghost message appear on server #150

Closed
ghost opened this issue Apr 7, 2015 · 8 comments
Closed

Ghost message appear on server #150

ghost opened this issue Apr 7, 2015 · 8 comments

Comments

@ghost
Copy link

ghost commented Apr 7, 2015

A simple ping pong rpc with increasing counter. Client send ping every n seconds interval, specifying timeout on context.

What happened:

  1. Client-server ping pong smoothly.
  2. Turn off network connection.
  3. Client failed "context deadline exceeded". Client's main exited.
  4. Turn on network connection.
  5. After sometime, server received ghost ping message with updated counter.

Expected:
Server shouldn't receive message, since client process already exit.

Seems like low level retransmission problem. I'm not familiar with HTTP2 protocol.

@iamqizhao
Copy link
Contributor

This should not happen if the interval between step 2 and 4 is long enough.
Can you provide the timing of the above steps (especially, step 2, 3, 4, 5).

I am on vacation this week and will try to look into it sometime next week
if you can provide enough info to facilitate my debugging.

On Mon, Apr 6, 2015 at 7:22 PM, prazzt notifications@github.com wrote:

A simple ping pong rpc with increasing counter. Client send ping every n
seconds interval, specifying timeout on context.

What happened:

  1. Client-server ping pong smoothly.
  2. Turn off network connection.
  3. Client failed "context deadline exceeded". Client's main exited.
  4. Turn on network connection.
  5. After sometime, server received ghost ping message with updated counter.

Expected:
Server shouldn't receive message, since client process already exit.

Seems like low level retransmission problem. I'm not familiar with HTTP2
protocol, but if we can control it, I expect client to not retransmit once
the context is canceled.


Reply to this email directly or view it on GitHub
#150.

@ghost
Copy link
Author

ghost commented Apr 7, 2015

Ahh okay, just enjoy your holiday man .. :)

This should not happen if the interval between step 2 and 4 is long enough.

How long is long enough ? Where is this number defined ?
I only tested for short interval between 2-4 (around 2 minutes), will try to test for longer duration.

@ghost
Copy link
Author

ghost commented Apr 7, 2015

I added a gist to help debugging. In my test, after turning of network about 2.5 minutes, message won't be retransmitted.

Is this expected ? how can I control the period ?

@iamqizhao
Copy link
Contributor

I had difficulty to reproduce it. Are you sure the client's main existed before the network connection is on?

@iamqizhao iamqizhao reopened this Apr 15, 2015
@ghost
Copy link
Author

ghost commented Apr 16, 2015

Yes, I'm sure. I just reinstalled all dependencies (grpc, protobuf) just in case.

  • go version go1.4 linux/386
  • two machines, internal network

Steps:

  • Run client.go on machine1, server.go on machine2
  • Turn off network connection in machine1, it should exit automatically. Output
    2015/04/16 06:04:52 Received 0 from server
    2015/04/16 06:04:55 Received 1 from server
    2015/04/16 06:04:58 Received 2 from server
    2015/04/16 06:05:01 Received 3 from server
    2015/04/16 06:05:09 stream error rpc error: code = 4 desc = "context deadline exceeded"
    2015/04/16 06:05:09 client shutting down
  • Machine2 stays on:
        2015/04/16 06:04:52 Received 0 from client
        2015/04/16 06:04:55 Received 1 from client
        2015/04/16 06:04:58 Received 2 from client
        2015/04/16 06:05:01 Received 3 from client
  • Turns on network on machine1
  • After some period, the fourth message appear on machine2
    2015/04/16 06:05:58 Received 4 from client
    2015/04/16 06:05:58 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport: use of closed network connection"

@ghost
Copy link
Author

ghost commented Apr 16, 2015

If it helps, netstat says FIN_WAIT1 on machine1 while connection's off.

@pires
Copy link

pires commented May 15, 2015

According to this

All segments preceding and including FIN
will be retransmitted until acknowledged.  When the other TCP has
both acknowledged the FIN and sent a FIN of its own, the first TCP
can ACK this FIN.

Can you try and turn the network only after the socket on the client side no longer exists? I believe the timeout on a standard Linux system is 2 minutes.

@iamqizhao
Copy link
Contributor

On Thu, Apr 16, 2015 at 6:42 AM, prazzt notifications@github.com wrote:

If it helps, netstat says FIN_WAIT1 on machine1 while connection's off.

okay, it seems you did not unplug the network cable. Instead you disabled
network connection causing machine1 sent FIN. In this case, the kernel of
machine1 will keep retrying with exponential backoff tcp_orphan_retries
(default 8) times. If you enable network during retrying, the server could
receive 4th message. If you use netstat to monitor the state of tcp socket
on machine1 to make sure it disappears, the server should not receive 4th
msg any more.


Reply to this email directly or view it on GitHub
#150 (comment).

@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants