Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handshake retransmissions #266

Merged
merged 8 commits into from
Feb 17, 2017
Merged

Add handshake retransmissions #266

merged 8 commits into from
Feb 17, 2017

Conversation

ianswett
Copy link
Contributor

@ianswett ianswett commented Feb 7, 2017

Add details about handshake retransmissions for issue #169

Copy link
Contributor

@martinduke martinduke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is fine, though there are some fairly significant editorial comments.

I don't believe this fully addresses #169, although from Tokyo I gather that some of the things I was worried about are actually application-layer actions.

@@ -414,6 +445,15 @@ acknowledged. DetectLostPackets is called every time there is a new largest
packet or if the loss detection alarm fires the previous largest acked packet is
supplied.

### Handshake Packets

The receiver MUST not trust an unencryped ack for encrypted packets. The receiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean to "trust" an ack? Perhaps " the receiver MUST ignore any acknowledgment of an encrypted packet that occurs in an ACK frame in an unencrypted packet"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinduke is right to ask, note that not just the acknowledgment, but the entire packet is now suspect. As we discussed elsewhere, I think that the entire packet needs to go away if it contains suspicious content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, will ignore the entire packet.

initial RTT value. If no previous RTT is available, the initial RTT defaults
to 200ms. Once an RTT measurement is taken, it MUST replace initial_rtt.

The client hello is retransmitted via a timer, similar to a tail loss probe, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues with the client and server paragraphs:

  • I think all the language about tail loss probe and delayed ack confuses it.
  • I'm afraid "client hello" is handshake specific
  • Separating out client and server behavior is also unnecessary, as the psuedocode is the same
  • This doesn't totally reflect the pseudo code.

So how about striking these two paragraphs and replacing with the following:
"Endpoints MUST retransmit handshake packets if not acknowledged(1) within a time limit. This time limit will be the largest of the rtt value and MinTLPTimeout. For each previous handshake timeout on this connection, the timeout doubles."

"Endpoints MUST(?) also retransmit handshake packets if they receive a packet from earlier in the handshake, indicating loss of the local packet."

(1) I don't remember, must we ack the client hello, or is the server hello sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ACK for the ClientHello isn't needed, but it does contain some useful information (if you don't care about provenance, that is).

@martinduke, your use of "timeout" in that proposed text is problematic I think. It could refer to the handshake timing out, but I think that you simply intended to retransmit on a doubling interval (i.e., 200ms, 400ms, 600ms, etc...).

I wonder if some amount of randomization on that retransmission is necessary to avoid synchronizing retransmissions across a large cohort of clients that are simultaneously disconnected, or whether variations in RTT will be enough to deal with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson Yeah, maybe "For each previous handshake restransmission on this connection, the time limit doubles".

The way I wrote this actually implies that the timeout doubling doesn't actually reset, i.e. chlo times out twice, causing the timeout to double. Then the Finished is lost, and remains at the doubled length. I'm not sure if I meant to do that, but be aware of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'll remove the TLP/delayed ack wording. It was there for motivation, but seems to be confusing more than helping.
  • Changed client hello to initial flight
  • Client vs server does matter in the case of stateless rejects, but I'll word that in as a special case where the connection 'terminates immediately' once the reject is sent.

I don't believe any of the TCP specs worry about randomization of handshake timers, so I don't think it's a large issue in practice.

receipt of another client hello.

Version negotiation packets are always stateless, and MUST be sent once per
client hello, and may be sent in response to 0RTT packets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...per handshake packet that uses an unsupported QUIC version,...

s/may/MAY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks done.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really need to stop using the github editor for doing this stuff. It shows.

@@ -213,14 +213,17 @@ or negotiated in order to better suit a variety of environments.
a packet lost. In fraction of an RTT.

* kMinTLPTimeout: 10ms
Minimum time in the future a tail loss probe alarm may be set for.
Minimum time in the future a tail loss probe alarm may be set for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editorial: this might look OK in markdown, but it turns into a real mess in the text version. I think that you want something like the following:

kMinTLPTimeout (default 10ms):
 : Minimum time in the future [...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked it to say default as you said, but the extra : looked odd in markdown and text, so I didn't add it.

Is there an example formatting you were trying to achieve I should copy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra ':' is correct -- it's Markdown formatting for a definition list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the '*' at the start of the definition title line prevents kramdown from detecting it as a definition list. You will note its absence in my example.

You also need to reformat the whole list consistent or, as you say, it looks odd.

initial RTT value. If no previous RTT is available, the initial RTT defaults
to 200ms. Once an RTT measurement is taken, it MUST replace initial_rtt.

The client hello is retransmitted via a timer, similar to a tail loss probe, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ACK for the ClientHello isn't needed, but it does contain some useful information (if you don't care about provenance, that is).

@martinduke, your use of "timeout" in that proposed text is problematic I think. It could refer to the handshake timing out, but I think that you simply intended to retransmit on a doubling interval (i.e., 200ms, 400ms, 600ms, etc...).

I wonder if some amount of randomization on that retransmission is necessary to avoid synchronizing retransmissions across a large cohort of clients that are simultaneously disconnected, or whether variations in RTT will be enough to deal with that.

initial RTT value. If no previous RTT is available, the initial RTT defaults
to 200ms. Once an RTT measurement is taken, it MUST replace initial_rtt.

The client hello is retransmitted via a timer, similar to a tail loss probe, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd to be talking about TLP when you don't have text on TLP yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -414,6 +445,15 @@ acknowledged. DetectLostPackets is called every time there is a new largest
packet or if the loss detection alarm fires the previous largest acked packet is
supplied.

### Handshake Packets

The receiver MUST not trust an unencryped ack for encrypted packets. The receiver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/unencryped/unencrypted

More generally, use "protected" rather than "encrypted' please. We have both confidentiality and integrity protection and both are relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that'd be unprotected packets? Or is there a better term?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is right. "unprotected" isn't great, but I don't have anything better.

@@ -414,6 +445,15 @@ acknowledged. DetectLostPackets is called every time there is a new largest
packet or if the loss detection alarm fires the previous largest acked packet is
supplied.

### Handshake Packets

The receiver MUST not trust an unencryped ack for encrypted packets. The receiver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinduke is right to ask, note that not just the acknowledgment, but the entire packet is now suspect. As we discussed elsewhere, I think that the entire packet needs to go away if it contains suspicious content.

@@ -431,7 +471,7 @@ Pseudocode for DetectLostPackets follows:
acked_packet.packet_number - reordering_threshold)
lost_packets.insert(unacked_packet.packet_number);
return lost_packets;
~~~
~~~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing space here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated with one follow up comment on protected terminology.

@@ -213,14 +213,17 @@ or negotiated in order to better suit a variety of environments.
a packet lost. In fraction of an RTT.

* kMinTLPTimeout: 10ms
Minimum time in the future a tail loss probe alarm may be set for.
Minimum time in the future a tail loss probe alarm may be set for.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked it to say default as you said, but the extra : looked odd in markdown and text, so I didn't add it.

Is there an example formatting you were trying to achieve I should copy?

initial RTT value. If no previous RTT is available, the initial RTT defaults
to 200ms. Once an RTT measurement is taken, it MUST replace initial_rtt.

The client hello is retransmitted via a timer, similar to a tail loss probe, but
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'll remove the TLP/delayed ack wording. It was there for motivation, but seems to be confusing more than helping.
  • Changed client hello to initial flight
  • Client vs server does matter in the case of stateless rejects, but I'll word that in as a special case where the connection 'terminates immediately' once the reject is sent.

I don't believe any of the TCP specs worry about randomization of handshake timers, so I don't think it's a large issue in practice.

initial RTT value. If no previous RTT is available, the initial RTT defaults
to 200ms. Once an RTT measurement is taken, it MUST replace initial_rtt.

The client hello is retransmitted via a timer, similar to a tail loss probe, but
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

receipt of another client hello.

Version negotiation packets are always stateless, and MUST be sent once per
client hello, and may be sent in response to 0RTT packets.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks done.

@@ -414,6 +445,15 @@ acknowledged. DetectLostPackets is called every time there is a new largest
packet or if the loss detection alarm fires the previous largest acked packet is
supplied.

### Handshake Packets

The receiver MUST not trust an unencryped ack for encrypted packets. The receiver
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, will ignore the entire packet.

@@ -414,6 +445,15 @@ acknowledged. DetectLostPackets is called every time there is a new largest
packet or if the loss detection alarm fires the previous largest acked packet is
supplied.

### Handshake Packets

The receiver MUST not trust an unencryped ack for encrypted packets. The receiver
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that'd be unprotected packets? Or is there a better term?

@@ -431,7 +471,7 @@ Pseudocode for DetectLostPackets follows:
acked_packet.packet_number - reordering_threshold)
lost_packets.insert(unacked_packet.packet_number);
return lost_packets;
~~~
~~~
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@martinduke martinduke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except for the one issue.

it is expected client hellos will be acked immediately or a handshake packet will
be sent immediately, so delayed acks do not need to be compensated for. As such,
it is always retransmitted at 2x the initial_rtt or smoothed_rtt.
Endpoints MUST retransmit handshake packets if not acknowledged(1) within a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (1) was my reference to a footnote in my comment. I think "not acknowledged" has to extend to be "not acknowledged and the next handshake message didn't arrive" but I'm not sure how to word that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I removed it and tweaked the text about exponential backoff again. Currently in the pseudocode(and the real code), all timers have their exponential backoff reset when an ack is received. The idea being, if you're getting some packets through, based the retransmission time on RTT. If you're not getting any data through, exponentially backoff because your RTT estimate may be wrong, and to avoid overloading the network.

Copy link
Contributor Author

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor update, PTAL.

it is expected client hellos will be acked immediately or a handshake packet will
be sent immediately, so delayed acks do not need to be compensated for. As such,
it is always retransmitted at 2x the initial_rtt or smoothed_rtt.
Endpoints MUST retransmit handshake packets if not acknowledged(1) within a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I removed it and tweaked the text about exponential backoff again. Currently in the pseudocode(and the real code), all timers have their exponential backoff reset when an ack is received. The idea being, if you're getting some packets through, based the retransmission time on RTT. If you're not getting any data through, exponentially backoff because your RTT estimate may be wrong, and to avoid overloading the network.

@martinduke
Copy link
Contributor

So I'm still concerned that, as written, this requires to retransmit the client hello because we got a server hello instead of an ack. Unless I've missed something, and we're requiring ACKs, which would make the protocol quite a bit cleaner.

Add some text on handshake state transitions cancelling handshake frames.
@ianswett
Copy link
Contributor Author

ianswett commented Feb 9, 2017

Good point, that's a bit awkward. You're right, it'd be best if we could just rely on ack frames. I added some text on crypto state transitions, but I'm not sure I like it. For now, we could just require they be properly acked, since we can include acks in crypto packets.

@MikeBishop
Copy link
Contributor

As Martin said, you need to find something better than the GitHub editor, or you'll keep having funky alignment and broken builds due to whitespace. I've recently switched to Atom, which seems to be working pretty well for me and has a pretty rich collection of plugins. You might give it a shot.

@ianswett
Copy link
Contributor Author

Are there any other web editors that are better? Alternately, can we just add a presubmit that strips whitespace and wraps lines?

Also, how are you seeing the trailing whitespace and long lines?

@martinthomson
Copy link
Member

@ianswett, when you look at the diff that you produce, it shows both. Long lines are harder to detect because github doesn't have column markers, but you can see really long lines.

@martinthomson
Copy link
Member

martinthomson commented Feb 17, 2017

@ianswett, I don't know what you are waiting for before merging this. This is only rotting sitting here like this. If it is a review from @janaiyengar, I'd suggest that you have already provided ample opportunity.

@martinthomson
Copy link
Member

martinthomson commented Feb 17, 2017

I've corrected the errors that were in this patch by folding a bunch of lines.

Ian, you are quite inconsistent with a few things in the diagrams

  • the use of semi-colons at the end of statements, which I think is unnecessary
  • x vs * for multiplication

That shouldn't block this PR, but it's probably worth fixing separately.

@ianswett
Copy link
Contributor Author

I was waiting for Jana, but yes, at this point, I'll merge and fix all the issues which have come up since then.

@ianswett ianswett merged commit e1002e5 into master Feb 17, 2017
@martinthomson martinthomson deleted the ianswett-patch-3 branch February 21, 2017 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants