Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pto_count should be reset when dropping a packet number space #3272

Closed
marten-seemann opened this issue Nov 23, 2019 · 10 comments · Fixed by #3415
Closed

pto_count should be reset when dropping a packet number space #3272

marten-seemann opened this issue Nov 23, 2019 · 10 comments · Fixed by #3415
Assignees
Labels
-recovery design has-consensus

Comments

@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Nov 23, 2019

pto_count should be reset to 0 when a packet number space is dropped. I'd argue that this is the consistent thing to do, because when we drop a packet number space, we implicitely acknowledge all outstanding packets in that space (and remove them from bytes_in_flight).

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Nov 23, 2019

@ianswett
Copy link
Contributor

@ianswett ianswett commented Nov 23, 2019

I also tend to think this is Ok. Importantly, if you don't get any ACKs, you also have no RTT samples, and in that case, continuing to backoff the PTO is very possibly the right thing to do.

@kazuho
Copy link
Member

@kazuho kazuho commented Nov 25, 2019

Maybe I'm confused, but shouldn't pto_count be per-PN-space, now that PTO is per-PN-space (#3066)?

@mnot mnot added the design label Nov 25, 2019
@mnot mnot added this to Design Issues in Late Stage Processing Nov 25, 2019
@ianswett
Copy link
Contributor

@ianswett ianswett commented Nov 27, 2019

I believe the simplest way to ensure QUIC isn't overly aggressive during the handshake and to encourage packet coalescing is to specify pto_count on a connection scope, similar to the rtt variables. I considered making pto_count per-PN space, but that doesn't ensure quite the same exponential backoff properties and I don't believe it's necessary.

@kazuho
Copy link
Member

@kazuho kazuho commented Nov 28, 2019

I believe the simplest way to ensure QUIC isn't overly aggressive during the handshake and to encourage packet coalescing is to specify pto_count on a connection scope, similar to the rtt variables.

I think we agree that PTO during handshake should be as aggressive as after the handshake. However, I am starting to wonder if the current spec. actually works that way.

Consider the case when the server responds to ClientHello. It would be sending Initial, Handshake, and 1-RTT packet at the same time. Then, when those QUIC packets gets lost, the loss timer would be invoked three times, resulting in pto_count set to 3. Is that the intended behavior? I think in such case, pto_count should become one, after the PTO timers of all those epochs are triggered, once each.

Therefore, to me, the correct logic seems to be:

  • have pto_count for each epoch
  • every time the per-epoch PTO timer is triggered, increment the per-epoch pto_count by 1
  • when an ACK is being received on any epoch, reset pto_count of all the epochs

PS. I acknowledge that the issue I'm discussing in this comment might be orthogonal to the original issue, and that it might deserve a separate issue.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Nov 28, 2019

@kazuho: you make a good point. I don't think this is orthogonal to the original issue, I think it's very much related. In addition to what you note, you would also need to use the per-epoch counter for determining how long to set the timer.

This is going to be imperfect or ugly or both. If we use a single pto_count, we'll have duplicate counts. If we use separate pto_counts, we'll have to reconcile different epochs having varying counts.

I think the behavior we're looking for is (i) use a separate pto_count per epoch, (ii) use the max of all pto_counts when setting the timer for any PN space, (iii) reset all pto_counts when an ACK is received for any epoch.

This isn't great, and it is imperfect -- you can end up being more aggressive than with a single count. But I think it's good enough.

@kazuho
Copy link
Member

@kazuho kazuho commented Nov 29, 2019

@janaiyengar

I think the behavior we're looking for is (i) use a separate pto_count per epoch, (ii) use the max of all pto_counts when setting the timer for any PN space, (iii) reset all pto_counts when an ACK is received for any epoch.

This isn't great, and it is imperfect -- you can end up being more aggressive than with a single count. But I think it's good enough.

The logic you describe sounds fine to me (apparently I missed (ii)), it would make the PTO behavior during handshake as aggressive as after the handshake. I cannot see when it would be more aggressive than that, and I think it's better than "imperfect but good enough."

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Dec 5, 2019

@kazuho : In theory, if we use the max of counts when setting the timer, it is possible for a single pto_count to increase without increasing the max. Which means that the sender can effectively seem to not back off sometimes, causing it to be potentially more aggressive than after the handshake.

I might have been a bit harsh earlier -- I think this is fine.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Feb 4, 2020

My plan is:

  1. Keep a single pto_count.
  2. Reset it when a packet number space is dropped, since that is an indication of forward progress.
  3. Re-arm the loss detection timer when dropping a packet number space in case it was armed for the space being dropped.
  4. Add some more explanatory text about how pto_count and multiple packet number spaces interact.

@ianswett ianswett self-assigned this Feb 4, 2020
ianswett added a commit that referenced this issue Feb 4, 2020
Fixes #3272 

This doesn't add normative statements, but it could if people would like.
@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. Proposed resolution is in #3415, which should include corresponding updates to the pseudocode.

@larseggert larseggert added the proposal-ready label Feb 5, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Feb 5, 2020
@LPardue LPardue moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Feb 19, 2020
@LPardue LPardue removed the proposal-ready label Feb 19, 2020
@LPardue LPardue added the call-issued label Feb 26, 2020
@LPardue LPardue added has-consensus and removed call-issued labels Mar 4, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Mar 4, 2020
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design has-consensus
Projects
Late Stage Processing
  
Issue Handled
Development

Successfully merging a pull request may close this issue.

7 participants