-
Notifications
You must be signed in to change notification settings - Fork 201
/
draft-ietf-quic-recovery.txt
2464 lines (1638 loc) · 91.9 KB
/
draft-ietf-quic-recovery.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
QUIC J. Iyengar, Ed.
Internet-Draft Fastly
Intended status: Standards Track I. Swett, Ed.
Expires: 13 November 2020 Google
12 May 2020
QUIC Loss Detection and Congestion Control
draft-ietf-quic-recovery-latest
Abstract
This document describes loss detection and congestion control
mechanisms for QUIC.
Note to Readers
Discussion of this draft takes place on the QUIC working group
mailing list (quic@ietf.org (mailto:quic@ietf.org)), which is
archived at https://mailarchive.ietf.org/arch/
search/?email_list=quic.
Working Group information can be found at https://github.com/quicwg;
source code and issues list for this draft can be found at
https://github.com/quicwg/base-drafts/labels/-recovery.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 13 November 2020.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
Iyengar & Swett Expires 13 November 2020 [Page 1]
Internet-Draft QUIC Loss Detection May 2020
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4
3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5
3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5
3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6
3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6
3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 6
3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7
3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7
3.1.6. Explicit Correction For Delayed Acknowledgements . . 7
3.1.7. Probe Timeout Replaces RTO and TLP . . . . . . . . . 7
3.1.8. The Minimum Congestion Window is Two Packets . . . . 8
4. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 8
4.1. Generating RTT samples . . . . . . . . . . . . . . . . . 8
4.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 9
4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 9
5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 11
5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 11
5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 11
5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 12
5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 13
5.2.2. Handshakes and New Paths . . . . . . . . . . . . . . 14
5.2.3. Speeding Up Handshake Completion . . . . . . . . . . 15
5.2.4. Sending Probe Packets . . . . . . . . . . . . . . . . 16
5.2.5. Loss Detection . . . . . . . . . . . . . . . . . . . 17
5.3. Handling Retry Packets . . . . . . . . . . . . . . . . . 17
5.4. Discarding Keys and Packet State . . . . . . . . . . . . 18
6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 18
6.1. Explicit Congestion Notification . . . . . . . . . . . . 19
6.2. Initial and Minimum Congestion Window . . . . . . . . . . 19
6.3. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 19
6.4. Congestion Avoidance . . . . . . . . . . . . . . . . . . 19
6.5. Recovery Period . . . . . . . . . . . . . . . . . . . . . 20
6.6. Ignoring Loss of Undecryptable Packets . . . . . . . . . 20
6.7. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 20
6.8. Persistent Congestion . . . . . . . . . . . . . . . . . . 21
Iyengar & Swett Expires 13 November 2020 [Page 2]
Internet-Draft QUIC Loss Detection May 2020
6.9. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.10. Under-utilizing the Congestion Window . . . . . . . . . . 23
7. Security Considerations . . . . . . . . . . . . . . . . . . . 23
7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 23
7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 23
7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 23
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 24
9.1. Normative References . . . . . . . . . . . . . . . . . . 24
9.2. Informative References . . . . . . . . . . . . . . . . . 25
Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 26
A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 26
A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 27
A.2. Constants of Interest . . . . . . . . . . . . . . . . . . 27
A.3. Variables of interest . . . . . . . . . . . . . . . . . . 28
A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 28
A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 29
A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 29
A.7. Setting the Loss Detection Timer . . . . . . . . . . . . 31
A.8. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 32
A.9. Detecting Lost Packets . . . . . . . . . . . . . . . . . 33
Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 34
B.1. Constants of interest . . . . . . . . . . . . . . . . . . 34
B.2. Variables of interest . . . . . . . . . . . . . . . . . . 35
B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 35
B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 36
B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 36
B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 36
B.7. Process ECN Information . . . . . . . . . . . . . . . . . 37
B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 37
B.9. Upon dropping Initial or Handshake keys . . . . . . . . . 38
Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 38
C.1. Since draft-ietf-quic-recovery-26 . . . . . . . . . . . . 38
C.2. Since draft-ietf-quic-recovery-25 . . . . . . . . . . . . 38
C.3. Since draft-ietf-quic-recovery-24 . . . . . . . . . . . . 38
C.4. Since draft-ietf-quic-recovery-23 . . . . . . . . . . . . 38
C.5. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 39
C.6. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 39
C.7. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 39
C.8. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 39
C.9. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 40
C.10. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 40
C.11. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 41
C.12. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 41
C.13. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 41
C.14. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 42
C.15. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 42
C.16. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 42
Iyengar & Swett Expires 13 November 2020 [Page 3]
Internet-Draft QUIC Loss Detection May 2020
C.17. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 42
C.18. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 42
C.19. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 42
C.20. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 43
C.21. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 43
C.22. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 43
C.23. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 43
C.24. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 43
C.25. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 43
C.26. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 43
C.27. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 43
Appendix D. Contributors . . . . . . . . . . . . . . . . . . . . 44
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 44
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44
1. Introduction
QUIC is a new multiplexed and secure transport protocol atop UDP,
specified in [QUIC-TRANSPORT]. This document describes congestion
control and loss recovery for QUIC. Mechanisms described in this
document follow the spirit of existing TCP congestion control and
loss recovery mechanisms, described in RFCs, various Internet-drafts,
or academic papers, and also those prevalent in TCP implementations.
2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Definitions of terms that are used in this document:
Ack-eliciting Frames: All frames other than ACK, PADDING, and
CONNECTION_CLOSE are considered ack-eliciting.
Ack-eliciting Packets: Packets that contain ack-eliciting frames
elicit an ACK from the receiver within the maximum ack delay and
are called ack-eliciting packets.
In-flight: Packets are considered in-flight when they are ack-
eliciting or contain a PADDING frame, and they have been sent but
are not acknowledged, declared lost, or abandoned along with old
keys.
Iyengar & Swett Expires 13 November 2020 [Page 4]
Internet-Draft QUIC Loss Detection May 2020
3. Design of the QUIC Transmission Machinery
All transmissions in QUIC are sent with a packet-level header, which
indicates the encryption level and includes a packet sequence number
(referred to below as a packet number). The encryption level
indicates the packet number space, as described in [QUIC-TRANSPORT].
Packet numbers never repeat within a packet number space for the
lifetime of a connection. Packet numbers are sent in monotonically
increasing order within a space, preventing ambiguity.
This design obviates the need for disambiguating between
transmissions and retransmissions and eliminates significant
complexity from QUIC's interpretation of TCP loss detection
mechanisms.
QUIC packets can contain multiple frames of different types. The
recovery mechanisms ensure that data and frames that need reliable
delivery are acknowledged or declared lost and sent in new packets as
necessary. The types of frames contained in a packet affect recovery
and congestion control logic:
* All packets are acknowledged, though packets that contain no ack-
eliciting frames are only acknowledged along with ack-eliciting
packets.
* Long header packets that contain CRYPTO frames are critical to the
performance of the QUIC handshake and use shorter timers for
acknowledgement.
* Packets containing frames besides ACK or CONNECTION_CLOSE frames
count toward congestion control limits and are considered in-
flight.
* PADDING frames cause packets to contribute toward bytes in flight
without directly causing an acknowledgment to be sent.
3.1. Relevant Differences Between QUIC and TCP
Readers familiar with TCP's loss detection and congestion control
will find algorithms here that parallel well-known TCP ones.
Protocol differences between QUIC and TCP however contribute to
algorithmic differences. We briefly describe these protocol
differences below.
Iyengar & Swett Expires 13 November 2020 [Page 5]
Internet-Draft QUIC Loss Detection May 2020
3.1.1. Separate Packet Number Spaces
QUIC uses separate packet number spaces for each encryption level,
except 0-RTT and all generations of 1-RTT keys use the same packet
number space. Separate packet number spaces ensures acknowledgement
of packets sent with one level of encryption will not cause spurious
retransmission of packets sent with a different encryption level.
Congestion control and round-trip time (RTT) measurement are unified
across packet number spaces.
3.1.2. Monotonically Increasing Packet Numbers
TCP conflates transmission order at the sender with delivery order at
the receiver, which results in retransmissions of the same data
carrying the same sequence number, and consequently leads to
"retransmission ambiguity". QUIC separates the two. QUIC uses a
packet number to indicate transmission order. Application data is
sent in one or more streams and delivery order is determined by
stream offsets encoded within STREAM frames.
QUIC's packet number is strictly increasing within a packet number
space, and directly encodes transmission order. A higher packet
number signifies that the packet was sent later, and a lower packet
number signifies that the packet was sent earlier. When a packet
containing ack-eliciting frames is detected lost, QUIC rebundles
necessary frames in a new packet with a new packet number, removing
ambiguity about which packet is acknowledged when an ACK is received.
Consequently, more accurate RTT measurements can be made, spurious
retransmissions are trivially detected, and mechanisms such as Fast
Retransmit can be applied universally, based only on packet number.
This design point significantly simplifies loss detection mechanisms
for QUIC. Most TCP mechanisms implicitly attempt to infer
transmission ordering based on TCP sequence numbers - a non-trivial
task, especially when TCP timestamps are not available.
3.1.3. Clearer Loss Epoch
QUIC starts a loss epoch when a packet is lost and ends one when any
packet sent after the epoch starts is acknowledged. TCP waits for
the gap in the sequence number space to be filled, and so if a
segment is lost multiple times in a row, the loss epoch may not end
for several round trips. Because both should reduce their congestion
windows only once per epoch, QUIC will do it once for every round
trip that experiences loss, while TCP may only do it once across
multiple round trips.
Iyengar & Swett Expires 13 November 2020 [Page 6]
Internet-Draft QUIC Loss Detection May 2020
3.1.4. No Reneging
QUIC ACKs contain information that is similar to TCP SACK, but QUIC
does not allow any acked packet to be reneged, greatly simplifying
implementations on both sides and reducing memory pressure on the
sender.
3.1.5. More ACK Ranges
QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In
high loss environments, this speeds recovery, reduces spurious
retransmits, and ensures forward progress without relying on
timeouts.
3.1.6. Explicit Correction For Delayed Acknowledgements
QUIC endpoints measure the delay incurred between when a packet is
received and when the corresponding acknowledgment is sent, allowing
a peer to maintain a more accurate round-trip time estimate; see
Section 13.2 of [QUIC-TRANSPORT].
3.1.7. Probe Timeout Replaces RTO and TLP
QUIC uses a probe timeout (see Section 5.2), with a timer based on
TCP's RTO computation. QUIC's PTO includes the peer's maximum
expected acknowledgement delay instead of using a fixed minimum
timeout. QUIC does not collapse the congestion window until
persistent congestion (Section 6.8) is declared, unlike TCP, which
collapses the congestion window upon expiry of an RTO. Instead of
collapsing the congestion window and declaring everything in-flight
lost, QUIC allows probe packets to temporarily exceed the congestion
window whenever the timer expires.
In doing this, QUIC avoids unnecessary congestion window reductions,
obviating the need for correcting mechanisms such as F-RTO [RFC5682].
Since QUIC does not collapse the congestion window on a PTO
expiration, a QUIC sender is not limited from sending more in-flight
packets after a PTO expiration if it still has available congestion
window. This occurs when a sender is application-limited and the PTO
timer expires. This is more aggressive than TCP's RTO mechanism when
application-limited, but identical when not application-limited.
A single packet loss at the tail does not indicate persistent
congestion, so QUIC specifies a time-based definition to ensure one
or more packets are sent prior to a dramatic decrease in congestion
window; see Section 6.8.
Iyengar & Swett Expires 13 November 2020 [Page 7]
Internet-Draft QUIC Loss Detection May 2020
3.1.8. The Minimum Congestion Window is Two Packets
TCP uses a minimum congestion window of one packet. However, loss of
that single packet means that the sender needs to waiting for a PTO
(Section 5.2) to recover, which can be much longer than a round-trip
time. Sending a single ack-eliciting packet also increases the
chances of incurring additional latency when a receiver delays its
acknowledgement.
QUIC therefore recommends that the minimum congestion window be two
packets. While this increases network load, it is considered safe,
since the sender will still reduce its sending rate exponentially
under persistent congestion (Section 5.2).
4. Estimating the Round-Trip Time
At a high level, an endpoint measures the time from when a packet was
sent to when it is acknowledged as a round-trip time (RTT) sample.
The endpoint uses RTT samples and peer-reported host delays (see
Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical
description of the network path's RTT. An endpoint computes the
following three values for each path: the minimum value observed over
the lifetime of the path (min_rtt), an exponentially-weighted moving
average (smoothed_rtt), and the mean deviation (referred to as
"variation" in the rest of this document) in the observed RTT samples
(rttvar).
4.1. Generating RTT samples
An endpoint generates an RTT sample on receiving an ACK frame that
meets the following two conditions:
* the largest acknowledged packet number is newly acknowledged, and
* at least one of the newly acknowledged packets was ack-eliciting.
The RTT sample, latest_rtt, is generated as the time elapsed since
the largest acknowledged packet was sent:
latest_rtt = ack_time - send_time_of_largest_acked
An RTT sample is generated using only the largest acknowledged packet
in the received ACK frame. This is because a peer reports ACK delays
for only the largest acknowledged packet in an ACK frame. While the
reported ACK delay is not used by the RTT sample measurement, it is
used to adjust the RTT sample in subsequent computations of
smoothed_rtt and rttvar Section 4.3.
Iyengar & Swett Expires 13 November 2020 [Page 8]
Internet-Draft QUIC Loss Detection May 2020
To avoid generating multiple RTT samples for a single packet, an ACK
frame SHOULD NOT be used to update RTT estimates if it does not newly
acknowledge the largest acknowledged packet.
An RTT sample MUST NOT be generated on receiving an ACK frame that
does not newly acknowledge at least one ack-eliciting packet. A peer
usually does not send an ACK frame when only non-ack-eliciting
packets are received. Therefore an ACK frame that contains
acknowledgements for only non-ack-eliciting packets could include an
arbitrarily large Ack Delay value. Ignoring such ACK frames avoids
complications in subsequent smoothed_rtt and rttvar computations.
A sender might generate multiple RTT samples per RTT when multiple
ACK frames are received within an RTT. As suggested in [RFC6298],
doing so might result in inadequate history in smoothed_rtt and
rttvar. Ensuring that RTT estimates retain sufficient history is an
open research question.
4.2. Estimating min_rtt
min_rtt is the minimum RTT observed for a given network path.
min_rtt is set to the latest_rtt on the first RTT sample, and to the
lesser of min_rtt and latest_rtt on subsequent samples. In this
document, min_rtt is used by loss detection to reject implausibly
small rtt samples.
An endpoint uses only locally observed times in computing the min_rtt
and does not adjust for ACK delays reported by the peer. Doing so
allows the endpoint to set a lower bound for the smoothed_rtt based
entirely on what it observes (see Section 4.3), and limits potential
underestimation due to erroneously-reported delays by the peer.
The RTT for a network path may change over time. If a path's actual
RTT decreases, the min_rtt will adapt immediately on the first low
sample. If the path's actual RTT increases, the min_rtt will not
adapt to it, allowing future RTT samples that are smaller than the
new RTT be included in smoothed_rtt.
4.3. Estimating smoothed_rtt and rttvar
smoothed_rtt is an exponentially-weighted moving average of an
endpoint's RTT samples, and rttvar is the variation in the RTT
samples, estimated using a mean variation.
The calculation of smoothed_rtt uses path latency after adjusting RTT
samples for acknowledgement delays. These delays are computed using
the ACK Delay field of the ACK frame as described in Section 19.3 of
[QUIC-TRANSPORT]. For packets sent in the ApplicationData packet
Iyengar & Swett Expires 13 November 2020 [Page 9]
Internet-Draft QUIC Loss Detection May 2020
number space, a peer limits any delay in sending an acknowledgement
for an ack-eliciting packet to no greater than the value it
advertised in the max_ack_delay transport parameter. Consequently,
when a peer reports an Ack Delay that is greater than its
max_ack_delay, the delay is attributed to reasons out of the peer's
control, such as scheduler latency at the peer or loss of previous
ACK frames. Any delays beyond the peer's max_ack_delay are therefore
considered effectively part of path delay and incorporated into the
smoothed_rtt estimate.
When adjusting an RTT sample using peer-reported acknowledgement
delays, an endpoint:
* MUST ignore the Ack Delay field of the ACK frame for packets sent
in the Initial and Handshake packet number space.
* MUST use the lesser of the value reported in Ack Delay field of
the ACK frame and the peer's max_ack_delay transport parameter.
* MUST NOT apply the adjustment if the resulting RTT sample is
smaller than the min_rtt. This limits the underestimation that a
misreporting peer can cause to the smoothed_rtt.
smoothed_rtt and rttvar are computed as follows, similar to
[RFC6298].
When there are no samples for a network path, and on the first RTT
sample for the network path:
smoothed_rtt = rtt_sample
rttvar = rtt_sample / 2
Before any RTT samples are available, the initial RTT is used as
rtt_sample. On the first RTT sample for the network path, that
sample is used as rtt_sample. This ensures that the first
measurement erases the history of any persisted or default values.
On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows:
ack_delay = min(Ack Delay in ACK Frame, max_ack_delay)
adjusted_rtt = latest_rtt
if (min_rtt + ack_delay < latest_rtt):
adjusted_rtt = latest_rtt - ack_delay
smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
rttvar_sample = abs(smoothed_rtt - adjusted_rtt)
rttvar = 3/4 * rttvar + 1/4 * rttvar_sample
Iyengar & Swett Expires 13 November 2020 [Page 10]
Internet-Draft QUIC Loss Detection May 2020
5. Loss Detection
QUIC senders use acknowledgements to detect lost packets, and a probe
time out (see Section 5.2) to ensure acknowledgements are received.
This section provides a description of these algorithms.
If a packet is lost, the QUIC transport needs to recover from that
loss, such as by retransmitting the data, sending an updated frame,
or abandoning the frame. For more information, see Section 13.3 of
[QUIC-TRANSPORT].
5.1. Acknowledgement-based Detection
Acknowledgement-based loss detection implements the spirit of TCP's
Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK],
SACK loss recovery [RFC6675], and RACK [RACK]. This section provides
an overview of how these algorithms are implemented in QUIC.
A packet is declared lost if it meets all the following conditions:
* The packet is unacknowledged, in-flight, and was sent prior to an
acknowledged packet.
* Either its packet number is kPacketThreshold smaller than an
acknowledged packet (Section 5.1.1), or it was sent long enough in
the past (Section 5.1.2).
The acknowledgement indicates that a packet sent later was delivered,
and the packet and time thresholds provide some tolerance for packet
reordering.
Spuriously declaring packets as lost leads to unnecessary
retransmissions and may result in degraded performance due to the
actions of the congestion controller upon detecting loss.
Implementations can detect spurious retransmissions and increase the
reordering threshold in packets or time to reduce future spurious
retransmissions and loss events. Implementations with adaptive time
thresholds MAY choose to start with smaller initial reordering
thresholds to minimize recovery latency.
5.1.1. Packet Threshold
The RECOMMENDED initial value for the packet reordering threshold
(kPacketThreshold) is 3, based on best practices for TCP loss
detection [RFC5681] [RFC6675]. Implementations SHOULD NOT use a
packet threshold less than 3, to keep in line with TCP [RFC5681].
Iyengar & Swett Expires 13 November 2020 [Page 11]
Internet-Draft QUIC Loss Detection May 2020
Some networks may exhibit higher degrees of reordering, causing a
sender to detect spurious losses. Implementers MAY use algorithms
developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's
reordering resilience.
5.1.2. Time Threshold
Once a later packet within the same packet number space has been
acknowledged, an endpoint SHOULD declare an earlier packet lost if it
was sent a threshold amount of time in the past. To avoid declaring
packets as lost too early, this time threshold MUST be set to at
least the local timer granularity, as indicated by the kGranularity
constant. The time threshold is:
max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity)
If packets sent prior to the largest acknowledged packet cannot yet
be declared lost, then a timer SHOULD be set for the remaining time.
Using max(smoothed_rtt, latest_rtt) protects from the two following
cases:
* the latest RTT sample is lower than the smoothed RTT, perhaps due
to reordering where the acknowledgement encountered a shorter
path;
* the latest RTT sample is higher than the smoothed RTT, perhaps due
to a sustained increase in the actual RTT, but the smoothed RTT
has not yet caught up.
The RECOMMENDED time threshold (kTimeThreshold), expressed as a
round-trip time multiplier, is 9/8. The RECOMMENDED value of the
timer granularity (kGranularity) is 1ms.
Implementations MAY experiment with absolute thresholds, thresholds
from previous connections, adaptive thresholds, or including RTT
variation. Smaller thresholds reduce reordering resilience and
increase spurious retransmissions, and larger thresholds increase
loss detection delay.
5.2. Probe Timeout
A Probe Timeout (PTO) triggers sending one or two probe datagrams
when ack-eliciting packets are not acknowledged within the expected
period of time or the handshake has not been completed. A PTO
enables a connection to recover from loss of tail packets or
acknowledgements.
Iyengar & Swett Expires 13 November 2020 [Page 12]
Internet-Draft QUIC Loss Detection May 2020
As with loss detection, the probe timeout is per packet number space.
The PTO algorithm used in QUIC implements the reliability functions
of Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for
TCP [RFC5682]. The timeout computation is based on TCP's
retransmission timeout period [RFC6298].
5.2.1. Computing PTO
When an ack-eliciting packet is transmitted, the sender schedules a
timer for the PTO period as follows:
PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay
The PTO period is the amount of time that a sender ought to wait for
an acknowledgement of a sent packet. This time period includes the
estimated network roundtrip-time (smoothed_rtt), the variation in the
estimate (4*rttvar), and max_ack_delay, to account for the maximum
time by which a receiver might delay sending an acknowledgement.
When the PTO is armed for Initial or Handshake packet number spaces,
the max_ack_delay is 0, as specified in 13.2.1 of [QUIC-TRANSPORT].
The PTO value MUST be set to at least kGranularity, to avoid the
timer expiring immediately.
A sender recomputes and may need to reset its PTO timer every time an
ack-eliciting packet is sent. When ack-eliciting packets are in-
flight in multiple packet number spaces, the timer MUST be set for
the packet number space with the earliest timeout, except for
ApplicationData, which MUST be ignored until the handshake completes;
see Section 4.1.1 of [QUIC-TLS]. Not arming the PTO for
ApplicationData prevents a client from retransmitting a 0-RTT packet
on a PTO expiration before confirming that the server is able to
decrypt 0-RTT packets, and prevents a server from sending a 1-RTT
packet on a PTO expiration before it has the keys to process an
acknowledgement.
When a PTO timer expires, the PTO backoff MUST be increased,
resulting in the PTO period being set to twice its current value.
The PTO period is set based on the latest RTT information after
receiving an acknowledgement. The PTO backoff is reset upon
receiving an acknowledgement unless it's a client unsure if the the
server has validated the client's address. Not resetting the backoff
during peer address validation ensures the client's anti-deadlock
timer is not set too aggressively when the server is slow in
responding with handshake data.
Iyengar & Swett Expires 13 November 2020 [Page 13]
Internet-Draft QUIC Loss Detection May 2020
This exponential reduction in the sender's rate is important because
consecutive PTOs might be caused by loss of packets or
acknowledgements due to severe congestion. Even when there are ack-
eliciting packets in-flight in multiple packet number spaces, the
exponential increase in probe timeout occurs across all spaces to
prevent excess load on the network. For example, a timeout in the
Initial packet number space doubles the length of the timeout in the
Handshake packet number space.
The life of a connection that is experiencing consecutive PTOs is
limited by the endpoint's idle timeout.
The probe timer MUST NOT be set if the time threshold Section 5.1.2
loss detection timer is set. The time threshold loss detection timer
is expected to both expire earlier than the PTO and be less likely to
spuriously retransmit data.
5.2.2. Handshakes and New Paths
Resumed connections over the same network MAY use the previous
connection's final smoothed RTT value as the resumed connection's
initial RTT. When no previous RTT is available, the initial RTT
SHOULD be set to 333ms, resulting in a 1 second initial timeout, as
recommended in [RFC6298].
A connection MAY use the delay between sending a PATH_CHALLENGE and
receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in
Appendix A.2) for a new path, but the delay SHOULD NOT be considered
an RTT sample.
Prior to handshake completion, when few to none RTT samples have been
generated, it is possible that the probe timer expiration is due to
an incorrect RTT estimate at the client. To allow the client to
improve its RTT estimate, the new packet that it sends MUST be ack-
eliciting.
Initial packets and Handshake packets could be never acknowledged,
but they are removed from bytes in flight when the Initial and
Handshake keys are discarded, as described below in
Section Section 5.4. When Initial or Handshake keys are discarded,
the PTO and loss detection timers MUST be reset, because discarding
keys indicates forward progress and the loss detection timer might
have been set for a now discarded packet number space.
Iyengar & Swett Expires 13 November 2020 [Page 14]
Internet-Draft QUIC Loss Detection May 2020
5.2.2.1. Before Address Validation
Until the server has validated the client's address on the path, the
amount of data it can send is limited to three times the amount of
data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If
no additional data can be sent, the server's PTO alarm MUST NOT be
armed until datagrams have been received from the client, because
packets sent on PTO count against the anti-amplification limit. Note
that the server could fail to validate the client's address even if
0-RTT is accepted.
Since the server could be blocked until more packets are received
from the client, it is the client's responsibility to send packets to
unblock the server until it is certain that the server has finished
its address validation (see Section 8 of [QUIC-TRANSPORT]). That is,
the client MUST set the probe timer if the client has not received an
acknowledgement for one of its Handshake or 1-RTT packets, and has
not received a HANDSHAKE_DONE frame. If Handshake keys are available
to the client, it MUST send a Handshake packet, and otherwise it MUST
send an Initial packet in a UDP datagram of at least 1200 bytes.
A client could have received and acknowledged a Handshake packet,
causing it to discard state for the Initial packet number space, but
not sent any ack-eliciting Handshake packets. In this case, the PTO
is set from the current time.
5.2.3. Speeding Up Handshake Completion
When a server receives an Initial packet containing duplicate CRYPTO
data, it can assume the client did not receive all of the server's
CRYPTO data sent in Initial packets, or the client's estimated RTT is
too small. When a client receives Handshake or 1-RTT packets prior
to obtaining Handshake keys, it may assume some or all of the
server's Initial packets were lost.
To speed up handshake completion under these conditions, an endpoint
MAY send a packet containing unacknowledged CRYPTO data earlier than
the PTO expiry, subject to address validation limits; see Section 8.1
of [QUIC-TRANSPORT].
Peers can also use coalesced packets to ensure that each datagram
elicits at least one acknowledgement. For example, clients can
coalesce an Initial packet containing PING and PADDING frames with a
0-RTT data packet and a server can coalesce an Initial packet
containing a PING frame with one or more packets in its first flight.
Iyengar & Swett Expires 13 November 2020 [Page 15]
Internet-Draft QUIC Loss Detection May 2020
5.2.4. Sending Probe Packets
When a PTO timer expires, a sender MUST send at least one ack-
eliciting packet in the packet number space as a probe, unless there
is no data available to send. An endpoint MAY send up to two full-
sized datagrams containing ack-eliciting packets, to avoid an
expensive consecutive PTO expiration due to a single lost datagram or
transmit data from multiple packet number spaces. All probe packets
sent on a PTO MUST be ack-eliciting.
In addition to sending data in the packet number space for which the
timer expired, the sender SHOULD send ack-eliciting packets from
other packet number spaces with in-flight data, coalescing packets if
possible. This is particularly valuable when the server has both
Initial and Handshake data in-flight or the client has both Handshake
and ApplicationData in-flight, because the peer might only have
receive keys for one of the two packet number spaces.
If the sender wants to elicit a faster acknowledgement on PTO, it can
skip a packet number to eliminate the ack delay.
When the PTO timer expires, and there is new or previously sent
unacknowledged data, it MUST be sent. A probe packet SHOULD carry
new data when possible. A probe packet MAY carry retransmitted
unacknowledged data when new data is unavailable, when flow control
does not permit new data to be sent, or to opportunistically reduce
loss recovery delay. Implementations MAY use alternative strategies
for determining the content of probe packets, including sending new
or retransmitted data based on the application's priorities.
It is possible the sender has no new or previously-sent data to send.
As an example, consider the following sequence of events: new
application data is sent in a STREAM frame, deemed lost, then
retransmitted in a new packet, and then the original transmission is
acknowledged. When there is no data to send, the sender SHOULD send
a PING or other ack-eliciting frame in a single packet, re-arming the
PTO timer.
Alternatively, instead of sending an ack-eliciting packet, the sender
MAY mark any packets still in flight as lost. Doing so avoids
sending an additional packet, but increases the risk that loss is
declared too aggressively, resulting in an unnecessary rate reduction
by the congestion controller.
Iyengar & Swett Expires 13 November 2020 [Page 16]
Internet-Draft QUIC Loss Detection May 2020
Consecutive PTO periods increase exponentially, and as a result,
connection recovery latency increases exponentially as packets
continue to be dropped in the network. Sending two packets on PTO
expiration increases resilience to packet drops, thus reducing the
probability of consecutive PTO events.
When the PTO timer expires multiple times and new data cannot be
sent, implementations must choose between sending the same payload
every time or sending different payloads. Sending the same payload
may be simpler and ensures the highest priority frames arrive first.
Sending different payloads each time reduces the chances of spurious
retransmission.
5.2.5. Loss Detection
Delivery or loss of packets in flight is established when an ACK
frame is received that newly acknowledges one or more packets.
A PTO timer expiration event does not indicate packet loss and MUST
NOT cause prior unacknowledged packets to be marked as lost. When an
acknowledgement is received that newly acknowledges packets, loss
detection proceeds as dictated by packet and time threshold
mechanisms; see Section 5.1.
5.3. Handling Retry Packets
A Retry packet causes a client to send another Initial packet,
effectively restarting the connection process. A Retry packet
indicates that the Initial was received, but not processed. A Retry
packet cannot be treated as an acknowledgment, because it does not
indicate that a packet was processed or specify the packet number.
Clients that receive a Retry packet reset congestion control and loss
recovery state, including resetting any pending timers. Other
connection state, in particular cryptographic handshake messages, is
retained; see Section 17.2.5 of [QUIC-TRANSPORT].
The client MAY compute an RTT estimate to the server as the time
period from when the first Initial was sent to when a Retry or a
Version Negotiation packet is received. The client MAY use this
value in place of its default for the initial RTT estimate.
Iyengar & Swett Expires 13 November 2020 [Page 17]
Internet-Draft QUIC Loss Detection May 2020
5.4. Discarding Keys and Packet State
When packet protection keys are discarded (see Section 4.10 of
[QUIC-TLS]), all packets that were sent with those keys can no longer
be acknowledged because their acknowledgements cannot be processed
anymore. The sender MUST discard all recovery state associated with
those packets and MUST remove them from the count of bytes in flight.
Endpoints stop sending and receiving Initial packets once they start
exchanging Handshake packets; see Section 17.2.2.1 of
[QUIC-TRANSPORT]. At this point, recovery state for all in-flight
Initial packets is discarded.
When 0-RTT is rejected, recovery state for all in-flight 0-RTT
packets is discarded.
If a server accepts 0-RTT, but does not buffer 0-RTT packets that
arrive before Initial packets, early 0-RTT packets will be declared
lost, but that is expected to be infrequent.
It is expected that keys are discarded after packets encrypted with
them would be acknowledged or declared lost. Initial secrets however
might be destroyed sooner, as soon as handshake keys are available;
see Section 4.11.1 of [QUIC-TLS].
6. Congestion Control
This document specifies a congestion controller for QUIC similar to
TCP NewReno [RFC6582].
The signals QUIC provides for congestion control are generic and are
designed to support different algorithms. Endpoints can unilaterally
choose a different algorithm to use, such as Cubic [RFC8312].
If an endpoint uses a different controller than that specified in
this document, the chosen controller MUST conform to the congestion
control guidelines specified in Section 3.1 of [RFC8085].
Similar to TCP, packets containing only ACK frames do not count
towards bytes in flight and are not congestion controlled. Unlike
TCP, QUIC can detect the loss of these packets and MAY use that
information to adjust the congestion controller or the rate of ACK-
only packets being sent, but this document does not describe a
mechanism for doing so.