diff --git a/http/h1-refs/draft-ietf-quic-http.html b/http/h1-refs/draft-ietf-quic-http.html new file mode 100644 index 0000000000..8b73704f62 --- /dev/null +++ b/http/h1-refs/draft-ietf-quic-http.html @@ -0,0 +1,5389 @@ + + +
+ + + +Internet-Draft | +HTTP/3 | +January 2021 | +
Bishop | +Expires 19 July 2021 | +[Page] | +
The QUIC transport protocol has several features that are desirable in a +transport for HTTP, such as stream multiplexing, per-stream flow control, and +low-latency connection establishment. This document describes a mapping of HTTP +semantics over QUIC. This document also identifies HTTP/2 features that are +subsumed by QUIC, and describes how HTTP/2 extensions can be ported to HTTP/3.¶
+DO NOT DEPLOY THIS VERSION OF HTTP/3 UNTIL IT IS IN AN RFC. This version is +still a work in progress. For trial deployments, please use earlier versions.¶
+Discussion of this draft takes place on the QUIC working group mailing list +(quic@ietf.org), which is archived at +https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
+Working Group information can be found at https://github.com/quicwg; source +code and issues list for this draft can be found at +https://github.com/quicwg/base-drafts/labels/-http.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 19 July 2021.¶
++ Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Simplified BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Simplified BSD License.¶
+HTTP semantics ([SEMANTICS]) are used for a broad +range of services on the Internet. These semantics have most commonly been used +with HTTP/1.1, over a variety of transport and session layers, and with HTTP/2 +over TLS. HTTP/3 supports the same semantics over a new transport protocol, +QUIC.¶
+HTTP/1.1 ([HTTP11]) uses whitespace-delimited text +fields to convey HTTP messages. While these exchanges are human-readable, using +whitespace for message formatting leads to parsing complexity and excessive +tolerance of variant behavior. Because HTTP/1.x does not include a multiplexing +layer, multiple TCP connections are often used to service requests in parallel. +However, that has a negative impact on congestion control and network +efficiency, since TCP does not share congestion control across multiple +connections.¶
+HTTP/2 ([HTTP2]) introduced a binary framing and multiplexing layer +to improve latency without modifying the transport layer. However, because the +parallel nature of HTTP/2's multiplexing is not visible to TCP's loss recovery +mechanisms, a lost or reordered packet causes all active transactions to +experience a stall regardless of whether that transaction was directly impacted +by the lost packet.¶
+The QUIC transport protocol incorporates stream multiplexing and per-stream flow +control, similar to that provided by the HTTP/2 framing layer. By providing +reliability at the stream level and congestion control across the entire +connection, QUIC has the capability to improve the performance of HTTP compared +to a TCP mapping. QUIC also incorporates TLS 1.3 ([TLS13]) at the +transport layer, offering comparable confidentiality and integrity to running +TLS over TCP, with the improved connection setup latency of TCP Fast Open +([TFO]).¶
+This document defines a mapping of HTTP semantics over the QUIC transport +protocol, drawing heavily on the design of HTTP/2. While delegating stream +lifetime and flow control issues to QUIC, a similar binary framing is used on +each stream. Some HTTP/2 features are subsumed by QUIC, while other features are +implemented atop QUIC.¶
+QUIC is described in [QUIC-TRANSPORT]. For a full description of HTTP/2, see +[HTTP2].¶
+HTTP/3 provides a transport for HTTP semantics using the QUIC transport protocol +and an internal framing layer similar to HTTP/2.¶
+Once a client knows that an HTTP/3 server exists at a certain endpoint, it opens +a QUIC connection. QUIC provides protocol negotiation, stream-based +multiplexing, and flow control. Discovery of an HTTP/3 endpoint is described in +Section 3.1.¶
+Within each stream, the basic unit of HTTP/3 communication is a frame +(Section 7.2). Each frame type serves a different purpose. For example, HEADERS +and DATA frames form the basis of HTTP requests and responses +(Section 4.1).¶
+Multiplexing of requests is performed using the QUIC stream abstraction, +described in Section 2 of [QUIC-TRANSPORT]. Each request-response pair +consumes a single QUIC stream. Streams are independent of each other, so one +stream that is blocked or suffers packet loss does not prevent progress on other +streams.¶
+Server push is an interaction mode introduced in HTTP/2 ([HTTP2]) that +permits a server to push a request-response exchange to a client in anticipation +of the client making the indicated request. This trades off network usage +against a potential latency gain. Several HTTP/3 frames are used to manage +server push, such as PUSH_PROMISE, MAX_PUSH_ID, and CANCEL_PUSH.¶
+As in HTTP/2, request and response fields are compressed for transmission. +Because HPACK ([HPACK]) relies on in-order transmission of compressed +field sections (a guarantee not provided by QUIC), HTTP/3 replaces HPACK with +QPACK ([QPACK]). QPACK uses separate unidirectional streams to modify and track +field table state, while encoded field sections refer to the state of the table +without modifying it.¶
+The following sections provide a detailed overview of the lifecycle of an HTTP/3 +connection:¶
+The details of the wire protocol and interactions with the transport are +described in subsequent sections:¶
+Additional resources are provided in the final sections:¶
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", +"MAY", and "OPTIONAL" in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, and only when, they +appear in all capitals, as shown here.¶
+This document uses the variable-length integer encoding from +[QUIC-TRANSPORT].¶
+The following terms are used:¶
+An abrupt termination of a connection or stream, possibly due to an error +condition.¶
+The endpoint that initiates an HTTP/3 connection. Clients send HTTP requests +and receive HTTP responses.¶
+A transport-layer connection between two endpoints, using QUIC as the +transport protocol.¶
+An error that affects the entire HTTP/3 connection.¶
+Either the client or server of the connection.¶
+The smallest unit of communication on a stream in HTTP/3, consisting of a +header and a variable-length sequence of bytes structured according to the +frame type.¶
+Protocol elements called "frames" exist in both this document and +[QUIC-TRANSPORT]. Where frames from [QUIC-TRANSPORT] are referenced, the +frame name will be prefaced with "QUIC." For example, "QUIC CONNECTION_CLOSE +frames." References without this preface refer to frames defined in +Section 7.2.¶
+A QUIC connection where the negotiated application protocol is HTTP/3.¶
+An endpoint. When discussing a particular endpoint, "peer" refers to the +endpoint that is remote to the primary subject of discussion.¶
+An endpoint that is receiving frames.¶
+An endpoint that is transmitting frames.¶
+The endpoint that accepts an HTTP/3 connection. Servers receive HTTP requests +and send HTTP responses.¶
+A bidirectional or unidirectional bytestream provided by the QUIC transport. +All streams within an HTTP/3 connection can be considered "HTTP/3 streams," +but multiple stream types are defined within HTTP/3.¶
+An application-level error on the individual stream.¶
+The term "content" is defined in Section 6.4 of [SEMANTICS].¶
+Finally, the terms "resource", "message", "user agent", "origin server", +"gateway", "intermediary", "proxy", and "tunnel" are defined in Section 3 of +[SEMANTICS].¶
+Packet diagrams in this document use the format defined in Section 1.3 of +[QUIC-TRANSPORT] to illustrate the order and size of fields.¶
+HTTP relies on the notion of an authoritative response: a response that has been +determined to be the most appropriate response for that request given the state +of the target resource at the time of response message origination by (or at the +direction of) the origin server identified within the target URI. Locating an +authoritative server for an HTTP URL is discussed in Section 4.3 of +[SEMANTICS].¶
+The "https" scheme associates authority with possession of a certificate that +the client considers to be trustworthy for the host identified by the authority +component of the URL.¶
+If a server presents a valid certificate and proof that it controls the +corresponding private key, then a client will accept a secured TLS session with +that server as being authoritative for all origins with the "https" scheme and a +host identified in the certificate. The host must be listed either as the CN +field of the certificate subject or as a dNSName in the subjectAltName field of +the certificate; see [RFC6125]. For a host that is an IP address, the client +MUST verify that the address appears as an iPAddress in the subjectAltName field +of the certificate.¶
+If the hostname or address is not present in the certificate, the client MUST +NOT consider the server authoritative for origins containing that hostname or +address. See Section 4.3 of [SEMANTICS] for more detail on authoritative +access.¶
+A client MAY attempt access to a resource with an "https" URI by resolving the +host identifier to an IP address, establishing a QUIC connection to that address +on the indicated port, and sending an HTTP/3 request message targeting the URI +to the server over that secured connection. Unless some other mechanism is used +to select HTTP/3, the token "h3" is used in the Application Layer Protocol +Negotiation (ALPN; see [RFC7301]) extension during the TLS handshake.¶
+Connectivity problems (e.g., blocking UDP) can result in QUIC connection +establishment failure; clients SHOULD attempt to use TCP-based versions of HTTP +in this case.¶
+Servers MAY serve HTTP/3 on any UDP port; an alternative service advertisement +always includes an explicit port, and URLs contain either an explicit port or a +default port associated with the scheme.¶
+An HTTP origin advertises the availability of an equivalent HTTP/3 endpoint via +the Alt-Svc HTTP response header field or the HTTP/2 ALTSVC frame ([ALTSVC]), +using the "h3" ALPN token.¶
+For example, an origin could indicate in an HTTP response that HTTP/3 was +available on UDP port 50781 at the same hostname by including the following +header field:¶
++Alt-Svc: h3=":50781" +¶ +
On receipt of an Alt-Svc record indicating HTTP/3 support, a client MAY attempt +to establish a QUIC connection to the indicated host and port; if this +connection is successful, the client can send HTTP requests using the mapping +described in this document.¶
+Although HTTP is independent of the transport protocol, the "http" scheme +associates authority with the ability to receive TCP connections on the +indicated port of whatever host is identified within the authority component. +Because HTTP/3 does not use TCP, HTTP/3 cannot be used for direct access to the +authoritative server for a resource identified by an "http" URI. However, +protocol extensions such as [ALTSVC] permit the authoritative server +to identify other services that are also authoritative and that might be +reachable over HTTP/3.¶
+Prior to making requests for an origin whose scheme is not "https", the client +MUST ensure the server is willing to serve that scheme. For origins whose scheme +is "http", an experimental method to accomplish this is described in +[RFC8164]. Other mechanisms might be defined for various schemes in the +future.¶
+HTTP/3 relies on QUIC version 1 as the underlying transport. The use of other +QUIC transport versions with HTTP/3 MAY be defined by future specifications.¶
+QUIC version 1 uses TLS version 1.3 or greater as its handshake protocol. +HTTP/3 clients MUST support a mechanism to indicate the target host to the +server during the TLS handshake. If the server is identified by a domain name +([DNS-TERMS]), clients MUST send the Server Name Indication (SNI; +[RFC6066]) TLS extension unless an alternative mechanism to indicate the +target host is used.¶
+QUIC connections are established as described in [QUIC-TRANSPORT]. During +connection establishment, HTTP/3 support is indicated by selecting the ALPN +token "h3" in the TLS handshake. Support for other application-layer protocols +MAY be offered in the same handshake.¶
+While connection-level options pertaining to the core QUIC protocol are set in +the initial crypto handshake, HTTP/3-specific settings are conveyed in the +SETTINGS frame. After the QUIC connection is established, a SETTINGS frame +(Section 7.2.4) MUST be sent by each endpoint as the initial frame of their +respective HTTP control stream; see Section 6.2.1.¶
+HTTP/3 connections are persistent across multiple requests. For best +performance, it is expected that clients will not close connections until it is +determined that no further communication with a server is necessary (for +example, when a user navigates away from a particular web page) or until the +server closes the connection.¶
+Once a connection exists to a server endpoint, this connection MAY be reused for +requests with multiple different URI authority components. Clients SHOULD NOT +open more than one HTTP/3 connection to a given host and port pair, where the +host is derived from a URI, a selected alternative service ([ALTSVC]), or a +configured proxy. A client MAY open multiple HTTP/3 connections to the same IP +address and UDP port using different transport or TLS configurations but SHOULD +avoid creating multiple connections with the same configuration.¶
+Servers are encouraged to maintain open HTTP/3 connections for as long as +possible but are permitted to terminate idle connections if necessary. When +either endpoint chooses to close the HTTP/3 connection, the terminating endpoint +SHOULD first send a GOAWAY frame (Section 5.2) so that both +endpoints can reliably determine whether previously sent frames have been +processed and gracefully complete or terminate any necessary remaining tasks.¶
+A server that does not wish clients to reuse HTTP/3 connections for a particular +origin can indicate that it is not authoritative for a request by sending a 421 +(Misdirected Request) status code in response to the request; see Section 9.1.2 +of [HTTP2].¶
+A client sends an HTTP request on a request stream, which is a client-initiated +bidirectional QUIC stream; see Section 6.1. A client MUST send only a +single request on a given stream. A server sends zero or more interim HTTP +responses on the same stream as the request, followed by a single final HTTP +response, as detailed below. See Section 15 of [SEMANTICS] for a description +of interim and final HTTP responses.¶
+Pushed responses are sent on a server-initiated unidirectional QUIC stream; see +Section 6.2.2. A server sends zero or more interim HTTP responses, followed +by a single final HTTP response, in the same manner as a standard response. +Push is described in more detail in Section 4.4.¶
+On a given stream, receipt of multiple requests or receipt of an additional HTTP +response following a final HTTP response MUST be treated as malformed +(Section 4.1.3).¶
+An HTTP message (request or response) consists of:¶
+Header and trailer field sections are described in Sections 6.3 and 6.5 of +[SEMANTICS]; the content is described in Section 6.4 of +[SEMANTICS].¶
+Receipt of an invalid sequence of frames MUST be treated as a connection error +of type H3_FRAME_UNEXPECTED; see Section 8. In particular, a DATA frame before +any HEADERS frame, or a HEADERS or DATA frame after the trailing HEADERS frame +is considered invalid. Other frame types, especially unknown frame types, +might be permitted subject to their own rules; see Section 9.¶
+A server MAY send one or more PUSH_PROMISE frames (Section 7.2.5) +before, after, or interleaved with the frames of a response message. These +PUSH_PROMISE frames are not part of the response; see Section 4.4 for more +details. PUSH_PROMISE frames are not permitted on push streams; a pushed +response that includes PUSH_PROMISE frames MUST be treated as a connection error +of type H3_FRAME_UNEXPECTED; see Section 8.¶
+Frames of unknown types (Section 9), including reserved frames +(Section 7.2.8) MAY be sent on a request or push stream before, after, or +interleaved with other frames described in this section.¶
+The HEADERS and PUSH_PROMISE frames might reference updates to the QPACK dynamic +table. While these updates are not directly part of the message exchange, they +must be received and processed before the message can be consumed. See +Section 4.1.1 for more details.¶
+Transfer codings (see Section 6.1 of [HTTP11]) are not defined for HTTP/3; +the Transfer-Encoding header field MUST NOT be used.¶
+A response MAY consist of multiple messages when and only when one or more +interim responses (1xx; see Section 15.2 of [SEMANTICS]) precede a final +response to the same request. Interim responses do not contain content +or trailers.¶
+An HTTP request/response exchange fully consumes a client-initiated +bidirectional QUIC stream. After sending a request, a client MUST close the +stream for sending. Unless using the CONNECT method (see Section 4.2), clients +MUST NOT make stream closure dependent on receiving a response to their request. +After sending a final response, the server MUST close the stream for sending. At +this point, the QUIC stream is fully closed.¶
+When a stream is closed, this indicates the end of the final HTTP message. +Because some messages are large or unbounded, endpoints SHOULD begin processing +partial HTTP messages once enough of the message has been received to make +progress. If a client-initiated stream terminates without enough of the HTTP +message to provide a complete response, the server SHOULD abort its response +with the error code H3_REQUEST_INCOMPLETE; see Section 8.¶
+A server can send a complete response prior to the client sending an entire +request if the response does not depend on any portion of the request that has +not been sent and received. When the server does not need to receive the +remainder of the request, it MAY abort reading the request stream, send a +complete response, and cleanly close the sending part of the stream. The error +code H3_NO_ERROR SHOULD be used when requesting that the client stop sending on +the request stream. Clients MUST NOT discard complete responses as a result of +having their request terminated abruptly, though clients can always discard +responses at their discretion for other reasons. If the server sends a partial +or complete response but does not abort reading the request, clients SHOULD +continue sending the body of the request and close the stream normally.¶
+HTTP messages carry metadata as a series of key-value pairs called HTTP fields; +see Sections 6.3 and 6.5 of [SEMANTICS]. For a listing of registered HTTP +fields, see the "Hypertext Transfer Protocol (HTTP) Field Name Registry" +maintained at https://www.iana.org/assignments/http-fields/.¶
+As in previous versions of HTTP, field names are strings containing a subset of +ASCII characters that are compared in a case-insensitive fashion. Properties of +HTTP field names and values are discussed in more detail in Section 5.1 of +[SEMANTICS]. As in HTTP/2, characters in field names MUST be converted to +lowercase prior to their encoding. A request or response containing uppercase +characters in field names MUST be treated as malformed (Section 4.1.3).¶
+Like HTTP/2, HTTP/3 does not use the Connection header field to indicate +connection-specific fields; in this protocol, connection-specific metadata is +conveyed by other means. An endpoint MUST NOT generate an HTTP/3 field section +containing connection-specific fields; any message containing +connection-specific fields MUST be treated as malformed (Section 4.1.3).¶
+The only exception to this is the TE header field, which MAY be present in an +HTTP/3 request header; when it is, it MUST NOT contain any value other than +"trailers".¶
+This means that an intermediary transforming an HTTP/1.x message to HTTP/3 will +need to remove any fields nominated by the Connection field, along with the +Connection field itself. Such intermediaries SHOULD also remove other +connection-specific fields, such as Keep-Alive, Proxy-Connection, +Transfer-Encoding, and Upgrade, even if they are not nominated by the Connection +field.¶
+Like HTTP/2, HTTP/3 employs a series of pseudo-header fields where the field +name begins with the ':' character (ASCII 0x3a). These pseudo-header fields +convey the target URI, the method of the request, and the status code for the +response.¶
+Pseudo-header fields are not HTTP fields. Endpoints MUST NOT generate +pseudo-header fields other than those defined in this document; however, an +extension could negotiate a modification of this restriction; see +Section 9.¶
+Pseudo-header fields are only valid in the context in which they are defined. +Pseudo-header fields defined for requests MUST NOT appear in responses; +pseudo-header fields defined for responses MUST NOT appear in requests. +Pseudo-header fields MUST NOT appear in trailers. Endpoints MUST treat a +request or response that contains undefined or invalid pseudo-header fields as +malformed (Section 4.1.3).¶
+All pseudo-header fields MUST appear in the header field section before regular +header fields. Any request or response that contains a pseudo-header field that +appears in a header field section after a regular header field MUST be treated +as malformed (Section 4.1.3).¶
+The following pseudo-header fields are defined for requests:¶
+Contains the scheme portion of the target URI (Section 3.1 of + [URI])¶
+":scheme" is not restricted to "http" and "https" schemed URIs. A proxy or +gateway can translate requests for non-HTTP schemes, enabling the use of +HTTP to interact with non-HTTP services.¶
+Contains the authority portion of the target URI (Section 3.2 of +[URI]). The authority MUST NOT include the deprecated "userinfo" +subcomponent for "http" or "https" schemed URIs.¶
+To ensure that the HTTP/1.1 request line can be reproduced accurately, this +pseudo-header field MUST be omitted when translating from an HTTP/1.1 +request that has a request target in origin or asterisk form; see Section +7.1 of [SEMANTICS]. Clients that generate HTTP/3 requests directly +SHOULD use the ":authority" pseudo-header field instead of the Host field. +An intermediary that converts an HTTP/3 request to HTTP/1.1 MUST create a +Host field if one is not present in a request by copying the value of the +":authority" pseudo-header field.¶
+Contains the path and query parts of the target URI (the "path-absolute" +production and optionally a '?' character followed by the "query" +production; see Sections 3.3 and 3.4 of [URI]. A request in +asterisk form includes the value '*' for the ":path" pseudo-header field.¶
+This pseudo-header field MUST NOT be empty for "http" or "https" URIs; +"http" or "https" URIs that do not contain a path component MUST include a +value of '/'. The exception to this rule is an OPTIONS request for an +"http" or "https" URI that does not include a path component; these MUST +include a ":path" pseudo-header field with a value of '*'; see Section 7.1 +of [SEMANTICS].¶
+All HTTP/3 requests MUST include exactly one value for the ":method", ":scheme", +and ":path" pseudo-header fields, unless it is a CONNECT request; see +Section 4.2.¶
+If the ":scheme" pseudo-header field identifies a scheme that has a mandatory +authority component (including "http" and "https"), the request MUST contain +either an ":authority" pseudo-header field or a "Host" header field. If these +fields are present, they MUST NOT be empty. If both fields are present, they +MUST contain the same value. If the scheme does not have a mandatory authority +component and none is provided in the request target, the request MUST NOT +contain the ":authority" pseudo-header or "Host" header fields.¶
+An HTTP request that omits mandatory pseudo-header fields or contains invalid +values for those pseudo-header fields is malformed (Section 4.1.3).¶
+HTTP/3 does not define a way to carry the version identifier that is included in +the HTTP/1.1 request line.¶
+For responses, a single ":status" pseudo-header field is defined that carries +the HTTP status code; see Section 15 of [SEMANTICS]. This pseudo-header +field MUST be included in all responses; otherwise, the response is malformed +(Section 4.1.3).¶
+HTTP/3 does not define a way to carry the version or reason phrase that is +included in an HTTP/1.1 status line.¶
+HTTP/3 uses QPACK field compression as described in [QPACK], a variation of +HPACK that allows the flexibility to avoid compression-induced head-of-line +blocking. See that document for additional details.¶
+To allow for better compression efficiency, the "Cookie" field ([RFC6265]) +MAY be split into separate field lines, each with one or more cookie-pairs, +before compression. If a decompressed field section contains multiple cookie +field lines, these MUST be concatenated into a single byte string using the +two-byte delimiter of 0x3b, 0x20 (the ASCII string "; ") before being passed +into a context other than HTTP/2 or HTTP/3, such as an HTTP/1.1 connection, or a +generic HTTP server application.¶
+An HTTP/3 implementation MAY impose a limit on the maximum size of the message +header it will accept on an individual HTTP message. A server that receives a +larger header section than it is willing to handle can send an HTTP 431 (Request +Header Fields Too Large) status code ([RFC6585]). A client can discard +responses that it cannot process. The size of a field list is calculated based +on the uncompressed size of fields, including the length of the name and value +in bytes plus an overhead of 32 bytes for each field.¶
+If an implementation wishes to advise its peer of this limit, it can be conveyed +as a number of bytes in the SETTINGS_MAX_FIELD_SECTION_SIZE parameter. An +implementation that has received this parameter SHOULD NOT send an HTTP message +header that exceeds the indicated size, as the peer will likely refuse to +process it. However, an HTTP message can traverse one or more intermediaries +before reaching the origin server; see Section 3.6 of [SEMANTICS]. Because +this limit is applied separately by each implementation which processes the +message, messages below this limit are not guaranteed to be accepted.¶
+Once a request stream has been opened, the request MAY be cancelled by either +endpoint. Clients cancel requests if the response is no longer of interest; +servers cancel requests if they are unable to or choose not to respond. When +possible, it is RECOMMENDED that servers send an HTTP response with an +appropriate status code rather than canceling a request it has already begun +processing.¶
+Implementations SHOULD cancel requests by abruptly terminating any +directions of a stream that are still open. This means resetting the +sending parts of streams and aborting reading on receiving parts of streams; +see Section 2.4 of [QUIC-TRANSPORT].¶
+When the server cancels a request without performing any application processing, +the request is considered "rejected." The server SHOULD abort its response +stream with the error code H3_REQUEST_REJECTED. In this context, "processed" +means that some data from the stream was passed to some higher layer of software +that might have taken some action as a result. The client can treat requests +rejected by the server as though they had never been sent at all, thereby +allowing them to be retried later.¶
+Servers MUST NOT use the H3_REQUEST_REJECTED error code for requests that were +partially or fully processed. When a server abandons a response after partial +processing, it SHOULD abort its response stream with the error code +H3_REQUEST_CANCELLED.¶
+Client SHOULD use the error code H3_REQUEST_CANCELLED to cancel requests. Upon +receipt of this error code, a server MAY abruptly terminate the response using +the error code H3_REQUEST_REJECTED if no processing was performed. Clients MUST +NOT use the H3_REQUEST_REJECTED error code, except when a server has requested +closure of the request stream with this error code.¶
+If a stream is canceled after receiving a complete response, the client MAY +ignore the cancellation and use the response. However, if a stream is cancelled +after receiving a partial response, the response SHOULD NOT be used. Only +idempotent actions such as GET, PUT, or DELETE can be safely retried; a client +SHOULD NOT automatically retry a request with a non-idempotent method unless it +has some means to know that the request semantics are idempotent +independent of the method or some means to detect that the original request was +never applied. See Section 9.2.2 of [SEMANTICS] for more details.¶
+A malformed request or response is one that is an otherwise valid sequence of +frames but is invalid due to:¶
+A request or response that is defined as having content when it contains a +Content-Length header field (Section 6.4.1 of [SEMANTICS]), +is malformed if the value of a Content-Length header field does not equal the +sum of the DATA frame lengths received. A response that is defined as never +having content, even when a Content-Length is present, can have a non-zero +Content-Length field even though no content is included in DATA frames.¶
+Intermediaries that process HTTP requests or responses (i.e., any intermediary +not acting as a tunnel) MUST NOT forward a malformed request or response. +Malformed requests or responses that are detected MUST be treated as a stream +error (Section 8) of type H3_MESSAGE_ERROR.¶
+For malformed requests, a server MAY send an HTTP response indicating the error +prior to closing or resetting the stream. Clients MUST NOT accept a malformed +response. Note that these requirements are intended to protect against several +types of common attacks against HTTP; they are deliberately strict because being +permissive can expose implementations to these vulnerabilities.¶
+The CONNECT method requests that the recipient establish a tunnel to the +destination origin server identified by the request-target; see Section 9.3.6 of +[SEMANTICS]. It is primarily used with HTTP proxies to establish a TLS +session with an origin server for the purposes of interacting with "https" +resources.¶
+In HTTP/1.x, CONNECT is used to convert an entire HTTP connection into a tunnel +to a remote host. In HTTP/2 and HTTP/3, the CONNECT method is used to establish +a tunnel over a single stream.¶
+A CONNECT request MUST be constructed as follows:¶
+The request stream remains open at the end of the request to carry the data to +be transferred. A CONNECT request that does not conform to these restrictions +is malformed; see Section 4.1.3.¶
+A proxy that supports CONNECT establishes a TCP connection ([RFC0793]) to the +server identified in the ":authority" pseudo-header field. Once this connection +is successfully established, the proxy sends a HEADERS frame containing a 2xx +series status code to the client, as defined in Section 15.3 of [SEMANTICS].¶
+All DATA frames on the stream correspond to data sent or received on the TCP +connection. The payload of any DATA frame sent by the client is transmitted by +the proxy to the TCP server; data received from the TCP server is packaged into +DATA frames by the proxy. Note that the size and number of TCP segments is not +guaranteed to map predictably to the size and number of HTTP DATA or QUIC STREAM +frames.¶
+Once the CONNECT method has completed, only DATA frames are permitted to be sent +on the stream. Extension frames MAY be used if specifically permitted by the +definition of the extension. Receipt of any other known frame type MUST be +treated as a connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
+The TCP connection can be closed by either peer. When the client ends the +request stream (that is, the receive stream at the proxy enters the "Data Recvd" +state), the proxy will set the FIN bit on its connection to the TCP server. When +the proxy receives a packet with the FIN bit set, it will close the send stream +that it sends to the client. TCP connections that remain half-closed in a +single direction are not invalid, but are often handled poorly by servers, so +clients SHOULD NOT close a stream for sending while they still expect to receive +data from the target of the CONNECT.¶
+A TCP connection error is signaled by abruptly terminating the stream. A proxy +treats any error in the TCP connection, which includes receiving a TCP segment +with the RST bit set, as a stream error of type H3_CONNECT_ERROR; see +Section 8. Correspondingly, if a proxy detects an error with the stream or the +QUIC connection, it MUST close the TCP connection. If the underlying TCP +implementation permits it, the proxy SHOULD send a TCP segment with the RST bit +set.¶
+HTTP/3 does not support the HTTP Upgrade mechanism (Section 7.8 of +[SEMANTICS]) or 101 (Switching Protocols) informational status code (Section +15.2.2 of [SEMANTICS]).¶
+Server push is an interaction mode that permits a server to push a +request-response exchange to a client in anticipation of the client making the +indicated request. This trades off network usage against a potential latency +gain. HTTP/3 server push is similar to what is described in Section 8.2 of +[HTTP2], but uses different mechanisms.¶
+Each server push is assigned a unique Push ID by the server. The Push ID is +used to refer to the push in various contexts throughout the lifetime of the +HTTP/3 connection.¶
+The Push ID space begins at zero, and ends at a maximum value set by the +MAX_PUSH_ID frame; see Section 7.2.7. In particular, a server is not +able to push until after the client sends a MAX_PUSH_ID frame. A client sends +MAX_PUSH_ID frames to control the number of pushes that a server can promise. A +server SHOULD use Push IDs sequentially, beginning from zero. A client MUST +treat receipt of a push stream as a connection error of type H3_ID_ERROR +(Section 8) when no MAX_PUSH_ID frame has been sent or when the stream +references a Push ID that is greater than the maximum Push ID.¶
+The Push ID is used in one or more PUSH_PROMISE frames (Section 7.2.5) +that carry the header section of the request message. These frames are sent on +the request stream that generated the push. This allows the server push to be +associated with a client request. When the same Push ID is promised on multiple +request streams, the decompressed request field sections MUST contain the same +fields in the same order, and both the name and the value in each field MUST be +identical.¶
+The Push ID is then included with the push stream that ultimately fulfills +those promises; see Section 6.2.2. The push stream identifies the Push ID of +the promise that it fulfills, then contains a response to the promised request +as described in Section 4.1.¶
+Finally, the Push ID can be used in CANCEL_PUSH frames; see +Section 7.2.3. Clients use this frame to indicate they do not wish to +receive a promised resource. Servers use this frame to indicate they will not +be fulfilling a previous promise.¶
+Not all requests can be pushed. A server MAY push requests that have the +following properties:¶
+The server MUST include a value in the ":authority" pseudo-header field for +which the server is authoritative; see Section 3.3.¶
+Clients SHOULD send a CANCEL_PUSH frame upon receipt of a PUSH_PROMISE frame +carrying a request that is not cacheable, is not known to be safe, that +indicates the presence of a request body, or for which it does not consider the +server authoritative. Any corresponding responses MUST NOT be used or cached.¶
+Each pushed response is associated with one or more client requests. The push +is associated with the request stream on which the PUSH_PROMISE frame was +received. The same server push can be associated with additional client +requests using a PUSH_PROMISE frame with the same Push ID on multiple request +streams. These associations do not affect the operation of the protocol, but +MAY be considered by user agents when deciding how to use pushed resources.¶
+Ordering of a PUSH_PROMISE frame in relation to certain parts of the response is +important. The server SHOULD send PUSH_PROMISE frames prior to sending HEADERS +or DATA frames that reference the promised responses. This reduces the chance +that a client requests a resource that will be pushed by the server.¶
+Due to reordering, push stream data can arrive before the corresponding +PUSH_PROMISE frame. When a client receives a new push stream with an +as-yet-unknown Push ID, both the associated client request and the pushed +request header fields are unknown. The client can buffer the stream data in +expectation of the matching PUSH_PROMISE. The client can use stream flow control +(see section 4.1 of [QUIC-TRANSPORT]) to limit the amount of data a server may +commit to the pushed stream.¶
+Push stream data can also arrive after a client has canceled a push. In this +case, the client can abort reading the stream with an error code of +H3_REQUEST_CANCELLED. This asks the server not to transfer additional data and +indicates that it will be discarded upon receipt.¶
+Pushed responses that are cacheable (see Section 3 of +[CACHING]) can be stored by the client, if it +implements an HTTP cache. Pushed responses are considered successfully +validated on the origin server (e.g., if the "no-cache" cache response directive +is present; see Section 5.2.2.3 of [CACHING]) at the time the pushed response +is received.¶
+Pushed responses that are not cacheable MUST NOT be stored by any HTTP cache. +They MAY be made available to the application separately.¶
+Once established, an HTTP/3 connection can be used for many requests and +responses over time until the connection is closed. Connection closure can +happen in any of several different ways.¶
+Each QUIC endpoint declares an idle timeout during the handshake. If the QUIC +connection remains idle (no packets received) for longer than this duration, the +peer will assume that the connection has been closed. HTTP/3 implementations +will need to open a new HTTP/3 connection for new requests if the existing +connection has been idle for longer than the idle timeout negotiated during the +QUIC handshake, and SHOULD do so if approaching the idle timeout; see Section +10.1 of [QUIC-TRANSPORT].¶
+HTTP clients are expected to request that the transport keep connections open +while there are responses outstanding for requests or server pushes, as +described in Section 10.1.2 of [QUIC-TRANSPORT]. If the client is not +expecting a response from the server, allowing an idle connection to time out is +preferred over expending effort maintaining a connection that might not be +needed. A gateway MAY maintain connections in anticipation of need rather than +incur the latency cost of connection establishment to servers. Servers SHOULD +NOT actively keep connections open.¶
+Even when a connection is not idle, either endpoint can decide to stop using the +connection and initiate a graceful connection close. Endpoints initiate the +graceful shutdown of an HTTP/3 connection by sending a GOAWAY frame +(Section 7.2.6). The GOAWAY frame contains an identifier that indicates to +the receiver the range of requests or pushes that were or might be processed in +this connection. The server sends a client-initiated bidirectional Stream ID; +the client sends a Push ID (Section 4.4). Requests or pushes with the +indicated identifier or greater are rejected (Section 4.1.2) by the +sender of the GOAWAY. This identifier MAY be zero if no requests or pushes were +processed.¶
+The information in the GOAWAY frame enables a client and server to agree on +which requests or pushes were accepted prior to the shutdown of the HTTP/3 +connection. Upon sending a GOAWAY frame, the endpoint SHOULD explicitly cancel +(see Section 4.1.2 and Section 7.2.3) any requests or pushes +that have identifiers greater than or equal to that indicated, in order to clean +up transport state for the affected streams. The endpoint SHOULD continue to do +so as more requests or pushes arrive.¶
+Endpoints MUST NOT initiate new requests or promise new pushes on the connection +after receipt of a GOAWAY frame from the peer. Clients MAY establish a new +connection to send additional requests.¶
+Some requests or pushes might already be in transit:¶
+Upon receipt of a GOAWAY frame, if the client has already sent requests with +a Stream ID greater than or equal to the identifier contained in the GOAWAY +frame, those requests will not be processed. Clients can safely retry +unprocessed requests on a different HTTP connection. A client that is +unable to retry requests loses all requests that are in flight when the +server closes the connection.¶
++Requests on Stream IDs less than the Stream ID in a GOAWAY frame from the +server might have been processed; their status cannot be known until a +response is received, the stream is reset individually, another GOAWAY is +received, or the connection terminates.¶
++Servers MAY reject individual requests on streams below the indicated ID if +these requests were not processed.¶
+Servers SHOULD send a GOAWAY frame when the closing of a connection is known +in advance, even if the advance notice is small, so that the remote peer can +know whether a request has been partially processed or not. For example, if an +HTTP client sends a POST at the same time that a server closes a QUIC +connection, the client cannot know if the server started to process that POST +request if the server does not send a GOAWAY frame to indicate what streams it +might have acted on.¶
+An endpoint MAY send multiple GOAWAY frames indicating different identifiers, +but the identifier in each frame MUST NOT be greater than the identifier in any +previous frame, since clients might already have retried unprocessed requests on +another HTTP connection. Receiving a GOAWAY containing a larger identifier than +previously received MUST be treated as a connection error of type H3_ID_ERROR; +see Section 8.¶
+An endpoint that is attempting to gracefully shut down a connection can send a +GOAWAY frame with a value set to the maximum possible value (2^62-4 for servers, +2^62-1 for clients). This ensures that the peer stops creating new requests or +pushes. After allowing time for any in-flight requests or pushes to arrive, the +endpoint can send another GOAWAY frame indicating which requests or pushes it +might accept before the end of the connection. This ensures that a connection +can be cleanly shut down without losing requests.¶
+A client has more flexibility in the value it chooses for the Push ID in a +GOAWAY that it sends. A value of 2^62 - 1 indicates that the server can +continue fulfilling pushes that have already been promised. A smaller value +indicates the client will reject pushes with Push IDs greater than or equal to +this value. Like the server, the client MAY send subsequent GOAWAY frames so +long as the specified Push ID is no greater than any previously sent value.¶
+Even when a GOAWAY indicates that a given request or push will not be processed +or accepted upon receipt, the underlying transport resources still exist. The +endpoint that initiated these requests can cancel them to clean up transport +state.¶
+Once all accepted requests and pushes have been processed, the endpoint can +permit the connection to become idle, or MAY initiate an immediate closure of +the connection. An endpoint that completes a graceful shutdown SHOULD use the +H3_NO_ERROR error code when closing the connection.¶
+If a client has consumed all available bidirectional stream IDs with requests, +the server need not send a GOAWAY frame, since the client is unable to make +further requests.¶
+An HTTP/3 implementation can immediately close the QUIC connection at any time. +This results in sending a QUIC CONNECTION_CLOSE frame to the peer indicating +that the application layer has terminated the connection. The application error +code in this frame indicates to the peer why the connection is being closed. +See Section 8 for error codes that can be used when closing a connection in +HTTP/3.¶
+Before closing the connection, a GOAWAY frame MAY be sent to allow the client to +retry some requests. Including the GOAWAY frame in the same packet as the QUIC +CONNECTION_CLOSE frame improves the chances of the frame being received by +clients.¶
+For various reasons, the QUIC transport could indicate to the application layer +that the connection has terminated. This might be due to an explicit closure +by the peer, a transport-level error, or a change in network topology that +interrupts connectivity.¶
+If a connection terminates without a GOAWAY frame, clients MUST assume that any +request that was sent, whether in whole or in part, might have been processed.¶
+A QUIC stream provides reliable in-order delivery of bytes, but makes no +guarantees about order of delivery with regard to bytes on other streams. On the +wire, data is framed into QUIC STREAM frames, but this framing is invisible to +the HTTP framing layer. The transport layer buffers and orders received QUIC +STREAM frames, exposing the data contained within as a reliable byte stream to +the application. Although QUIC permits out-of-order delivery within a stream, +HTTP/3 does not make use of this feature.¶
+QUIC streams can be either unidirectional, carrying data only from initiator to +receiver, or bidirectional. Streams can be initiated by either the client or +the server. For more detail on QUIC streams, see Section 2 of +[QUIC-TRANSPORT].¶
+When HTTP fields and data are sent over QUIC, the QUIC layer handles most of +the stream management. HTTP does not need to do any separate multiplexing when +using QUIC - data sent over a QUIC stream always maps to a particular HTTP +transaction or to the entire HTTP/3 connection context.¶
+All client-initiated bidirectional streams are used for HTTP requests and +responses. A bidirectional stream ensures that the response can be readily +correlated with the request. These streams are referred to as request streams.¶
+This means that the client's first request occurs on QUIC stream 0, with +subsequent requests on stream 4, 8, and so on. In order to permit these streams +to open, an HTTP/3 server SHOULD configure non-zero minimum values for the +number of permitted streams and the initial stream flow control window. So as +to not unnecessarily limit parallelism, at least 100 requests SHOULD be +permitted at a time.¶
+HTTP/3 does not use server-initiated bidirectional streams, though an extension +could define a use for these streams. Clients MUST treat receipt of a +server-initiated bidirectional stream as a connection error of type +H3_STREAM_CREATION_ERROR (Section 8) unless such an extension has been +negotiated.¶
+Unidirectional streams, in either direction, are used for a range of purposes. +The purpose is indicated by a stream type, which is sent as a variable-length +integer at the start of the stream. The format and structure of data that +follows this integer is determined by the stream type.¶
+Two stream types are defined in this document: control streams +(Section 6.2.1) and push streams (Section 6.2.2). [QPACK] defines two +additional stream types. Other stream types can be defined by extensions to +HTTP/3; see Section 9 for more details. Some stream types are reserved +(Section 6.2.3).¶
+The performance of HTTP/3 connections in the early phase of their lifetime is +sensitive to the creation and exchange of data on unidirectional streams. +Endpoints that excessively restrict the number of streams or the flow control +window of these streams will increase the chance that the remote peer reaches +the limit early and becomes blocked. In particular, implementations should +consider that remote peers may wish to exercise reserved stream behavior +(Section 6.2.3) with some of the unidirectional streams they are permitted +to use. To avoid blocking, the transport parameters sent by both clients and +servers MUST allow the peer to create at least one unidirectional stream for the +HTTP control stream plus the number of unidirectional streams required by +mandatory extensions (three being the minimum number required for the base +HTTP/3 protocol and QPACK), and SHOULD provide at least 1,024 bytes of flow +control credit to each stream.¶
+Note that an endpoint is not required to grant additional credits to create more +unidirectional streams if its peer consumes all the initial credits before +creating the critical unidirectional streams. Endpoints SHOULD create the HTTP +control stream as well as the unidirectional streams required by mandatory +extensions (such as the QPACK encoder and decoder streams) first, and then +create additional streams as allowed by their peer.¶
+If the stream header indicates a stream type that is not supported by the +recipient, the remainder of the stream cannot be consumed as the semantics are +unknown. Recipients of unknown stream types MAY abort reading of the stream with +an error code of H3_STREAM_CREATION_ERROR or a reserved error code +(Section 8.1), but MUST NOT consider such streams to be a connection +error of any kind.¶
+Implementations MAY send stream types before knowing whether the peer supports +them. However, stream types that could modify the state or semantics of +existing protocol components, including QPACK or other extensions, MUST NOT be +sent until the peer is known to support them.¶
+A sender can close or reset a unidirectional stream unless otherwise specified. +A receiver MUST tolerate unidirectional streams being closed or reset prior to +the reception of the unidirectional stream header.¶
+A control stream is indicated by a stream type of 0x00. Data on this stream +consists of HTTP/3 frames, as defined in Section 7.2.¶
+Each side MUST initiate a single control stream at the beginning of the +connection and send its SETTINGS frame as the first frame on this stream. If +the first frame of the control stream is any other frame type, this MUST be +treated as a connection error of type H3_MISSING_SETTINGS. Only one control +stream per peer is permitted; receipt of a second stream claiming to be a +control stream MUST be treated as a connection error of type +H3_STREAM_CREATION_ERROR. The sender MUST NOT close the control stream, and the +receiver MUST NOT request that the sender close the control stream. If either +control stream is closed at any point, this MUST be treated as a connection +error of type H3_CLOSED_CRITICAL_STREAM. Connection errors are described in +Section 8.¶
+A pair of unidirectional streams is used rather than a single bidirectional +stream. This allows either peer to send data as soon as it is able. Depending +on whether 0-RTT is enabled on the QUIC connection, either client or server +might be able to send stream data first after the cryptographic handshake +completes.¶
+Server push is an optional feature introduced in HTTP/2 that allows a server to +initiate a response before a request has been made. See Section 4.4 for +more details.¶
+A push stream is indicated by a stream type of 0x01, followed by the Push ID +of the promise that it fulfills, encoded as a variable-length integer. The +remaining data on this stream consists of HTTP/3 frames, as defined in +Section 7.2, and fulfills a promised server push by zero or more interim HTTP +responses followed by a single final HTTP response, as defined in +Section 4.1. Server push and Push IDs are described in +Section 4.4.¶
+Only servers can push; if a server receives a client-initiated push stream, this +MUST be treated as a connection error of type H3_STREAM_CREATION_ERROR; see +Section 8.¶
+Each Push ID MUST only be used once in a push stream header. If a push stream +header includes a Push ID that was used in another push stream header, the +client MUST treat this as a connection error of type H3_ID_ERROR; see +Section 8.¶
+Stream types of the format 0x1f * N + 0x21
for non-negative integer values of
+N are reserved to exercise the requirement that unknown types be ignored. These
+streams have no semantics, and can be sent when application-layer padding is
+desired. They MAY also be sent on connections where no data is currently being
+transferred. Endpoints MUST NOT consider these streams to have any meaning upon
+receipt.¶
The payload and length of the stream are selected in any manner the sending +implementation chooses. When sending a reserved stream type, the implementation +MAY either terminate the stream cleanly or reset it. When resetting the stream, +either the H3_NO_ERROR error code or a reserved error code +(Section 8.1) SHOULD be used.¶
+HTTP frames are carried on QUIC streams, as described in Section 6. +HTTP/3 defines three stream types: control stream, request stream, and push +stream. This section describes HTTP/3 frame formats and their permitted stream +types; see Table 1 for an overview. A comparison between +HTTP/2 and HTTP/3 frames is provided in Appendix A.2.¶
+Frame | +Control Stream | +Request Stream | +Push Stream | +Section | +
---|---|---|---|---|
DATA | +No | +Yes | +Yes | ++ Section 7.2.1 + | +
HEADERS | +No | +Yes | +Yes | ++ Section 7.2.2 + | +
CANCEL_PUSH | +Yes | +No | +No | ++ Section 7.2.3 + | +
SETTINGS | +Yes (1) | +No | +No | ++ Section 7.2.4 + | +
PUSH_PROMISE | +No | +Yes | +No | ++ Section 7.2.5 + | +
GOAWAY | +Yes | +No | +No | ++ Section 7.2.6 + | +
MAX_PUSH_ID | +Yes | +No | +No | ++ Section 7.2.7 + | +
Reserved | +Yes | +Yes | +Yes | ++ Section 7.2.8 + | +
Certain frames can only occur as the first frame of a particular stream type; +these are indicated in Table 1 with a (1). Specific guidance +is provided in the relevant section.¶
+Note that, unlike QUIC frames, HTTP/3 frames can span multiple packets.¶
+All frames have the following format:¶
+A frame includes the following fields:¶
+A variable-length integer that identifies the frame type.¶
+A variable-length integer that describes the length in bytes of +the Frame Payload.¶
+A payload, the semantics of which are determined by the Type field.¶
+Each frame's payload MUST contain exactly the fields identified in its +description. A frame payload that contains additional bytes after the +identified fields or a frame payload that terminates before the end of the +identified fields MUST be treated as a connection error of type +H3_FRAME_ERROR; see Section 8.¶
+When a stream terminates cleanly, if the last frame on the stream was truncated, +this MUST be treated as a connection error of type H3_FRAME_ERROR; see +Section 8. Streams that terminate abruptly may be reset at any point in a +frame.¶
+DATA frames (type=0x0) convey arbitrary, variable-length sequences of bytes +associated with HTTP request or response content.¶
+DATA frames MUST be associated with an HTTP request or response. If a DATA +frame is received on a control stream, the recipient MUST respond with a +connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
+The HEADERS frame (type=0x1) is used to carry an HTTP field section, encoded +using QPACK. See [QPACK] for more details.¶
+HEADERS frames can only be sent on request or push streams. If a HEADERS frame +is received on a control stream, the recipient MUST respond with a connection +error (Section 8) of type H3_FRAME_UNEXPECTED.¶
+The CANCEL_PUSH frame (type=0x3) is used to request cancellation of a server +push prior to the push stream being received. The CANCEL_PUSH frame identifies +a server push by Push ID (see Section 4.4), encoded as a variable-length +integer.¶
+When a client sends CANCEL_PUSH, it is indicating that it does not wish to +receive the promised resource. The server SHOULD abort sending the resource, +but the mechanism to do so depends on the state of the corresponding push +stream. If the server has not yet created a push stream, it does not create +one. If the push stream is open, the server SHOULD abruptly terminate that +stream. If the push stream has already ended, the server MAY still abruptly +terminate the stream or MAY take no action.¶
+A server sends CANCEL_PUSH to indicate that it will not be fulfilling a promise +which was previously sent. The client cannot expect the corresponding promise +to be fulfilled, unless it has already received and processed the promised +response. Regardless of whether a push stream has been opened, a server +SHOULD send a CANCEL_PUSH frame when it determines that promise will not be +fulfilled. If a stream has already been opened, the server can +abort sending on the stream with an error code of H3_REQUEST_CANCELLED.¶
+Sending a CANCEL_PUSH frame has no direct effect on the state of existing push +streams. A client SHOULD NOT send a CANCEL_PUSH frame when it has already +received a corresponding push stream. A push stream could arrive after a client +has sent a CANCEL_PUSH frame, because a server might not have processed the +CANCEL_PUSH. The client SHOULD abort reading the stream with an error code of +H3_REQUEST_CANCELLED.¶
+A CANCEL_PUSH frame is sent on the control stream. Receiving a CANCEL_PUSH +frame on a stream other than the control stream MUST be treated as a connection +error of type H3_FRAME_UNEXPECTED.¶
+The CANCEL_PUSH frame carries a Push ID encoded as a variable-length integer. +The Push ID identifies the server push that is being cancelled; see +Section 4.4. If a CANCEL_PUSH frame is received that references a Push ID +greater than currently allowed on the connection, this MUST be treated as a +connection error of type H3_ID_ERROR.¶
+If the client receives a CANCEL_PUSH frame, that frame might identify a Push ID +that has not yet been mentioned by a PUSH_PROMISE frame due to reordering. If a +server receives a CANCEL_PUSH frame for a Push ID that has not yet been +mentioned by a PUSH_PROMISE frame, this MUST be treated as a connection error of +type H3_ID_ERROR.¶
+The SETTINGS frame (type=0x4) conveys configuration parameters that affect how +endpoints communicate, such as preferences and constraints on peer behavior. +Individually, a SETTINGS parameter can also be referred to as a "setting"; the +identifier and value of each setting parameter can be referred to as a "setting +identifier" and a "setting value".¶
+SETTINGS frames always apply to an entire HTTP/3 connection, never a single +stream. A SETTINGS frame MUST be sent as the first frame of each control stream +(see Section 6.2.1) by each peer, and MUST NOT be sent subsequently. If an +endpoint receives a second SETTINGS frame on the control stream, the endpoint +MUST respond with a connection error of type H3_FRAME_UNEXPECTED.¶
+SETTINGS frames MUST NOT be sent on any stream other than the control stream. +If an endpoint receives a SETTINGS frame on a different stream, the endpoint +MUST respond with a connection error of type H3_FRAME_UNEXPECTED.¶
+SETTINGS parameters are not negotiated; they describe characteristics of the +sending peer that can be used by the receiving peer. However, a negotiation +can be implied by the use of SETTINGS - each peer uses SETTINGS to advertise a +set of supported values. The definition of the setting would describe how each +peer combines the two sets to conclude which choice will be used. SETTINGS does +not provide a mechanism to identify when the choice takes effect.¶
+Different values for the same parameter can be advertised by each peer. For +example, a client might be willing to consume a very large response field +section, while servers are more cautious about request size.¶
+The same setting identifier MUST NOT occur more than once in the SETTINGS frame. +A receiver MAY treat the presence of duplicate setting identifiers as a +connection error of type H3_SETTINGS_ERROR.¶
+The payload of a SETTINGS frame consists of zero or more parameters. Each +parameter consists of a setting identifier and a value, both encoded as QUIC +variable-length integers.¶
+An implementation MUST ignore the contents for any SETTINGS identifier it does +not understand.¶
+The following settings are defined in HTTP/3:¶
+The default value is unlimited. See Section 4.1.1.3 for usage.¶
+Setting identifiers of the format 0x1f * N + 0x21
for non-negative integer
+values of N are reserved to exercise the requirement that unknown identifiers be
+ignored. Such settings have no defined meaning. Endpoints SHOULD include at
+least one such setting in their SETTINGS frame. Endpoints MUST NOT consider such
+settings to have any meaning upon receipt.¶
Because the setting has no defined meaning, the value of the setting can be any +value the implementation selects.¶
+Setting identifiers which were used in HTTP/2 where there is no corresponding +HTTP/3 setting have also been reserved (Section 11.2.2). These settings MUST +NOT be sent, and their receipt MUST be treated as a connection error of type +H3_SETTINGS_ERROR.¶
+Additional settings can be defined by extensions to HTTP/3; see Section 9 +for more details.¶
+An HTTP implementation MUST NOT send frames or requests that would be invalid +based on its current understanding of the peer's settings.¶
+All settings begin at an initial value. Each endpoint SHOULD use these initial +values to send messages before the peer's SETTINGS frame has arrived, as packets +carrying the settings can be lost or delayed. When the SETTINGS frame arrives, +any settings are changed to their new values.¶
+This removes the need to wait for the SETTINGS frame before sending messages. +Endpoints MUST NOT require any data to be received from the peer prior to +sending the SETTINGS frame; settings MUST be sent as soon as the transport is +ready to send data.¶
+For servers, the initial value of each client setting is the default value.¶
+For clients using a 1-RTT QUIC connection, the initial value of each server +setting is the default value. 1-RTT keys will always become available prior to +the packet containing SETTINGS being processed by QUIC, even if the server sends +SETTINGS immediately. Clients SHOULD NOT wait indefinitely for SETTINGS to +arrive before sending requests, but SHOULD process received datagrams in order +to increase the likelihood of processing SETTINGS before sending the first +request.¶
+When a 0-RTT QUIC connection is being used, the initial value of each server +setting is the value used in the previous session. Clients SHOULD store the +settings the server provided in the HTTP/3 connection where resumption +information was provided, but MAY opt not to store settings in certain cases +(e.g., if the session ticket is received before the SETTINGS frame). A client +MUST comply with stored settings -- or default values, if no values are stored +-- when attempting 0-RTT. Once a server has provided new settings, clients MUST +comply with those values.¶
+A server can remember the settings that it advertised, or store an +integrity-protected copy of the values in the ticket and recover the information +when accepting 0-RTT data. A server uses the HTTP/3 settings values in +determining whether to accept 0-RTT data. If the server cannot determine that +the settings remembered by a client are compatible with its current settings, it +MUST NOT accept 0-RTT data. Remembered settings are compatible if a client +complying with those settings would not violate the server's current settings.¶
+A server MAY accept 0-RTT and subsequently provide different settings in its +SETTINGS frame. If 0-RTT data is accepted by the server, its SETTINGS frame MUST +NOT reduce any limits or alter any values that might be violated by the client +with its 0-RTT data. The server MUST include all settings that differ from +their default values. If a server accepts 0-RTT but then sends settings that +are not compatible with the previously specified settings, this MUST be treated +as a connection error of type H3_SETTINGS_ERROR. If a server accepts 0-RTT but +then sends a SETTINGS frame that omits a setting value that the client +understands (apart from reserved setting identifiers) that was previously +specified to have a non-default value, this MUST be treated as a connection +error of type H3_SETTINGS_ERROR.¶
+The PUSH_PROMISE frame (type=0x5) is used to carry a promised request header +field section from server to client on a request stream, as in HTTP/2.¶
+The payload consists of:¶
+A variable-length integer that identifies the server push operation. A Push +ID is used in push stream headers (Section 4.4) and CANCEL_PUSH frames +(Section 7.2.3).¶
+QPACK-encoded request header fields for the promised response. See [QPACK] +for more details.¶
+A server MUST NOT use a Push ID that is larger than the client has provided in a +MAX_PUSH_ID frame (Section 7.2.7). A client MUST treat receipt of a +PUSH_PROMISE frame that contains a larger Push ID than the client has advertised +as a connection error of H3_ID_ERROR.¶
+A server MAY use the same Push ID in multiple PUSH_PROMISE frames. If so, the +decompressed request header sets MUST contain the same fields in the same order, +and both the name and the value in each field MUST be exact matches. Clients +SHOULD compare the request header sections for resources promised multiple +times. If a client receives a Push ID that has already been promised and detects +a mismatch, it MUST respond with a connection error of type +H3_GENERAL_PROTOCOL_ERROR. If the decompressed field sections match exactly, the +client SHOULD associate the pushed content with each stream on which a +PUSH_PROMISE frame was received.¶
+Allowing duplicate references to the same Push ID is primarily to reduce +duplication caused by concurrent requests. A server SHOULD avoid reusing a Push +ID over a long period. Clients are likely to consume server push responses and +not retain them for reuse over time. Clients that see a PUSH_PROMISE frame that +uses a Push ID that they have already consumed and discarded are forced to +ignore the promise.¶
+If a PUSH_PROMISE frame is received on the control stream, the client MUST +respond with a connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
+A client MUST NOT send a PUSH_PROMISE frame. A server MUST treat the receipt of +a PUSH_PROMISE frame as a connection error of type H3_FRAME_UNEXPECTED; see +Section 8.¶
+See Section 4.4 for a description of the overall server push mechanism.¶
+The GOAWAY frame (type=0x7) is used to initiate graceful shutdown of an HTTP/3 +connection by either endpoint. GOAWAY allows an endpoint to stop accepting new +requests or pushes while still finishing processing of previously received +requests and pushes. This enables administrative actions, like server +maintenance. GOAWAY by itself does not close a connection.¶
+The GOAWAY frame is always sent on the control stream. In the server to client +direction, it carries a QUIC Stream ID for a client-initiated bidirectional +stream encoded as a variable-length integer. A client MUST treat receipt of a +GOAWAY frame containing a Stream ID of any other type as a connection error of +type H3_ID_ERROR.¶
+In the client to server direction, the GOAWAY frame carries a Push ID encoded as +a variable-length integer.¶
+The GOAWAY frame applies to the entire connection, not a specific stream. A +client MUST treat a GOAWAY frame on a stream other than the control stream as a +connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
+See Section 5.2 for more information on the use of the GOAWAY frame.¶
+The MAX_PUSH_ID frame (type=0xd) is used by clients to control the number of +server pushes that the server can initiate. This sets the maximum value for a +Push ID that the server can use in PUSH_PROMISE and CANCEL_PUSH frames. +Consequently, this also limits the number of push streams that the server can +initiate in addition to the limit maintained by the QUIC transport.¶
+The MAX_PUSH_ID frame is always sent on the control stream. Receipt of a +MAX_PUSH_ID frame on any other stream MUST be treated as a connection error of +type H3_FRAME_UNEXPECTED.¶
+A server MUST NOT send a MAX_PUSH_ID frame. A client MUST treat the receipt of +a MAX_PUSH_ID frame as a connection error of type H3_FRAME_UNEXPECTED.¶
+The maximum Push ID is unset when an HTTP/3 connection is created, meaning that +a server cannot push until it receives a MAX_PUSH_ID frame. A client that +wishes to manage the number of promised server pushes can increase the maximum +Push ID by sending MAX_PUSH_ID frames as the server fulfills or cancels server +pushes.¶
+The MAX_PUSH_ID frame carries a single variable-length integer that identifies +the maximum value for a Push ID that the server can use; see Section 4.4. A +MAX_PUSH_ID frame cannot reduce the maximum Push ID; receipt of a MAX_PUSH_ID +frame that contains a smaller value than previously received MUST be treated as +a connection error of type H3_ID_ERROR.¶
+Frame types of the format 0x1f * N + 0x21
for non-negative integer values of N
+are reserved to exercise the requirement that unknown types be ignored
+(Section 9). These frames have no semantics, and MAY be sent on any stream
+where frames are allowed to be sent. This enables their use for
+application-layer padding. Endpoints MUST NOT consider these frames to have any
+meaning upon receipt.¶
The payload and length of the frames are selected in any manner the +implementation chooses.¶
+Frame types that were used in HTTP/2 where there is no corresponding HTTP/3 +frame have also been reserved (Section 11.2.1). These frame types MUST NOT be +sent, and their receipt MUST be treated as a connection error of type +H3_FRAME_UNEXPECTED.¶
+When a stream cannot be completed successfully, QUIC allows the application to +abruptly terminate (reset) that stream and communicate a reason; see Section 2.4 +of [QUIC-TRANSPORT]. This is referred to as a "stream error." An HTTP/3 +implementation can decide to close a QUIC stream and communicate the type of +error. Wire encodings of error codes are defined in Section 8.1. +Stream errors are distinct from HTTP status codes which indicate error +conditions. Stream errors indicate that the sender did not transfer or consume +the full request or response, while HTTP status codes indicate the result of a +request that was successfully received.¶
+If an entire connection needs to be terminated, QUIC similarly provides +mechanisms to communicate a reason; see Section 5.3 of [QUIC-TRANSPORT]. This +is referred to as a "connection error." Similar to stream errors, an HTTP/3 +implementation can terminate a QUIC connection and communicate the reason using +an error code from Section 8.1.¶
+Although the reasons for closing streams and connections are called "errors," +these actions do not necessarily indicate a problem with the connection or +either implementation. For example, a stream can be reset if the requested +resource is no longer needed.¶
+An endpoint MAY choose to treat a stream error as a connection error under +certain circumstances, closing the entire connection in response to a condition +on a single stream. Implementations need to consider the impact on outstanding +requests before making this choice.¶
+Because new error codes can be defined without negotiation (see Section 9), +use of an error code in an unexpected context or receipt of an unknown error +code MUST be treated as equivalent to H3_NO_ERROR. However, closing a stream +can have other effects regardless of the error code; for example, see +Section 4.1.¶
+The following error codes are defined for use when abruptly terminating streams, +aborting reading of streams, or immediately closing HTTP/3 connections.¶
+No error. This is used when the connection or stream needs to be closed, but +there is no error to signal.¶
+Peer violated protocol requirements in a way that does not match a more +specific error code, or endpoint declines to use the more specific error code.¶
+An internal error has occurred in the HTTP stack.¶
+The endpoint detected that its peer created a stream that it will not accept.¶
+A stream required by the HTTP/3 connection was closed or reset.¶
+A frame was received that was not permitted in the current state or on the +current stream.¶
+A frame that fails to satisfy layout requirements or with an invalid size +was received.¶
+The endpoint detected that its peer is exhibiting a behavior that might be +generating excessive load.¶
+A Stream ID or Push ID was used incorrectly, such as exceeding a limit, +reducing a limit, or being reused.¶
+An endpoint detected an error in the payload of a SETTINGS frame.¶
+No SETTINGS frame was received at the beginning of the control stream.¶
+A server rejected a request without performing any application processing.¶
+The request or its response (including pushed response) is cancelled.¶
+The client's stream terminated without containing a fully-formed request.¶
+An HTTP message was malformed and cannot be processed.¶
+The TCP connection established in response to a CONNECT request was reset or +abnormally closed.¶
+The requested operation cannot be served over HTTP/3. The peer should +retry over HTTP/1.1.¶
+Error codes of the format 0x1f * N + 0x21
for non-negative integer values of N
+are reserved to exercise the requirement that unknown error codes be treated as
+equivalent to H3_NO_ERROR (Section 9). Implementations SHOULD select an
+error code from this space with some probability when they would have sent
+H3_NO_ERROR.¶
HTTP/3 permits extension of the protocol. Within the limitations described in +this section, protocol extensions can be used to provide additional services or +alter any aspect of the protocol. Extensions are effective only within the +scope of a single HTTP/3 connection.¶
+This applies to the protocol elements defined in this document. This does not +affect the existing options for extending HTTP, such as defining new methods, +status codes, or fields.¶
+Extensions are permitted to use new frame types (Section 7.2), new settings +(Section 7.2.4.1), new error codes (Section 8), or new unidirectional +stream types (Section 6.2). Registries are established for +managing these extension points: frame types (Section 11.2.1), settings +(Section 11.2.2), error codes (Section 11.2.3), and stream types +(Section 11.2.4).¶
+Implementations MUST ignore unknown or unsupported values in all extensible +protocol elements. Implementations MUST discard frames and unidirectional +streams that have unknown or unsupported types. This means that any of these +extension points can be safely used by extensions without prior arrangement or +negotiation. However, where a known frame type is required to be in a specific +location, such as the SETTINGS frame as the first frame of the control stream +(see Section 6.2.1), an unknown frame type does not satisfy that +requirement and SHOULD be treated as an error.¶
+Extensions that could change the semantics of existing protocol components MUST +be negotiated before being used. For example, an extension that changes the +layout of the HEADERS frame cannot be used until the peer has given a positive +signal that this is acceptable. Coordinating when such a revised layout comes +into effect could prove complex. As such, allocating new identifiers for +new definitions of existing protocol elements is likely to be more effective.¶
+This document does not mandate a specific method for negotiating the use of an +extension but notes that a setting (Section 7.2.4.1) could be used for +that purpose. If both peers set a value that indicates willingness to use the +extension, then the extension can be used. If a setting is used for extension +negotiation, the default value MUST be defined in such a fashion that the +extension is disabled if the setting is omitted.¶
+The security considerations of HTTP/3 should be comparable to those of HTTP/2 +with TLS. However, many of the considerations from Section 10 of [HTTP2] +apply to [QUIC-TRANSPORT] and are discussed in that document.¶
+ +The use of ALPN in the TLS and QUIC handshakes establishes the target +application protocol before application-layer bytes are processed. This ensures +that endpoints have strong assurances that peers are using the same protocol.¶
+This does not guarantee protection from all cross-protocol attacks. Section +21.5 of [QUIC-TRANSPORT] describes some ways in which the plaintext of QUIC +packets can be used to perform request forgery against endpoints that don't use +authenticated transports.¶
+The HTTP/3 field encoding allows the expression of names that are not valid +field names in the syntax used by HTTP (Section 5.1 of [SEMANTICS]). +Requests or responses containing invalid field names MUST be treated as +malformed (Section 4.1.3). An intermediary therefore cannot translate an HTTP/3 +request or response containing an invalid field name into an HTTP/1.1 message.¶
+Similarly, HTTP/3 can transport field values that are not valid. While most +values that can be encoded will not alter field parsing, carriage return (CR, +ASCII 0xd), line feed (LF, ASCII 0xa), and the zero character (NUL, ASCII 0x0) +might be exploited by an attacker if they are translated verbatim. Any request +or response that contains a character not permitted in a field value MUST be +treated as malformed (Section 4.1.3). Valid characters are defined by the +"field-content" ABNF rule in Section 5.5 of [SEMANTICS].¶
+Pushed responses do not have an explicit request from the client; the request is +provided by the server in the PUSH_PROMISE frame.¶
+Caching responses that are pushed is possible based on the guidance provided by +the origin server in the Cache-Control header field. However, this can cause +issues if a single server hosts more than one tenant. For example, a server +might offer multiple users each a small portion of its URI space.¶
+Where multiple tenants share space on the same server, that server MUST ensure +that tenants are not able to push representations of resources that they do not +have authority over. Failure to enforce this would allow a tenant to provide a +representation that would be served out of cache, overriding the actual +representation that the authoritative tenant provides.¶
+Clients are required to reject pushed responses for which an origin server is +not authoritative; see Section 4.4.¶
+An HTTP/3 connection can demand a greater commitment of resources to operate +than an HTTP/1.1 or HTTP/2 connection. The use of field compression and flow +control depend on a commitment of resources for storing a greater amount of +state. Settings for these features ensure that memory commitments for these +features are strictly bounded.¶
+The number of PUSH_PROMISE frames is constrained in a similar fashion. A client +that accepts server push SHOULD limit the number of Push IDs it issues at a +time.¶
+Processing capacity cannot be guarded as effectively as state capacity.¶
+The ability to send undefined protocol elements that the peer is required to +ignore can be abused to cause a peer to expend additional processing time. This +might be done by setting multiple undefined SETTINGS parameters, unknown frame +types, or unknown stream types. Note, however, that some uses are entirely +legitimate, such as optional-to-understand extensions and padding to increase +resistance to traffic analysis.¶
+Compression of field sections also offers some opportunities to waste processing +resources; see Section 7 of [QPACK] for more details on potential abuses.¶
+All these features -- i.e., server push, unknown protocol elements, field +compression -- have legitimate uses. These features become a burden only when +they are used unnecessarily or to excess.¶
+An endpoint that does not monitor this behavior exposes itself to a risk of +denial-of-service attack. Implementations SHOULD track the use of these +features and set limits on their use. An endpoint MAY treat activity that is +suspicious as a connection error of type H3_EXCESSIVE_LOAD (Section 8), but +false positives will result in disrupting valid connections and requests.¶
+A large field section (Section 4.1) can cause an implementation to +commit a large amount of state. Header fields that are critical for routing can +appear toward the end of a header field section, which prevents streaming of the +header field section to its ultimate destination. This ordering and other +reasons, such as ensuring cache correctness, mean that an endpoint likely needs +to buffer the entire header field section. Since there is no hard limit to the +size of a field section, some endpoints could be forced to commit a large amount +of available memory for header fields.¶
+An endpoint can use the SETTINGS_MAX_FIELD_SECTION_SIZE +(Section 4.1.1.3) setting to advise peers of limits that might apply +on the size of field sections. This setting is only advisory, so endpoints MAY +choose to send field sections that exceed this limit and risk having the request +or response being treated as malformed. This setting is specific to an HTTP/3 +connection, so any request or response could encounter a hop with a lower, +unknown limit. An intermediary can attempt to avoid this problem by passing on +values presented by different peers, but they are not obligated to do so.¶
+A server that receives a larger field section than it is willing to handle can +send an HTTP 431 (Request Header Fields Too Large) status code ([RFC6585]). +A client can discard responses that it cannot process.¶
+The CONNECT method can be used to create disproportionate load on a proxy, +since stream creation is relatively inexpensive when compared to the creation +and maintenance of a TCP connection. A proxy might also maintain some resources +for a TCP connection beyond the closing of the stream that carries the CONNECT +request, since the outgoing TCP connection remains in the TIME_WAIT state. +Therefore, a proxy cannot rely on QUIC stream limits alone to control the +resources consumed by CONNECT requests.¶
+Compression can allow an attacker to recover secret data when it is compressed +in the same context as data under attacker control. HTTP/3 enables compression +of fields (Section 4.1.1); the following concerns also apply to the use +of HTTP compressed content-codings; see Section 8.5.1 of [SEMANTICS].¶
+There are demonstrable attacks on compression that exploit the characteristics +of the web (e.g., [BREACH]). The attacker induces multiple requests +containing varying plaintext, observing the length of the resulting ciphertext +in each, which reveals a shorter length when a guess about the secret is +correct.¶
+Implementations communicating on a secure channel MUST NOT compress content that +includes both confidential and attacker-controlled data unless separate +compression contexts are used for each source of data. Compression MUST NOT be +used if the source of data cannot be reliably determined.¶
+Further considerations regarding the compression of fields sections are +described in [QPACK].¶
+Padding can be used to obscure the exact size of frame content and is provided +to mitigate specific attacks within HTTP, for example, attacks where compressed +content includes both attacker-controlled plaintext and secret data (e.g., +[BREACH]).¶
+Where HTTP/2 employs PADDING frames and Padding fields in other frames to make a +connection more resistant to traffic analysis, HTTP/3 can either rely on +transport-layer padding or employ the reserved frame and stream types discussed +in Section 7.2.8 and Section 6.2.3. These methods of padding produce +different results in terms of the granularity of padding, how padding is +arranged in relation to the information that is being protected, whether padding +is applied in the case of packet loss, and how an implementation might control +padding.¶
+Reserved stream types can be used to give the appearance of sending traffic even +when the connection is idle. Because HTTP traffic often occurs in bursts, +apparent traffic can be used to obscure the timing or duration of such bursts, +even to the point of appearing to send a constant stream of data. However, as +such traffic is still flow controlled by the receiver, a failure to promptly +drain such streams and provide additional flow control credit can limit the +sender's ability to send real traffic.¶
+To mitigate attacks that rely on compression, disabling or limiting compression +might be preferable to padding as a countermeasure.¶
+Use of padding can result in less protection than might seem immediately +obvious. Redundant padding could even be counterproductive. At best, padding +only makes it more difficult for an attacker to infer length information by +increasing the number of frames an attacker has to observe. Incorrectly +implemented padding schemes can be easily defeated. In particular, randomized +padding with a predictable distribution provides very little protection; +similarly, padding payloads to a fixed size exposes information as payload sizes +cross the fixed-sized boundary, which could be possible if an attacker can +control plaintext.¶
+Several protocol elements contain nested length elements, typically in the form +of frames with an explicit length containing variable-length integers. This +could pose a security risk to an incautious implementer. An implementation MUST +ensure that the length of a frame exactly matches the length of the fields it +contains.¶
+The use of 0-RTT with HTTP/3 creates an exposure to replay attack. The +anti-replay mitigations in [HTTP-REPLAY] MUST be applied when using +HTTP/3 with 0-RTT.¶
+Certain HTTP implementations use the client address for logging or +access-control purposes. Since a QUIC client's address might change during a +connection (and future versions might support simultaneous use of multiple +addresses), such implementations will need to either actively retrieve the +client's current address or addresses when they are relevant or explicitly +accept that the original address might change.¶
+Several characteristics of HTTP/3 provide an observer an opportunity to +correlate actions of a single client or server over time. These include the +value of settings, the timing of reactions to stimulus, and the handling of any +features that are controlled by settings.¶
+As far as these create observable differences in behavior, they could be used as +a basis for fingerprinting a specific client.¶
+HTTP/3's preference for using a single QUIC connection allows correlation of a +user's activity on a site. Reusing connections for different origins allows +for correlation of activity across those origins.¶
+Several features of QUIC solicit immediate responses and can be used by an +endpoint to measure latency to their peer; this might have privacy implications +in certain scenarios.¶
+This document registers a new ALPN protocol ID (Section 11.1) and creates new +registries that manage the assignment of codepoints in HTTP/3.¶
+This document creates a new registration for the identification of +HTTP/3 in the "Application Layer Protocol Negotiation (ALPN) +Protocol IDs" registry established in [RFC7301].¶
+The "h3" string identifies HTTP/3:¶
+ +New registries created in this document operate under the QUIC registration +policy documented in Section 22.1 of [QUIC-TRANSPORT]. These registries all +include the common set of fields listed in Section 22.1.1 of [QUIC-TRANSPORT]. +These registries [SHALL be/are] collected under a "Hypertext Transfer Protocol +version 3 (HTTP/3) Parameters" heading.¶
+The initial allocations in these registries created in this document are all +assigned permanent status and list a change controller of the IETF and a contact +of the HTTP working group (ietf-http-wg@w3.org).¶
+This document establishes a registry for HTTP/3 frame type codes. The "HTTP/3 +Frame Type" registry governs a 62-bit space. This registry follows the QUIC +registry policy; see Section 11.2. Permanent registrations in this registry +are assigned using the Specification Required policy ([RFC8126]), except for +values between 0x00 and 0x3f (in hexadecimal; inclusive), which are assigned +using Standards Action or IESG Approval as defined in Section 4.9 and 4.10 of +[RFC8126].¶
+While this registry is separate from the "HTTP/2 Frame Type" registry defined in +[HTTP2], it is preferable that the assignments parallel each other where the +code spaces overlap. If an entry is present in only one registry, every effort +SHOULD be made to avoid assigning the corresponding value to an unrelated +operation.¶
+In addition to common fields as described in Section 11.2, permanent +registrations in this registry MUST include the following field:¶
+A name or label for the frame type.¶
+Specifications of frame types MUST include a description of the frame layout and +its semantics, including any parts of the frame that are conditionally present.¶
+The entries in Table 2 are registered by this document.¶
+Frame Type | +Value | +Specification | +
---|---|---|
DATA | +0x0 | ++ Section 7.2.1 + | +
HEADERS | +0x1 | ++ Section 7.2.2 + | +
Reserved | +0x2 | +N/A | +
CANCEL_PUSH | +0x3 | ++ Section 7.2.3 + | +
SETTINGS | +0x4 | ++ Section 7.2.4 + | +
PUSH_PROMISE | +0x5 | ++ Section 7.2.5 + | +
Reserved | +0x6 | +N/A | +
GOAWAY | +0x7 | ++ Section 7.2.6 + | +
Reserved | +0x8 | +N/A | +
Reserved | +0x9 | +N/A | +
MAX_PUSH_ID | +0xd | ++ Section 7.2.7 + | +
Each code of the format 0x1f * N + 0x21
for non-negative integer values of N
+(that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be assigned by
+IANA and MUST NOT appear in the listing of assigned values.¶
This document establishes a registry for HTTP/3 settings. The "HTTP/3 Settings" +registry governs a 62-bit space. This registry follows the QUIC registry +policy; see Section 11.2. Permanent registrations in this registry are +assigned using the Specification Required policy ([RFC8126]), except for +values between 0x00 and 0x3f (in hexadecimal; inclusive), which are assigned +using Standards Action or IESG Approval as defined in Section 4.9 and 4.10 of +[RFC8126].¶
+While this registry is separate from the "HTTP/2 Settings" registry defined in +[HTTP2], it is preferable that the assignments parallel each other. If an +entry is present in only one registry, every effort SHOULD be made to avoid +assigning the corresponding value to an unrelated operation.¶
+In addition to common fields as described in Section 11.2, permanent +registrations in this registry MUST include the following fields:¶
+A symbolic name for the setting. Specifying a setting name is optional.¶
+The value of the setting unless otherwise indicated. A default SHOULD be the +most restrictive possible value.¶
+The entries in Table 3 are registered by this document.¶
+Setting Name | +Value | +Specification | +Default | +
---|---|---|---|
Reserved | +0x2 | +N/A | +N/A | +
Reserved | +0x3 | +N/A | +N/A | +
Reserved | +0x4 | +N/A | +N/A | +
Reserved | +0x5 | +N/A | +N/A | +
MAX_FIELD_SECTION_SIZE | +0x6 | ++ Section 7.2.4.1 + | +Unlimited | +
Each code of the format 0x1f * N + 0x21
for non-negative integer values of N
+(that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be assigned by
+IANA and MUST NOT appear in the listing of assigned values.¶
This document establishes a registry for HTTP/3 error codes. The "HTTP/3 Error +Code" registry manages a 62-bit space. This registry follows the QUIC registry +policy; see Section 11.2. Permanent registrations in this registry are +assigned using the Specification Required policy ([RFC8126]), except for +values between 0x00 and 0x3f (in hexadecimal; inclusive), which are assigned +using Standards Action or IESG Approval as defined in Section 4.9 and 4.10 of +[RFC8126].¶
+Registrations for error codes are required to include a description of the +error code. An expert reviewer is advised to examine new registrations for +possible duplication with existing error codes. Use of existing +registrations is to be encouraged, but not mandated. Use of values that +are registered in the "HTTP/2 Error Code" registry is discouraged.¶
+In addition to common fields as described in Section 11.2, this registry +includes two additional fields. Permanent registrations in this registry MUST +include the following field:¶
+A name for the error code.¶
+A brief description of the error code semantics.¶
+The entries in Table 4 are registered by this document. These +error codes were selected from the range that operates on a Specification +Required policy to avoid collisions with HTTP/2 error codes.¶
+Name | +Value | +Description | +Specification | +
---|---|---|---|
H3_NO_ERROR | +0x0100 | +No error | ++ Section 8.1 + | +
H3_GENERAL_PROTOCOL_ERROR | +0x0101 | +General protocol error | ++ Section 8.1 + | +
H3_INTERNAL_ERROR | +0x0102 | +Internal error | ++ Section 8.1 + | +
H3_STREAM_CREATION_ERROR | +0x0103 | +Stream creation error | ++ Section 8.1 + | +
H3_CLOSED_CRITICAL_STREAM | +0x0104 | +Critical stream was closed | ++ Section 8.1 + | +
H3_FRAME_UNEXPECTED | +0x0105 | +Frame not permitted in the current state | ++ Section 8.1 + | +
H3_FRAME_ERROR | +0x0106 | +Frame violated layout or size rules | ++ Section 8.1 + | +
H3_EXCESSIVE_LOAD | +0x0107 | +Peer generating excessive load | ++ Section 8.1 + | +
H3_ID_ERROR | +0x0108 | +An identifier was used incorrectly | ++ Section 8.1 + | +
H3_SETTINGS_ERROR | +0x0109 | +SETTINGS frame contained invalid values | ++ Section 8.1 + | +
H3_MISSING_SETTINGS | +0x010a | +No SETTINGS frame received | ++ Section 8.1 + | +
H3_REQUEST_REJECTED | +0x010b | +Request not processed | ++ Section 8.1 + | +
H3_REQUEST_CANCELLED | +0x010c | +Data no longer needed | ++ Section 8.1 + | +
H3_REQUEST_INCOMPLETE | +0x010d | +Stream terminated early | ++ Section 8.1 + | +
H3_MESSAGE_ERROR | +0x010e | +Malformed message | ++ Section 8.1 + | +
H3_CONNECT_ERROR | +0x010f | +TCP reset or error on CONNECT request | ++ Section 8.1 + | +
H3_VERSION_FALLBACK | +0x0110 | +Retry over HTTP/1.1 | ++ Section 8.1 + | +
Each code of the format 0x1f * N + 0x21
for non-negative integer values of N
+(that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be assigned by
+IANA and MUST NOT appear in the listing of assigned values.¶
This document establishes a registry for HTTP/3 unidirectional stream types. The +"HTTP/3 Stream Type" registry governs a 62-bit space. This registry follows the +QUIC registry policy; see Section 11.2. Permanent registrations in this +registry are assigned using the Specification Required policy ([RFC8126]), +except for values between 0x00 and 0x3f (in hexadecimal; inclusive), which are +assigned using Standards Action or IESG Approval as defined in Section 4.9 and +4.10 of [RFC8126].¶
+In addition to common fields as described in Section 11.2, permanent +registrations in this registry MUST include the following fields:¶
+A name or label for the stream type.¶
+Which endpoint on an HTTP/3 connection may initiate a stream of this type. +Values are "Client", "Server", or "Both".¶
+Specifications for permanent registrations MUST include a description of the +stream type, including the layout and semantics of the stream contents.¶
+The entries in the following table are registered by this document.¶
+Stream Type | +Value | +Specification | +Sender | +
---|---|---|---|
Control Stream | +0x00 | ++ Section 6.2.1 + | +Both | +
Push Stream | +0x01 | ++ Section 4.4 + | +Server | +
Each code of the format 0x1f * N + 0x21
for non-negative integer values of N
+(that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be assigned by
+IANA and MUST NOT appear in the listing of assigned values.¶
HTTP/3 is strongly informed by HTTP/2, and bears many similarities. This +section describes the approach taken to design HTTP/3, points out important +differences from HTTP/2, and describes how to map HTTP/2 extensions into HTTP/3.¶
+HTTP/3 begins from the premise that similarity to HTTP/2 is preferable, but not +a hard requirement. HTTP/3 departs from HTTP/2 where QUIC differs from TCP, +either to take advantage of QUIC features (like streams) or to accommodate +important shortcomings (such as a lack of total ordering). These differences +make HTTP/3 similar to HTTP/2 in key aspects, such as the relationship of +requests and responses to streams. However, the details of the HTTP/3 design are +substantially different from HTTP/2.¶
+These departures are noted in this section.¶
+HTTP/3 permits use of a larger number of streams (2^62-1) than HTTP/2. The same +considerations about exhaustion of stream identifier space apply, though the +space is significantly larger such that it is likely that other limits in QUIC +are reached first, such as the limit on the connection flow control window.¶
+In contrast to HTTP/2, stream concurrency in HTTP/3 is managed by QUIC. QUIC +considers a stream closed when all data has been received and sent data has been +acknowledged by the peer. HTTP/2 considers a stream closed when the frame +containing the END_STREAM bit has been committed to the transport. As a result, +the stream for an equivalent exchange could remain "active" for a longer period +of time. HTTP/3 servers might choose to permit a larger number of concurrent +client-initiated bidirectional streams to achieve equivalent concurrency to +HTTP/2, depending on the expected usage patterns.¶
+Due to the presence of other unidirectional stream types, HTTP/3 does not rely +exclusively on the number of concurrent unidirectional streams to control the +number of concurrent in-flight pushes. Instead, HTTP/3 clients use the +MAX_PUSH_ID frame to control the number of pushes received from an HTTP/3 +server.¶
+Many framing concepts from HTTP/2 can be elided on QUIC, because the transport +deals with them. Because frames are already on a stream, they can omit the +stream number. Because frames do not block multiplexing (QUIC's multiplexing +occurs below this layer), the support for variable-maximum-length packets can be +removed. Because stream termination is handled by QUIC, an END_STREAM flag is +not required. This permits the removal of the Flags field from the generic +frame layout.¶
+Frame payloads are largely drawn from [HTTP2]. However, QUIC includes many +features (e.g., flow control) that are also present in HTTP/2. In these cases, +the HTTP mapping does not re-implement them. As a result, several HTTP/2 frame +types are not required in HTTP/3. Where an HTTP/2-defined frame is no longer +used, the frame ID has been reserved in order to maximize portability between +HTTP/2 and HTTP/3 implementations. However, even equivalent frames between the +two mappings are not identical.¶
+Many of the differences arise from the fact that HTTP/2 provides an absolute +ordering between frames across all streams, while QUIC provides this guarantee +on each stream only. As a result, if a frame type makes assumptions that frames +from different streams will still be received in the order sent, HTTP/3 will +break them.¶
+Some examples of feature adaptations are described below, as well as general +guidance to extension frame implementors converting an HTTP/2 extension to +HTTP/3.¶
+HTTP/2 specifies priority assignments in PRIORITY frames and (optionally) in +HEADERS frames. HTTP/3 does not provide a means of signaling priority.¶
+Note that while there is no explicit signaling for priority, this does not mean +that prioritization is not important for achieving good performance.¶
+HPACK was designed with the assumption of in-order delivery. A sequence of +encoded field sections must arrive (and be decoded) at an endpoint in the same +order in which they were encoded. This ensures that the dynamic state at the two +endpoints remains in sync.¶
+Because this total ordering is not provided by QUIC, HTTP/3 uses a modified +version of HPACK, called QPACK. QPACK uses a single unidirectional stream to +make all modifications to the dynamic table, ensuring a total order of updates. +All frames that contain encoded fields merely reference the table state at a +given time without modifying it.¶
+ +HTTP/2 specifies a stream flow control mechanism. Although all HTTP/2 frames are +delivered on streams, only the DATA frame payload is subject to flow control. +QUIC provides flow control for stream data and all HTTP/3 frame types defined in +this document are sent on streams. Therefore, all frame headers and payload are +subject to flow control.¶
+Frame type definitions in HTTP/3 often use the QUIC variable-length integer +encoding. In particular, Stream IDs use this encoding, which allows for a +larger range of possible values than the encoding used in HTTP/2. Some frames +in HTTP/3 use an identifier rather than a Stream ID (e.g., Push +IDs). Redefinition of the encoding of extension frame types might be necessary +if the encoding includes a Stream ID.¶
+Because the Flags field is not present in generic HTTP/3 frames, those frames +that depend on the presence of flags need to allocate space for flags as part +of their frame payload.¶
+Other than these issues, frame type HTTP/2 extensions are typically portable to +QUIC simply by replacing Stream 0 in HTTP/2 with a control stream in HTTP/3. +HTTP/3 extensions will not assume ordering, but would not be harmed by ordering, +and would be portable to HTTP/2 in the same manner.¶
+Padding is not defined in HTTP/3 frames. See Section 7.2.1.¶
+The PRIORITY region of HEADERS is not defined in HTTP/3 frames. Padding is not +defined in HTTP/3 frames. See Section 7.2.2.¶
+As described in Appendix A.2.1, HTTP/3 does not provide a means of +signaling priority.¶
+RST_STREAM frames do not exist in HTTP/3, since QUIC provides stream lifecycle +management. The same code point is used for the CANCEL_PUSH frame +(Section 7.2.3).¶
+SETTINGS frames are sent only at the beginning of the connection. See +Section 7.2.4 and Appendix A.3.¶
+The PUSH_PROMISE frame does not reference a stream; instead the push stream +references the PUSH_PROMISE frame using a Push ID. See +Section 7.2.5.¶
+PING frames do not exist in HTTP/3, as QUIC provides equivalent +functionality.¶
+GOAWAY does not contain an error code. In the client to server direction, +it carries a Push ID instead of a server initiated stream ID. +See Section 7.2.6.¶
+WINDOW_UPDATE frames do not exist in HTTP/3, since QUIC provides flow control.¶
+CONTINUATION frames do not exist in HTTP/3; instead, larger +HEADERS/PUSH_PROMISE frames than HTTP/2 are permitted.¶
+Frame types defined by extensions to HTTP/2 need to be separately registered for +HTTP/3 if still applicable. The IDs of frames defined in [HTTP2] have been +reserved for simplicity. Note that the frame type space in HTTP/3 is +substantially larger (62 bits versus 8 bits), so many HTTP/3 frame types have no +equivalent HTTP/2 code points. See Section 11.2.1.¶
+An important difference from HTTP/2 is that settings are sent once, as the first +frame of the control stream, and thereafter cannot change. This eliminates many +corner cases around synchronization of changes.¶
+Some transport-level options that HTTP/2 specifies via the SETTINGS frame are +superseded by QUIC transport parameters in HTTP/3. The HTTP-level options that +are retained in HTTP/3 have the same value as in HTTP/2. The superseded +settings are reserved, and their receipt is an error. See +Section 7.2.4.1 for discussion of both the retained and reserved values.¶
+Below is a listing of how each HTTP/2 SETTINGS parameter is mapped:¶
+This is removed in favor of the MAX_PUSH_ID frame, which provides a more +granular control over server push. Specifying a setting with the identifier +0x2 (corresponding to the SETTINGS_ENABLE_PUSH parameter) in the HTTP/3 +SETTINGS frame is an error.¶
+QUIC controls the largest open Stream ID as part of its flow control logic. +Specifying a setting with the identifier 0x3 (corresponding to the +SETTINGS_MAX_CONCURRENT_STREAMS parameter) in the HTTP/3 SETTINGS frame is an +error.¶
+QUIC requires both stream and connection flow control window sizes to be +specified in the initial transport handshake. Specifying a setting with the +identifier 0x4 (corresponding to the SETTINGS_INITIAL_WINDOW_SIZE parameter) +in the HTTP/3 SETTINGS frame is an error.¶
+This setting has no equivalent in HTTP/3. Specifying a setting with the +identifier 0x5 (corresponding to the SETTINGS_MAX_FRAME_SIZE parameter) in the +HTTP/3 SETTINGS frame is an error.¶
+This setting identifier has been renamed SETTINGS_MAX_FIELD_SECTION_SIZE.¶
+In HTTP/3, setting values are variable-length integers (6, 14, 30, or 62 bits +long) rather than fixed-length 32-bit fields as in HTTP/2. This will often +produce a shorter encoding, but can produce a longer encoding for settings that +use the full 32-bit space. Settings ported from HTTP/2 might choose to redefine +their value to limit it to 30 bits for more efficient encoding, or to make use +of the 62-bit space if more than 30 bits are required.¶
+Settings need to be defined separately for HTTP/2 and HTTP/3. The IDs of +settings defined in [HTTP2] have been reserved for simplicity. Note that +the settings identifier space in HTTP/3 is substantially larger (62 bits versus +16 bits), so many HTTP/3 settings have no equivalent HTTP/2 code point. See +Section 11.2.2.¶
+As QUIC streams might arrive out of order, endpoints are advised not to wait for +the peers' settings to arrive before responding to other streams. See +Section 7.2.4.2.¶
+QUIC has the same concepts of "stream" and "connection" errors that HTTP/2 +provides. However, the differences between HTTP/2 and HTTP/3 mean that error +codes are not directly portable between versions.¶
+The HTTP/2 error codes defined in Section 7 of [HTTP2] logically map to +the HTTP/3 error codes as follows:¶
+H3_NO_ERROR in Section 8.1.¶
+This is mapped to H3_GENERAL_PROTOCOL_ERROR except in cases where more +specific error codes have been defined. Such cases include +H3_FRAME_UNEXPECTED, H3_MESSAGE_ERROR, and H3_CLOSED_CRITICAL_STREAM defined +in Section 8.1.¶
+H3_INTERNAL_ERROR in Section 8.1.¶
+Not applicable, since QUIC handles flow control.¶
+Not applicable, since no acknowledgment of SETTINGS is defined.¶
+Not applicable, since QUIC handles stream management.¶
+H3_FRAME_ERROR error code defined in Section 8.1.¶
+H3_REQUEST_REJECTED (in Section 8.1) is used to indicate that a +request was not processed. Otherwise, not applicable because QUIC handles +stream management.¶
+H3_REQUEST_CANCELLED in Section 8.1.¶
+H3_CONNECT_ERROR in Section 8.1.¶
+H3_EXCESSIVE_LOAD in Section 8.1.¶
+Not applicable, since QUIC is assumed to provide sufficient security on all +connections.¶
+H3_VERSION_FALLBACK in Section 8.1.¶
+Error codes need to be defined for HTTP/2 and HTTP/3 separately. See +Section 11.2.3.¶
+An intermediary that converts between HTTP/2 and HTTP/3 may encounter error +conditions from either upstream. It is useful to communicate the occurrence of +error to the downstream but error codes largely reflect connection-local +problems that generally do not make sense to propagate.¶
+An intermediary that encounters an error from an upstream origin can indicate +this by sending an HTTP status code such as 502, which is suitable for a broad +class of errors.¶
+There are some rare cases where it is beneficial to propagate the error by +mapping it to the closest matching error type to the receiver. For example, an +intermediary that receives an HTTP/2 stream error of type REFUSED_STREAM from +the origin has a clear signal that the request was not processed and that the +request is safe to retry. Propagating this error condition to the client as an +HTTP/3 stream error of type H3_REQUEST_REJECTED allows the client to take the +action it deems most appropriate. In the reverse direction, the intermediary +might deem it beneficial to pass on client request cancellations that are +indicated by terminating a stream with H3_REQUEST_CANCELLED; see +Section 4.1.2.¶
+Conversion between errors is described in the logical mapping. The error codes +are defined in non-overlapping spaces in order to protect against accidental +conversion that could result in the use of inappropriate or unknown error codes +for the target version. An intermediary is permitted to promote stream errors to +connection errors but they should be aware of the cost to the HTTP/3 connection +for what might be a temporary or intermittent error.¶
+Editorial changes only.¶
+Editorial changes only.¶
+Further changes to error codes (#2662,#2551):¶
+ +http-opportunistic
resource (RFC 8164) when scheme is
+http
(#2439,#2973)¶
+Changes to SETTINGS frames in 0-RTT (#2972,#2790,#2945):¶
+No changes¶
+Extensive changes to error codes and conditions of their sending¶
+Use variable-length integers throughout (#2437,#2233,#2253,#2275)¶
+ +Changes to PRIORITY frame (#1865, #2075)¶
+ +Substantial editorial reorganization; no technical changes.¶
+None.¶
+SETTINGS changes (#181):¶
+ +The original authors of this specification were Robbie Shade and Mike Warres.¶
+The IETF QUIC Working Group received an enormous amount of support from many +people. Among others, the following people provided substantial contributions to +this document:¶
+奥 一穂 (Kazuho Oku)¶
+A portion of Mike's contribution was supported by Microsoft during his +employment there.¶
+Internet-Draft | +QUIC Invariants | +January 2021 | +
Thomson | +Expires 19 July 2021 | +[Page] | +
This document defines the properties of the QUIC transport protocol that are +common to all versions of the protocol.¶
+Discussion of this draft takes place on the QUIC working group mailing list +(quic@ietf.org), which is archived at +https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
+Working Group information can be found at https://github.com/quicwg; source +code and issues list for this draft can be found at +https://github.com/quicwg/base-drafts/labels/-invariants.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 19 July 2021.¶
++ Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Simplified BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Simplified BSD License.¶
+QUIC is a connection-oriented protocol between two endpoints. Those endpoints +exchange UDP datagrams. These UDP datagrams contain QUIC packets. QUIC +endpoints use QUIC packets to establish a QUIC connection, which is shared +protocol state between those endpoints.¶
+In addition to providing secure, multiplexed transport, QUIC [QUIC-TRANSPORT] +allows for the option to negotiate a version. This allows the protocol to +change over time in response to new requirements. Many characteristics of the +protocol could change between versions.¶
+This document describes the subset of QUIC that is intended to remain stable as +new versions are developed and deployed. All of these invariants are +IP-version-independent.¶
+The primary goal of this document is to ensure that it is possible to deploy new +versions of QUIC. By documenting the properties that cannot change, this +document aims to preserve the ability for QUIC endpoints to negotiate changes to +any other aspect of the protocol. As a consequence, this also guarantees a +minimal amount of information that is made available to entities other than +endpoints. Unless specifically prohibited in this document, any aspect of the +protocol can change between different versions.¶
+Appendix A contains a non-exhaustive list of some incorrect assumptions +that might be made based on knowledge of QUIC version 1; these do not apply to +every version of QUIC.¶
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", +"MAY", and "OPTIONAL" in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, and only when, they +appear in all capitals, as shown here.¶
+This document defines requirements on future QUIC versions, even where normative +language is not used.¶
+This document uses terms and notational conventions from [QUIC-TRANSPORT].¶
+The format of packets is described using the notation defined in this section. +This notation is the same as that used in [QUIC-TRANSPORT].¶
+Complex fields are named and then followed by a list of fields surrounded by a +pair of matching braces. Each field in this list is separated by commas.¶
+Individual fields include length information, plus indications about fixed +value, optionality, or repetitions. Individual fields use the following +notational conventions, with all lengths in bits:¶
+Indicates that x is A bits long¶
+Indicates that x can be any length from A to B; A can be omitted to indicate +a minimum of zero bits and B can be omitted to indicate no set upper limit; +values in this format always end on an byte boundary¶
+Indicates that x, with a length described by L, has a fixed value of C¶
+Indicates that x is repeated zero or more times (and that each instance is +length L)¶
+This document uses network byte order (that is, big endian) values. Fields +are placed starting from the high-order bits of each byte.¶
+Figure 1 shows an example structure:¶
+QUIC endpoints exchange UDP datagrams that contain one or more QUIC packets. +This section describes the invariant characteristics of a QUIC packet. A +version of QUIC could permit multiple QUIC packets in a single UDP datagram, but +the invariant properties only describe the first packet in a datagram.¶
+QUIC defines two types of packet header: long and short. Packets with long +headers are identified by the most significant bit of the first byte being set; +packets with a short header have that bit cleared.¶
+QUIC packets might be integrity protected, including the header. However, QUIC +Version Negotiation packets are not integrity protected; see Section 6.¶
+Aside from the values described here, the payload of QUIC packets is +version-specific and of arbitrary length.¶
+Long headers take the form described in Figure 2.¶
+A QUIC packet with a long header has the high bit of the first byte set to 1. +All other bits in that byte are version specific.¶
+The next four bytes include a 32-bit Version field. Versions are described in +Section 5.4.¶
+The next byte contains the length in bytes of the Destination Connection ID +field that follows it. This length is encoded as an 8-bit unsigned integer. +The Destination Connection ID field follows the Destination Connection ID Length +field and is between 0 and 255 bytes in length. Connection IDs are described in +Section 5.3.¶
+The next byte contains the length in bytes of the Source Connection ID field +that follows it. This length is encoded as an 8-bit unsigned integer. The +Source Connection ID field follows the Source Connection ID Length field and is +between 0 and 255 bytes in length.¶
+The remainder of the packet contains version-specific content.¶
+Short headers take the form described in Figure 3.¶
+A QUIC packet with a short header has the high bit of the first byte set to 0.¶
+A QUIC packet with a short header includes a Destination Connection ID +immediately following the first byte. The short header does not include the +Connection ID Lengths, Source Connection ID, or Version fields. The length of +the Destination Connection ID is not encoded in packets with a short header +and is not constrained by this specification.¶
+The remainder of the packet has version-specific semantics.¶
+A connection ID is an opaque field of arbitrary length.¶
+The primary function of a connection ID is to ensure that changes in addressing +at lower protocol layers (UDP, IP, and below) do not cause packets for a QUIC +connection to be delivered to the wrong QUIC endpoint. The connection ID +is used by endpoints and the intermediaries that support them to ensure that +each QUIC packet can be delivered to the correct instance of an endpoint. At +the endpoint, the connection ID is used to identify the QUIC connection for +which the packet is intended.¶
+The connection ID is chosen by each endpoint using version-specific methods. +Packets for the same QUIC connection might use different connection ID values.¶
+The Version field contains a 4-byte identifier. This value can be used by +endpoints to identify a QUIC Version. A Version field with a value of +0x00000000 is reserved for version negotiation; see Section 6. All other values +are potentially valid.¶
+The properties described in this document apply to all versions of QUIC. A +protocol that does not conform to the properties described in this document is +not QUIC. Future documents might describe additional properties that apply to +a specific QUIC version, or to a range of QUIC versions.¶
+A QUIC endpoint that receives a packet with a long header and a version it +either does not understand or does not support might send a Version Negotiation +packet in response. Packets with a short header do not trigger version +negotiation.¶
+A Version Negotiation packet sets the high bit of the first byte, and thus it +conforms with the format of a packet with a long header as defined in +Section 5.1. A Version Negotiation packet is identifiable as such by the +Version field, which is set to 0x00000000.¶
+Only the most significant bit of the first byte of a Version Negotiation packet +has any defined value. The remaining 7 bits, labeled Unused, can be set to any +value when sending and MUST be ignored on receipt.¶
+After the Source Connection ID field, the Version Negotiation packet contains a +list of Supported Version fields, each identifying a version that the endpoint +sending the packet supports. A Version Negotiation packet contains no other +fields. An endpoint MUST ignore a packet that contains no Supported Version +fields, or a truncated Supported Version.¶
+Version Negotiation packets do not use integrity or confidentiality protection. +Specific QUIC versions might include protocol elements that allow endpoints to +detect modification or corruption in the set of supported versions.¶
+An endpoint MUST include the value from the Source Connection ID field of the +packet it receives in the Destination Connection ID field. The value for Source +Connection ID MUST be copied from the Destination Connection ID of the received +packet, which is initially randomly selected by a client. Echoing both +connection IDs gives clients some assurance that the server received the packet +and that the Version Negotiation packet was not generated by an attacker that is +unable to observe packets.¶
+An endpoint that receives a Version Negotiation packet might change the version +that it decides to use for subsequent packets. The conditions under which an +endpoint changes QUIC version will depend on the version of QUIC that it +chooses.¶
+See [QUIC-TRANSPORT] for a more thorough description of how an endpoint that +supports QUIC version 1 generates and consumes a Version Negotiation packet.¶
+It is possible that middleboxes could observe traits of a specific version of +QUIC and assume that when other versions of QUIC exhibit similar traits the same +underlying semantic is being expressed. There are potentially many such traits; +see Appendix A. Some effort has been made to either eliminate or +obscure some observable traits in QUIC version 1, but many of these remain. +Other QUIC versions might make different design decisions and so exhibit +different traits.¶
+The QUIC version number does not appear in all QUIC packets, which means that +reliably extracting information from a flow based on version-specific traits +requires that middleboxes retain state for every connection ID they see.¶
+The Version Negotiation packet described in this document is not +integrity-protected; it only has modest protection against insertion by +attackers. An endpoint MUST authenticate the semantic content of a Version +Negotiation packet if it attempts a different QUIC version as a result.¶
+This document makes no request of IANA.¶
+There are several traits of QUIC version 1 [QUIC-TRANSPORT] that are not +protected from observation, but are nonetheless considered to be changeable when +a new version is deployed.¶
+This section lists a sampling of incorrect assumptions that might be made about +QUIC based on knowledge of QUIC version 1. Some of these statements are not +even true for QUIC version 1. This is not an exhaustive list; it is intended to +be illustrative only.¶
+Any and all of the following statements can be false for a given QUIC +version:¶
+Internet-Draft | +QPACK | +January 2021 | +
Krasic, et al. | +Expires 19 July 2021 | +[Page] | +
This specification defines QPACK, a compression format for efficiently +representing HTTP fields, to be used in HTTP/3. This is a variation of HPACK +compression that seeks to reduce head-of-line blocking.¶
+Discussion of this draft takes place on the QUIC working group mailing list +(quic@ietf.org), which is archived at +https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
+Working Group information can be found at https://github.com/quicwg; source +code and issues list for this draft can be found at +https://github.com/quicwg/base-drafts/labels/-qpack.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 19 July 2021.¶
++ Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Simplified BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Simplified BSD License.¶
+The QUIC transport protocol ([QUIC-TRANSPORT]) is designed to support HTTP +semantics, and its design subsumes many of the features of HTTP/2 +([RFC7540]). HTTP/2 uses HPACK ([RFC7541]) for compression of the header +and trailer sections. If HPACK were used for HTTP/3 ([HTTP3]), it would +induce head-of-line blocking for field sections due to built-in assumptions of a +total ordering across frames on all streams.¶
+QPACK reuses core concepts from HPACK, but is redesigned to allow correctness in +the presence of out-of-order delivery, with flexibility for implementations to +balance between resilience against head-of-line blocking and optimal compression +ratio. The design goals are to closely approach the compression ratio of HPACK +with substantially less head-of-line blocking under the same loss conditions.¶
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", +"MAY", and "OPTIONAL" in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, and only when, they +appear in all capitals, as shown here.¶
+Definitions of terms that are used in this document:¶
+Metadata sent as part of an HTTP message. The term encompasses both header +and trailer fields. Colloquially, the term "headers" has often been used to +refer to HTTP header fields and trailer fields; this document uses "fields" +for generality.¶
+A name-value pair sent as part of an HTTP field section. See Sections 6.3 +and Section 6.5 of [SEMANTICS].¶
+Data associated with a field name, composed from all field line values with +that field name in that section, concatenated together and separated with +commas.¶
+An ordered collection of HTTP field lines associated with an HTTP message. A +field section can contain multiple field lines with the same name. It can +also contain duplicate field lines. An HTTP message can include both header +field and trailer field sections.¶
+An instruction that represents a field line, possibly by reference to the +dynamic and static tables.¶
+An implementation that encodes field sections.¶
+An implementation that decodes encoded field sections.¶
+A unique index for each entry in the dynamic table.¶
+A reference point for relative and post-base indices. Representations that +reference dynamic table entries are relative to a Base.¶
+The total number of entries inserted in the dynamic table.¶
+QPACK is a name, not an acronym.¶
+Diagrams use the format described in Section 3.1 of [RFC2360], with the +following additional conventions:¶
+Indicates that x is A bits long¶
+Indicates that x uses the prefixed integer encoding defined in +Section 4.1.1, beginning with an A-bit prefix.¶
+Indicates that x is variable-length and extends to the end of the region.¶
+Like HPACK, QPACK uses two tables for associating field lines ("headers") to +indices. The static table (Section 3.1) is predefined and contains +common header field lines (some of them with an empty value). The dynamic table +(Section 3.2) is built up over the course of the connection and can +be used by the encoder to index both header and trailer field lines in the +encoded field sections.¶
+QPACK defines unidirectional streams for sending instructions from encoder to +decoder and vice versa.¶
+An encoder converts a header or trailer field section into a series of +representations by emitting either an indexed or a literal representation for +each field line in the list; see Section 4.5. Indexed +representations achieve high compression by replacing the literal name and +possibly the value with an index to either the static or dynamic table. +References to the static table and literal representations do not require any +dynamic state and never risk head-of-line blocking. References to the dynamic +table risk head-of-line blocking if the encoder has not received an +acknowledgment indicating the entry is available at the decoder.¶
+An encoder MAY insert any entry in the dynamic table it chooses; it is not +limited to field lines it is compressing.¶
+QPACK preserves the ordering of field lines within each field section. An +encoder MUST emit field representations in the order they appear in the input +field section.¶
+QPACK is designed to contain the more complex state tracking to the encoder, +while the decoder is relatively simple.¶
+Inserting entries into the dynamic table might not be possible if the table +contains entries that cannot be evicted.¶
+A dynamic table entry cannot be evicted immediately after insertion, even if it +has never been referenced. Once the insertion of a dynamic table entry has been +acknowledged and there are no outstanding references to the entry in +unacknowledged representations, the entry becomes evictable. Note that +references on the encoder stream never preclude the eviction of an entry, +because those references are guaranteed to be processed before the instruction +evicting the entry.¶
+If the dynamic table does not contain enough room for a new entry without +evicting other entries, and the entries that would be evicted are not +evictable, the encoder MUST NOT insert that entry into the dynamic table +(including duplicates of existing entries). In order to avoid this, an encoder +that uses the dynamic table has to keep track of each dynamic table entry +referenced by each field section until those representations are acknowledged by +the decoder; see Section 4.4.1.¶
+To ensure that the encoder is not prevented from adding new entries, the encoder +can avoid referencing entries that are close to eviction. Rather than +reference such an entry, the encoder can emit a Duplicate instruction +(Section 4.3.4), and reference the duplicate instead.¶
+Determining which entries are too close to eviction to reference is an encoder +preference. One heuristic is to target a fixed amount of available space in the +dynamic table: either unused space or space that can be reclaimed by evicting +non-blocking entries. To achieve this, the encoder can maintain a draining +index, which is the smallest absolute index (Section 3.2.4) in the dynamic table +that it will emit a reference for. As new entries are inserted, the encoder +increases the draining index to maintain the section of the table that it will +not reference. If the encoder does not create new references to entries with an +absolute index lower than the draining index, the number of unacknowledged +references to those entries will eventually become zero, allowing them to be +evicted.¶
+Because QUIC does not guarantee order between data on different streams, a +decoder might encounter a representation that references a dynamic table entry +that it has not yet received.¶
+Each encoded field section contains a Required Insert Count (Section 4.5.1), +the lowest possible value for the Insert Count with which the field section can +be decoded. For a field section encoded using references to the dynamic table, +the Required Insert Count is one larger than the largest absolute index of all +referenced dynamic table entries. For a field section encoded with no references +to the dynamic table, the Required Insert Count is zero.¶
+When the decoder receives an encoded field section with a Required Insert Count +greater than its own Insert Count, the stream cannot be processed immediately, +and is considered "blocked"; see Section 2.2.1.¶
+The decoder specifies an upper bound on the number of streams that can be +blocked using the SETTINGS_QPACK_BLOCKED_STREAMS setting; see Section 5. +An encoder MUST limit the number of streams that could become blocked to the +value of SETTINGS_QPACK_BLOCKED_STREAMS at all times. If a decoder encounters +more blocked streams than it promised to support, it MUST treat this as a +connection error of type QPACK_DECOMPRESSION_FAILED.¶
+Note that the decoder might not become blocked on every stream that risks +becoming blocked.¶
+An encoder can decide whether to risk having a stream become blocked. If +permitted by the value of SETTINGS_QPACK_BLOCKED_STREAMS, compression efficiency +can often be improved by referencing dynamic table entries that are still in +transit, but if there is loss or reordering the stream can become blocked at the +decoder. An encoder can avoid the risk of blocking by only referencing dynamic +table entries that have been acknowledged, but this could mean using literals. +Since literals make the encoded field section larger, this can result in the +encoder becoming blocked on congestion or flow control limits.¶
+Writing instructions on streams that are limited by flow control can produce +deadlocks.¶
+A decoder might stop issuing flow control credit on the stream that carries an +encoded field section until the necessary updates are received on the encoder +stream. If the granting of flow control credit on the encoder stream (or the +connection as a whole) depends on the consumption and release of data on the +stream carrying the encoded field section, a deadlock might result.¶
+More generally, a stream containing a large instruction can become deadlocked if +the decoder withholds flow control credit until the instruction is completely +received.¶
+To avoid these deadlocks, an encoder SHOULD avoid writing an instruction unless +sufficient stream and connection flow control credit is available for the entire +instruction.¶
+The Known Received Count is the total number of dynamic table insertions and +duplications acknowledged by the decoder. The encoder tracks the Known Received +Count in order to identify which dynamic table entries can be referenced without +potentially blocking a stream. The decoder tracks the Known Received Count in +order to be able to send Insert Count Increment instructions.¶
+A Section Acknowledgment instruction (Section 4.4.1) implies that +the decoder has received all dynamic table state necessary to decode the field +section. If the Required Insert Count of the acknowledged field section is +greater than the current Known Received Count, Known Received Count is updated +to the value of the Required Insert Count.¶
+An Insert Count Increment instruction (Section 4.4.3) increases the +Known Received Count by its Increment parameter. See Section 2.2.2.3 for +guidance.¶
+As in HPACK, the decoder processes a series of representations and emits the +corresponding field sections. It also processes instructions received on the +encoder stream that modify the dynamic table. Note that encoded field sections +and encoder stream instructions arrive on separate streams. This is unlike +HPACK, where encoded field sections (header blocks) can contain instructions +that modify the dynamic table, and there is no dedicated stream of HPACK +instructions.¶
+The decoder MUST emit field lines in the order their representations appear in +the encoded field section.¶
+Upon receipt of an encoded field section, the decoder examines the Required +Insert Count. When the Required Insert Count is less than or equal to the +decoder's Insert Count, the field section can be processed immediately. +Otherwise, the stream on which the field section was received becomes blocked.¶
+While blocked, encoded field section data SHOULD remain in the blocked stream's +flow control window. A stream becomes unblocked when the Insert Count becomes +greater than or equal to the Required Insert Count for all encoded field +sections the decoder has started reading from the stream.¶
+When processing encoded field sections, the decoder expects the Required Insert +Count to equal the lowest possible value for the Insert Count with which the +field section can be decoded, as prescribed in Section 2.1.2. If it +encounters a Required Insert Count smaller than expected, it MUST treat this as +a connection error of type QPACK_DECOMPRESSION_FAILED; see +Section 2.2.3. If it encounters a Required Insert Count larger than +expected, it MAY treat this as a connection error of type +QPACK_DECOMPRESSION_FAILED.¶
+The decoder signals the following events by emitting decoder instructions +(Section 4.4) on the decoder stream.¶
+After the decoder finishes decoding a field section encoded using +representations containing dynamic table references, it MUST emit a Section +Acknowledgment instruction (Section 4.4.1). A stream may carry +multiple field sections in the case of intermediate responses, trailers, and +pushed requests. The encoder interprets each Section Acknowledgment +instruction as acknowledging the earliest unacknowledged field section +containing dynamic table references sent on the given stream.¶
+When an endpoint receives a stream reset before the end of a stream or before +all encoded field sections are processed on that stream, or when it abandons +reading of a stream, it generates a Stream Cancellation instruction; see +Section 4.4.2. This signals to the encoder that all references to the +dynamic table on that stream are no longer outstanding. A decoder with a +maximum dynamic table capacity (Section 3.2.3) equal to +zero MAY omit sending Stream Cancellations, because the encoder cannot have any +dynamic table references. An encoder cannot infer from this instruction that +any updates to the dynamic table have been received.¶
+The Section Acknowledgment and Stream Cancellation instructions permit the +encoder to remove references to entries in the dynamic table. When an entry +with absolute index lower than the Known Received Count has zero references, +then it is considered evictable; see Section 2.1.1.¶
+After receiving new table entries on the encoder stream, the decoder chooses +when to emit Insert Count Increment instructions; see +Section 4.4.3. Emitting this instruction after adding each new +dynamic table entry will provide the timeliest feedback to the encoder, but +could be redundant with other decoder feedback. By delaying an Insert Count +Increment instruction, the decoder might be able to coalesce multiple Insert +Count Increment instructions, or replace them entirely with Section +Acknowledgments; see Section 4.4.1. However, delaying too long +may lead to compression inefficiencies if the encoder waits for an entry to be +acknowledged before using it.¶
+If the decoder encounters a reference in a field line representation to a +dynamic table entry that has already been evicted or that has an absolute +index greater than or equal to the declared Required Insert Count +(Section 4.5.1), it MUST treat this as a connection error of type +QPACK_DECOMPRESSION_FAILED.¶
+If the decoder encounters a reference in an encoder instruction to a dynamic +table entry that has already been evicted, it MUST treat this as a connection +error of type QPACK_ENCODER_STREAM_ERROR.¶
+Unlike in HPACK, entries in the QPACK static and dynamic tables are addressed +separately. The following sections describe how entries in each table are +addressed.¶
+The static table consists of a predefined list of field lines, each of which has +a fixed index over time. Its entries are defined in Appendix A.¶
+All entries in the static table have a name and a value. However, values can be +empty (that is, have a length of 0). Each entry is identified by a unique +index.¶
+Note that the QPACK static table is indexed from 0, whereas the HPACK static +table is indexed from 1.¶
+When the decoder encounters an invalid static table index in a field line +representation it MUST treat this as a connection error of type +QPACK_DECOMPRESSION_FAILED. If this index is received on the encoder stream, +this MUST be treated as a connection error of type QPACK_ENCODER_STREAM_ERROR.¶
+The dynamic table consists of a list of field lines maintained in first-in, +first-out order. Each HTTP/3 endpoint holds a dynamic table that is initially +empty. Entries are added by encoder instructions received on the encoder +stream; see Section 4.3.¶
+The dynamic table can contain duplicate entries (i.e., entries with the same +name and same value). Therefore, duplicate entries MUST NOT be treated as an +error by the decoder.¶
+Dynamic table entries can have empty values.¶
+The size of the dynamic table is the sum of the size of its entries.¶
+The size of an entry is the sum of its name's length in bytes, its value's +length in bytes, and 32. The size of an entry is calculated using the length of +its name and value without Huffman encoding applied.¶
+The encoder sets the capacity of the dynamic table, which serves as the upper +limit on its size. The initial capacity of the dynamic table is zero. The +encoder sends a Set Dynamic Table Capacity instruction +(Section 4.3.1) with a non-zero capacity to begin using the dynamic +table.¶
+Before a new entry is added to the dynamic table, entries are evicted from the +end of the dynamic table until the size of the dynamic table is less than or +equal to (table capacity - size of new entry). The encoder MUST NOT cause a +dynamic table entry to be evicted unless that entry is evictable; see +Section 2.1.1. The new entry is then added to the table. It is an +error if the encoder attempts to add an entry that is larger than the dynamic +table capacity; the decoder MUST treat this as a connection error of type +QPACK_ENCODER_STREAM_ERROR.¶
+A new entry can reference an entry in the dynamic table that will be evicted +when adding this new entry into the dynamic table. Implementations are +cautioned to avoid deleting the referenced name or value if the referenced entry +is evicted from the dynamic table prior to inserting the new entry.¶
+Whenever the dynamic table capacity is reduced by the encoder +(Section 4.3.1), entries are evicted from the end of the dynamic +table until the size of the dynamic table is less than or equal to the new table +capacity. This mechanism can be used to completely clear entries from the +dynamic table by setting a capacity of 0, which can subsequently be restored.¶
+To bound the memory requirements of the decoder, the decoder limits the maximum +value the encoder is permitted to set for the dynamic table capacity. In +HTTP/3, this limit is determined by the value of +SETTINGS_QPACK_MAX_TABLE_CAPACITY sent by the decoder; see Section 5. +The encoder MUST NOT set a dynamic table capacity that exceeds this maximum, but +it can choose to use a lower dynamic table capacity; see +Section 4.3.1.¶
+For clients using 0-RTT data in HTTP/3, the server's maximum table capacity is +the remembered value of the setting, or zero if the value was not previously +sent. When the client's 0-RTT value of the SETTING is zero, the server MAY set +it to a non-zero value in its SETTINGS frame. If the remembered value is +non-zero, the server MUST send the same non-zero value in its SETTINGS frame. If +it specifies any other value, or omits SETTINGS_QPACK_MAX_TABLE_CAPACITY from +SETTINGS, the encoder must treat this as a connection error of type +QPACK_DECODER_STREAM_ERROR.¶
+For HTTP/3 servers and HTTP/3 clients when 0-RTT is not attempted or is +rejected, the maximum table capacity is 0 until the encoder processes a SETTINGS +frame with a non-zero value of SETTINGS_QPACK_MAX_TABLE_CAPACITY.¶
+When the maximum table capacity is zero, the encoder MUST NOT insert entries +into the dynamic table, and MUST NOT send any encoder instructions on the +encoder stream.¶
+Each entry possesses an absolute index that is fixed for the lifetime of that +entry. The first entry inserted has an absolute index of 0; indices increase +by one with each insertion.¶
+Relative indices begin at zero and increase in the opposite direction from the +absolute index. Determining which entry has a relative index of 0 depends on +the context of the reference.¶
+In encoder instructions (Section 4.3), a relative index of 0 +refers to the most recently inserted value in the dynamic table. Note that this +means the entry referenced by a given relative index will change while +interpreting instructions on the encoder stream.¶
+ +Unlike in encoder instructions, relative indices in field line representations +are relative to the Base at the beginning of the encoded field section; see +Section 4.5.1. This ensures that references are stable even if encoded field +sections and dynamic table updates are processed out of order.¶
+In a field line representation, a relative index of 0 refers to the entry with +absolute index equal to Base - 1.¶
+ +Post-Base indices are used in field line representations for entries with +absolute indices greater than or equal to Base, starting at 0 for the entry with +absolute index equal to Base, and increasing in the same direction as the +absolute index.¶
+Post-Base indices allow an encoder to process a field section in a single pass +and include references to entries added while processing this (or other) field +sections.¶
+ +The prefixed integer from Section 5.1 of [RFC7541] is used heavily throughout +this document. The format from [RFC7541] is used unmodified. Note, however, +that QPACK uses some prefix sizes not actually used in HPACK.¶
+QPACK implementations MUST be able to decode integers up to and including 62 +bits long.¶
+The string literal defined by Section 5.2 of [RFC7541] is also used throughout. +This string format includes optional Huffman encoding.¶
+HPACK defines string literals to begin on a byte boundary. They begin with a +single bit flag, denoted as 'H' in this document (indicating whether the string +is Huffman-coded), followed by the Length encoded as a 7-bit prefix integer, and +finally Length bytes of data. When Huffman encoding is enabled, the Huffman +table from Appendix B of [RFC7541] is used without modification and Length +indicates the size of the string after encoding.¶
+This document expands the definition of string literals by permitting them to +begin other than on a byte boundary. An "N-bit prefix string literal" begins +mid-byte, with the first (8-N) bits allocated to a previous field. The string +uses one bit for the Huffman flag, followed by the Length encoded as an +(N-1)-bit prefix integer. The prefix size, N, can have a value between 2 and 8 +inclusive. The remainder of the string literal is unmodified.¶
+A string literal without a prefix length noted is an 8-bit prefix string literal +and follows the definitions in [RFC7541] without modification.¶
+QPACK defines two unidirectional stream types:¶
+HTTP/3 endpoints contain a QPACK encoder and decoder. Each endpoint MUST +initiate at most one encoder stream and at most one decoder stream. Receipt of a +second instance of either stream type MUST be treated as a connection error of +type H3_STREAM_CREATION_ERROR.¶
+These streams MUST NOT be closed. Closure of either unidirectional stream type +MUST be treated as a connection error of type H3_CLOSED_CRITICAL_STREAM.¶
+An endpoint MAY avoid creating an encoder stream if it will not be used (for +example if its encoder does not wish to use the dynamic table, or if the maximum +size of the dynamic table permitted by the peer is zero).¶
+An endpoint MAY avoid creating a decoder stream if its decoder sets the maximum +capacity of the dynamic table to zero.¶
+An endpoint MUST allow its peer to create an encoder stream and a decoder stream +even if the connection's settings prevent their use.¶
+An encoder sends encoder instructions on the encoder stream to set the capacity +of the dynamic table and add dynamic table entries. Instructions adding table +entries can use existing entries to avoid transmitting redundant information. +The name can be transmitted as a reference to an existing entry in the static or +the dynamic table or as a string literal. For entries that already exist in +the dynamic table, the full entry can also be used by reference, creating a +duplicate entry.¶
+An encoder informs the decoder of a change to the dynamic table capacity using +an instruction that starts with the '001' 3-bit pattern. This is followed +by the new dynamic table capacity represented as an integer with a 5-bit prefix; +see Section 4.1.1.¶
+The new capacity MUST be lower than or equal to the limit described in +Section 3.2.3. In HTTP/3, this limit is the value of the +SETTINGS_QPACK_MAX_TABLE_CAPACITY parameter (Section 5) received from +the decoder. The decoder MUST treat a new dynamic table capacity value that +exceeds this limit as a connection error of type QPACK_ENCODER_STREAM_ERROR.¶
+Reducing the dynamic table capacity can cause entries to be evicted; see +Section 3.2.2. This MUST NOT cause the eviction of entries that are not +evictable; see Section 2.1.1. Changing the capacity of the dynamic +table is not acknowledged as this instruction does not insert an entry.¶
+An encoder adds an entry to the dynamic table where the field name matches the +field name of an entry stored in the static or the dynamic table using an +instruction that starts with the '1' 1-bit pattern. The second ('T') bit +indicates whether the reference is to the static or dynamic table. The 6-bit +prefix integer (Section 4.1.1) that follows is used to locate the table +entry for the field name. When T=1, the number represents the static table +index; when T=0, the number is the relative index of the entry in the dynamic +table.¶
+The field name reference is followed by the field value represented as a string +literal; see Section 4.1.2.¶
+ +An encoder adds an entry to the dynamic table where both the field name and the +field value are represented as string literals using an instruction that starts +with the '01' 2-bit pattern.¶
+This is followed by the name represented as a 6-bit prefix string literal, and +the value represented as an 8-bit prefix string literal; see +Section 4.1.2.¶
+ +An encoder duplicates an existing entry in the dynamic table using an +instruction that starts with the '000' 3-bit pattern. This is followed by +the relative index of the existing entry represented as an integer with a 5-bit +prefix; see Section 4.1.1.¶
+The existing entry is re-inserted into the dynamic table without resending +either the name or the value. This is useful to avoid adding a reference to an +older entry, which might block inserting new entries.¶
+A decoder sends decoder instructions on the decoder stream to inform the encoder +about the processing of field sections and table updates to ensure consistency +of the dynamic table.¶
+After processing an encoded field section whose declared Required Insert Count +is not zero, the decoder emits a Section Acknowledgment instruction. The +instruction starts with the '1' 1-bit pattern, followed by the field +section's associated stream ID encoded as a 7-bit prefix integer; see +Section 4.1.1.¶
+This instruction is used as described in Section 2.1.4 and +in Section 2.2.2.¶
+If an encoder receives a Section Acknowledgment instruction referring to a +stream on which every encoded field section with a non-zero Required Insert +Count has already been acknowledged, this MUST be treated as a connection error +of type QPACK_DECODER_STREAM_ERROR.¶
+The Section Acknowledgment instruction might increase the Known Received Count; +see Section 2.1.4.¶
+When a stream is reset or reading is abandoned, the decoder emits a Stream +Cancellation instruction. The instruction starts with the '01' 2-bit +pattern, followed by the stream ID of the affected stream encoded as a +6-bit prefix integer.¶
+This instruction is used as described in Section 2.2.2.¶
+The Insert Count Increment instruction starts with the '00' 2-bit pattern, +followed by the Increment encoded as a 6-bit prefix integer. This instruction +increases the Known Received Count (Section 2.1.4) by the value of +the Increment parameter. The decoder should send an Increment value that +increases the Known Received Count to the total number of dynamic table +insertions and duplications processed so far.¶
+An encoder that receives an Increment field equal to zero, or one that increases +the Known Received Count beyond what the encoder has sent MUST treat this as a +connection error of type QPACK_DECODER_STREAM_ERROR.¶
+An encoded field section consists of a prefix and a possibly empty sequence of +representations defined in this section. Each representation corresponds to a +single field line. These representations reference the static table or the +dynamic table in a particular state, but do not modify that state.¶
+Encoded field sections are carried in frames on streams defined by the enclosing +protocol.¶
+Each encoded field section is prefixed with two integers. The Required Insert +Count is encoded as an integer with an 8-bit prefix using the encoding described +in Section 4.5.1.1. The Base is encoded as a sign bit ('S') and a Delta Base value +with a 7-bit prefix; see Section 4.5.1.2.¶
+Required Insert Count identifies the state of the dynamic table needed to +process the encoded field section. Blocking decoders use the Required Insert +Count to determine when it is safe to process the rest of the field section.¶
+The encoder transforms the Required Insert Count as follows before encoding:¶
++ if ReqInsertCount == 0: + EncInsertCount = 0 + else: + EncInsertCount = (ReqInsertCount mod (2 * MaxEntries)) + 1 +¶ +
Here MaxEntries
is the maximum number of entries that the dynamic table can
+have. The smallest entry has empty name and value strings and has the size of
+32. Hence MaxEntries
is calculated as¶
+ MaxEntries = floor( MaxTableCapacity / 32 ) +¶ +
MaxTableCapacity
is the maximum capacity of the dynamic table as specified by
+the decoder; see Section 3.2.3.¶
This encoding limits the length of the prefix on long-lived connections.¶
+The decoder can reconstruct the Required Insert Count using an algorithm such as +the following. If the decoder encounters a value of EncodedInsertCount that +could not have been produced by a conformant encoder, it MUST treat this as a +connection error of type QPACK_DECOMPRESSION_FAILED.¶
+TotalNumberOfInserts is the total number of inserts into the decoder's dynamic +table.¶
++ FullRange = 2 * MaxEntries + if EncodedInsertCount == 0: + ReqInsertCount = 0 + else: + if EncodedInsertCount > FullRange: + Error + MaxValue = TotalNumberOfInserts + MaxEntries + + # MaxWrapped is the largest possible value of + # ReqInsertCount that is 0 mod 2*MaxEntries + MaxWrapped = floor(MaxValue / FullRange) * FullRange + ReqInsertCount = MaxWrapped + EncodedInsertCount - 1 + + # If ReqInsertCount exceeds MaxValue, the Encoder's value + # must have wrapped one fewer time + if ReqInsertCount > MaxValue: + if ReqInsertCount <= FullRange: + Error + ReqInsertCount -= FullRange + + # Value of 0 must be encoded as 0. + if ReqInsertCount == 0: + Error +¶ +
For example, if the dynamic table is 100 bytes, then the Required Insert Count +will be encoded modulo 6. If a decoder has received 10 inserts, then an encoded +value of 4 indicates that the Required Insert Count is 9 for the field section.¶
+The Base is used to resolve references in the dynamic table as described in +Section 3.2.5.¶
+To save space, the Base is encoded relative to the Required Insert Count using a +one-bit sign ('S') and the Delta Base value. A sign bit of 0 indicates that the +Base is greater than or equal to the value of the Required Insert Count; the +decoder adds the value of Delta Base to the Required Insert Count to determine +the value of the Base. A sign bit of 1 indicates that the Base is less than the +Required Insert Count; the decoder subtracts the value of Delta Base from the +Required Insert Count and also subtracts one to determine the value of the Base. +That is:¶
++ if S == 0: + Base = ReqInsertCount + DeltaBase + else: + Base = ReqInsertCount - DeltaBase - 1 +¶ +
A single-pass encoder determines the Base before encoding a field section. If +the encoder inserted entries in the dynamic table while encoding the field +section and is referencing them, Required Insert Count will be greater than the +Base, so the encoded difference is negative and the sign bit is set to 1. If +the field section was not encoded using representations that reference the most +recent entry in the table and did not insert any new entries, the Base will be +greater than the Required Insert Count, so the delta will be positive and the +sign bit is set to 0.¶
+An encoder that produces table updates before encoding a field section might set +Base to the value of Required Insert Count. In such case, both the sign bit and +the Delta Base will be set to zero.¶
+A field section that was encoded without references to the dynamic table can use +any value for the Base; setting Delta Base to zero is one of the most efficient +encodings.¶
+For example, with a Required Insert Count of 9, a decoder receives an S bit of 1 +and a Delta Base of 2. This sets the Base to 6 and enables post-base indexing +for three entries. In this example, a relative index of 1 refers to the 5th +entry that was added to the table; a post-base index of 1 refers to the 8th +entry.¶
+An indexed field line representation identifies an entry in the static table, +or an entry in the dynamic table with an absolute index less than the value of +the Base.¶
+ +This representation starts with the '1' 1-bit pattern, followed by the 'T' bit +indicating whether the reference is into the static or dynamic table. The 6-bit +prefix integer (Section 4.1.1) that follows is used to locate the +table entry for the field line. When T=1, the number represents the static +table index; when T=0, the number is the relative index of the entry in the +dynamic table.¶
+An indexed field line with post-base index representation identifies an entry +in the dynamic table with an absolute index greater than or equal to the value +of the Base.¶
+ +This representation starts with the '0001' 4-bit pattern. This is followed +by the post-base index (Section 3.2.6) of the matching field line, represented +as an integer with a 4-bit prefix; see Section 4.1.1.¶
+A literal field line with name reference representation encodes a field line +where the field name matches the field name of an entry in the static table, or +the field name of an entry in the dynamic table with an absolute index less than +the value of the Base.¶
+ +This representation starts with the '01' 2-bit pattern. The following bit, +'N', indicates whether an intermediary is permitted to add this field line to +the dynamic table on subsequent hops. When the 'N' bit is set, the encoded field +line MUST always be encoded with a literal representation. In particular, when a +peer sends a field line that it received represented as a literal field line +with the 'N' bit set, it MUST use a literal representation to forward this field +line. This bit is intended for protecting field values that are not to be put +at risk by compressing them; see Section 7 for more details.¶
+The fourth ('T') bit indicates whether the reference is to the static or dynamic +table. The 4-bit prefix integer (Section 4.1.1) that follows is used to +locate the table entry for the field name. When T=1, the number represents the +static table index; when T=0, the number is the relative index of the entry in +the dynamic table.¶
+Only the field name is taken from the dynamic table entry; the field value is +encoded as an 8-bit prefix string literal; see Section 4.1.2.¶
+A literal field line with post-base name reference representation encodes a +field line where the field name matches the field name of a dynamic table entry +with an absolute index greater than or equal to the value of the Base.¶
+ +This representation starts with the '0000' 4-bit pattern. The fifth bit is +the 'N' bit as described in Section 4.5.4. This is followed by a +post-base index of the dynamic table entry (Section 3.2.6) encoded as an +integer with a 3-bit prefix; see Section 4.1.1.¶
+Only the field name is taken from the dynamic table entry; the field value is +encoded as an 8-bit prefix string literal; see Section 4.1.2.¶
+The literal field line with literal name representation encodes a +field name and a field value as string literals.¶
+ +This representation starts with the '001' 3-bit pattern. The fourth bit is +the 'N' bit as described in Section 4.5.4. The name follows, +represented as a 4-bit prefix string literal, then the value, represented as an +8-bit prefix string literal; see Section 4.1.2.¶
+QPACK defines two settings for the HTTP/3 SETTINGS frame:¶
+The default value is zero. See Section 3.2 for usage. This is +the equivalent of the SETTINGS_HEADER_TABLE_SIZE from HTTP/2.¶
+The default value is zero. See Section 2.1.2.¶
+The following error codes are defined for HTTP/3 to indicate failures of +QPACK that prevent the stream or connection from continuing:¶
+The decoder failed to interpret an encoded field section and is not able to +continue decoding that field section.¶
+The decoder failed to interpret an encoder instruction received on the +encoder stream.¶
+The encoder failed to interpret a decoder instruction received on the +decoder stream.¶
+This section describes potential areas of security concern with QPACK:¶
+QPACK reduces the encoded size of field sections by exploiting the redundancy +inherent in protocols like HTTP. The ultimate goal of this is to reduce the +amount of data that is required to send HTTP requests or responses.¶
+The compression context used to encode header and trailer fields can be probed +by an attacker who can both define fields to be encoded and transmitted and +observe the length of those fields once they are encoded. When an attacker can +do both, they can adaptively modify requests in order to confirm guesses about +the dynamic table state. If a guess is compressed into a shorter length, the +attacker can observe the encoded length and infer that the guess was correct.¶
+This is possible even over the Transport Layer Security Protocol (TLS, see +[TLS]), because while TLS provides confidentiality protection for +content, it only provides a limited amount of protection for the length of that +content.¶
+Padding schemes only provide limited protection against an attacker with these +capabilities, potentially only forcing an increased number of guesses to learn +the length associated with a given guess. Padding schemes also work directly +against compression by increasing the number of bits that are transmitted.¶
+Attacks like CRIME ([CRIME]) demonstrated the existence of these general +attacker capabilities. The specific attack exploited the fact that DEFLATE +([RFC1951]) removes redundancy based on prefix matching. This permitted the +attacker to confirm guesses a character at a time, reducing an exponential-time +attack into a linear-time attack.¶
+QPACK mitigates but does not completely prevent attacks modeled on CRIME +([CRIME]) by forcing a guess to match an entire field line, rather than +individual characters. An attacker can only learn whether a guess is correct or +not, so is reduced to a brute force guess for the field values associated with a +given field name.¶
+The viability of recovering specific field values therefore depends on the +entropy of values. As a result, values with high entropy are unlikely to be +recovered successfully. However, values with low entropy remain vulnerable.¶
+Attacks of this nature are possible any time that two mutually distrustful +entities control requests or responses that are placed onto a single HTTP/3 +connection. If the shared QPACK compressor permits one entity to add entries to +the dynamic table, and the other to access those entries to encode chosen field +lines, then the attacker can learn the state of the table by observing the +length of the encoded output.¶
+Having requests or responses from mutually distrustful entities occurs when an +intermediary either:¶
+Web browsers also need to assume that requests made on the same connection by +different web origins ([RFC6454]) are made by mutually distrustful entities.¶
+Users of HTTP that require confidentiality for header or trailer fields can use +values with entropy sufficient to make guessing infeasible. However, this is +impractical as a general solution because it forces all users of HTTP to take +steps to mitigate attacks. It would impose new constraints on how HTTP is used.¶
+Rather than impose constraints on users of HTTP, an implementation of QPACK can +instead constrain how compression is applied in order to limit the potential for +dynamic table probing.¶
+An ideal solution segregates access to the dynamic table based on the entity +that is constructing the message. Field values that are added to the table are +attributed to an entity, and only the entity that created a particular value can +extract that value.¶
+To improve compression performance of this option, certain entries might be +tagged as being public. For example, a web browser might make the values of the +Accept-Encoding header field available in all requests.¶
+An encoder without good knowledge of the provenance of field values might +instead introduce a penalty for many field lines with the same field name and +different values. This penalty could cause a large number of attempts to guess +a field value to result in the field not being compared to the dynamic table +entries in future messages, effectively preventing further guesses.¶
+Simply removing entries corresponding to the field from the dynamic table can +be ineffectual if the attacker has a reliable way of causing values to be +reinstalled. For example, a request to load an image in a web browser +typically includes the Cookie header field (a potentially highly valued target +for this sort of attack), and web sites can easily force an image to be +loaded, thereby refreshing the entry in the dynamic table.¶
+This response might be made inversely proportional to the length of the +field value. Disabling access to the dynamic table for a given field name might +occur for shorter values more quickly or with higher probability than for longer +values.¶
+Implementations can also choose to protect sensitive fields by not compressing +them and instead encoding their value as literals.¶
+Refusing to insert a field line into the dynamic table is only effective if +doing so is avoided on all hops. The never-indexed literal bit (see +Section 4.5.4) can be used to signal to intermediaries that a +particular value was intentionally sent as a literal.¶
+An intermediary MUST NOT re-encode a value that uses a literal representation +with the 'N' bit set with another representation that would index it. If QPACK +is used for re-encoding, a literal representation with the 'N' bit set MUST be +used. If HPACK is used for re-encoding, the never-indexed literal +representation (see Section 6.2.3 of [RFC7541]) MUST be used.¶
+The choice to mark that a field value should never be indexed depends on several +factors. Since QPACK does not protect against guessing an entire field value, +short or low-entropy values are more readily recovered by an adversary. +Therefore, an encoder might choose not to index values with low entropy.¶
+An encoder might also choose not to index values for fields that are considered +to be highly valuable or sensitive to recovery, such as the Cookie or +Authorization header fields.¶
+On the contrary, an encoder might prefer indexing values for fields that have +little or no value if they were exposed. For instance, a User-Agent header field +does not commonly vary between requests and is sent to any server. In that case, +confirmation that a particular User-Agent value has been used provides little +value.¶
+Note that these criteria for deciding to use a never-indexed literal +representation will evolve over time as new attacks are discovered.¶
+There is no currently known attack against a static Huffman encoding. A study +has shown that using a static Huffman encoding table created an information +leakage, however this same study concluded that an attacker could not take +advantage of this information leakage to recover any meaningful amount of +information (see [PETAL]).¶
+An attacker can try to cause an endpoint to exhaust its memory. QPACK is +designed to limit both the peak and stable amounts of memory allocated by an +endpoint.¶
+The amount of memory used by the encoder is limited by the protocol using +QPACK through the definition of the maximum size of the dynamic table, and the +maximum number of blocking streams. In HTTP/3, these values are controlled by +the decoder through the settings parameters SETTINGS_QPACK_MAX_TABLE_CAPACITY +and SETTINGS_QPACK_BLOCKED_STREAMS, respectively (see +Section 3.2.3 and Section 2.1.2). The limit on the +size of the dynamic table takes into account the size of the data stored in the +dynamic table, plus a small allowance for overhead. The limit on the number of +blocked streams is only a proxy for the maximum amount of memory required by the +decoder. The actual maximum amount of memory will depend on how much memory the +decoder uses to track each blocked stream.¶
+A decoder can limit the amount of state memory used for the dynamic table by +setting an appropriate value for the maximum size of the dynamic table. In +HTTP/3, this is realized by setting an appropriate value for the +SETTINGS_QPACK_MAX_TABLE_CAPACITY parameter. An encoder can limit the amount of +state memory it uses by signaling a lower dynamic table size than the decoder +allows (see Section 3.2.2).¶
+A decoder can limit the amount of state memory used for blocked streams by +setting an appropriate value for the maximum number of blocked streams. In +HTTP/3, this is realized by setting an appropriate value for the +QPACK_BLOCKED_STREAMS parameter. Streams which risk becoming blocked consume no +additional state memory on the encoder.¶
+An encoder allocates memory to track all dynamic table references in +unacknowledged field sections. An implementation can directly limit the amount +of state memory by only using as many references to the dynamic table as it +wishes to track; no signaling to the decoder is required. However, limiting +references to the dynamic table will reduce compression effectiveness.¶
+The amount of temporary memory consumed by an encoder or decoder can be limited +by processing field lines sequentially. A decoder implementation does not need +to retain a complete list of field lines while decoding a field section. An +encoder implementation does not need to retain a complete list of field lines +while encoding a field section if it is using a single-pass algorithm. Note +that it might be necessary for an application to retain a complete list of field +lines for other reasons; even if QPACK does not force this to occur, application +constraints might make this necessary.¶
+While the negotiated limit on the dynamic table size accounts for much of the +memory that can be consumed by a QPACK implementation, data that cannot be +immediately sent due to flow control is not affected by this limit. +Implementations should limit the size of unsent data, especially on the decoder +stream where flexibility to choose what to send is limited. Possible responses +to an excess of unsent data might include limiting the ability of the peer to +open new streams, reading only from the encoder stream, or closing the +connection.¶
+An implementation of QPACK needs to ensure that large values for integers, long +encoding for integers, or long string literals do not create security +weaknesses.¶
+An implementation has to set a limit for the values it accepts for integers, as +well as for the encoded length; see Section 4.1.1. In the same way, it +has to set a limit to the length it accepts for string literals; see +Section 4.1.2.¶
+This document specifies two settings. The entries in the following table are +registered in the "HTTP/3 Settings" registry established in [HTTP3].¶
+Setting Name | +Code | +Specification | +Default | +
---|---|---|---|
QPACK_MAX_TABLE_CAPACITY | +0x1 | ++ Section 5 + | +0 | +
QPACK_BLOCKED_STREAMS | +0x7 | ++ Section 5 + | +0 | +
This document specifies two stream types. The entries in the following table are +registered in the "HTTP/3 Stream Type" registry established in [HTTP3].¶
+Stream Type | +Code | +Specification | +Sender | +
---|---|---|---|
QPACK Encoder Stream | +0x02 | ++ Section 4.2 + | +Both | +
QPACK Decoder Stream | +0x03 | ++ Section 4.2 + | +Both | +
This document specifies three error codes. The entries in the following table +are registered in the "HTTP/3 Error Code" registry established in [HTTP3].¶
+Name | +Code | +Description | +Specification | +
---|---|---|---|
QPACK_DECOMPRESSION_FAILED | +0x200 | +Decoding of a field section failed | ++ Section 6 + | +
QPACK_ENCODER_STREAM_ERROR | +0x201 | +Error on the encoder stream | ++ Section 6 + | +
QPACK_DECODER_STREAM_ERROR | +0x202 | +Error on the decoder stream | ++ Section 6 + | +
This table was generated by analyzing actual Internet traffic in 2018 and +including the most common header fields, after filtering out some unsupported +and non-standard values. Due to this methodology, some of the entries may be +inconsistent or appear multiple times with similar but not identical values. The +order of the entries is optimized to encode the most common header fields with +the smallest number of bytes.¶
+Index | +Name | +Value | +
---|---|---|
0 | +:authority | ++ |
1 | +:path | +/ | +
2 | +age | +0 | +
3 | +content-disposition | ++ |
4 | +content-length | +0 | +
5 | +cookie | ++ |
6 | +date | ++ |
7 | +etag | ++ |
8 | +if-modified-since | ++ |
9 | +if-none-match | ++ |
10 | +last-modified | ++ |
11 | +link | ++ |
12 | +location | ++ |
13 | +referer | ++ |
14 | +set-cookie | ++ |
15 | +:method | +CONNECT | +
16 | +:method | +DELETE | +
17 | +:method | +GET | +
18 | +:method | +HEAD | +
19 | +:method | +OPTIONS | +
20 | +:method | +POST | +
21 | +:method | +PUT | +
22 | +:scheme | +http | +
23 | +:scheme | +https | +
24 | +:status | +103 | +
25 | +:status | +200 | +
26 | +:status | +304 | +
27 | +:status | +404 | +
28 | +:status | +503 | +
29 | +accept | +*/* | +
30 | +accept | +application/dns-message | +
31 | +accept-encoding | +gzip, deflate, br | +
32 | +accept-ranges | +bytes | +
33 | +access-control-allow-headers | +cache-control | +
34 | +access-control-allow-headers | +content-type | +
35 | +access-control-allow-origin | +* | +
36 | +cache-control | +max-age=0 | +
37 | +cache-control | +max-age=2592000 | +
38 | +cache-control | +max-age=604800 | +
39 | +cache-control | +no-cache | +
40 | +cache-control | +no-store | +
41 | +cache-control | +public, max-age=31536000 | +
42 | +content-encoding | +br | +
43 | +content-encoding | +gzip | +
44 | +content-type | +application/dns-message | +
45 | +content-type | +application/javascript | +
46 | +content-type | +application/json | +
47 | +content-type | +application/x-www-form-urlencoded | +
48 | +content-type | +image/gif | +
49 | +content-type | +image/jpeg | +
50 | +content-type | +image/png | +
51 | +content-type | +text/css | +
52 | +content-type | +text/html; charset=utf-8 | +
53 | +content-type | +text/plain | +
54 | +content-type | +text/plain;charset=utf-8 | +
55 | +range | +bytes=0- | +
56 | +strict-transport-security | +max-age=31536000 | +
57 | +strict-transport-security | +max-age=31536000; includesubdomains | +
58 | +strict-transport-security | +max-age=31536000; includesubdomains; preload | +
59 | +vary | +accept-encoding | +
60 | +vary | +origin | +
61 | +x-content-type-options | +nosniff | +
62 | +x-xss-protection | +1; mode=block | +
63 | +:status | +100 | +
64 | +:status | +204 | +
65 | +:status | +206 | +
66 | +:status | +302 | +
67 | +:status | +400 | +
68 | +:status | +403 | +
69 | +:status | +421 | +
70 | +:status | +425 | +
71 | +:status | +500 | +
72 | +accept-language | ++ |
73 | +access-control-allow-credentials | +FALSE | +
74 | +access-control-allow-credentials | +TRUE | +
75 | +access-control-allow-headers | +* | +
76 | +access-control-allow-methods | +get | +
77 | +access-control-allow-methods | +get, post, options | +
78 | +access-control-allow-methods | +options | +
79 | +access-control-expose-headers | +content-length | +
80 | +access-control-request-headers | +content-type | +
81 | +access-control-request-method | +get | +
82 | +access-control-request-method | +post | +
83 | +alt-svc | +clear | +
84 | +authorization | ++ |
85 | +content-security-policy | +script-src 'none'; object-src 'none'; base-uri 'none' | +
86 | +early-data | +1 | +
87 | +expect-ct | ++ |
88 | +forwarded | ++ |
89 | +if-range | ++ |
90 | +origin | ++ |
91 | +purpose | +prefetch | +
92 | +server | ++ |
93 | +timing-allow-origin | +* | +
94 | +upgrade-insecure-requests | +1 | +
95 | +user-agent | ++ |
96 | +x-forwarded-for | ++ |
97 | +x-frame-options | +deny | +
98 | +x-frame-options | +sameorigin | +
The following examples represent a series of exchanges between an encoder and a +decoder. The exchanges are designed to exercise most QPACK instructions, and +highlight potentially common patterns and their impact on dynamic table state. +The encoder sends three encoded field sections containing one field line each, +as wells as two speculative inserts that are not referenced.¶
+The state of the encoder's dynamic table is shown, along with its +current size. Each entry is shown with the Absolute Index of the entry (Abs), +the current number of outstanding encoded field sections with references to that +entry (Ref), along with the name and value. Entries above the 'acknowledged' +line have been acknowledged by the decoder.¶
+The encoder sends an encoded field section containing a literal representation +of a field with a static name reference.¶
++Data | Interpretation + | Encoder's Dynamic Table + +Stream: 0 +0000 | Required Insert Count = 0, Base = 0 +510b 2f69 6e64 6578 | Literal Field Line with Name Reference +2e68 746d 6c | Static Table, Index=1 + | (:path=/index.html) + + Abs Ref Name Value + ^-- acknowledged --^ + Size=0 +¶ +
The encoder sets the dynamic table capacity, inserts a header with a dynamic +name reference, then sends a potentially blocking, encoded field section +referencing this new entry. The decoder acknowledges processing the encoded +field section, which implicitly acknowledges all dynamic table insertions up to +the Required Insert Count.¶
++Stream: Encoder +3fbd01 | Set Dynamic Table Capacity=220 +c00f 7777 772e 6578 | Insert With Name Reference +616d 706c 652e 636f | Static Table, Index=0 +6d | (:authority=www.example.com) +c10c 2f73 616d 706c | Insert With Name Reference +652f 7061 7468 | Static Table, Index=1 + | (:path=/sample/path) + + Abs Ref Name Value + ^-- acknowledged --^ + 1 0 :authority www.example.com + 2 0 :path /sample/path + Size=106 + +Stream: 4 +0381 | Required Insert Count = 2, Base = 0 +10 | Indexed Field Line With Post-Base Index + | Absolute Index = Base(0) + Index(0) + 1 = 1 + | (:authority=www.example.com) +11 | Indexed Field Line With Post-Base Index + | Absolute Index = Base(0) + Index(1) + 1 = 2 + | (:path=/sample/path) + + Abs Ref Name Value + ^-- acknowledged --^ + 1 1 :authority www.example.com + 2 1 :path /sample/path + Size=106 + +Stream: Decoder +84 | Section Acknowledgment (stream=4) + + Abs Ref Name Value + 1 0 :authority www.example.com + 2 0 :path /sample/path + ^-- acknowledged --^ + Size=106 +¶ +
The encoder inserts a header into the dynamic table with a literal name. +The decoder acknowledges receipt of the entry. The encoder does not send +any encoded field sections.¶
++Stream: Encoder +4a63 7573 746f 6d2d | Insert With Literal Name +6b65 790c 6375 7374 | (custom-key=custom-value) +6f6d 2d76 616c 7565 | + + Abs Ref Name Value + 1 0 :authority www.example.com + 2 0 :path /sample/path + ^-- acknowledged --^ + 3 0 custom-key custom-value + Size=160 + +Stream: Decoder +01 | Insert Count Increment (1) + + Abs Ref Name Value + 1 0 :authority www.example.com + 2 0 :path /sample/path + 3 0 custom-key custom-value + ^-- acknowledged --^ + Size=160 + +¶ +
The encoder duplicates an existing entry in the dynamic table, then sends an +encoded field section referencing the dynamic table entries including the +duplicated entry. The packet containing the encoder stream data is delayed. +Before the packet arrives, the decoder cancels the stream and notifies the +encoder that the encoded field section was not processed.¶
++Stream: Encoder +02 | Duplicate (Relative Index=2) + + Abs Ref Name Value + 1 0 :authority www.example.com + 2 0 :path /sample/path + 3 0 custom-key custom-value + ^-- acknowledged --^ + 4 0 :authority www.example.com + Size=217 + +Stream: 8 +0500 | Required Insert Count = 4, Base = 4 +80 | Indexed Field Line, Dynamic Table + | Absolute Index = Base(4) - Index(0) = 4 + | (:authority=www.example.com) +c1 | Indexed Field Line, Static Table Index = 1 + | (:path=/) +81 | Indexed Field Line, Dynamic Table + | Absolute Index = Base(4) - Index(1) = 3 + | (custom-key=custom-value) + + Abs Ref Name Value + 1 0 :authority www.example.com + 2 0 :path /sample/path + 3 1 custom-key custom-value + ^-- acknowledged --^ + 4 1 :authority www.example.com + Size=217 + +Stream: Decoder +48 | Stream Cancellation (Stream=8) + + Abs Ref Name Value + 1 0 :authority www.example.com + 2 0 :path /sample/path + 3 0 custom-key custom-value + ^-- acknowledged --^ + 4 0 :authority www.example.com + Size=215 + +¶ +
The encoder inserts another header into the dynamic table, which evicts the +oldest entry. The encoder does not send any encoded field sections.¶
++Stream: Encoder +810d 6375 7374 6f6d | Insert With Name Reference +2d76 616c 7565 32 | Dynamic Table, Absolute Index=2 + | (custom-key=custom-value2) + + Abs Ref Name Value + 2 0 :path /sample/path + 3 0 custom-key custom-value + ^-- acknowledged --^ + 4 0 :authority www.example.com + 5 0 custom-key custom-value2 + Size=215 +¶ +
Pseudo-code for single pass encoding, excluding handling of duplicates, +non-blocking mode, available encoder stream flow control and reference tracking.¶
++base = dynamicTable.getInsertCount() +requiredInsertCount = 0 +for line in field_lines: + staticIndex = staticTable.findIndex(line) + if staticIndex is not None: + encodeIndexReference(streamBuffer, staticIndex) + continue + + dynamicIndex = dynamicTable.findIndex(line) + if dynamicIndex is None: + # No matching entry. Either insert+index or encode literal + staticNameIndex = staticTable.findName(line.name) + if staticNameIndex is None: + dynamicNameIndex = dynamicTable.findName(line.name) + + if shouldIndex(line) and dynamicTable.canIndex(line): + encodeInsert(encoderBuffer, staticNameIndex, + dynamicNameIndex, line) + dynamicIndex = dynamicTable.add(line) + + if dynamicIndex is None: + # Could not index it, literal + if dynamicNameIndex is not None: + # Encode literal with dynamic name, possibly above base + encodeDynamicLiteral(streamBuffer, dynamicNameIndex, + base, line) + requiredInsertCount = max(requiredInsertCount, + dynamicNameIndex) + else: + # Encodes a literal with a static name or literal name + encodeLiteral(streamBuffer, staticNameIndex, line) + else: + # Dynamic index reference + assert(dynamicIndex is not None) + requiredInsertCount = max(requiredInsertCount, dynamicIndex) + # Encode dynamicIndex, possibly above base + encodeDynamicIndexReference(streamBuffer, dynamicIndex, base) + +# encode the prefix +if requiredInsertCount == 0: + encodeIndexReference(prefixBuffer, 0, 0, 8) + encodeIndexReference(prefixBuffer, 0, 0, 7) +else: + wireRIC = ( + requiredInsertCount + % (2 * getMaxEntries(maxTableCapacity)) + ) + 1; + encodeInteger(prefixBuffer, 0x00, wireRIC, 8) + if base >= requiredInsertCount: + encodeInteger(prefixBuffer, 0, base - requiredInsertCount, 7) + else: + encodeInteger(prefixBuffer, 0x80, + requiredInsertCount - base - 1, 7) + +return encoderBuffer, prefixBuffer + streamBuffer +¶ +
Editorial changes only¶
+Editorial changes only¶
+Editorial changes only¶
+Editorial changes only¶
+No changes¶
+Added security considerations¶
+No changes¶
+Editorial changes only¶
+Editorial changes only¶
+Editorial changes only¶
+The IETF QUIC Working Group received an enormous amount of support from many +people.¶
+The compression design team did substantial work exploring the problem space and +influencing the initial draft. The contributions of design team members Roberto +Peon, Martin Thomson, and Dmitri Tikhonov are gratefully acknowledged.¶
+The following people also provided substantial contributions to this document:¶
+奥 一穂 (Kazuho Oku)¶
+This draft draws heavily on the text of [RFC7541]. The indirect input of +those authors is also gratefully acknowledged.¶
+Buck's contribution was supported by Google during his employment there.¶
+A portion of Mike's contribution was supported by Microsoft during his +employment there.¶
+Internet-Draft | +QUIC Loss Detection | +January 2021 | +
Iyengar & Swett | +Expires 19 July 2021 | +[Page] | +
This document describes loss detection and congestion control mechanisms for +QUIC.¶
+Discussion of this draft takes place on the QUIC working group mailing list +(quic@ietf.org), which is archived at +https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
+Working Group information can be found at https://github.com/quicwg; source +code and issues list for this draft can be found at +https://github.com/quicwg/base-drafts/labels/-recovery.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 19 July 2021.¶
++ Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Simplified BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Simplified BSD License.¶
+QUIC is a secure general-purpose transport protocol, described in +[QUIC-TRANSPORT]). This document describes loss detection and congestion +control mechanisms for QUIC.¶
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", +"MAY", and "OPTIONAL" in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, and only when, they +appear in all capitals, as shown here.¶
+Definitions of terms that are used in this document:¶
+All frames other than ACK, PADDING, and CONNECTION_CLOSE are considered +ack-eliciting.¶
+Packets that contain ack-eliciting frames elicit an ACK from the receiver +within the maximum acknowledgment delay and are called ack-eliciting packets.¶
+Packets are considered in-flight when they are ack-eliciting or contain a +PADDING frame, and they have been sent but are not acknowledged, declared +lost, or discarded along with old keys.¶
+All transmissions in QUIC are sent with a packet-level header, which indicates +the encryption level and includes a packet sequence number (referred to below as +a packet number). The encryption level indicates the packet number space, as +described in Section 12.3 in [QUIC-TRANSPORT]. Packet numbers never repeat +within a packet number space for the lifetime of a connection. Packet numbers +are sent in monotonically increasing order within a space, preventing ambiguity. +It is permitted for some packet numbers to never be used, leaving intentional +gaps.¶
+This design obviates the need for disambiguating between transmissions and +retransmissions; this eliminates significant complexity from QUIC's +interpretation of TCP loss detection mechanisms.¶
+QUIC packets can contain multiple frames of different types. The recovery +mechanisms ensure that data and frames that need reliable delivery are +acknowledged or declared lost and sent in new packets as necessary. The types +of frames contained in a packet affect recovery and congestion control logic:¶
+Readers familiar with TCP's loss detection and congestion control will find +algorithms here that parallel well-known TCP ones. However, protocol differences +between QUIC and TCP contribute to algorithmic differences. These protocol +differences are briefly described below.¶
+QUIC uses separate packet number spaces for each encryption level, except 0-RTT +and all generations of 1-RTT keys use the same packet number space. Separate +packet number spaces ensures acknowledgment of packets sent with one level of +encryption will not cause spurious retransmission of packets sent with a +different encryption level. Congestion control and round-trip time (RTT) +measurement are unified across packet number spaces.¶
+TCP conflates transmission order at the sender with delivery order at the +receiver, resulting in the retransmission ambiguity problem +([RETRANSMISSION]). QUIC separates transmission order from delivery order: +packet numbers indicate transmission order, and delivery order is determined by +the stream offsets in STREAM frames.¶
+QUIC's packet number is strictly increasing within a packet number space, +and directly encodes transmission order. A higher packet number signifies +that the packet was sent later, and a lower packet number signifies that +the packet was sent earlier. When a packet containing ack-eliciting +frames is detected lost, QUIC includes necessary frames in a new packet +with a new packet number, removing ambiguity about which packet is +acknowledged when an ACK is received. Consequently, more accurate RTT +measurements can be made, spurious retransmissions are trivially detected, and +mechanisms such as Fast Retransmit can be applied universally, based only on +packet number.¶
+This design point significantly simplifies loss detection mechanisms for QUIC. +Most TCP mechanisms implicitly attempt to infer transmission ordering based on +TCP sequence numbers - a non-trivial task, especially when TCP timestamps are +not available.¶
+QUIC starts a loss epoch when a packet is lost. The loss epoch ends when any +packet sent after the start of the epoch is acknowledged. TCP waits for the gap +in the sequence number space to be filled, and so if a segment is lost multiple +times in a row, the loss epoch may not end for several round trips. Because both +should reduce their congestion windows only once per epoch, QUIC will do it once +for every round trip that experiences loss, while TCP may only do it once across +multiple round trips.¶
+QUIC ACK frames contain information similar to that in TCP Selective +Acknowledgements (SACKs, [RFC2018]). However, QUIC does not allow a packet +acknowledgement to be reneged, greatly simplifying implementations on both sides +and reducing memory pressure on the sender.¶
+QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In high loss +environments, this speeds recovery, reduces spurious retransmits, and ensures +forward progress without relying on timeouts.¶
+QUIC endpoints measure the delay incurred between when a packet is received and +when the corresponding acknowledgment is sent, allowing a peer to maintain a +more accurate round-trip time estimate; see Section 13.2 of [QUIC-TRANSPORT].¶
+QUIC uses a probe timeout (PTO; see Section 6.2), with a timer based on TCP's RTO +computation; see [RFC6297]. QUIC's PTO includes the peer's maximum expected +acknowledgment delay instead of using a fixed minimum timeout.¶
+Similar to the RACK-TLP loss detection algorithm for TCP +([RACK]), QUIC does not collapse the congestion window +when the PTO expires, since a single packet loss at the tail does not indicate +persistent congestion. Instead, QUIC collapses the congestion window when +persistent congestion is declared; see Section 7.6. In doing this, +QUIC avoids unnecessary congestion window reductions, obviating the need for +correcting mechanisms such as F-RTO ([RFC5682]). Since QUIC does not +collapse the congestion window on a PTO expiration, a QUIC sender is not +limited from sending more in-flight packets after a PTO expiration if it still +has available congestion window. This occurs when a sender is +application-limited and the PTO timer expires. This is more aggressive than +TCP's RTO mechanism when application-limited, but identical when not +application-limited.¶
+QUIC allows probe packets to temporarily exceed the congestion window whenever +the timer expires.¶
+TCP uses a minimum congestion window of one packet. However, loss of +that single packet means that the sender needs to waiting for a PTO +(Section 6.2) to recover, which can be much longer than a round-trip time. +Sending a single ack-eliciting packet also increases the chances of incurring +additional latency when a receiver delays its acknowledgment.¶
+QUIC therefore recommends that the minimum congestion window be two +packets. While this increases network load, it is considered safe, since the +sender will still reduce its sending rate exponentially under persistent +congestion (Section 6.2).¶
+At a high level, an endpoint measures the time from when a packet was sent to +when it is acknowledged as a round-trip time (RTT) sample. The endpoint uses +RTT samples and peer-reported host delays (see Section 13.2 of +[QUIC-TRANSPORT]) to generate a statistical description of the network +path's RTT. An endpoint computes the following three values for each path: +the minimum value over a period of time (min_rtt), an +exponentially-weighted moving average (smoothed_rtt), and the mean deviation +(referred to as "variation" in the rest of this document) in the observed RTT +samples (rttvar).¶
+An endpoint generates an RTT sample on receiving an ACK frame that meets the +following two conditions:¶
+The RTT sample, latest_rtt, is generated as the time elapsed since the largest +acknowledged packet was sent:¶
++latest_rtt = ack_time - send_time_of_largest_acked +¶ +
An RTT sample is generated using only the largest acknowledged packet in the +received ACK frame. This is because a peer reports acknowledgment delays for +only the largest acknowledged packet in an ACK frame. While the reported +acknowledgment delay is not used by the RTT sample measurement, it is used to +adjust the RTT sample in subsequent computations of smoothed_rtt and rttvar +(Section 5.3).¶
+To avoid generating multiple RTT samples for a single packet, an ACK frame +SHOULD NOT be used to update RTT estimates if it does not newly acknowledge the +largest acknowledged packet.¶
+An RTT sample MUST NOT be generated on receiving an ACK frame that does not +newly acknowledge at least one ack-eliciting packet. A peer usually does not +send an ACK frame when only non-ack-eliciting packets are received. Therefore +an ACK frame that contains acknowledgments for only non-ack-eliciting packets +could include an arbitrarily large ACK Delay value. Ignoring +such ACK frames avoids complications in subsequent smoothed_rtt and rttvar +computations.¶
+A sender might generate multiple RTT samples per RTT when multiple ACK frames +are received within an RTT. As suggested in [RFC6298], doing so might result +in inadequate history in smoothed_rtt and rttvar. Ensuring that RTT estimates +retain sufficient history is an open research question.¶
+min_rtt is the sender's estimate of the minimum RTT observed for a given network +path over a period of time. In this document, min_rtt is used by loss detection +to reject implausibly small rtt samples.¶
+min_rtt MUST be set to the latest_rtt on the first RTT sample. min_rtt MUST be +set to the lesser of min_rtt and latest_rtt (Section 5.1) on all other +samples.¶
+An endpoint uses only locally observed times in computing the min_rtt and does +not adjust for acknowledgment delays reported by the peer. Doing so allows the +endpoint to set a lower bound for the smoothed_rtt based entirely on what it +observes (see Section 5.3), and limits potential underestimation due to +erroneously-reported delays by the peer.¶
+The RTT for a network path may change over time. If a path's actual RTT +decreases, the min_rtt will adapt immediately on the first low sample. If the +path's actual RTT increases however, the min_rtt will not adapt to it, allowing +future RTT samples that are smaller than the new RTT to be included in +smoothed_rtt.¶
+Endpoints SHOULD set the min_rtt to the newest RTT sample after persistent +congestion is established. This is to allow a connection to reset its estimate +of min_rtt and smoothed_rtt (Section 5.3) after a disruptive network event, +and because it is possible that an increase in path delay resulted in persistent +congestion being incorrectly declared.¶
+Endpoints MAY re-establish the min_rtt at other times in the connection, such as +when traffic volume is low and an acknowledgment is received with a low +acknowledgment delay. Implementations SHOULD NOT refresh the min_rtt +value too often, since the actual minimum RTT of the path is not +frequently observable.¶
+smoothed_rtt is an exponentially-weighted moving average of an endpoint's RTT +samples, and rttvar estimates the variation in the RTT samples using a mean +variation.¶
+The calculation of smoothed_rtt uses RTT samples after adjusting them for +acknowledgment delays. These delays are decoded from the ACK Delay field of +ACK frames as described in Section 19.3 of [QUIC-TRANSPORT].¶
+The peer might report acknowledgment delays that are larger than the peer's +max_ack_delay during the handshake (Section 13.2.1 of [QUIC-TRANSPORT]). To +account for this, the endpoint SHOULD ignore max_ack_delay until the handshake +is confirmed, as defined in Section 4.1.2 of [QUIC-TLS]. When they occur, +these large acknowledgment delays are likely to be non-repeating and limited to +the handshake. The endpoint can therefore use them without limiting them to the +max_ack_delay, avoiding unnecessary inflation of the RTT estimate.¶
+Note that a large acknowledgment delay can result in a substantially inflated +smoothed_rtt, if there is either an error in the peer's reporting of the +acknowledgment delay or in the endpoint's min_rtt estimate. Therefore, prior +to handshake confirmation, an endpoint MAY ignore RTT samples if adjusting +the RTT sample for acknowledgment delay causes the sample to be less than the +min_rtt.¶
+After the handshake is confirmed, any acknowledgment delays reported by the +peer that are greater than the peer's max_ack_delay are attributed to +unintentional but potentially repeating delays, such as scheduler latency at the +peer or loss of previous acknowledgments. Excess delays could also be due to +a non-compliant receiver. Therefore, these extra delays are considered +effectively part of path delay and incorporated into the RTT estimate.¶
+Therefore, when adjusting an RTT sample using peer-reported acknowledgment +delays, an endpoint:¶
+Additionally, an endpoint might postpone the processing of acknowledgments when +the corresponding decryption keys are not immediately available. For example, a +client might receive an acknowledgment for a 0-RTT packet that it cannot +decrypt because 1-RTT packet protection keys are not yet available to it. In +such cases, an endpoint SHOULD subtract such local delays from its RTT sample +until the handshake is confirmed.¶
+Similar to [RFC6298], smoothed_rtt and rttvar are computed as follows.¶
+An endpoint initializes the RTT estimator during connection establishment and +when the estimator is reset during connection migration; see Section 9.4 of +[QUIC-TRANSPORT]. Before any RTT samples are available for a new path or when +the estimator is reset, the estimator is initialized using the initial RTT; see +Section 6.2.2.¶
+smoothed_rtt and rttvar are initialized as follows, where kInitialRtt contains +the initial RTT value:¶
++smoothed_rtt = kInitialRtt +rttvar = kInitialRtt / 2 +¶ +
RTT samples for the network path are recorded in latest_rtt; see +Section 5.1. On the first RTT sample after initialization, the estimator is +reset using that sample. This ensures that the estimator retains no history of +past samples.¶
+On the first RTT sample after initialization, smoothed_rtt and rttvar are set as +follows:¶
++smoothed_rtt = latest_rtt +rttvar = latest_rtt / 2 +¶ +
On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows:¶
++ack_delay = decoded acknowledgment delay from ACK frame +if (handshake confirmed): + ack_delay = min(ack_delay, max_ack_delay) +adjusted_rtt = latest_rtt +if (min_rtt + ack_delay < latest_rtt): + adjusted_rtt = latest_rtt - ack_delay +smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt +rttvar_sample = abs(smoothed_rtt - adjusted_rtt) +rttvar = 3/4 * rttvar + 1/4 * rttvar_sample +¶ +
QUIC senders use acknowledgments to detect lost packets, and a probe +time out (see Section 6.2) to ensure acknowledgments are received. This section +provides a description of these algorithms.¶
+If a packet is lost, the QUIC transport needs to recover from that loss, such +as by retransmitting the data, sending an updated frame, or discarding the +frame. For more information, see Section 13.3 of [QUIC-TRANSPORT].¶
+Loss detection is separate per packet number space, unlike RTT measurement and +congestion control, because RTT and congestion control are properties of the +path, whereas loss detection also relies upon key availability.¶
+Acknowledgment-based loss detection implements the spirit of TCP's Fast +Retransmit ([RFC5681]), Early Retransmit ([RFC5827]), FACK ([FACK]), +SACK loss recovery ([RFC6675]), and RACK-TLP ([RACK]). This +section provides an overview of how these algorithms are implemented in QUIC.¶
+A packet is declared lost if it meets all the following conditions:¶
+The acknowledgment indicates that a packet sent later was delivered, and the +packet and time thresholds provide some tolerance for packet reordering.¶
+Spuriously declaring packets as lost leads to unnecessary retransmissions and +may result in degraded performance due to the actions of the congestion +controller upon detecting loss. Implementations can detect spurious +retransmissions and increase the reordering threshold in packets or time to +reduce future spurious retransmissions and loss events. Implementations with +adaptive time thresholds MAY choose to start with smaller initial reordering +thresholds to minimize recovery latency.¶
+The RECOMMENDED initial value for the packet reordering threshold +(kPacketThreshold) is 3, based on best practices for TCP loss detection +([RFC5681], [RFC6675]). In order to remain similar to TCP, +implementations SHOULD NOT use a packet threshold less than 3; see [RFC5681].¶
+Some networks may exhibit higher degrees of packet reordering, causing a sender +to detect spurious losses. Additionally, packet reordering could be more common +with QUIC than TCP, because network elements that could observe and reorder +TCP packets cannot do that for QUIC, because QUIC packet numbers are encrypted. +Algorithms that increase the reordering threshold after spuriously detecting +losses, such as RACK [RACK], have proven to be useful in TCP and are +expected to be at least as useful in QUIC.¶
+Once a later packet within the same packet number space has been acknowledged, +an endpoint SHOULD declare an earlier packet lost if it was sent a threshold +amount of time in the past. To avoid declaring packets as lost too early, this +time threshold MUST be set to at least the local timer granularity, as +indicated by the kGranularity constant. The time threshold is:¶
++max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) +¶ +
If packets sent prior to the largest acknowledged packet cannot yet be declared +lost, then a timer SHOULD be set for the remaining time.¶
+Using max(smoothed_rtt, latest_rtt) protects from the two following cases:¶
+The RECOMMENDED time threshold (kTimeThreshold), expressed as a round-trip time +multiplier, is 9/8. The RECOMMENDED value of the timer granularity +(kGranularity) is 1ms.¶
+TCP's RACK ([RACK]) specifies a slightly larger +threshold, equivalent to 5/4, for a similar purpose. Experience with QUIC shows +that 9/8 works well.¶
+Implementations MAY experiment with absolute thresholds, thresholds from +previous connections, adaptive thresholds, or including RTT variation. Smaller +thresholds reduce reordering resilience and increase spurious retransmissions, +and larger thresholds increase loss detection delay.¶
+A Probe Timeout (PTO) triggers sending one or two probe datagrams when +ack-eliciting packets are not acknowledged within the expected period of +time or the server may not have validated the client's address. A PTO enables +a connection to recover from loss of tail packets or acknowledgments.¶
+As with loss detection, the probe timeout is per packet number space. That is, a +PTO value is computed per packet number space.¶
+A PTO timer expiration event does not indicate packet loss and MUST NOT cause +prior unacknowledged packets to be marked as lost. When an acknowledgment +is received that newly acknowledges packets, loss detection proceeds as +dictated by packet and time threshold mechanisms; see Section 6.1.¶
+The PTO algorithm used in QUIC implements the reliability functions of +Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for +TCP [RFC5682]. The timeout computation is based on TCP's retransmission +timeout period [RFC6298].¶
+When an ack-eliciting packet is transmitted, the sender schedules a timer for +the PTO period as follows:¶
++PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay +¶ +
The PTO period is the amount of time that a sender ought to wait for an +acknowledgment of a sent packet. This time period includes the estimated +network roundtrip-time (smoothed_rtt), the variation in the estimate (4*rttvar), +and max_ack_delay, to account for the maximum time by which a receiver might +delay sending an acknowledgment.¶
+When the PTO is armed for Initial or Handshake packet number spaces, the +max_ack_delay in the PTO period computation is set to 0, since the peer is +expected to not delay these packets intentionally; see 13.2.1 of +[QUIC-TRANSPORT].¶
+The PTO period MUST be at least kGranularity, to avoid the timer expiring +immediately.¶
+When ack-eliciting packets in multiple packet number spaces are in flight, the +timer MUST be set to the earlier value of the Initial and Handshake packet +number spaces.¶
+An endpoint MUST NOT set its PTO timer for the application data packet number +space until the handshake is confirmed. Doing so prevents the endpoint from +retransmitting information in packets when either the peer does not yet have the +keys to process them or the endpoint does not yet have the keys to process their +acknowledgments. For example, this can happen when a client sends 0-RTT packets +to the server; it does so without knowing whether the server will be able to +decrypt them. Similarly, this can happen when a server sends 1-RTT packets +before confirming that the client has verified the server's certificate and can +therefore read these 1-RTT packets.¶
+A sender SHOULD restart its PTO timer every time an ack-eliciting packet is sent +or acknowledged, or when Initial or Handshake keys are discarded (Section 4.9 of +[QUIC-TLS]). This ensures the PTO is always set based on the latest estimate +of the round-trip time and for the correct packet across packet number spaces.¶
+When a PTO timer expires, the PTO backoff MUST be increased, resulting in the +PTO period being set to twice its current value. The PTO backoff factor is reset +when an acknowledgment is received, except in the following case. A server +might take longer to respond to packets during the handshake than otherwise. To +protect such a server from repeated client probes, the PTO backoff is not reset +at a client that is not yet certain that the server has finished validating the +client's address. That is, a client does not reset the PTO backoff factor on +receiving acknowledgments in Initial packets.¶
+This exponential reduction in the sender's rate is important because +consecutive PTOs might be caused by loss of packets or acknowledgments due to +severe congestion. Even when there are ack-eliciting packets in-flight in +multiple packet number spaces, the exponential increase in probe timeout +occurs across all spaces to prevent excess load on the network. For example, +a timeout in the Initial packet number space doubles the length of the timeout +in the Handshake packet number space.¶
+The total length of time over which consecutive PTOs expire is limited by the +idle timeout.¶
+The PTO timer MUST NOT be set if a timer is set for time threshold +loss detection; see Section 6.1.2. A timer that is set for time +threshold loss detection will expire earlier than the PTO timer +in most cases and is less likely to spuriously retransmit data.¶
+Resumed connections over the same network MAY use the previous connection's +final smoothed RTT value as the resumed connection's initial RTT. When no +previous RTT is available, the initial RTT SHOULD be set to 333ms. This +results in handshakes starting with a PTO of 1 second, as recommended +for TCP's initial retransmission timeout; see Section 2 of [RFC6298].¶
+A connection MAY use the delay between sending a PATH_CHALLENGE and receiving a +PATH_RESPONSE to set the initial RTT (see kInitialRtt in +Appendix A.2) for a new path, but the delay SHOULD NOT be +considered an RTT sample.¶
+Initial packets and Handshake packets could be never acknowledged, but they are +removed from bytes in flight when the Initial and Handshake keys are discarded, +as described below in Section 6.4. When Initial or Handshake keys are +discarded, the PTO and loss detection timers MUST be reset, because discarding +keys indicates forward progress and the loss detection timer might have been set +for a now discarded packet number space.¶
+Until the server has validated the client's address on the path, the amount of +data it can send is limited to three times the amount of data received, +as specified in Section 8.1 of [QUIC-TRANSPORT]. If no additional data can be +sent, the server's PTO timer MUST NOT be armed until datagrams have been +received from the client, because packets sent on PTO count against the +anti-amplification limit. Note that the server could fail to validate the +client's address even if 0-RTT is accepted.¶
+Since the server could be blocked until more datagrams are received from the +client, it is the client's responsibility to send packets to unblock the server +until it is certain that the server has finished its address validation +(see Section 8 of [QUIC-TRANSPORT]). That is, the client MUST set the +probe timer if the client has not received an acknowledgment for any of its +Handshake packets and the handshake is not confirmed (see Section 4.1.2 of +[QUIC-TLS]), even if there are no packets in flight. When the PTO fires, +the client MUST send a Handshake packet if it has Handshake keys, otherwise it +MUST send an Initial packet in a UDP datagram with a payload of at least 1200 +bytes.¶
+When a server receives an Initial packet containing duplicate CRYPTO data, +it can assume the client did not receive all of the server's CRYPTO data sent +in Initial packets, or the client's estimated RTT is too small. When a +client receives Handshake or 1-RTT packets prior to obtaining Handshake keys, +it may assume some or all of the server's Initial packets were lost.¶
+To speed up handshake completion under these conditions, an endpoint MAY, for a +limited number of times per connection, send a packet containing +unacknowledged CRYPTO data earlier than the PTO expiry, subject to the address +validation limits in Section 8.1 of [QUIC-TRANSPORT]. Doing so at most once +for each connection is adequate to quickly recover from a single packet loss. +An endpoint that always retransmits packets in response to receiving packets +that it cannot process risks creating an infinite exchange of packets.¶
+Endpoints can also use coalesced packets (see Section 12.2 of +[QUIC-TRANSPORT]) to ensure that each datagram elicits at least one +acknowledgment. For example, a client can coalesce an Initial packet +containing PING and PADDING frames with a 0-RTT data packet and a server can +coalesce an Initial packet containing a PING frame with one or more packets in +its first flight.¶
+When a PTO timer expires, a sender MUST send at least one ack-eliciting packet +in the packet number space as a probe. An endpoint MAY send up to two +full-sized datagrams containing ack-eliciting packets, to avoid an expensive +consecutive PTO expiration due to a single lost datagram, or transmit data +from multiple packet number spaces. All probe packets sent on a PTO MUST be +ack-eliciting.¶
+In addition to sending data in the packet number space for which the timer +expired, the sender SHOULD send ack-eliciting packets from other packet +number spaces with in-flight data, coalescing packets if possible. This is +particularly valuable when the server has both Initial and Handshake data +in-flight or the client has both Handshake and Application Data in-flight, +because the peer might only have receive keys for one of the two packet number +spaces.¶
+If the sender wants to elicit a faster acknowledgment on PTO, it can skip a +packet number to eliminate the acknowledgment delay.¶
+An endpoint SHOULD include new data in packets that are sent on PTO expiration. +Previously sent data MAY be sent if no new data can be sent. Implementations +MAY use alternative strategies for determining the content of probe packets, +including sending new or retransmitted data based on the application's +priorities.¶
+It is possible the sender has no new or previously-sent data to send. +As an example, consider the following sequence of events: new application data +is sent in a STREAM frame, deemed lost, then retransmitted in a new packet, +and then the original transmission is acknowledged. When there is no data to +send, the sender SHOULD send a PING or other ack-eliciting frame in a single +packet, re-arming the PTO timer.¶
+Alternatively, instead of sending an ack-eliciting packet, the sender MAY mark +any packets still in flight as lost. Doing so avoids sending an additional +packet, but increases the risk that loss is declared too aggressively, resulting +in an unnecessary rate reduction by the congestion controller.¶
+Consecutive PTO periods increase exponentially, and as a result, connection +recovery latency increases exponentially as packets continue to be dropped in +the network. Sending two packets on PTO expiration increases resilience to +packet drops, thus reducing the probability of consecutive PTO events.¶
+When the PTO timer expires multiple times and new data cannot be sent, +implementations must choose between sending the same payload every time +or sending different payloads. Sending the same payload may be simpler +and ensures the highest priority frames arrive first. Sending different +payloads each time reduces the chances of spurious retransmission.¶
+A Retry packet causes a client to send another Initial packet, effectively +restarting the connection process. A Retry packet indicates that the Initial +was received, but not processed. A Retry packet cannot be treated as an +acknowledgment, because it does not indicate that a packet was processed or +specify the packet number.¶
+Clients that receive a Retry packet reset congestion control and loss recovery +state, including resetting any pending timers. Other connection state, in +particular cryptographic handshake messages, is retained; see Section 17.2.5 of +[QUIC-TRANSPORT].¶
+The client MAY compute an RTT estimate to the server as the time period from +when the first Initial was sent to when a Retry or a Version Negotiation packet +is received. The client MAY use this value in place of its default for the +initial RTT estimate.¶
+When Initial and Handshake packet protection keys are discarded +(see Section 4.9 of [QUIC-TLS]), all packets that were sent with those keys +can no longer be acknowledged because their acknowledgments cannot be processed. +The sender MUST discard all recovery state associated with those packets +and MUST remove them from the count of bytes in flight.¶
+Endpoints stop sending and receiving Initial packets once they start exchanging +Handshake packets; see Section 17.2.2.1 of [QUIC-TRANSPORT]. At this point, +recovery state for all in-flight Initial packets is discarded.¶
+When 0-RTT is rejected, recovery state for all in-flight 0-RTT packets is +discarded.¶
+If a server accepts 0-RTT, but does not buffer 0-RTT packets that arrive +before Initial packets, early 0-RTT packets will be declared lost, but that +is expected to be infrequent.¶
+It is expected that keys are discarded after packets encrypted with them would +be acknowledged or declared lost. However, Initial and Handshake secrets are +discarded as soon as handshake and 1-RTT keys are proven to be available to both +client and server; see Section 4.9.1 of [QUIC-TLS].¶
+This document specifies a sender-side congestion controller for QUIC similar to +TCP NewReno ([RFC6582]).¶
+The signals QUIC provides for congestion control are generic and are designed to +support different sender-side algorithms. A sender can unilaterally choose a +different algorithm to use, such as Cubic ([RFC8312]).¶
+If a sender uses a different controller than that specified in this document, +the chosen controller MUST conform to the congestion control guidelines +specified in Section 3.1 of [RFC8085].¶
+Similar to TCP, packets containing only ACK frames do not count towards bytes +in flight and are not congestion controlled. Unlike TCP, QUIC can detect the +loss of these packets and MAY use that information to adjust the congestion +controller or the rate of ACK-only packets being sent, but this document does +not describe a mechanism for doing so.¶
+The algorithm in this document specifies and uses the controller's congestion +window in bytes.¶
+An endpoint MUST NOT send a packet if it would cause bytes_in_flight (see +Appendix B.2) to be larger than the congestion window, unless the packet +is sent on a PTO timer expiration (see Section 6.2) or when entering recovery +(see Section 7.3.2).¶
+If a path has been validated to support ECN ([RFC3168], [RFC8311]), QUIC +treats a Congestion Experienced (CE) codepoint in the IP header as a signal of +congestion. This document specifies an endpoint's response when the +peer-reported ECN-CE count increases; see Section 13.4.2 of [QUIC-TRANSPORT].¶
+QUIC begins every connection in slow start with the congestion window set to +an initial value. Endpoints SHOULD use an initial congestion window of 10 times +the maximum datagram size (max_datagram_size), while limiting the window +to the larger of 14720 bytes or twice the maximum datagram size. This follows +the analysis and recommendations in [RFC6928], increasing the byte limit to +account for the smaller 8-byte overhead of UDP compared to the 20-byte overhead +for TCP.¶
+If the maximum datagram size changes during the connection, the initial +congestion window SHOULD be recalculated with the new size. If the maximum +datagram size is decreased in order to complete the handshake, the +congestion window SHOULD be set to the new initial congestion window.¶
+Prior to validating the client's address, the server can be further limited by +the anti-amplification limit as specified in Section 8.1 of [QUIC-TRANSPORT]. +Though the anti-amplification limit can prevent the congestion window from +being fully utilized and therefore slow down the increase in congestion window, +it does not directly affect the congestion window.¶
+The minimum congestion window is the smallest value the congestion window can +decrease to as a response to loss, increase in the peer-reported ECN-CE count, +or persistent congestion. The RECOMMENDED value is 2 * max_datagram_size.¶
+The NewReno congestion controller described in this document has three +distinct states, as shown in Figure 1.¶
+These states and the transitions between them are described in subsequent +sections.¶
+A NewReno sender is in slow start any time the congestion window is below the +slow start threshold. A sender begins in slow start because the slow start +threshold is initialized to an infinite value.¶
+While a sender is in slow start, the congestion window increases by the number +of bytes acknowledged when each acknowledgment is processed. This results in +exponential growth of the congestion window.¶
+The sender MUST exit slow start and enter a recovery period when a packet is +lost or when the ECN-CE count reported by its peer increases.¶
+A sender re-enters slow start any time the congestion window is less than the +slow start threshold, which only occurs after persistent congestion is +declared.¶
+A NewReno sender enters a recovery period when it detects the loss of a packet +or the ECN-CE count reported by its peer increases. A sender that is already in +a recovery period stays in it and does not re-enter it.¶
+On entering a recovery period, a sender MUST set the slow start threshold to +half the value of the congestion window when loss is detected. The congestion +window MUST be set to the reduced value of the slow start threshold before +exiting the recovery period.¶
+Implementations MAY reduce the congestion window immediately upon entering a +recovery period or use other mechanisms, such as Proportional Rate Reduction +([PRR]), to reduce the congestion window more gradually. If the +congestion window is reduced immediately, a single packet can be sent prior to +reduction. This speeds up loss recovery if the data in the lost packet is +retransmitted and is similar to TCP as described in Section 5 of [RFC6675].¶
+The recovery period aims to limit congestion window reduction to once per round +trip. Therefore during a recovery period, the congestion window does not change +in response to new losses or increases in the ECN-CE count.¶
+A recovery period ends and the sender enters congestion avoidance when a packet +sent during the recovery period is acknowledged. This is slightly different +from TCP's definition of recovery, which ends when the lost segment that +started recovery is acknowledged ([RFC5681]).¶
+A NewReno sender is in congestion avoidance any time the congestion window is +at or above the slow start threshold and not in a recovery period.¶
+A sender in congestion avoidance uses an Additive Increase Multiplicative +Decrease (AIMD) approach that MUST limit the increase to the congestion window +to at most one maximum datagram size for each congestion window that is +acknowledged.¶
+The sender exits congestion avoidance and enters a recovery period when a +packet is lost or when the ECN-CE count reported by its peer increases.¶
+During the handshake, some packet protection keys might not be available when +a packet arrives and the receiver can choose to drop the packet. In particular, +Handshake and 0-RTT packets cannot be processed until the Initial packets +arrive and 1-RTT packets cannot be processed until the handshake completes. +Endpoints MAY ignore the loss of Handshake, 0-RTT, and 1-RTT packets that might +have arrived before the peer had packet protection keys to process those +packets. Endpoints MUST NOT ignore the loss of packets that were sent after +the earliest acknowledged packet in a given packet number space.¶
+Probe packets MUST NOT be blocked by the congestion controller. A sender MUST +however count these packets as being additionally in flight, since these packets +add network load without establishing packet loss. Note that sending probe +packets might cause the sender's bytes in flight to exceed the congestion window +until an acknowledgment is received that establishes loss or delivery of +packets.¶
+When a sender establishes loss of all packets sent over a long enough duration, +the network is considered to be experiencing persistent congestion.¶
+The persistent congestion duration is computed as follows:¶
++(smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay) * + kPersistentCongestionThreshold +¶ +
Unlike the PTO computation in Section 6.2, this duration includes the max_ack_delay +irrespective of the packet number spaces in which losses are established.¶
+This duration allows a sender to send as many packets before establishing +persistent congestion, including some in response to PTO expiration, as TCP does +with Tail Loss Probes ([RACK]) and a Retransmission Timeout ([RFC5681]).¶
+Larger values of kPersistentCongestionThreshold cause the sender to become less +responsive to persistent congestion in the network, which can result in +aggressive sending into a congested network. Too small a value can result in a +sender declaring persistent congestion unnecessarily, resulting in reduced +throughput for the sender.¶
+The RECOMMENDED value for kPersistentCongestionThreshold is 3, which results in +behavior that is approximately equivalent to a TCP sender declaring an RTO after +two TLPs.¶
+This design does not use consecutive PTO events to establish persistent +congestion, since application patterns impact PTO expirations. For example, a +sender that sends small amounts of data with silence periods between them +restarts the PTO timer every time it sends, potentially preventing the PTO timer +from expiring for a long period of time, even when no acknowledgments are being +received. The use of a duration enables a sender to establish persistent +congestion without depending on PTO expiration.¶
+A sender establishes persistent congestion after the receipt of an +acknowledgment if two packets that are ack-eliciting are declared lost, and:¶
+These two packets MUST be ack-eliciting, since a receiver is required to +acknowledge only ack-eliciting packets within its maximum ack delay; see Section +13.2 of [QUIC-TRANSPORT].¶
+The persistent congestion period SHOULD NOT start until there is at least one +RTT sample. Before the first RTT sample, a sender arms its PTO timer based on +the initial RTT (Section 6.2.2), which could be substantially larger than +the actual RTT. Requiring a prior RTT sample prevents a sender from establishing +persistent congestion with potentially too few probes.¶
+Since network congestion is not affected by packet number spaces, persistent +congestion SHOULD consider packets sent across packet number spaces. A sender +that does not have state for all packet number spaces or an implementation that +cannot compare send times across packet number spaces MAY use state for just the +packet number space that was acknowledged. This might result in erroneously +declaring persistent congestion, but it will not lead to a failure to detect +persistent congestion.¶
+When persistent congestion is declared, the sender's congestion window MUST be +reduced to the minimum congestion window (kMinimumWindow), similar to a TCP +sender's response on an RTO ([RFC5681]).¶
+The following example illustrates how a sender might establish persistent +congestion. Assume:¶
++smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay = 2 +kPersistentCongestionThreshold = 3 +¶ +
Consider the following sequence of events:¶
+Time | +Action | +
---|---|
t=0 | +Send packet #1 (app data) | +
t=1 | +Send packet #2 (app data) | +
t=1.2 | +Recv acknowledgment of #1 | +
t=2 | +Send packet #3 (app data) | +
t=3 | +Send packet #4 (app data) | +
t=4 | +Send packet #5 (app data) | +
t=5 | +Send packet #6 (app data) | +
t=6 | +Send packet #7 (app data) | +
t=8 | +Send packet #8 (PTO 1) | +
t=12 | +Send packet #9 (PTO 2) | +
t=12.2 | +Recv acknowledgment of #9 | +
Packets 2 through 8 are declared lost when the acknowledgment for packet 9 is +received at t = 12.2.¶
+The congestion period is calculated as the time between the oldest and newest +lost packets: 8 - 1 = 7. The persistent congestion duration is: 2 * 3 = 6. +Because the threshold was reached and because none of the packets between the +oldest and the newest lost packets were acknowledged, the network is considered +to have experienced persistent congestion.¶
+While this example shows PTO expiration, they are not required for persistent +congestion to be established.¶
+A sender SHOULD pace sending of all in-flight packets based on input from the +congestion controller.¶
+Sending multiple packets into the network without any delay between them creates +a packet burst that might cause short-term congestion and losses. Senders MUST +either use pacing or limit such bursts. Senders SHOULD limit bursts to the +initial congestion window; see Section 7.2. A sender with knowledge that +the network path to the receiver can absorb larger bursts MAY use a higher +limit.¶
+An implementation should take care to architect its congestion controller to +work well with a pacer. For instance, a pacer might wrap the congestion +controller and control the availability of the congestion window, or a pacer +might pace out packets handed to it by the congestion controller.¶
+Timely delivery of ACK frames is important for efficient loss recovery. Packets +containing only ACK frames SHOULD therefore not be paced, to avoid delaying +their delivery to the peer.¶
+Endpoints can implement pacing as they choose. A perfectly paced sender spreads +packets exactly evenly over time. For a window-based congestion controller, such +as the one in this document, that rate can be computed by averaging the +congestion window over the round-trip time. Expressed as a rate in units of +bytes per time, where congestion_window is in bytes:¶
++rate = N * congestion_window / smoothed_rtt +¶ +
Or, expressed as an inter-packet interval in units of time:¶
++interval = ( smoothed_rtt * packet_size / congestion_window ) / N +¶ +
Using a value for N
that is small, but at least 1 (for example, 1.25) ensures
+that variations in round-trip time do not result in under-utilization of the
+congestion window.¶
Practical considerations, such as packetization, scheduling delays, and +computational efficiency, can cause a sender to deviate from this rate over time +periods that are much shorter than a round-trip time.¶
+One possible implementation strategy for pacing uses a leaky bucket algorithm, +where the capacity of the "bucket" is limited to the maximum burst size and the +rate the "bucket" fills is determined by the above function.¶
+When bytes in flight is smaller than the congestion window and sending is not +pacing limited, the congestion window is under-utilized. When this occurs, +the congestion window SHOULD NOT be increased in either slow start or +congestion avoidance. This can happen due to insufficient application data +or flow control limits.¶
+A sender that paces packets (see Section 7.7) might delay sending packets +and not fully utilize the congestion window due to this delay. A sender +SHOULD NOT consider itself application limited if it would have fully +utilized the congestion window without pacing delay.¶
+A sender MAY implement alternative mechanisms to update its congestion window +after periods of under-utilization, such as those proposed for TCP in +[RFC7661].¶
+Loss detection and congestion control fundamentally involve consumption of +signals, such as delay, loss, and ECN markings, from unauthenticated +entities. An attacker can cause endpoints to reduce their sending rate by +manipulating these signals; by dropping packets, by altering path delay +strategically, or by changing ECN codepoints.¶
+Packets that carry only ACK frames can be heuristically identified by observing +packet size. Acknowledgment patterns may expose information about link +characteristics or application behavior. To reduce leaked information, +endpoints can bundle acknowledgments with other frames, or they can use PADDING +frames at a potential cost to performance.¶
+A receiver can misreport ECN markings to alter the congestion response of a +sender. Suppressing reports of ECN-CE markings could cause a sender to +increase their send rate. This increase could result in congestion and loss.¶
+A sender can detect suppression of reports by marking occasional packets that it +sends with an ECN-CE marking. If a packet sent with an ECN-CE marking is not +reported as having been CE marked when the packet is acknowledged, then the +sender can disable ECN for that path by not setting ECT codepoints in subsequent +packets sent on that path [RFC3168].¶
+Reporting additional ECN-CE markings will cause a sender to reduce their sending +rate, which is similar in effect to advertising reduced connection flow control +limits and so no advantage is gained by doing so.¶
+Endpoints choose the congestion controller that they use. Congestion controllers +respond to reports of ECN-CE by reducing their rate, but the response may vary. +Markings can be treated as equivalent to loss ([RFC3168]), but other +responses can be specified, such as ([RFC8511]) or ([RFC8311]).¶
+This document has no IANA actions.¶
+We now describe an example implementation of the loss detection mechanisms +described in Section 6.¶
+The pseudocode segments in this section are licensed as Code Components; see the +copyright notice.¶
+To correctly implement congestion control, a QUIC sender tracks every +ack-eliciting packet until the packet is acknowledged or lost. +It is expected that implementations will be able to access this information by +packet number and crypto context and store the per-packet fields +(Appendix A.1.1) for loss recovery and congestion control.¶
+After a packet is declared lost, the endpoint can still maintain state for it +for an amount of time to allow for packet reordering; see Section 13.3 of +[QUIC-TRANSPORT]. This enables a sender to detect spurious retransmissions.¶
+Sent packets are tracked for each packet number space, and ACK +processing only applies to a single space.¶
+The packet number of the sent packet.¶
+A boolean that indicates whether a packet is ack-eliciting. +If true, it is expected that an acknowledgment will be received, +though the peer could delay sending the ACK frame containing it +by up to the max_ack_delay.¶
+A boolean that indicates whether the packet counts towards bytes in +flight.¶
+The number of bytes sent in the packet, not including UDP or IP +overhead, but including QUIC framing overhead.¶
+The time the packet was sent.¶
+Constants used in loss recovery are based on a combination of RFCs, papers, and +common practice.¶
+Maximum reordering in packets before packet threshold loss detection +considers a packet lost. The value recommended in Section 6.1.1 is 3.¶
+Maximum reordering in time before time threshold loss detection +considers a packet lost. Specified as an RTT multiplier. The value +recommended in Section 6.1.2 is 9/8.¶
+Timer granularity. This is a system-dependent value, and Section 6.1.2 +recommends a value of 1ms.¶
+The RTT used before an RTT sample is taken. The value recommended in +Section 6.2.2 is 333ms.¶
+An enum to enumerate the three packet number spaces.¶
++enum kPacketNumberSpace { + Initial, + Handshake, + ApplicationData, +} +¶ +
Variables required to implement the congestion control mechanisms +are described in this section.¶
+The most recent RTT measurement made when receiving an ack for +a previously unacked packet.¶
+The smoothed RTT of the connection, computed as described in +Section 5.3.¶
+The RTT variation, computed as described in Section 5.3.¶
+The minimum RTT seen over a period of time, ignoring acknowledgment delay, as +described in Section 5.2.¶
+The time that the first RTT sample was obtained.¶
+The maximum amount of time by which the receiver intends to delay +acknowledgments for packets in the Application Data packet number +space, as defined by the eponymous transport parameter (Section 18.2 +of [QUIC-TRANSPORT]). Note that the actual ack_delay in a received +ACK frame may be larger due to late timers, reordering, or loss.¶
+Multi-modal timer used for loss detection.¶
+The number of times a PTO has been sent without receiving an ack.¶
+The time the most recent ack-eliciting packet was sent.¶
+The largest packet number acknowledged in the packet number space so far.¶
+The time at which the next packet in that packet number space can be +considered lost based on exceeding the reordering window in time.¶
+An association of packet numbers in a packet number space to information +about them. Described in detail above in Appendix A.1.¶
+At the beginning of the connection, initialize the loss detection variables as +follows:¶
++loss_detection_timer.reset() +pto_count = 0 +latest_rtt = 0 +smoothed_rtt = kInitialRtt +rttvar = kInitialRtt / 2 +min_rtt = 0 +first_rtt_sample = 0 +for pn_space in [ Initial, Handshake, ApplicationData ]: + largest_acked_packet[pn_space] = infinite + time_of_last_ack_eliciting_packet[pn_space] = 0 + loss_time[pn_space] = 0 +¶ +
After a packet is sent, information about the packet is stored. The parameters +to OnPacketSent are described in detail above in Appendix A.1.1.¶
+Pseudocode for OnPacketSent follows:¶
++OnPacketSent(packet_number, pn_space, ack_eliciting, + in_flight, sent_bytes): + sent_packets[pn_space][packet_number].packet_number = + packet_number + sent_packets[pn_space][packet_number].time_sent = now() + sent_packets[pn_space][packet_number].ack_eliciting = + ack_eliciting + sent_packets[pn_space][packet_number].in_flight = in_flight + sent_packets[pn_space][packet_number].sent_bytes = sent_bytes + if (in_flight): + if (ack_eliciting): + time_of_last_ack_eliciting_packet[pn_space] = now() + OnPacketSentCC(sent_bytes) + SetLossDetectionTimer() +¶ +
When a server is blocked by anti-amplification limits, receiving +a datagram unblocks it, even if none of the packets in the +datagram are successfully processed. In such a case, the PTO +timer will need to be re-armed.¶
+Pseudocode for OnDatagramReceived follows:¶
++OnDatagramReceived(datagram): + // If this datagram unblocks the server, arm the + // PTO timer to avoid deadlock. + if (server was at anti-amplification limit): + SetLossDetectionTimer() +¶ +
When an ACK frame is received, it may newly acknowledge any number of packets.¶
+Pseudocode for OnAckReceived and UpdateRtt follow:¶
++IncludesAckEliciting(packets): + for packet in packets: + if (packet.ack_eliciting): + return true + return false + +OnAckReceived(ack, pn_space): + if (largest_acked_packet[pn_space] == infinite): + largest_acked_packet[pn_space] = ack.largest_acked + else: + largest_acked_packet[pn_space] = + max(largest_acked_packet[pn_space], ack.largest_acked) + + // DetectAndRemoveAckedPackets finds packets that are newly + // acknowledged and removes them from sent_packets. + newly_acked_packets = + DetectAndRemoveAckedPackets(ack, pn_space) + // Nothing to do if there are no newly acked packets. + if (newly_acked_packets.empty()): + return + + // Update the RTT if the largest acknowledged is newly acked + // and at least one ack-eliciting was newly acked. + if (newly_acked_packets.largest().packet_number == + ack.largest_acked && + IncludesAckEliciting(newly_acked_packets)): + latest_rtt = + now() - newly_acked_packets.largest().time_sent + UpdateRtt(ack.ack_delay) + + // Process ECN information if present. + if (ACK frame contains ECN information): + ProcessECN(ack, pn_space) + + lost_packets = DetectAndRemoveLostPackets(pn_space) + if (!lost_packets.empty()): + OnPacketsLost(lost_packets) + OnPacketsAcked(newly_acked_packets) + + // Reset pto_count unless the client is unsure if + // the server has validated the client's address. + if (PeerCompletedAddressValidation()): + pto_count = 0 + SetLossDetectionTimer() + + +UpdateRtt(ack_delay): + if (first_rtt_sample == 0): + min_rtt = latest_rtt + smoothed_rtt = latest_rtt + rttvar = latest_rtt / 2 + first_rtt_sample = now() + return + + // min_rtt ignores acknowledgment delay. + min_rtt = min(min_rtt, latest_rtt) + // Limit ack_delay by max_ack_delay after handshake + // confirmation. + if (handshake confirmed): + ack_delay = min(ack_delay, max_ack_delay) + + // Adjust for acknowledgment delay if plausible. + adjusted_rtt = latest_rtt + if (latest_rtt > min_rtt + ack_delay): + adjusted_rtt = latest_rtt - ack_delay + + rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) + smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt +¶ +
QUIC loss detection uses a single timer for all timeout loss detection. The +duration of the timer is based on the timer's mode, which is set in the packet +and timer events further below. The function SetLossDetectionTimer defined +below shows how the single timer is set.¶
+This algorithm may result in the timer being set in the past, particularly if +timers wake up late. Timers set in the past fire immediately.¶
+Pseudocode for SetLossDetectionTimer follows (where the "^" operator represents +exponentiation):¶
++GetLossTimeAndSpace(): + time = loss_time[Initial] + space = Initial + for pn_space in [ Handshake, ApplicationData ]: + if (time == 0 || loss_time[pn_space] < time): + time = loss_time[pn_space]; + space = pn_space + return time, space + +GetPtoTimeAndSpace(): + duration = (smoothed_rtt + max(4 * rttvar, kGranularity)) + * (2 ^ pto_count) + // Arm PTO from now when there are no inflight packets. + if (no in-flight packets): + assert(!PeerCompletedAddressValidation()) + if (has handshake keys): + return (now() + duration), Handshake + else: + return (now() + duration), Initial + pto_timeout = infinite + pto_space = Initial + for space in [ Initial, Handshake, ApplicationData ]: + if (no in-flight packets in space): + continue; + if (space == ApplicationData): + // Skip Application Data until handshake confirmed. + if (handshake is not confirmed): + return pto_timeout, pto_space + // Include max_ack_delay and backoff for Application Data. + duration += max_ack_delay * (2 ^ pto_count) + + t = time_of_last_ack_eliciting_packet[space] + duration + if (t < pto_timeout): + pto_timeout = t + pto_space = space + return pto_timeout, pto_space + +PeerCompletedAddressValidation(): + // Assume clients validate the server's address implicitly. + if (endpoint is server): + return true + // Servers complete address validation when a + // protected packet is received. + return has received Handshake ACK || + handshake confirmed + +SetLossDetectionTimer(): + earliest_loss_time, _ = GetLossTimeAndSpace() + if (earliest_loss_time != 0): + // Time threshold loss detection. + loss_detection_timer.update(earliest_loss_time) + return + + if (server is at anti-amplification limit): + // The server's timer is not set if nothing can be sent. + loss_detection_timer.cancel() + return + + if (no ack-eliciting packets in flight && + PeerCompletedAddressValidation()): + // There is nothing to detect lost, so no timer is set. + // However, the client needs to arm the timer if the + // server might be blocked by the anti-amplification limit. + loss_detection_timer.cancel() + return + + timeout, _ = GetPtoTimeAndSpace() + loss_detection_timer.update(timeout) +¶ +
When the loss detection timer expires, the timer's mode determines the action +to be performed.¶
+Pseudocode for OnLossDetectionTimeout follows:¶
++OnLossDetectionTimeout(): + earliest_loss_time, pn_space = GetLossTimeAndSpace() + if (earliest_loss_time != 0): + // Time threshold loss Detection + lost_packets = DetectAndRemoveLostPackets(pn_space) + assert(!lost_packets.empty()) + OnPacketsLost(lost_packets) + SetLossDetectionTimer() + return + + if (bytes_in_flight > 0): + // PTO. Send new data if available, else retransmit old data. + // If neither is available, send a single PING frame. + _, pn_space = GetPtoTimeAndSpace() + SendOneOrTwoAckElicitingPackets(pn_space) + else: + assert(!PeerCompletedAddressValidation()) + // Client sends an anti-deadlock packet: Initial is padded + // to earn more anti-amplification credit, + // a Handshake packet proves address ownership. + if (has Handshake keys): + SendOneAckElicitingHandshakePacket() + else: + SendOneAckElicitingPaddedInitialPacket() + + pto_count++ + SetLossDetectionTimer() +¶ +
DetectAndRemoveLostPackets is called every time an ACK is received or the time +threshold loss detection timer expires. This function operates on the +sent_packets for that packet number space and returns a list of packets newly +detected as lost.¶
+Pseudocode for DetectAndRemoveLostPackets follows:¶
++DetectAndRemoveLostPackets(pn_space): + assert(largest_acked_packet[pn_space] != infinite) + loss_time[pn_space] = 0 + lost_packets = [] + loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) + + // Minimum time of kGranularity before packets are deemed lost. + loss_delay = max(loss_delay, kGranularity) + + // Packets sent before this time are deemed lost. + lost_send_time = now() - loss_delay + + foreach unacked in sent_packets[pn_space]: + if (unacked.packet_number > largest_acked_packet[pn_space]): + continue + + // Mark packet as lost, or set time when it should be marked. + // Note: The use of kPacketThreshold here assumes that there + // were no sender-induced gaps in the packet number space. + if (unacked.time_sent <= lost_send_time || + largest_acked_packet[pn_space] >= + unacked.packet_number + kPacketThreshold): + sent_packets[pn_space].remove(unacked.packet_number) + lost_packets.insert(unacked) + else: + if (loss_time[pn_space] == 0): + loss_time[pn_space] = unacked.time_sent + loss_delay + else: + loss_time[pn_space] = min(loss_time[pn_space], + unacked.time_sent + loss_delay) + return lost_packets +¶ +
When Initial or Handshake keys are discarded, packets from the space +are discarded and loss detection state is updated.¶
+Pseudocode for OnPacketNumberSpaceDiscarded follows:¶
++OnPacketNumberSpaceDiscarded(pn_space): + assert(pn_space != ApplicationData) + RemoveFromBytesInFlight(sent_packets[pn_space]) + sent_packets[pn_space].clear() + // Reset the loss detection and PTO timer + time_of_last_ack_eliciting_packet[pn_space] = 0 + loss_time[pn_space] = 0 + pto_count = 0 + SetLossDetectionTimer() +¶ +
We now describe an example implementation of the congestion controller described +in Section 7.¶
+The pseudocode segments in this section are licensed as Code Components; see the +copyright notice.¶
+Constants used in congestion control are based on a combination of RFCs, papers, +and common practice.¶
+Default limit on the initial bytes in flight as described in Section 7.2.¶
+Minimum congestion window in bytes as described in Section 7.2.¶
+Scaling factor applied to reduce the congestion window when a new loss event +is detected. Section 7 recommends a value is 0.5.¶
+Period of time for persistent congestion to be established, specified as a PTO +multiplier. Section 7.6 recommends a value of 3.¶
+Variables required to implement the congestion control mechanisms +are described in this section.¶
+The sender's current maximum payload size. Does not include UDP or IP +overhead. The max datagram size is used for congestion window +computations. An endpoint sets the value of this variable based on its Path +Maximum Transmission Unit (PMTU; see Section 14.2 of [QUIC-TRANSPORT]), with +a minimum value of 1200 bytes.¶
+The highest value reported for the ECN-CE counter in the packet number space +by the peer in an ACK frame. This value is used to detect increases in the +reported ECN-CE counter.¶
+The sum of the size in bytes of all sent packets that contain at least one +ack-eliciting or PADDING frame, and have not been acknowledged or declared +lost. The size does not include IP or UDP overhead, but does include the QUIC +header and AEAD overhead. Packets only containing ACK frames do not count +towards bytes_in_flight to ensure congestion control does not impede +congestion feedback.¶
+Maximum number of bytes allowed to be in flight.¶
+The time the current recovery period started due to the detection of loss +or ECN. When a packet sent after this time is acknowledged, QUIC exits +congestion recovery.¶
+Slow start threshold in bytes. When the congestion window is below ssthresh, +the mode is slow start and the window grows by the number of bytes +acknowledged.¶
+The congestion control pseudocode also accesses some of the variables from the +loss recovery pseudocode.¶
+At the beginning of the connection, initialize the congestion control +variables as follows:¶
++congestion_window = kInitialWindow +bytes_in_flight = 0 +congestion_recovery_start_time = 0 +ssthresh = infinite +for pn_space in [ Initial, Handshake, ApplicationData ]: + ecn_ce_counters[pn_space] = 0 +¶ +
Whenever a packet is sent, and it contains non-ACK frames, the packet +increases bytes_in_flight.¶
++OnPacketSentCC(sent_bytes): + bytes_in_flight += sent_bytes +¶ +
Invoked from loss detection's OnAckReceived and is supplied with the +newly acked_packets from sent_packets.¶
+In congestion avoidance, implementers that use an integer representation +for congestion_window should be careful with division, and can use +the alternative approach suggested in Section 2.1 of [RFC3465].¶
++InCongestionRecovery(sent_time): + return sent_time <= congestion_recovery_start_time + +OnPacketsAcked(acked_packets): + for acked_packet in acked_packets: + OnPacketAcked(acked_packet) + +OnPacketAcked(acked_packet): + if (!acked_packet.in_flight): + return; + // Remove from bytes_in_flight. + bytes_in_flight -= acked_packet.sent_bytes + // Do not increase congestion_window if application + // limited or flow control limited. + if (IsAppOrFlowControlLimited()) + return + // Do not increase congestion window in recovery period. + if (InCongestionRecovery(acked_packet.time_sent)): + return + if (congestion_window < ssthresh): + // Slow start. + congestion_window += acked_packet.sent_bytes + else: + // Congestion avoidance. + congestion_window += + max_datagram_size * acked_packet.sent_bytes + / congestion_window +¶ +
Invoked from ProcessECN and OnPacketsLost when a new congestion event is +detected. If not already in recovery, this starts a recovery period and +reduces the slow start threshold and congestion window immediately.¶
++OnCongestionEvent(sent_time): + // No reaction if already in a recovery period. + if (InCongestionRecovery(sent_time)): + return + + // Enter recovery period. + congestion_recovery_start_time = now() + ssthresh = congestion_window * kLossReductionFactor + congestion_window = max(ssthresh, kMinimumWindow) + // A packet can be sent to speed up loss recovery. + MaybeSendOnePacket() +¶ +
Invoked when an ACK frame with an ECN section is received from the peer.¶
++ProcessECN(ack, pn_space): + // If the ECN-CE counter reported by the peer has increased, + // this could be a new congestion event. + if (ack.ce_counter > ecn_ce_counters[pn_space]): + ecn_ce_counters[pn_space] = ack.ce_counter + sent_time = sent_packets[ack.largest_acked].time_sent + OnCongestionEvent(sent_time) +¶ +
Invoked when DetectAndRemoveLostPackets deems packets lost.¶
++OnPacketsLost(lost_packets): + sent_time_of_last_loss = 0 + // Remove lost packets from bytes_in_flight. + for lost_packet in lost_packets: + if lost_packet.in_flight: + bytes_in_flight -= lost_packet.sent_bytes + sent_time_of_last_loss = + max(sent_time_of_last_loss, lost_packet.time_sent) + // Congestion event if in-flight packets were lost + if (sent_time_of_last_loss != 0): + OnCongestionEvent(sent_time_of_last_loss) + + // Reset the congestion window if the loss of these + // packets indicates persistent congestion. + // Only consider packets sent after getting an RTT sample. + if (first_rtt_sample == 0): + return + pc_lost = [] + for lost in lost_packets: + if lost.time_sent > first_rtt_sample: + pc_lost.insert(lost) + if (InPersistentCongestion(pc_lost)): + congestion_window = kMinimumWindow + congestion_recovery_start_time = 0 +¶ +
When Initial or Handshake keys are discarded, packets sent in that space no +longer count toward bytes in flight.¶
+Pseudocode for RemoveFromBytesInFlight follows:¶
++RemoveFromBytesInFlight(discarded_packets): + // Remove any unacknowledged packets from flight. + foreach packet in discarded_packets: + if packet.in_flight + bytes_in_flight -= size +¶ +
Issue and pull request numbers are listed with a leading octothorp.¶
+Editorial changes only.¶
+No changes.¶
+No significant changes.¶
+No significant changes.¶
+No significant changes.¶
+No significant changes.¶
+No significant changes.¶
+No significant changes.¶
+The IETF QUIC Working Group received an enormous amount of support from many +people. The following people provided substantive contributions to this +document:¶
+ +Internet-Draft | +Using TLS to Secure QUIC | +January 2021 | +
Thomson & Turner | +Expires 19 July 2021 | +[Page] | +
This document describes how Transport Layer Security (TLS) is used to secure +QUIC.¶
+Discussion of this draft takes place on the QUIC working group mailing list +(quic@ietf.org), which is archived at +https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
+Working Group information can be found at https://github.com/quicwg; source +code and issues list for this draft can be found at +https://github.com/quicwg/base-drafts/labels/-tls.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 19 July 2021.¶
++ Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Simplified BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Simplified BSD License.¶
+This document describes how QUIC [QUIC-TRANSPORT] is secured using TLS +[TLS13].¶
+TLS 1.3 provides critical latency improvements for connection establishment over +previous versions. Absent packet loss, most new connections can be established +and secured within a single round trip; on subsequent connections between the +same client and server, the client can often send application data immediately, +that is, using a zero round trip setup.¶
+This document describes how TLS acts as a security component of QUIC.¶
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", +"MAY", and "OPTIONAL" in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, and only when, they +appear in all capitals, as shown here.¶
+This document uses the terminology established in [QUIC-TRANSPORT].¶
+For brevity, the acronym TLS is used to refer to TLS 1.3, though a newer version +could be used; see Section 4.2.¶
+TLS provides two endpoints with a way to establish a means of communication over +an untrusted medium (for example, the Internet). TLS enables authentication of +peers and provides confidentiality and integrity protection for messages that +endpoints exchange.¶
+Internally, TLS is a layered protocol, with the structure shown in +Figure 1.¶
+Each Content layer message (e.g., Handshake, Alerts, and Application Data) is +carried as a series of typed TLS records by the Record layer. Records are +individually cryptographically protected and then transmitted over a reliable +transport (typically TCP), which provides sequencing and guaranteed delivery.¶
+The TLS authenticated key exchange occurs between two endpoints: client and +server. The client initiates the exchange and the server responds. If the key +exchange completes successfully, both client and server will agree on a secret. +TLS supports both pre-shared key (PSK) and Diffie-Hellman over either finite +fields or elliptic curves ((EC)DHE) key exchanges. PSK is the basis for Early +Data (0-RTT); the latter provides forward secrecy (FS) when the (EC)DHE +keys are destroyed. The two modes can also be combined, to provide forward +secrecy while using the PSK for authentication.¶
+After completing the TLS handshake, the client will have learned and +authenticated an identity for the server and the server is optionally able to +learn and authenticate an identity for the client. TLS supports X.509 +[RFC5280] certificate-based authentication for both server and client. +When PSK key exchange is used (as in resumption), knowledge of the PSK +serves to authenticate the peer.¶
+The TLS key exchange is resistant to tampering by attackers and it produces +shared secrets that cannot be controlled by either participating peer.¶
+TLS provides two basic handshake modes of interest to QUIC:¶
+A simplified TLS handshake with 0-RTT application data is shown in Figure 2.¶
+Figure 2 omits the EndOfEarlyData message, which is not used in QUIC; see +Section 8.3. Likewise, neither ChangeCipherSpec nor KeyUpdate messages are +used by QUIC. ChangeCipherSpec is redundant in TLS 1.3; see Section 8.4. +QUIC has its own key update mechanism; see Section 6.¶
+Data is protected using a number of encryption levels:¶
+Application Data may appear only in the Early Data and Application Data +levels. Handshake and Alert messages may appear in any level.¶
+The 0-RTT handshake can be used if the client and server have previously +communicated. In the 1-RTT handshake, the client is unable to send protected +Application Data until it has received all of the Handshake messages sent by the +server.¶
+QUIC [QUIC-TRANSPORT] assumes responsibility for the confidentiality and +integrity protection of packets. For this it uses keys derived from a TLS +handshake [TLS13], but instead of carrying TLS records over QUIC (as with +TCP), TLS Handshake and Alert messages are carried directly over the QUIC +transport, which takes over the responsibilities of the TLS record layer, as +shown in Figure 3.¶
+QUIC also relies on TLS for authentication and negotiation of parameters that +are critical to security and performance.¶
+Rather than a strict layering, these two protocols cooperate: QUIC uses the TLS +handshake; TLS uses the reliability, ordered delivery, and record layer provided +by QUIC.¶
+At a high level, there are two main interactions between the TLS and QUIC +components:¶
+Figure 4 shows these interactions in more detail, with the QUIC packet +protection being called out specially.¶
+Unlike TLS over TCP, QUIC applications that want to send data do not send it +through TLS "application_data" records. Rather, they send it as QUIC STREAM +frames or other frame types, which are then carried in QUIC packets.¶
+QUIC carries TLS handshake data in CRYPTO frames, each of which consists of a +contiguous block of handshake data identified by an offset and length. Those +frames are packaged into QUIC packets and encrypted under the current +encryption level. As with TLS over TCP, once TLS handshake data has been +delivered to QUIC, it is QUIC's responsibility to deliver it reliably. Each +chunk of data that is produced by TLS is associated with the set of keys that +TLS is currently using. If QUIC needs to retransmit that data, it MUST use the +same keys even if TLS has already updated to newer keys.¶
+Each encryption level corresponds to a packet number space. The packet number +space that is used determines the semantics of frames. Some frames are +prohibited in different packet number spaces; see Section 12.5 of +[QUIC-TRANSPORT].¶
+Because packets could be reordered on the wire, QUIC uses the packet type to +indicate which keys were used to protect a given packet, as shown in +Table 1. When packets of different types need to be sent, +endpoints SHOULD use coalesced packets to send them in the same UDP datagram.¶
+Packet Type | +Encryption Keys | +PN Space | +
---|---|---|
Initial | +Initial secrets | +Initial | +
0-RTT Protected | +0-RTT | +Application data | +
Handshake | +Handshake | +Handshake | +
Retry | +Retry | +N/A | +
Version Negotiation | +N/A | +N/A | +
Short Header | +1-RTT | +Application data | +
Section 17 of [QUIC-TRANSPORT] shows how packets at the various encryption +levels fit into the handshake process.¶
+As shown in Figure 4, the interface from QUIC to TLS consists of four +primary functions:¶
+Additional functions might be needed to configure TLS. In particular, QUIC and +TLS need to agree on which is responsible for validation of peer credentials, +such as certificate validation ([RFC5280]).¶
+In this document, the TLS handshake is considered complete when the TLS stack +has reported that the handshake is complete. This happens when the TLS stack +has both sent a Finished message and verified the peer's Finished message. +Verifying the peer's Finished provides the endpoints with an assurance that +previous handshake messages have not been modified. Note that the handshake +does not complete at both endpoints simultaneously. Consequently, any +requirement that is based on the completion of the handshake depends on the +perspective of the endpoint in question.¶
+In this document, the TLS handshake is considered confirmed at the server when +the handshake completes. The server MUST send a HANDSHAKE_DONE frame as soon as +the handshake is complete. At the client, the handshake is considered confirmed +when a HANDSHAKE_DONE frame is received.¶
+Additionally, a client MAY consider the handshake to be confirmed when it +receives an acknowledgment for a 1-RTT packet. This can be implemented by +recording the lowest packet number sent with 1-RTT keys, and comparing it to the +Largest Acknowledged field in any received 1-RTT ACK frame: once the latter is +greater than or equal to the former, the handshake is confirmed.¶
+In order to drive the handshake, TLS depends on being able to send and receive +handshake messages. There are two basic functions on this interface: one where +QUIC requests handshake messages and one where QUIC provides bytes that comprise +handshake messages.¶
+Before starting the handshake QUIC provides TLS with the transport parameters +(see Section 8.2) that it wishes to carry.¶
+A QUIC client starts TLS by requesting TLS handshake bytes from TLS. The client +acquires handshake bytes before sending its first packet. A QUIC server starts +the process by providing TLS with the client's handshake bytes.¶
+At any time, the TLS stack at an endpoint will have a current sending +encryption level and receiving encryption level. TLS encryption levels determine +the QUIC packet type and keys that are used for protecting data.¶
+Each encryption level is associated with a different sequence of bytes, which is +reliably transmitted to the peer in CRYPTO frames. When TLS provides handshake +bytes to be sent, they are appended to the handshake bytes for the current +encryption level. The encryption level then determines the type of packet that +the resulting CRYPTO frame is carried in; see Table 1.¶
+Four encryption levels are used, producing keys for Initial, 0-RTT, Handshake, +and 1-RTT packets. CRYPTO frames are carried in just three of these levels, +omitting the 0-RTT level. These four levels correspond to three packet number +spaces: Initial and Handshake encrypted packets use their own separate spaces; +0-RTT and 1-RTT packets use the application data packet number space.¶
+QUIC takes the unprotected content of TLS handshake records as the content of +CRYPTO frames. TLS record protection is not used by QUIC. QUIC assembles +CRYPTO frames into QUIC packets, which are protected using QUIC packet +protection.¶
+QUIC CRYPTO frames only carry TLS handshake messages. TLS +alerts are turned into QUIC CONNECTION_CLOSE error codes; see Section 4.8. +TLS application data and other content types cannot be carried by QUIC at any +encryption level; it is an error if they are received from the TLS stack.¶
+When an endpoint receives a QUIC packet containing a CRYPTO frame from the +network, it proceeds as follows:¶
+Each time that TLS is provided with new data, new handshake bytes are requested +from TLS. TLS might not provide any bytes if the handshake messages it has +received are incomplete or it has no data to send.¶
+The content of CRYPTO frames might either be processed incrementally by TLS or +buffered until complete messages or flights are available. TLS is responsible +for buffering handshake bytes that have arrived in order. QUIC is responsible +for buffering handshake bytes that arrive out of order or for encryption levels +that are not yet ready. QUIC does not provide any means of flow control for +CRYPTO frames; see Section 7.5 of [QUIC-TRANSPORT].¶
+Once the TLS handshake is complete, this is indicated to QUIC along with any +final handshake bytes that TLS needs to send. At this stage, the transport +parameters that the peer advertised during the handshake are authenticated; +see Section 8.2.¶
+Once the handshake is complete, TLS becomes passive. TLS can still receive data +from its peer and respond in kind, but it will not need to send more data unless +specifically requested - either by an application or QUIC. One reason to send +data is that the server might wish to provide additional or updated session +tickets to a client.¶
+When the handshake is complete, QUIC only needs to provide TLS with any data +that arrives in CRYPTO streams. In the same manner that is used during the +handshake, new data is requested from TLS after providing received data.¶
+As keys at a given encryption level become available to TLS, TLS indicates to +QUIC that reading or writing keys at that encryption level are available.¶
+The availability of new keys is always a result of providing inputs to TLS. TLS +only provides new keys after being initialized (by a client) or when provided +with new handshake data.¶
+However, a TLS implementation could perform some of its processing +asynchronously. In particular, the process of validating a certificate can take +some time. While waiting for TLS processing to complete, an endpoint SHOULD +buffer received packets if they might be processed using keys that aren't yet +available. These packets can be processed once keys are provided by TLS. An +endpoint SHOULD continue to respond to packets that can be processed during this +time.¶
+After processing inputs, TLS might produce handshake bytes, keys for new +encryption levels, or both.¶
+TLS provides QUIC with three items as a new encryption level becomes available:¶
+These values are based on the values that TLS negotiates and are used by QUIC to +generate packet and header protection keys; see Section 5 and +Section 5.4.¶
+If 0-RTT is possible, it is ready after the client sends a TLS ClientHello +message or the server receives that message. After providing a QUIC client with +the first handshake bytes, the TLS stack might signal the change to 0-RTT +keys. On the server, after receiving handshake bytes that contain a ClientHello +message, a TLS server might signal that 0-RTT keys are available.¶
+Although TLS only uses one encryption level at a time, QUIC may use more than +one level. For instance, after sending its Finished message (using a CRYPTO +frame at the Handshake encryption level) an endpoint can send STREAM data (in +1-RTT encryption). If the Finished message is lost, the endpoint uses the +Handshake encryption level to retransmit the lost message. Reordering or loss +of packets can mean that QUIC will need to handle packets at multiple encryption +levels. During the handshake, this means potentially handling packets at higher +and lower encryption levels than the current encryption level used by TLS.¶
+In particular, server implementations need to be able to read packets at the +Handshake encryption level at the same time as the 0-RTT encryption level. A +client could interleave ACK frames that are protected with Handshake keys with +0-RTT data and the server needs to process those acknowledgments in order to +detect lost Handshake packets.¶
+QUIC also needs access to keys that might not ordinarily be available to a TLS +implementation. For instance, a client might need to acknowledge Handshake +packets before it is ready to send CRYPTO frames at that encryption level. TLS +therefore needs to provide keys to QUIC before it might produce them for its own +use.¶
+Figure 5 summarizes the exchange between QUIC and TLS for both +client and server. Solid arrows indicate packets that carry handshake data; +dashed arrows show where application data can be sent. Each arrow is tagged +with the encryption level used for that transmission.¶
+Figure 5 shows the multiple packets that form a single "flight" of +messages being processed individually, to show what incoming messages trigger +different actions. This shows multiple "Get Handshake" invocations to retrieve +handshake messages at different encryption levels. New handshake messages are +requested after incoming packets have been processed.¶
+Figure 5 shows one possible structure for a simple handshake +exchange. The exact process varies based on the structure of endpoint +implementations and the order in which packets arrive. Implementations could +use a different number of operations or execute them in other orders.¶
+This document describes how TLS 1.3 [TLS13] is used with QUIC.¶
+In practice, the TLS handshake will negotiate a version of TLS to use. This +could result in a newer version of TLS than 1.3 being negotiated if both +endpoints support that version. This is acceptable provided that the features +of TLS 1.3 that are used by QUIC are supported by the newer version.¶
+Clients MUST NOT offer TLS versions older than 1.3. A badly configured TLS +implementation could negotiate TLS 1.2 or another older version of TLS. An +endpoint MUST terminate the connection if a version of TLS older than 1.3 is +negotiated.¶
+The first Initial packet from a client contains the start or all of its first +cryptographic handshake message, which for TLS is the ClientHello. Servers +might need to parse the entire ClientHello (e.g., to access extensions such as +Server Name Identification (SNI) or Application Layer Protocol Negotiation +(ALPN)) in order to decide whether to accept the new incoming QUIC connection. +If the ClientHello spans multiple Initial packets, such servers would need to +buffer the first received fragments, which could consume excessive resources if +the client's address has not yet been validated. To avoid this, servers MAY +use the Retry feature (see Section 8.1 of [QUIC-TRANSPORT]) to only buffer +partial ClientHello messages from clients with a validated address.¶
+QUIC packet and framing add at least 36 bytes of overhead to the ClientHello +message. That overhead increases if the client chooses a source connection ID +longer than zero bytes. Overheads also do not include the token or a +destination connection ID longer than 8 bytes, both of which might be required +if a server sends a Retry packet.¶
+A typical TLS ClientHello can easily fit into a 1200-byte packet. However, in +addition to the overheads added by QUIC, there are several variables that could +cause this limit to be exceeded. Large session tickets, multiple or large key +shares, and long lists of supported ciphers, signature algorithms, versions, +QUIC transport parameters, and other negotiable parameters and extensions could +cause this message to grow.¶
+For servers, in addition to connection IDs and tokens, the size of TLS session +tickets can have an effect on a client's ability to connect efficiently. +Minimizing the size of these values increases the probability that clients can +use them and still fit their entire ClientHello message in their first Initial +packet.¶
+The TLS implementation does not need to ensure that the ClientHello is large +enough to meet the requirements for QUIC packets. QUIC PADDING frames are added +to increase the size of the packet as necessary; see Section 14.1 of +[QUIC-TRANSPORT].¶
+The requirements for authentication depend on the application protocol that is +in use. TLS provides server authentication and permits the server to request +client authentication.¶
+A client MUST authenticate the identity of the server. This typically involves +verification that the identity of the server is included in a certificate and +that the certificate is issued by a trusted entity (see for example +[RFC2818]).¶
+Where servers provide certificates for authentication, the size of +the certificate chain can consume a large number of bytes. Controlling the +size of certificate chains is critical to performance in QUIC as servers are +limited to sending 3 bytes for every byte received prior to validating the +client address; see Section 8.1 of [QUIC-TRANSPORT]. The size of a +certificate chain can be managed by limiting the number of names or +extensions; using keys with small public key representations, like ECDSA; or +by using certificate compression +[COMPRESS].¶
+A server MAY request that the client authenticate during the handshake. A server +MAY refuse a connection if the client is unable to authenticate when requested. +The requirements for client authentication vary based on application protocol +and deployment.¶
+A server MUST NOT use post-handshake client authentication (as defined in +Section 4.6.2 of [TLS13]), because the multiplexing offered by QUIC prevents +clients from correlating the certificate request with the application-level +event that triggered it (see [HTTP2-TLS13]). +More specifically, servers MUST NOT send post-handshake TLS CertificateRequest +messages and clients MUST treat receipt of such messages as a connection error +of type PROTOCOL_VIOLATION.¶
+QUIC can use the session resumption feature of TLS 1.3. It does this by +carrying NewSessionTicket messages in CRYPTO frames after the handshake is +complete. Session resumption can be used to provide 0-RTT, and can also be +used when 0-RTT is disabled.¶
+Endpoints that use session resumption might need to remember some information +about the current connection when creating a resumed connection. TLS requires +that some information be retained; see Section 4.6.1 of [TLS13]. QUIC itself +does not depend on any state being retained when resuming a connection, unless +0-RTT is also used; see Section 7.4.1 of [QUIC-TRANSPORT] and +Section 4.6.1. Application protocols could depend on state that is retained +between resumed connections.¶
+Clients can store any state required for resumption along with the session +ticket. Servers can use the session ticket to help carry state.¶
+Session resumption allows servers to link activity on the original connection +with the resumed connection, which might be a privacy issue for clients. +Clients can choose not to enable resumption to avoid creating this correlation. +Clients SHOULD NOT reuse tickets as that allows entities other than the server +to correlate connections; see Section C.4 of [TLS13].¶
+The 0-RTT feature in QUIC allows a client to send application data before the +handshake is complete. This is made possible by reusing negotiated parameters +from a previous connection. To enable this, 0-RTT depends on the client +remembering critical parameters and providing the server with a TLS session +ticket that allows the server to recover the same information.¶
+This information includes parameters that determine TLS state, as governed by +[TLS13], QUIC transport parameters, the chosen application protocol, and any +information the application protocol might need; see Section 4.6.3. This +information determines how 0-RTT packets and their contents are formed.¶
+To ensure that the same information is available to both endpoints, all +information used to establish 0-RTT comes from the same connection. Endpoints +cannot selectively disregard information that might alter the sending or +processing of 0-RTT.¶
+[TLS13] sets a limit of 7 days on the time between the original connection +and any attempt to use 0-RTT. There are other constraints on 0-RTT usage, +notably those caused by the potential exposure to replay attack; see Section 9.2.¶
+The TLS "early_data" extension in the NewSessionTicket message is defined +to convey (in the "max_early_data_size" parameter) the amount of TLS 0-RTT +data the server is willing to accept. QUIC does not use TLS 0-RTT data. +QUIC uses 0-RTT packets to carry early data. Accordingly, the +"max_early_data_size" parameter is repurposed to hold a sentinel value +0xffffffff to indicate that the server is willing to accept QUIC 0-RTT data; +to indicate that the server does not accept 0-RTT data, the "early_data" +extension is omitted from the NewSessionTicket. +The amount of data that the client can send in QUIC 0-RTT is +controlled by the initial_max_data transport parameter supplied by the server.¶
+Servers MUST NOT send the early_data extension with a max_early_data_size field +set to any value other than 0xffffffff. A client MUST treat receipt of a +NewSessionTicket that contains an early_data extension with any other value as +a connection error of type PROTOCOL_VIOLATION.¶
+A client that wishes to send 0-RTT packets uses the early_data extension in +the ClientHello message of a subsequent handshake; see Section 4.2.10 of +[TLS13]. It then sends application data in 0-RTT packets.¶
+A client that attempts 0-RTT might also provide an address validation token if +the server has sent a NEW_TOKEN frame; see Section 8.1 of [QUIC-TRANSPORT].¶
+A server accepts 0-RTT by sending an early_data extension in the +EncryptedExtensions; see Section 4.2.10 of [TLS13]. The server then +processes and acknowledges the 0-RTT packets that it receives.¶
+A server rejects 0-RTT by sending the EncryptedExtensions without an early_data +extension. A server will always reject 0-RTT if it sends a TLS +HelloRetryRequest. When rejecting 0-RTT, a server MUST NOT process any 0-RTT +packets, even if it could. When 0-RTT was rejected, a client SHOULD treat +receipt of an acknowledgment for a 0-RTT packet as a connection error of type +PROTOCOL_VIOLATION, if it is able to detect the condition.¶
+When 0-RTT is rejected, all connection characteristics that the client assumed +might be incorrect. This includes the choice of application protocol, transport +parameters, and any application configuration. The client therefore MUST reset +the state of all streams, including application state bound to those streams.¶
+A client MAY reattempt 0-RTT if it receives a Retry or Version Negotiation +packet. These packets do not signify rejection of 0-RTT.¶
+When a server receives a ClientHello with the early_data extension, it has to +decide whether to accept or reject early data from the client. Some of this +decision is made by the TLS stack (e.g., checking that the cipher suite being +resumed was included in the ClientHello; see Section 4.2.10 of [TLS13]). Even +when the TLS stack has no reason to reject early data, the QUIC stack or the +application protocol using QUIC might reject early data because the +configuration of the transport or application associated with the resumed +session is not compatible with the server's current configuration.¶
+QUIC requires additional transport state to be associated with a 0-RTT session +ticket. One common way to implement this is using stateless session tickets and +storing this state in the session ticket. Application protocols that use QUIC +might have similar requirements regarding associating or storing state. This +associated state is used for deciding whether early data must be rejected. For +example, HTTP/3 ([QUIC-HTTP]) settings determine how early data from the +client is interpreted. Other applications using QUIC could have different +requirements for determining whether to accept or reject early data.¶
+The HelloRetryRequest message (see Section 4.1.4 of [TLS13]) can be used to +request that a client provide new information, such as a key share, or to +validate some characteristic of the client. From the perspective of QUIC, +HelloRetryRequest is not differentiated from other cryptographic handshake +messages that are carried in Initial packets. Although it is in principle +possible to use this feature for address verification, QUIC implementations +SHOULD instead use the Retry feature; see Section 8.1 of [QUIC-TRANSPORT].¶
+If TLS experiences an error, it generates an appropriate alert as defined in +Section 6 of [TLS13].¶
+A TLS alert is converted into a QUIC connection error. The AlertDescription +value is +added to 0x100 to produce a QUIC error code from the range reserved for +CRYPTO_ERROR. The resulting value is sent in a QUIC CONNECTION_CLOSE frame of +type 0x1c.¶
+QUIC is only able to convey an alert level of "fatal". In TLS 1.3, the only +existing uses for the "warning" level are to signal connection close; see +Section 6.1 of [TLS13]. As QUIC provides alternative mechanisms for +connection termination and the TLS connection is only closed if an error is +encountered, a QUIC endpoint MUST treat any alert from TLS as if it were at the +"fatal" level.¶
+QUIC permits the use of a generic code in place of a specific error code; see +Section 11 of [QUIC-TRANSPORT]. For TLS alerts, this includes replacing any +alert with a generic alert, such as handshake_failure (0x128 in QUIC). +Endpoints MAY use a generic error code to avoid possibly exposing confidential +information.¶
+After QUIC has completed a move to a new encryption level, packet protection +keys for previous encryption levels can be discarded. This occurs several times +during the handshake, as well as when keys are updated; see Section 6.¶
+Packet protection keys are not discarded immediately when new keys are +available. If packets from a lower encryption level contain CRYPTO frames, +frames that retransmit that data MUST be sent at the same encryption level. +Similarly, an endpoint generates acknowledgments for packets at the same +encryption level as the packet being acknowledged. Thus, it is possible that +keys for a lower encryption level are needed for a short time after keys for a +newer encryption level are available.¶
+An endpoint cannot discard keys for a given encryption level unless it has +received all the cryptographic handshake messages from its peer at that +encryption level and its peer has done the same. Different methods for +determining this are provided for Initial keys (Section 4.9.1) and +Handshake keys (Section 4.9.2). These methods do not prevent packets +from being received or sent at that encryption level because a peer might not +have received all the acknowledgments necessary.¶
+Though an endpoint might retain older keys, new data MUST be sent at the highest +currently-available encryption level. Only ACK frames and retransmissions of +data in CRYPTO frames are sent at a previous encryption level. These packets +MAY also include PADDING frames.¶
+Packets protected with Initial secrets (Section 5.2) are not +authenticated, meaning that an attacker could spoof packets with the intent to +disrupt a connection. To limit these attacks, Initial packet protection keys +are discarded more aggressively than other keys.¶
+The successful use of Handshake packets indicates that no more Initial packets +need to be exchanged, as these keys can only be produced after receiving all +CRYPTO frames from Initial packets. Thus, a client MUST discard Initial keys +when it first sends a Handshake packet and a server MUST discard Initial keys +when it first successfully processes a Handshake packet. Endpoints MUST NOT +send Initial packets after this point.¶
+This results in abandoning loss recovery state for the Initial encryption level +and ignoring any outstanding Initial packets.¶
+An endpoint MUST discard its handshake keys when the TLS handshake is confirmed +(Section 4.1.2).¶
+0-RTT and 1-RTT packets share the same packet number space, and clients do not +send 0-RTT packets after sending a 1-RTT packet (Section 5.6).¶
+Therefore, a client SHOULD discard 0-RTT keys as soon as it installs 1-RTT +keys, since they have no use after that moment.¶
+Additionally, a server MAY discard 0-RTT keys as soon as it receives a 1-RTT +packet. However, due to packet reordering, a 0-RTT packet could arrive after +a 1-RTT packet. Servers MAY temporarily retain 0-RTT keys to allow decrypting +reordered packets without requiring their contents to be retransmitted with +1-RTT keys. After receiving a 1-RTT packet, servers MUST discard 0-RTT keys +within a short time; the RECOMMENDED time period is three times the Probe +Timeout (PTO, see [QUIC-RECOVERY]). A server MAY discard 0-RTT keys earlier +if it determines that it has received all 0-RTT packets, which can be done by +keeping track of missing packet numbers.¶
+As with TLS over TCP, QUIC protects packets with keys derived from the TLS +handshake, using the AEAD algorithm [AEAD] negotiated by TLS.¶
+QUIC packets have varying protections depending on their type:¶
+This section describes how packet protection is applied to Handshake packets, +0-RTT packets, and 1-RTT packets. The same packet protection process is applied +to Initial packets. However, as it is trivial to determine the keys used for +Initial packets, these packets are not considered to have confidentiality or +integrity protection. Retry packets use a fixed key and so similarly lack +confidentiality and integrity protection.¶
+QUIC derives packet protection keys in the same way that TLS derives record +protection keys.¶
+Each encryption level has separate secret values for protection of packets sent +in each direction. These traffic secrets are derived by TLS (see Section 7.1 of +[TLS13]) and are used by QUIC for all encryption levels except the Initial +encryption level. The secrets for the Initial encryption level are computed +based on the client's initial Destination Connection ID, as described in +Section 5.2.¶
+The keys used for packet protection are computed from the TLS secrets using the +KDF provided by TLS. In TLS 1.3, the HKDF-Expand-Label function described in +Section 7.1 of [TLS13] is used, using the hash function from the negotiated +cipher suite. All uses of HKDF-Expand-Label in QUIC use a zero-length Context.¶
+Note that labels, which are described using strings, are encoded +as bytes using ASCII [ASCII] without quotes or any trailing NUL +byte.¶
+Other versions of TLS MUST provide a similar function in order to be +used with QUIC.¶
+The current encryption level secret and the label "quic key" are input to the +KDF to produce the AEAD key; the label "quic iv" is used to derive the +Initialization Vector (IV); see Section 5.3. The header protection key uses the +"quic hp" label; see Section 5.4. Using these labels provides key +separation between QUIC and TLS; see Section 9.6.¶
+Both "quic key" and "quic hp" are used to produce keys, so the Length provided +to HKDF-Expand-Label along with these labels is determined by the size of keys +in the AEAD or header protection algorithm. The Length provided with "quic iv" +is the minimum length of the AEAD nonce, or 8 bytes if that is larger; see +[AEAD].¶
+The KDF used for initial secrets is always the HKDF-Expand-Label function from +TLS 1.3; see Section 5.2.¶
+Initial packets apply the packet protection process, but use a secret derived +from the Destination Connection ID field from the client's first Initial +packet.¶
+This secret is determined by using HKDF-Extract (see Section 2.2 of +[HKDF]) with a salt of 0x38762cf7f55934b34d179ae6a4c80cadccbb7f0a +and a IKM of the Destination Connection ID field. This produces an intermediate +pseudorandom key (PRK) that is used to derive two separate secrets for sending +and receiving.¶
+The secret used by clients to construct Initial packets uses the PRK and the +label "client in" as input to the HKDF-Expand-Label function from TLS +[TLS13] to produce a 32-byte secret. Packets constructed by the server use +the same process with the label "server in". The hash function for HKDF when +deriving initial secrets and keys is SHA-256 +[SHA].¶
+This process in pseudocode is:¶
++initial_salt = 0x38762cf7f55934b34d179ae6a4c80cadccbb7f0a +initial_secret = HKDF-Extract(initial_salt, + client_dst_connection_id) + +client_initial_secret = HKDF-Expand-Label(initial_secret, + "client in", "", + Hash.length) +server_initial_secret = HKDF-Expand-Label(initial_secret, + "server in", "", + Hash.length) +¶ +
The connection ID used with HKDF-Expand-Label is the Destination Connection ID +in the Initial packet sent by the client. This will be a randomly-selected +value unless the client creates the Initial packet after receiving a Retry +packet, where the Destination Connection ID is selected by the server.¶
+Future versions of QUIC SHOULD generate a new salt value, thus ensuring that +the keys are different for each version of QUIC. This prevents a middlebox that +recognizes only one version of QUIC from seeing or modifying the contents of +packets from future versions.¶
+The HKDF-Expand-Label function defined in TLS 1.3 MUST be used for Initial +packets even where the TLS versions offered do not include TLS 1.3.¶
+The secrets used for constructing subsequent Initial packets change when a +server sends a Retry packet, to use the connection ID value selected by the +server. The secrets do not change when a client changes the Destination +Connection ID it uses in response to an Initial packet from the server.¶
+The Destination Connection ID field could be any length up to 20 bytes, +including zero length if the server sends a Retry packet with a zero-length +Source Connection ID field. After a Retry, the Initial keys provide the client +no assurance that the server received its packet, so the client has to rely on +the exchange that included the Retry packet to validate the server address; +see Section 8.1 of [QUIC-TRANSPORT].¶
+Appendix A contains sample Initial packets.¶
+The Authenticated Encryption with Associated Data (AEAD; see [AEAD]) function +used for QUIC packet protection is the AEAD that is negotiated for use with the +TLS connection. For example, if TLS is using the TLS_AES_128_GCM_SHA256 cipher +suite, the AEAD_AES_128_GCM function is used.¶
+QUIC can use any of the cipher suites defined in [TLS13] with the exception +of TLS_AES_128_CCM_8_SHA256. A cipher suite MUST NOT be negotiated unless a +header protection scheme is defined for the cipher suite. This document defines +a header protection scheme for all cipher suites defined in [TLS13] aside +from TLS_AES_128_CCM_8_SHA256. These cipher suites have a 16-byte +authentication tag and produce an output 16 bytes larger than their input.¶
+An endpoint MUST NOT reject a ClientHello that offers a cipher suite that it +does not support, or it would be impossible to deploy a new cipher suite. +This also applies to TLS_AES_128_CCM_8_SHA256.¶
+When constructing packets, the AEAD function is applied prior to applying +header protection; see Section 5.4. The unprotected packet header is part +of the associated data (A). When processing packets, an endpoint first +removes the header protection.¶
+The key and IV for the packet are computed as described in Section 5.1. +The nonce, N, is formed by combining the packet protection IV with the packet +number. The 62 bits of the reconstructed QUIC packet number in network byte +order are left-padded with zeros to the size of the IV. The exclusive OR of the +padded packet number and the IV forms the AEAD nonce.¶
+The associated data, A, for the AEAD is the contents of the QUIC header, +starting from the first byte of either the short or long header, up to and +including the unprotected packet number.¶
+The input plaintext, P, for the AEAD is the payload of the QUIC packet, as +described in [QUIC-TRANSPORT].¶
+The output ciphertext, C, of the AEAD is transmitted in place of P.¶
+Some AEAD functions have limits for how many packets can be encrypted under the +same key and IV; see Section 6.6. This might be lower than the packet +number limit. An endpoint MUST initiate a key update (Section 6) prior to +exceeding any limit set for the AEAD that is in use.¶
+Parts of QUIC packet headers, in particular the Packet Number field, are +protected using a key that is derived separately from the packet protection key +and IV. The key derived using the "quic hp" label is used to provide +confidentiality protection for those fields that are not exposed to on-path +elements.¶
+This protection applies to the least-significant bits of the first byte, plus +the Packet Number field. The four least-significant bits of the first byte are +protected for packets with long headers; the five least significant bits of the +first byte are protected for packets with short headers. For both header forms, +this covers the reserved bits and the Packet Number Length field; the Key Phase +bit is also protected for packets with a short header.¶
+The same header protection key is used for the duration of the connection, with +the value not changing after a key update (see Section 6). This allows +header protection to be used to protect the key phase.¶
+This process does not apply to Retry or Version Negotiation packets, which do +not contain a protected payload or any of the fields that are protected by this +process.¶
+Header protection is applied after packet protection is applied (see Section 5.3). +The ciphertext of the packet is sampled and used as input to an encryption +algorithm. The algorithm used depends on the negotiated AEAD.¶
+The output of this algorithm is a 5-byte mask that is applied to the protected +header fields using exclusive OR. The least significant bits of the first byte +of the packet are masked by the least significant bits of the first mask byte, +and the packet number is masked with the remaining bytes. Any unused bytes of +mask that might result from a shorter packet number encoding are unused.¶
+Figure 6 shows a sample algorithm for applying header protection. Removing +header protection only differs in the order in which the packet number length +(pn_length) is determined (here "^" is used to represent exclusive or).¶
+Specific header protection functions are defined based on the selected cipher +suite; see Section 5.4.3 and Section 5.4.4.¶
+Figure 7 shows an example long header packet (Initial) and a short header +packet (1-RTT). Figure 7 shows the fields in each header that are covered +by header protection and the portion of the protected packet payload that is +sampled.¶
+Before a TLS cipher suite can be used with QUIC, a header protection algorithm +MUST be specified for the AEAD used with that cipher suite. This document +defines algorithms for AEAD_AES_128_GCM, AEAD_AES_128_CCM, AEAD_AES_256_GCM (all +these AES AEADs are defined in [AEAD]), and AEAD_CHACHA20_POLY1305 +(defined in [CHACHA]). Prior to TLS selecting a cipher suite, AES +header protection is used (Section 5.4.3), matching the AEAD_AES_128_GCM packet +protection.¶
+The header protection algorithm uses both the header protection key and a sample +of the ciphertext from the packet Payload field.¶
+The same number of bytes are always sampled, but an allowance needs to be made +for the endpoint removing protection, which will not know the length of the +Packet Number field. The sample of ciphertext is taken starting from an offset +of 4 bytes after the start of the Packet Number field. That is, in sampling +packet ciphertext for header protection, the Packet Number field is assumed to +be 4 bytes long (its maximum possible encoded length).¶
+An endpoint MUST discard packets that are not long enough to contain a complete +sample.¶
+To ensure that sufficient data is available for sampling, packets are padded so +that the combined lengths of the encoded packet number and protected payload is +at least 4 bytes longer than the sample required for header protection. The +cipher suites defined in [TLS13] - other than TLS_AES_128_CCM_8_SHA256, for +which a header protection scheme is not defined in this document - have 16-byte +expansions and 16-byte header protection samples. This results in needing at +least 3 bytes of frames in the unprotected payload if the packet number is +encoded on a single byte, or 2 bytes of frames for a 2-byte packet number +encoding.¶
+The sampled ciphertext can be determined by the following pseudocode:¶
++# pn_offset is the start of the Packet Number field. +sample_offset = pn_offset + 4 + +sample = packet[sample_offset..sample_offset+sample_length] +¶ +
where the packet number offset of a short header packet can be calculated as:¶
++pn_offset = 1 + len(connection_id) +¶ +
and the packet number offset of a long header packet can be calculated as:¶
++pn_offset = 7 + len(destination_connection_id) + + len(source_connection_id) + + len(payload_length) +if packet_type == Initial: + pn_offset += len(token_length) + + len(token) +¶ +
For example, for a packet with a short header, an 8-byte connection ID, and +protected with AEAD_AES_128_GCM, the sample takes bytes 13 to 28 inclusive +(using zero-based indexing).¶
+Multiple QUIC packets might be included in the same UDP datagram. Each packet +is handled separately.¶
+This section defines the packet protection algorithm for AEAD_AES_128_GCM, +AEAD_AES_128_CCM, and AEAD_AES_256_GCM. AEAD_AES_128_GCM and AEAD_AES_128_CCM +use 128-bit AES in electronic code-book (ECB) mode. AEAD_AES_256_GCM uses +256-bit AES in ECB mode. AES is defined in [AES].¶
+This algorithm samples 16 bytes from the packet ciphertext. This value is used +as the input to AES-ECB. In pseudocode, the header protection function is +defined as:¶
++header_protection(hp_key, sample): + mask = AES-ECB(hp_key, sample) +¶ +
When AEAD_CHACHA20_POLY1305 is in use, header protection uses the raw ChaCha20 +function as defined in Section 2.4 of [CHACHA]. This uses a 256-bit key and +16 bytes sampled from the packet protection output.¶
+The first 4 bytes of the sampled ciphertext are the block counter. A ChaCha20 +implementation could take a 32-bit integer in place of a byte sequence, in +which case the byte sequence is interpreted as a little-endian value.¶
+The remaining 12 bytes are used as the nonce. A ChaCha20 implementation might +take an array of three 32-bit integers in place of a byte sequence, in which +case the nonce bytes are interpreted as a sequence of 32-bit little-endian +integers.¶
+The encryption mask is produced by invoking ChaCha20 to protect 5 zero bytes. In +pseudocode, the header protection function is defined as:¶
++header_protection(hp_key, sample): + counter = sample[0..3] + nonce = sample[4..15] + mask = ChaCha20(hp_key, counter, nonce, {0,0,0,0,0}) +¶ +
Once an endpoint successfully receives a packet with a given packet number, it +MUST discard all packets in the same packet number space with higher packet +numbers if they cannot be successfully unprotected with either the same key, or +- if there is a key update - a subsequent packet protection key; see +Section 6. Similarly, a packet that appears to trigger a key update, but +cannot be unprotected successfully MUST be discarded.¶
+Failure to unprotect a packet does not necessarily indicate the existence of a +protocol error in a peer or an attack. The truncated packet number encoding +used in QUIC can cause packet numbers to be decoded incorrectly if they are +delayed significantly.¶
+If 0-RTT keys are available (see Section 4.6.1), the lack of replay protection +means that restrictions on their use are necessary to avoid replay attacks on +the protocol.¶
+Of the frames defined in [QUIC-TRANSPORT], the STREAM, RESET_STREAM, +STOP_SENDING, and CONNECTION_CLOSE frames are potentially unsafe for use with +0-RTT as they carry application data. Application data that is received in +0-RTT could cause an application at the server to process the data multiple +times rather than just once. Additional actions taken by a server as a result +of processing replayed application data could have unwanted consequences. A +client therefore MUST NOT use 0-RTT for application data unless specifically +requested by the application that is in use.¶
+An application protocol that uses QUIC MUST include a profile that defines +acceptable use of 0-RTT; otherwise, 0-RTT can only be used to carry QUIC frames +that do not carry application data. For example, a profile for HTTP is +described in [HTTP-REPLAY] and used for HTTP/3; see Section 10.9 of +[QUIC-HTTP].¶
+Though replaying packets might result in additional connection attempts, the +effect of processing replayed frames that do not carry application data is +limited to changing the state of the affected connection. A TLS handshake +cannot be successfully completed using replayed packets.¶
+A client MAY wish to apply additional restrictions on what data it sends prior +to the completion of the TLS handshake.¶
+A client otherwise treats 0-RTT keys as equivalent to 1-RTT keys, except that +it cannot send certain frames with 0-RTT keys; see Section 12.5 of +[QUIC-TRANSPORT].¶
+A client that receives an indication that its 0-RTT data has been accepted by a +server can send 0-RTT data until it receives all of the server's handshake +messages. A client SHOULD stop sending 0-RTT data if it receives an indication +that 0-RTT data has been rejected.¶
+A server MUST NOT use 0-RTT keys to protect packets; it uses 1-RTT keys to +protect acknowledgments of 0-RTT packets. A client MUST NOT attempt to +decrypt 0-RTT packets it receives and instead MUST discard them.¶
+Once a client has installed 1-RTT keys, it MUST NOT send any more 0-RTT +packets.¶
+0-RTT data can be acknowledged by the server as it receives it, but any +packets containing acknowledgments of 0-RTT data cannot have packet protection +removed by the client until the TLS handshake is complete. The 1-RTT keys +necessary to remove packet protection cannot be derived until the client +receives all server handshake messages.¶
+Due to reordering and loss, protected packets might be received by an endpoint +before the final TLS handshake messages are received. A client will be unable +to decrypt 1-RTT packets from the server, whereas a server will be able to +decrypt 1-RTT packets from the client. Endpoints in either role MUST NOT +decrypt 1-RTT packets from their peer prior to completing the handshake.¶
+Even though 1-RTT keys are available to a server after receiving the first +handshake messages from a client, it is missing assurances on the client state:¶
+Therefore, the server's use of 1-RTT keys before the handshake is complete is +limited to sending data. A server MUST NOT process incoming 1-RTT protected +packets before the TLS handshake is complete. Because sending acknowledgments +indicates that all frames in a packet have been processed, a server cannot send +acknowledgments for 1-RTT packets until the TLS handshake is complete. Received +packets protected with 1-RTT keys MAY be stored and later decrypted and used +once the handshake is complete.¶
+TLS implementations might provide all 1-RTT secrets prior to handshake +completion. Even where QUIC implementations have 1-RTT read keys, those keys +are not to be used prior to completing the handshake.¶
+The requirement for the server to wait for the client Finished message creates +a dependency on that message being delivered. A client can avoid the +potential for head-of-line blocking that this implies by sending its 1-RTT +packets coalesced with a Handshake packet containing a copy of the CRYPTO frame +that carries the Finished message, until one of the Handshake packets is +acknowledged. This enables immediate server processing for those packets.¶
+A server could receive packets protected with 0-RTT keys prior to receiving a +TLS ClientHello. The server MAY retain these packets for later decryption in +anticipation of receiving a ClientHello.¶
+A client generally receives 1-RTT keys at the same time as the handshake +completes. Even if it has 1-RTT secrets, a client MUST NOT process +incoming 1-RTT protected packets before the TLS handshake is complete.¶
+Retry packets (see the Retry Packet section of [QUIC-TRANSPORT]) carry a +Retry Integrity Tag that provides two properties: it allows discarding +packets that have accidentally been corrupted by the network; only an +entity that observes an Initial packet can send a valid Retry packet.¶
+The Retry Integrity Tag is a 128-bit field that is computed as the output of +AEAD_AES_128_GCM ([AEAD]) used with the following inputs:¶
+The secret key and the nonce are values derived by calling HKDF-Expand-Label +using 0xd9c9943e6101fd200021506bcc02814c73030f25c79d71ce876eca876e6fca8e as the +secret, with labels being "quic key" and "quic iv" (Section 5.1).¶
+The Retry Pseudo-Packet is not sent over the wire. It is computed by taking +the transmitted Retry packet, removing the Retry Integrity Tag and prepending +the two following fields:¶
+The ODCID Length field contains the length in bytes of the Original +Destination Connection ID field that follows it, encoded as an 8-bit unsigned +integer.¶
+The Original Destination Connection ID contains the value of the Destination +Connection ID from the Initial packet that this Retry is in response to. The +length of this field is given in ODCID Length. The presence of this field +ensures that a valid Retry packet can only be sent by an entity that +observes the Initial packet.¶
+Once the handshake is confirmed (see Section 4.1.2), an endpoint MAY +initiate a key update.¶
+The Key Phase bit indicates which packet protection keys are used to protect the +packet. The Key Phase bit is initially set to 0 for the first set of 1-RTT +packets and toggled to signal each subsequent key update.¶
+The Key Phase bit allows a recipient to detect a change in keying material +without needing to receive the first packet that triggered the change. An +endpoint that notices a changed Key Phase bit updates keys and decrypts the +packet that contains the changed value.¶
+Initiating a key update results in both endpoints updating keys. This differs +from TLS where endpoints can update keys independently.¶
+This mechanism replaces the key update mechanism of TLS, which relies on +KeyUpdate messages sent using 1-RTT encryption keys. Endpoints MUST NOT send a +TLS KeyUpdate message. Endpoints MUST treat the receipt of a TLS KeyUpdate +message as a connection error of type 0x10a, equivalent to a +fatal TLS alert of unexpected_message; see Section 4.8.¶
+Figure 9 shows a key update process, where the initial set of keys used +(identified with @M) are replaced by updated keys (identified with @N). The +value of the Key Phase bit is indicated in brackets [].¶
+Endpoints maintain separate read and write secrets for packet protection. An +endpoint initiates a key update by updating its packet protection write secret +and using that to protect new packets. The endpoint creates a new write secret +from the existing write secret as performed in Section 7.2 of [TLS13]. This +uses the KDF function provided by TLS with a label of "quic ku". The +corresponding key and IV are created from that secret as defined in +Section 5.1. The header protection key is not updated.¶
+For example, to update write keys with TLS 1.3, HKDF-Expand-Label is used as:¶
++secret_<n+1> = HKDF-Expand-Label(secret_<n>, "quic ku", + "", Hash.length) +¶ +
The endpoint toggles the value of the Key Phase bit and uses the updated key and +IV to protect all subsequent packets.¶
+An endpoint MUST NOT initiate a key update prior to having confirmed the +handshake (Section 4.1.2). An endpoint MUST NOT initiate a subsequent +key update unless it has received an acknowledgment for a packet that was sent +protected with keys from the current key phase. This ensures that keys are +available to both peers before another key update can be initiated. This can be +implemented by tracking the lowest packet number sent with each key phase, and +the highest acknowledged packet number in the 1-RTT space: once the latter is +higher than or equal to the former, another key update can be initiated.¶
+Keys of packets other than the 1-RTT packets are never updated; their keys are +derived solely from the TLS handshake state.¶
+The endpoint that initiates a key update also updates the keys that it uses for +receiving packets. These keys will be needed to process packets the peer sends +after updating.¶
+An endpoint MUST retain old keys until it has successfully unprotected a packet +sent using the new keys. An endpoint SHOULD retain old keys for some time +after unprotecting a packet sent using the new keys. Discarding old keys too +early can cause delayed packets to be discarded. Discarding packets will be +interpreted as packet loss by the peer and could adversely affect performance.¶
+A peer is permitted to initiate a key update after receiving an acknowledgment +of a packet in the current key phase. An endpoint detects a key update when +processing a packet with a key phase that differs from the value used to protect +the last packet it sent. To process this packet, the endpoint uses the next +packet protection key and IV. See Section 6.3 for considerations +about generating these keys.¶
+If a packet is successfully processed using the next key and IV, then the peer +has initiated a key update. The endpoint MUST update its send keys to the +corresponding key phase in response, as described in Section 6.1. +Sending keys MUST be updated before sending an acknowledgment for the packet +that was received with updated keys. By acknowledging the packet that triggered +the key update in a packet protected with the updated keys, the endpoint signals +that the key update is complete.¶
+An endpoint can defer sending the packet or acknowledgment according to its +normal packet sending behaviour; it is not necessary to immediately generate a +packet in response to a key update. The next packet sent by the endpoint will +use the updated keys. The next packet that contains an acknowledgment will +cause the key update to be completed. If an endpoint detects a second update +before it has sent any packets with updated keys containing an +acknowledgment for the packet that initiated the key update, it indicates that +its peer has updated keys twice without awaiting confirmation. An endpoint MAY +treat such consecutive key updates as a connection error of type +KEY_UPDATE_ERROR.¶
+An endpoint that receives an acknowledgment that is carried in a packet +protected with old keys where any acknowledged packet was protected with newer +keys MAY treat that as a connection error of type KEY_UPDATE_ERROR. This +indicates that a peer has received and acknowledged a packet that initiates a +key update, but has not updated keys in response.¶
+Endpoints responding to an apparent key update MUST NOT generate a timing +side-channel signal that might indicate that the Key Phase bit was invalid (see +Section 9.4). Endpoints can use dummy packet protection keys in +place of discarded keys when key updates are not yet permitted. Using dummy +keys will generate no variation in the timing signal produced by attempting to +remove packet protection, and results in all packets with an invalid Key Phase +bit being rejected.¶
+The process of creating new packet protection keys for receiving packets could +reveal that a key update has occurred. An endpoint MAY generate new keys as +part of packet processing, but this creates a timing signal that could be used +by an attacker to learn when key updates happen and thus leak the value of the +Key Phase bit.¶
+Endpoints are generally expected to have current and next receive packet +protection keys available. For a short period after a key update completes, up +to the PTO, endpoints MAY defer generation of the next set of +receive packet protection keys. This allows endpoints +to retain only two sets of receive keys; see Section 6.5.¶
+Once generated, the next set of packet protection keys SHOULD be retained, even +if the packet that was received was subsequently discarded. Packets containing +apparent key updates are easy to forge and - while the process of key update +does not require significant effort - triggering this process could be used by +an attacker for DoS.¶
+For this reason, endpoints MUST be able to retain two sets of packet protection +keys for receiving packets: the current and the next. Retaining the previous +keys in addition to these might improve performance, but this is not essential.¶
+An endpoint never sends packets that are protected with old keys. Only the +current keys are used. Keys used for protecting packets can be discarded +immediately after switching to newer keys.¶
+Packets with higher packet numbers MUST be protected with either the same or +newer packet protection keys than packets with lower packet numbers. An +endpoint that successfully removes protection with old keys when newer keys were +used for packets with lower packet numbers MUST treat this as a connection error +of type KEY_UPDATE_ERROR.¶
+For receiving packets during a key update, packets protected with older keys +might arrive if they were delayed by the network. Retaining old packet +protection keys allows these packets to be successfully processed.¶
+As packets protected with keys from the next key phase use the same Key Phase +value as those protected with keys from the previous key phase, it is necessary +to distinguish between the two, if packets protected with old keys are to be +processed. This can be done using packet numbers. A recovered packet number +that is lower than any packet number from the current key phase uses the +previous packet protection keys; a recovered packet number that is higher than +any packet number from the current key phase requires the use of the next packet +protection keys.¶
+Some care is necessary to ensure that any process for selecting between +previous, current, and next packet protection keys does not expose a timing side +channel that might reveal which keys were used to remove packet protection. See +Section 9.5 for more information.¶
+Alternatively, endpoints can retain only two sets of packet protection keys, +swapping previous for next after enough time has passed to allow for reordering +in the network. In this case, the Key Phase bit alone can be used to select +keys.¶
+An endpoint MAY allow a period of approximately the Probe Timeout (PTO; see +[QUIC-RECOVERY]) after promoting the next set of receive keys to be current +before it creates the subsequent set of packet protection keys. These updated +keys MAY replace the previous keys at that time. With the caveat that PTO is a +subjective measure - that is, a peer could have a different view of the RTT - +this time is expected to be long enough that any reordered packets would be +declared lost by a peer even if they were acknowledged and short enough to +allow a peer to initiate further key updates.¶
+Endpoints need to allow for the possibility that a peer might not be able to +decrypt packets that initiate a key update during the period when the peer +retains old keys. Endpoints SHOULD wait three times the PTO before initiating a +key update after receiving an acknowledgment that confirms that the previous key +update was received. Failing to allow sufficient time could lead to packets +being discarded.¶
+An endpoint SHOULD retain old read keys for no more than three times the PTO +after having received a packet protected using the new keys. After this period, +old read keys and their corresponding secrets SHOULD be discarded.¶
+This document sets usage limits for AEAD algorithms to ensure that overuse does +not give an adversary a disproportionate advantage in attacking the +confidentiality and integrity of communications when using QUIC.¶
+The usage limits defined in TLS 1.3 exist for protection against attacks +on confidentiality and apply to successful applications of AEAD protection. The +integrity protections in authenticated encryption also depend on limiting the +number of attempts to forge packets. TLS achieves this by closing connections +after any record fails an authentication check. In comparison, QUIC ignores any +packet that cannot be authenticated, allowing multiple forgery attempts.¶
+QUIC accounts for AEAD confidentiality and integrity limits separately. The +confidentiality limit applies to the number of packets encrypted with a given +key. The integrity limit applies to the number of packets decrypted within a +given connection. Details on enforcing these limits for each AEAD algorithm +follow below.¶
+Endpoints MUST count the number of encrypted packets for each set of keys. If +the total number of encrypted packets with the same key exceeds the +confidentiality limit for the selected AEAD, the endpoint MUST stop using those +keys. Endpoints MUST initiate a key update before sending more protected packets +than the confidentiality limit for the selected AEAD permits. If a key update +is not possible or integrity limits are reached, the endpoint MUST stop using +the connection and only send stateless resets in response to receiving packets. +It is RECOMMENDED that endpoints immediately close the connection with a +connection error of type AEAD_LIMIT_REACHED before reaching a state where key +updates are not possible.¶
+For AEAD_AES_128_GCM and AEAD_AES_256_GCM, the confidentiality limit is 2^23 +encrypted packets; see Appendix B.1. For AEAD_CHACHA20_POLY1305, the +confidentiality limit is greater than the number of possible packets (2^62) and +so can be disregarded. For AEAD_AES_128_CCM, the confidentiality limit is 2^21.5 +encrypted packets; see Appendix B.2. Applying a limit reduces the probability +that an attacker can distinguish the AEAD in use from a random permutation; see +[AEBounds], [ROBUST], and [GCM-MU].¶
+In addition to counting packets sent, endpoints MUST count the number of +received packets that fail authentication during the lifetime of a connection. +If the total number of received packets that fail authentication within the +connection, across all keys, exceeds the integrity limit for the selected AEAD, +the endpoint MUST immediately close the connection with a connection error of +type AEAD_LIMIT_REACHED and not process any more packets.¶
+For AEAD_AES_128_GCM and AEAD_AES_256_GCM, the integrity limit is 2^52 invalid +packets; see Appendix B.1. For AEAD_CHACHA20_POLY1305, the integrity limit is +2^36 invalid packets; see [AEBounds]. For AEAD_AES_128_CCM, the integrity +limit is 2^21.5 invalid packets; see Appendix B.2. Applying this limit reduces +the probability that an attacker can successfully forge a packet; see +[AEBounds], [ROBUST], and [GCM-MU].¶
+Endpoints that limit the size of packets MAY use higher confidentiality and +integrity limits; see Appendix B for details.¶
+Future analyses and specifications MAY relax confidentiality or integrity limits +for an AEAD.¶
+Any TLS cipher suite that is specified for use with QUIC MUST define limits on +the use of the associated AEAD function that preserves margins for +confidentiality and integrity. That is, limits MUST be specified for the number +of packets that can be authenticated and for the number of packets that can fail +authentication. Providing a reference to any analysis upon which values are +based - and any assumptions used in that analysis - allows limits to be adapted +to varying usage conditions.¶
+The KEY_UPDATE_ERROR error code (0xe) is used to signal errors related to key +updates.¶
+Initial packets are not protected with a secret key, so they are subject to +potential tampering by an attacker. QUIC provides protection against attackers +that cannot read packets, but does not attempt to provide additional protection +against attacks where the attacker can observe and inject packets. Some forms +of tampering -- such as modifying the TLS messages themselves -- are detectable, +but some -- such as modifying ACKs -- are not.¶
+For example, an attacker could inject a packet containing an ACK frame that +makes it appear that a packet had not been received or to create a false +impression of the state of the connection (e.g., by modifying the ACK Delay). +Note that such a packet could cause a legitimate packet to be dropped as a +duplicate. Implementations SHOULD use caution in relying on any data that is +contained in Initial packets that is not otherwise authenticated.¶
+It is also possible for the attacker to tamper with data that is carried in +Handshake packets, but because that tampering requires modifying TLS handshake +messages, that tampering will cause the TLS handshake to fail.¶
+Certain aspects of the TLS handshake are different when used with QUIC.¶
+QUIC also requires additional features from TLS. In addition to negotiation of +cryptographic parameters, the TLS handshake carries and authenticates values for +QUIC transport parameters.¶
+QUIC requires that the cryptographic handshake provide authenticated protocol +negotiation. TLS uses Application Layer Protocol Negotiation +([ALPN]) to select an application protocol. Unless another mechanism +is used for agreeing on an application protocol, endpoints MUST use ALPN for +this purpose.¶
+When using ALPN, endpoints MUST immediately close a connection (see Section +10.2 of [QUIC-TRANSPORT]) with a no_application_protocol TLS alert (QUIC error +code 0x178; see Section 4.8) if an application protocol is not negotiated. +While [ALPN] only specifies that servers use this alert, QUIC clients MUST +use error 0x178 to terminate a connection when ALPN negotiation fails.¶
+An application protocol MAY restrict the QUIC versions that it can operate over. +Servers MUST select an application protocol compatible with the QUIC version +that the client has selected. The server MUST treat the inability to select a +compatible application protocol as a connection error of type 0x178 +(no_application_protocol). Similarly, a client MUST treat the selection of an +incompatible application protocol by a server as a connection error of type +0x178.¶
+QUIC transport parameters are carried in a TLS extension. Different versions of +QUIC might define a different method for negotiating transport configuration.¶
+Including transport parameters in the TLS handshake provides integrity +protection for these values.¶
++ enum { + quic_transport_parameters(0x39), (65535) + } ExtensionType; +¶ +
The extension_data field of the quic_transport_parameters extension contains a +value that is defined by the version of QUIC that is in use.¶
+The quic_transport_parameters extension is carried in the ClientHello and the +EncryptedExtensions messages during the handshake. Endpoints MUST send the +quic_transport_parameters extension; endpoints that receive ClientHello or +EncryptedExtensions messages without the quic_transport_parameters extension +MUST close the connection with an error of type 0x16d (equivalent to a fatal TLS +missing_extension alert, see Section 4.8).¶
+Transport parameters become available prior to the completion of the handshake. +A server might use these values earlier than handshake completion. However, the +value of transport parameters is not authenticated until the handshake +completes, so any use of these parameters cannot depend on their authenticity. +Any tampering with transport parameters will cause the handshake to fail.¶
+Endpoints MUST NOT send this extension in a TLS connection that does not use +QUIC (such as the use of TLS with TCP defined in [TLS13]). A fatal +unsupported_extension alert MUST be sent by an implementation that supports this +extension if the extension is received when the transport is not QUIC.¶
+Negotiating the quic_transport_parameters extension causes the EndOfEarlyData to +be removed; see Section 8.3.¶
+The TLS EndOfEarlyData message is not used with QUIC. QUIC does not rely on +this message to mark the end of 0-RTT data or to signal the change to Handshake +keys.¶
+Clients MUST NOT send the EndOfEarlyData message. A server MUST treat receipt +of a CRYPTO frame in a 0-RTT packet as a connection error of type +PROTOCOL_VIOLATION.¶
+As a result, EndOfEarlyData does not appear in the TLS handshake transcript.¶
+Appendix D.4 of [TLS13] describes an alteration to the TLS 1.3 handshake as +a workaround for bugs in some middleboxes. The TLS 1.3 middlebox compatibility +mode involves setting the legacy_session_id field to a 32-byte value in the +ClientHello and ServerHello, then sending a change_cipher_spec record. Both +field and record carry no semantic content and are ignored.¶
+This mode has no use in QUIC as it only applies to middleboxes that interfere +with TLS over TCP. QUIC also provides no means to carry a change_cipher_spec +record. A client MUST NOT request the use of the TLS 1.3 compatibility mode. A +server SHOULD treat the receipt of a TLS ClientHello with a non-empty +legacy_session_id field as a connection error of type PROTOCOL_VIOLATION.¶
+All of the security considerations that apply to TLS also apply to the use of +TLS in QUIC. Reading all of [TLS13] and its appendices is the best way to +gain an understanding of the security properties of QUIC.¶
+This section summarizes some of the more important security aspects specific to +the TLS integration, though there are many security-relevant details in the +remainder of the document.¶
+Use of TLS session tickets allows servers and possibly other entities to +correlate connections made by the same client; see Section 4.5 for details.¶
+As described in Section 8 of [TLS13], use of TLS early data comes with an +exposure to replay attack. The use of 0-RTT in QUIC is similarly vulnerable to +replay attack.¶
+Endpoints MUST implement and use the replay protections described in [TLS13], +however it is recognized that these protections are imperfect. Therefore, +additional consideration of the risk of replay is needed.¶
+QUIC is not vulnerable to replay attack, except via the application protocol +information it might carry. The management of QUIC protocol state based on the +frame types defined in [QUIC-TRANSPORT] is not vulnerable to replay. +Processing of QUIC frames is idempotent and cannot result in invalid connection +states if frames are replayed, reordered or lost. QUIC connections do not +produce effects that last beyond the lifetime of the connection, except for +those produced by the application protocol that QUIC serves.¶
+TLS session tickets and address validation tokens are used to carry QUIC +configuration information between connections. Specifically, to enable a +server to efficiently recover state that is used in connection establishment +and address validation. These MUST NOT be used to communicate application +semantics between endpoints; clients MUST treat them as opaque values. The +potential for reuse of these tokens means that they require stronger +protections against replay.¶
+A server that accepts 0-RTT on a connection incurs a higher cost than accepting +a connection without 0-RTT. This includes higher processing and computation +costs. Servers need to consider the probability of replay and all associated +costs when accepting 0-RTT.¶
+Ultimately, the responsibility for managing the risks of replay attacks with +0-RTT lies with an application protocol. An application protocol that uses QUIC +MUST describe how the protocol uses 0-RTT and the measures that are employed to +protect against replay attack. An analysis of replay risk needs to consider +all QUIC protocol features that carry application semantics.¶
+Disabling 0-RTT entirely is the most effective defense against replay attack.¶
+QUIC extensions MUST describe how replay attacks affect their operation, or +prohibit their use in 0-RTT. Application protocols MUST either prohibit the use +of extensions that carry application semantics in 0-RTT or provide replay +mitigation strategies.¶
+A small ClientHello that results in a large block of handshake messages from a +server can be used in packet reflection attacks to amplify the traffic generated +by an attacker.¶
+QUIC includes three defenses against this attack. First, the packet containing a +ClientHello MUST be padded to a minimum size. Second, if responding to an +unverified source address, the server is forbidden to send more than three times +as many bytes as the number of bytes it has received (see Section 8.1 of +[QUIC-TRANSPORT]). Finally, because acknowledgments of Handshake packets are +authenticated, a blind attacker cannot forge them. Put together, these defenses +limit the level of amplification.¶
+[NAN] analyzes authenticated encryption
+algorithms that provide nonce privacy, referred to as "Hide Nonce" (HN)
+transforms. The general header protection construction in this document is
+one of those algorithms (HN1). Header protection is applied after the packet
+protection AEAD, sampling a set of bytes (sample
) from the AEAD output and
+encrypting the header field using a pseudorandom function (PRF) as follows:¶
+protected_field = field XOR PRF(hp_key, sample) +¶ +
The header protection variants in this document use a pseudorandom permutation +(PRP) in place of a generic PRF. However, since all PRPs are also PRFs [IMC], +these variants do not deviate from the HN1 construction.¶
+As hp_key
is distinct from the packet protection key, it follows that header
+protection achieves AE2 security as defined in [NAN] and therefore guarantees
+privacy of field
, the protected packet header. Future header protection
+variants based on this construction MUST use a PRF to ensure equivalent
+security guarantees.¶
Use of the same key and ciphertext sample more than once risks compromising +header protection. Protecting two different headers with the same key and +ciphertext sample reveals the exclusive OR of the protected fields. Assuming +that the AEAD acts as a PRF, if L bits are sampled, the odds of two ciphertext +samples being identical approach 2^(-L/2), that is, the birthday bound. For the +algorithms described in this document, that probability is one in 2^64.¶
+To prevent an attacker from modifying packet headers, the header is transitively +authenticated using packet protection; the entire packet header is part of the +authenticated additional data. Protected fields that are falsified or modified +can only be detected once the packet protection is removed.¶
+An attacker could guess values for packet numbers or Key Phase and have an +endpoint confirm guesses through timing side channels. Similarly, guesses for +the packet number length can be tried and exposed. If the recipient of a +packet discards packets with duplicate packet numbers without attempting to +remove packet protection they could reveal through timing side-channels that the +packet number matches a received packet. For authentication to be free from +side-channels, the entire process of header protection removal, packet number +recovery, and packet protection removal MUST be applied together without timing +and other side-channels.¶
+For the sending of packets, construction and protection of packet payloads and +packet numbers MUST be free from side-channels that would reveal the packet +number or its encoded size.¶
+During a key update, the time taken to generate new keys could reveal through +timing side-channels that a key update has occurred. Alternatively, where an +attacker injects packets this side-channel could reveal the value of the Key +Phase on injected packets. After receiving a key update, an endpoint SHOULD +generate and save the next set of receive packet protection keys, as described +in Section 6.3. By generating new keys before a key update is +received, receipt of packets will not create timing signals that leak the value +of the Key Phase.¶
+This depends on not doing this key generation during packet processing and it +can require that endpoints maintain three sets of packet protection keys for +receiving: for the previous key phase, for the current key phase, and for the +next key phase. Endpoints can instead choose to defer generation of the next +receive packet protection keys until they discard old keys so that only two sets +of receive keys need to be retained at any point in time.¶
+In using TLS, the central key schedule of TLS is used. As a result of the TLS +handshake messages being integrated into the calculation of secrets, the +inclusion of the QUIC transport parameters extension ensures that handshake and +1-RTT keys are not the same as those that might be produced by a server running +TLS over TCP. To avoid the possibility of cross-protocol key synchronization, +additional measures are provided to improve key separation.¶
+The QUIC packet protection keys and IVs are derived using a different label than +the equivalent keys in TLS.¶
+To preserve this separation, a new version of QUIC SHOULD define new labels for +key derivation for packet protection key and IV, plus the header protection +keys. This version of QUIC uses the string "quic". Other versions can use a +version-specific label in place of that string.¶
+The initial secrets use a key that is specific to the negotiated QUIC version. +New QUIC versions SHOULD define a new salt value used in calculating initial +secrets.¶
+QUIC depends on endpoints being able to generate secure random numbers, both +directly for protocol values such as the connection ID, and transitively via +TLS. See [RFC4086] for guidance on secure random number generation.¶
+IANA has registered a codepoint of 57 (or 0x39) for the +quic_transport_parameters extension (defined in Section 8.2) in the TLS +ExtensionType Values Registry [TLS-REGISTRIES].¶
+The Recommended column for this extension is marked Yes. The TLS 1.3 Column +includes CH and EE.¶
+This section shows examples of packet protection so that implementations can be +verified incrementally. Samples of Initial packets from both client and server, +plus a Retry packet are defined. These packets use an 8-byte client-chosen +Destination Connection ID of 0x8394c8f03e515708. Some intermediate values are +included. All values are shown in hexadecimal.¶
+The labels generated during the execution of the HKDF-Expand-Label function +(that is, HkdfLabel.label) and part of the value given to the HKDF-Expand +function in order to produce its output are:¶
+00200f746c73313320636c69656e7420696e00¶
+00200f746c7331332073657276657220696e00¶
+00100e746c7331332071756963206b657900¶
+000c0d746c733133207175696320697600¶
+00100d746c733133207175696320687000¶
+The initial secret is common:¶
++initial_secret = HKDF-Extract(initial_salt, cid) + = 7db5df06e7a69e432496adedb0085192 + 3595221596ae2ae9fb8115c1e9ed0a44 +¶ +
The secrets for protecting client packets are:¶
++client_initial_secret + = HKDF-Expand-Label(initial_secret, "client in", "", 32) + = c00cf151ca5be075ed0ebfb5c80323c4 + 2d6b7db67881289af4008f1f6c357aea + +key = HKDF-Expand-Label(client_initial_secret, "quic key", "", 16) + = 1f369613dd76d5467730efcbe3b1a22d + +iv = HKDF-Expand-Label(client_initial_secret, "quic iv", "", 12) + = fa044b2f42a3fd3b46fb255c + +hp = HKDF-Expand-Label(client_initial_secret, "quic hp", "", 16) + = 9f50449e04a0e810283a1e9933adedd2 +¶ +
The secrets for protecting server packets are:¶
++server_initial_secret + = HKDF-Expand-Label(initial_secret, "server in", "", 32) + = 3c199828fd139efd216c155ad844cc81 + fb82fa8d7446fa7d78be803acdda951b + +key = HKDF-Expand-Label(server_initial_secret, "quic key", "", 16) + = cf3a5331653c364c88f0f379b6067e37 + +iv = HKDF-Expand-Label(server_initial_secret, "quic iv", "", 12) + = 0ac1493ca1905853b0bba03e + +hp = HKDF-Expand-Label(server_initial_secret, "quic hp", "", 16) + = c206b8d9b9f0f37644430b490eeaa314 +¶ +
The client sends an Initial packet. The unprotected payload of this packet +contains the following CRYPTO frame, plus enough PADDING frames to make a +1162-byte payload:¶
++060040f1010000ed0303ebf8fa56f129 39b9584a3896472ec40bb863cfd3e868 +04fe3a47f06a2b69484c000004130113 02010000c000000010000e00000b6578 +616d706c652e636f6dff01000100000a 00080006001d00170018001000070005 +04616c706e0005000501000000000033 00260024001d00209370b2c9caa47fba +baf4559fedba753de171fa71f50f1ce1 5d43e994ec74d748002b000302030400 +0d0010000e0403050306030203080408 050806002d00020101001c0002400100 +3900320408ffffffffffffffff050480 00ffff07048000ffff08011001048000 +75300901100f088394c8f03e51570806 048000ffff +¶ +
The unprotected header indicates a length of 1182 bytes: the 4-byte packet +number, 1162 bytes of frames, and the 16-byte authentication tag. The header +includes the connection ID and a packet number of 2:¶
++c300000001088394c8f03e5157080000449e00000002 +¶ +
Protecting the payload produces output that is sampled for header protection. +Because the header uses a 4-byte packet number encoding, the first 16 bytes of +the protected payload is sampled, then applied to the header:¶
++sample = d1b1c98dd7689fb8ec11d242b123dc9b + +mask = AES-ECB(hp, sample)[0..4] + = 437b9aec36 + +header[0] ^= mask[0] & 0x0f + = c0 +header[18..21] ^= mask[1..4] + = 7b9aec34 +header = c000000001088394c8f03e5157080000449e7b9aec34 +¶ +
The resulting protected packet is:¶
++c000000001088394c8f03e5157080000 449e7b9aec34d1b1c98dd7689fb8ec11 +d242b123dc9bd8bab936b47d92ec356c 0bab7df5976d27cd449f63300099f399 +1c260ec4c60d17b31f8429157bb35a12 82a643a8d2262cad67500cadb8e7378c +8eb7539ec4d4905fed1bee1fc8aafba1 7c750e2c7ace01e6005f80fcb7df6212 +30c83711b39343fa028cea7f7fb5ff89 eac2308249a02252155e2347b63d58c5 +457afd84d05dfffdb20392844ae81215 4682e9cf012f9021a6f0be17ddd0c208 +4dce25ff9b06cde535d0f920a2db1bf3 62c23e596d11a4f5a6cf3948838a3aec +4e15daf8500a6ef69ec4e3feb6b1d98e 610ac8b7ec3faf6ad760b7bad1db4ba3 +485e8a94dc250ae3fdb41ed15fb6a8e5 eba0fc3dd60bc8e30c5c4287e53805db +059ae0648db2f64264ed5e39be2e20d8 2df566da8dd5998ccabdae053060ae6c +7b4378e846d29f37ed7b4ea9ec5d82e7 961b7f25a9323851f681d582363aa5f8 +9937f5a67258bf63ad6f1a0b1d96dbd4 faddfcefc5266ba6611722395c906556 +be52afe3f565636ad1b17d508b73d874 3eeb524be22b3dcbc2c7468d54119c74 +68449a13d8e3b95811a198f3491de3e7 fe942b330407abf82a4ed7c1b311663a +c69890f4157015853d91e923037c227a 33cdd5ec281ca3f79c44546b9d90ca00 +f064c99e3dd97911d39fe9c5d0b23a22 9a234cb36186c4819e8b9c5927726632 +291d6a418211cc2962e20fe47feb3edf 330f2c603a9d48c0fcb5699dbfe58964 +25c5bac4aee82e57a85aaf4e2513e4f0 5796b07ba2ee47d80506f8d2c25e50fd +14de71e6c418559302f939b0e1abd576 f279c4b2e0feb85c1f28ff18f58891ff +ef132eef2fa09346aee33c28eb130ff2 8f5b766953334113211996d20011a198 +e3fc433f9f2541010ae17c1bf202580f 6047472fb36857fe843b19f5984009dd +c324044e847a4f4a0ab34f719595de37 252d6235365e9b84392b061085349d73 +203a4a13e96f5432ec0fd4a1ee65accd d5e3904df54c1da510b0ff20dcc0c77f +cb2c0e0eb605cb0504db87632cf3d8b4 dae6e705769d1de354270123cb11450e +fc60ac47683d7b8d0f811365565fd98c 4c8eb936bcab8d069fc33bd801b03ade +a2e1fbc5aa463d08ca19896d2bf59a07 1b851e6c239052172f296bfb5e724047 +90a2181014f3b94a4e97d117b4381303 68cc39dbb2d198065ae3986547926cd2 +162f40a29f0c3c8745c0f50fba3852e5 66d44575c29d39a03f0cda721984b6f4 +40591f355e12d439ff150aab7613499d bd49adabc8676eef023b15b65bfc5ca0 +6948109f23f350db82123535eb8a7433 bdabcb909271a6ecbcb58b936a88cd4e +8f2e6ff5800175f113253d8fa9ca8885 c2f552e657dc603f252e1a8e308f76f0 +be79e2fb8f5d5fbbe2e30ecadd220723 c8c0aea8078cdfcb3868263ff8f09400 +54da48781893a7e49ad5aff4af300cd8 04a6b6279ab3ff3afb64491c85194aab +760d58a606654f9f4400e8b38591356f bf6425aca26dc85244259ff2b19c41b9 +f96f3ca9ec1dde434da7d2d392b905dd f3d1f9af93d1af5950bd493f5aa731b4 +056df31bd267b6b90a079831aaf579be 0a39013137aac6d404f518cfd4684064 +7e78bfe706ca4cf5e9c5453e9f7cfd2b 8b4c8d169a44e55c88d4a9a7f9474241 +e221af44860018ab0856972e194cd934 +¶ +
The server sends the following payload in response, including an ACK frame, a +CRYPTO frame, and no PADDING frames:¶
++02000000000600405a020000560303ee fce7f7b37ba1d1632e96677825ddf739 +88cfc79825df566dc5430b9a045a1200 130100002e00330024001d00209d3c94 +0d89690b84d08a60993c144eca684d10 81287c834d5311bcf32bb9da1a002b00 +020304 +¶ +
The header from the server includes a new connection ID and a 2-byte packet +number encoding for a packet number of 1:¶
++c1000000010008f067a5502a4262b50040750001 +¶ +
As a result, after protection, the header protection sample is taken starting +from the third protected byte:¶
++sample = 2cd0991cd25b0aac406a5816b6394100 +mask = 2ec0d8356a +header = cf000000010008f067a5502a4262b5004075c0d9 +¶ +
The final protected packet is then:¶
++cf000000010008f067a5502a4262b500 4075c0d95a482cd0991cd25b0aac406a +5816b6394100f37a1c69797554780bb3 8cc5a99f5ede4cf73c3ec2493a1839b3 +dbcba3f6ea46c5b7684df3548e7ddeb9 c3bf9c73cc3f3bded74b562bfb19fb84 +022f8ef4cdd93795d77d06edbb7aaf2f 58891850abbdca3d20398c276456cbc4 +2158407dd074ee +¶ +
This shows a Retry packet that might be sent in response to the Initial packet +in Appendix A.2. The integrity check includes the client-chosen +connection ID value of 0x8394c8f03e515708, but that value is not +included in the final Retry packet:¶
++ff000000010008f067a5502a4262b574 6f6b656e04a265ba2eff4d829058fb3f +0f2496ba +¶ +
This example shows some of the steps required to protect a packet with +a short header. This example uses AEAD_CHACHA20_POLY1305.¶
+In this example, TLS produces an application write secret from which a server +uses HKDF-Expand-Label to produce four values: a key, an IV, a header +protection key, and the secret that will be used after keys are updated (this +last value is not used further in this example).¶
++secret + = 9ac312a7f877468ebe69422748ad00a1 + 5443f18203a07d6060f688f30f21632b + +key = HKDF-Expand-Label(secret, "quic key", "", 32) + = c6d98ff3441c3fe1b2182094f69caa2e + d4b716b65488960a7a984979fb23e1c8 + +iv = HKDF-Expand-Label(secret, "quic iv", "", 12) + = e0459b3474bdd0e44a41c144 + +hp = HKDF-Expand-Label(secret, "quic hp", "", 32) + = 25a282b9e82f06f21f488917a4fc8f1b + 73573685608597d0efcb076b0ab7a7a4 + +ku = HKDF-Expand-Label(secret, "quic ku", "", 32) + = 1223504755036d556342ee9361d25342 + 1a826c9ecdf3c7148684b36b714881f9 +¶ +
The following shows the steps involved in protecting a minimal packet with an +empty Destination Connection ID. This packet contains a single PING frame (that +is, a payload of just 0x01) and has a packet number of 654360564. In this +example, using a packet number of length 3 (that is, 49140 is encoded) avoids +having to pad the payload of the packet; PADDING frames would be needed if the +packet number is encoded on fewer bytes.¶
++pn = 654360564 (decimal) +nonce = e0459b3474bdd0e46d417eb0 +unprotected header = 4200bff4 +payload plaintext = 01 +payload ciphertext = 655e5cd55c41f69080575d7999c25a5bfb +¶ +
The resulting ciphertext is the minimum size possible. One byte is skipped to +produce the sample for header protection.¶
++sample = 5e5cd55c41f69080575d7999c25a5bfb +mask = aefefe7d03 +header = 4cfe4189 +¶ +
The protected packet is the smallest possible packet size of 21 bytes.¶
++packet = 4cfe4189655e5cd55c41f69080575d7999c25a5bfb +¶ +
This section documents analyses used in deriving AEAD algorithm limits for +AEAD_AES_128_GCM, AEAD_AES_128_CCM, and AEAD_AES_256_GCM. The analyses that +follow use symbols for multiplication (*), division (/), and exponentiation (^), +plus parentheses for establishing precedence. The following symbols are also +used:¶
+The size of the authentication tag in bits. For these ciphers, t is 128.¶
+The size of the block function in bits. For these ciphers, n is 128.¶
+The size of the key in bits. This is 128 for AEAD_AES_128_GCM and +AEAD_AES_128_CCM; 256 for AEAD_AES_256_GCM.¶
+The number of blocks in each packet (see below).¶
+The number of genuine packets created and protected by endpoints. This value +is the bound on the number of packets that can be protected before updating +keys.¶
+The number of forged packets that endpoints will accept. This value is the +bound on the number of forged packets that an endpoint can reject before +updating keys.¶
+The amount of offline ideal cipher queries made by an adversary.¶
+The analyses that follow rely on a count of the number of block operations +involved in producing each message. This analysis is performed for packets of +size up to 2^11 (l = 2^7) and 2^16 (l = 2^12). A size of 2^11 is expected to be +a limit that matches common deployment patterns, whereas the 2^16 is the maximum +possible size of a QUIC packet. Only endpoints that strictly limit packet size +can use the larger confidentiality and integrity limits that are derived using +the smaller packet size.¶
+For AEAD_AES_128_GCM and AEAD_AES_256_GCM, the message length (l) is the length +of the associated data in blocks plus the length of the plaintext in blocks.¶
+For AEAD_AES_128_CCM, the total number of block cipher operations is the sum
+of: the length of the associated data in blocks, the length of the ciphertext
+in blocks, the length of the plaintext in blocks, plus 1. In this analysis,
+this is simplified to a value of twice the length of the packet in blocks (that
+is, 2l = 2^8
for packets that are limited to 2^11 bytes, or 2l = 2^13
+otherwise). This simplification is based on the packet containing all of the
+associated data and ciphertext. This results in a 1 to 3 block overestimation
+of the number of operations per packet.¶
[GCM-MU] specify concrete bounds for AEAD_AES_128_GCM and AEAD_AES_256_GCM as +used in TLS 1.3 and QUIC. This section documents this analysis using several +simplifying assumptions:¶
+The bounds in [GCM-MU] are tighter and more complete than those used in +[AEBounds], which allows for larger limits than those described in +[TLS13].¶
+For confidentiality, Theorum (4.3) in [GCM-MU] establishes that - for a +single user that does not repeat nonces - the dominant term in determining the +distinguishing advantage between a real and random AEAD algorithm gained by an +attacker is:¶
++2 * (q * l)^2 / 2^n +¶ +
For a target advantage of 2^-57, this results in the relation:¶
++q <= 2^35 / l +¶ +
Thus, endpoints that do not send packets larger than 2^11 bytes cannot protect +more than 2^28 packets in a single connection without causing an attacker to +gain an larger advantage than the target of 2^-57. The limit for endpoints that +allow for the packet size to be as large as 2^16 is instead 2^23.¶
+For integrity, Theorem (4.3) in [GCM-MU] establishes that an attacker gains +an advantage in successfully forging a packet of no more than:¶
++(1 / 2^(8 * n)) + ((2 * v) / 2^(2 * n)) + + ((2 * o * v) / 2^(k + n)) + (n * (v + (v * l)) / 2^k) +¶ +
The goal is to limit this advantage to 2^-57. For AEAD_AES_128_GCM, the fourth +term in this inequality dominates the rest, so the others can be removed without +significant effect on the result. This produces the following approximation:¶
++v <= 2^64 / l +¶ +
Endpoints that do not attempt to remove protection from packets larger than 2^11 +bytes can attempt to remove protection from at most 2^57 packets. Endpoints that +do not restrict the size of processed packets can attempt to remove protection +from at most 2^52 packets.¶
+For AEAD_AES_256_GCM, the same term dominates, but the larger value of k +produces the following approximation:¶
++v <= 2^192 / l +¶ +
This is substantially larger than the limit for AEAD_AES_128_GCM. However, this +document recommends that the same limit be applied to both functions as either +limit is acceptably large.¶
+TLS [TLS13] and [AEBounds] do not specify limits on usage +for AEAD_AES_128_CCM. However, any AEAD that is used with QUIC requires limits +on use that ensure that both confidentiality and integrity are preserved. This +section documents that analysis.¶
+[CCM-ANALYSIS] is used as the basis of this +analysis. The results of that analysis are used to derive usage limits that are +based on those chosen in [TLS13].¶
+For confidentiality, Theorem 2 in [CCM-ANALYSIS] establishes that an attacker +gains a distinguishing advantage over an ideal pseudorandom permutation (PRP) of +no more than:¶
++(2l * q)^2 / 2^n +¶ +
The integrity limit in Theorem 1 in [CCM-ANALYSIS] provides an attacker a +strictly higher advantage for the same number of messages. As the targets for +the confidentiality advantage and the integrity advantage are the same, only +Theorem 1 needs to be considered.¶
+Theorem 1 establishes that an attacker gains an advantage over an +ideal PRP of no more than:¶
++v / 2^t + (2l * (v + q))^2 / 2^n +¶ +
As t
and n
are both 128, the first term is negligible relative to the
+second, so that term can be removed without a significant effect on the result.¶
This produces a relation that combines both encryption and decryption attempts +with the same limit as that produced by the theorem for confidentiality alone. +For a target advantage of 2^-57, this results in:¶
++v + q <= 2^34.5 / l +¶ +
By setting q = v
, values for both confidentiality and integrity limits can be
+produced. Endpoints that limit packets to 2^11 bytes therefore have both
+confidentiality and integrity limits of 2^26.5 packets. Endpoints that do not
+restrict packet size have a limit of 2^21.5.¶
Issue and pull request numbers are listed with a leading octothorp.¶
+Changes to integration of the TLS handshake (#829, #1018, #1094, #1165, #1190, +#1233, #1242, #1252, #1450)¶
+No significant changes.¶
+No significant changes.¶
+The IETF QUIC Working Group received an enormous amount of support from many +people. The following people provided substantive contributions to this +document:¶
+奥 一穂 (Kazuho Oku)¶
+Mikkel Fahnøe Jørgensen¶
+Internet-Draft | +QUIC Transport Protocol | +January 2021 | +
Iyengar & Thomson | +Expires 19 July 2021 | +[Page] | +
This document defines the core of the QUIC transport protocol. QUIC provides +applications with flow-controlled streams for structured communication, +low-latency connection establishment, and network path migration. QUIC includes +security measures that ensure confidentiality, integrity, and availability in a +range of deployment circumstances. Accompanying documents describe the +integration of TLS for key negotiation, loss detection, and an exemplary +congestion control algorithm.¶
+DO NOT DEPLOY THIS VERSION OF QUIC UNTIL IT IS IN AN RFC. This version is still +a work in progress. For trial deployments, please use earlier versions.¶
+Discussion of this draft takes place on the QUIC working group mailing list +(quic@ietf.org), which is archived at +https://mailarchive.ietf.org/arch/search/?email_list=quic¶
+Working Group information can be found at https://github.com/quicwg; source +code and issues list for this draft can be found at +https://github.com/quicwg/base-drafts/labels/-transport.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 19 July 2021.¶
++ Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Simplified BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Simplified BSD License.¶
+QUIC is a secure general-purpose transport protocol. This +document defines version 1 of QUIC, which conforms to the version-independent +properties of QUIC defined in [QUIC-INVARIANTS].¶
+QUIC is a connection-oriented protocol that creates a stateful interaction +between a client and server.¶
+The QUIC handshake combines negotiation of cryptographic and transport +parameters. QUIC integrates the TLS ([TLS13]) handshake, although using a +customized framing for protecting packets. The integration of TLS and QUIC is +described in more detail in [QUIC-TLS]. The handshake is structured to permit +the exchange of application data as soon as possible. This includes an option +for clients to send data immediately (0-RTT), which requires some form of prior +communication or configuration to enable.¶
+Endpoints communicate in QUIC by exchanging QUIC packets. Most packets contain +frames, which carry control information and application data between endpoints. +QUIC authenticates the entirety of each packet and encrypts as much of each +packet as is practical. QUIC packets are carried in UDP datagrams +([UDP]) to better facilitate deployment in existing systems and +networks.¶
+Application protocols exchange information over a QUIC connection via streams, +which are ordered sequences of bytes. Two types of stream can be created: +bidirectional streams, which allow both endpoints to send data; and +unidirectional streams, which allow a single endpoint to send data. A +credit-based scheme is used to limit stream creation and to bound the amount of +data that can be sent.¶
+QUIC provides the necessary feedback to implement reliable delivery and +congestion control. An algorithm for detecting and recovering from loss of +data is described in [QUIC-RECOVERY]. QUIC depends on congestion control +to avoid network congestion. An exemplary congestion control algorithm is +also described in [QUIC-RECOVERY].¶
+QUIC connections are not strictly bound to a single network path. Connection +migration uses connection identifiers to allow connections to transfer to a new +network path. Only clients are able to migrate in this version of QUIC. This +design also allows connections to continue after changes in network topology or +address mappings, such as might be caused by NAT rebinding.¶
+Once established, multiple options are provided for connection termination. +Applications can manage a graceful shutdown, endpoints can negotiate a timeout +period, errors can cause immediate connection teardown, and a stateless +mechanism provides for termination of connections after one endpoint has lost +state.¶
+This document describes the core QUIC protocol and is structured as follows:¶
+Streams are the basic service abstraction that QUIC provides.¶
+ +Connections are the context in which QUIC endpoints communicate.¶
+Packets and frames are the basic unit used by QUIC to communicate.¶
+Finally, encoding details of QUIC protocol elements are described in:¶
+Accompanying documents describe QUIC's loss detection and congestion control +[QUIC-RECOVERY], and the use of TLS and other cryptographic mechanisms +[QUIC-TLS].¶
+This document defines QUIC version 1, which conforms to the protocol invariants +in [QUIC-INVARIANTS].¶
+To refer to QUIC version 1, cite this document. References to the limited +set of version-independent properties of QUIC can cite [QUIC-INVARIANTS].¶
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", +"MAY", and "OPTIONAL" in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, and only when, they +appear in all capitals, as shown here.¶
+Commonly used terms in the document are described below.¶
+The transport protocol described by this document. QUIC is a name, not an +acronym.¶
+An entity that can participate in a QUIC connection by generating, receiving, +and processing QUIC packets. There are only two types of endpoint in QUIC: +client and server.¶
+The endpoint that initiates a QUIC connection.¶
+The endpoint that accepts a QUIC connection.¶
+A complete processable unit of QUIC that can be encapsulated in a UDP +datagram. One or more QUIC packets can be encapsulated in a single UDP +datagram.¶
+A QUIC packet that contains frames other than ACK, PADDING, and +CONNECTION_CLOSE. These cause a recipient to send an acknowledgment; see +Section 13.2.1.¶
+A unit of structured protocol information. There are multiple frame types, +each of which carries different information. Frames are contained in QUIC +packets.¶
+When used without qualification, the tuple of IP version, IP address, and UDP +port number that represents one end of a network path.¶
+An identifier that is used to identify a QUIC connection at an endpoint. +Each endpoint selects one or more Connection IDs for its peer to include in +packets sent towards the endpoint. This value is opaque to the peer.¶
+A unidirectional or bidirectional channel of ordered bytes within a QUIC +connection. A QUIC connection can carry multiple simultaneous streams.¶
+An entity that uses QUIC to send and receive data.¶
+This document uses the terms "QUIC packets", "UDP datagrams", and "IP packets" +to refer to the units of the respective protocols. That is, one or more QUIC +packets can be encapsulated in a UDP datagram, which is in turn encapsulated in +an IP packet.¶
+Packet and frame diagrams in this document use a custom format. The purpose of +this format is to summarize, not define, protocol elements. Prose defines the +complete semantics and details of structures.¶
+Complex fields are named and then followed by a list of fields surrounded by a +pair of matching braces. Each field in this list is separated by commas.¶
+Individual fields include length information, plus indications about fixed +value, optionality, or repetitions. Individual fields use the following +notational conventions, with all lengths in bits:¶
+Indicates that x is A bits long¶
+Indicates that x holds an integer value using the variable-length encoding in +Section 16¶
+Indicates that x can be any length from A to B; A can be omitted to indicate +a minimum of zero bits and B can be omitted to indicate no set upper limit; +values in this format always end on an byte boundary¶
+Indicates that x has a fixed value of C with the length described by +L, which can use any of the three length forms above¶
+Indicates that x has a value in the range from C to D, inclusive, +with the length described by L, as above¶
+Indicates that x is optional (and has length of L)¶
+Indicates that zero or more instances of x are present (and that each +instance is length L)¶
+This document uses network byte order (that is, big endian) values. Fields +are placed starting from the high-order bits of each byte.¶
+By convention, individual fields reference a complex field by using the name of +the complex field.¶
+For example:¶
+When a single-bit field is referenced in prose, the position of that field can +be clarified by using the value of the byte that carries the field with the +field's value set. For example, the value 0x80 could be used to refer to the +single-bit field in the most significant bit of the byte, such as One-bit Field +in Figure 1.¶
+Streams in QUIC provide a lightweight, ordered byte-stream abstraction to an +application. Streams can be unidirectional or bidirectional.¶
+Streams can be created by sending data. Other processes associated with stream +management - ending, cancelling, and managing flow control - are all designed to +impose minimal overheads. For instance, a single STREAM frame (Section 19.8) +can open, carry data for, and close a stream. Streams can also be long-lived and +can last the entire duration of a connection.¶
+Streams can be created by either endpoint, can concurrently send data +interleaved with other streams, and can be cancelled. QUIC does not provide any +means of ensuring ordering between bytes on different streams.¶
+QUIC allows for an arbitrary number of streams to operate concurrently and for +an arbitrary amount of data to be sent on any stream, subject to flow control +constraints and stream limits; see Section 4.¶
+Streams can be unidirectional or bidirectional. Unidirectional streams carry +data in one direction: from the initiator of the stream to its peer. +Bidirectional streams allow for data to be sent in both directions.¶
+Streams are identified within a connection by a numeric value, referred to as +the stream ID. A stream ID is a 62-bit integer (0 to 2^62-1) that is unique for +all streams on a connection. Stream IDs are encoded as variable-length +integers; see Section 16. A QUIC endpoint MUST NOT reuse a stream ID +within a connection.¶
+The least significant bit (0x1) of the stream ID identifies the initiator of the +stream. Client-initiated streams have even-numbered stream IDs (with the bit +set to 0), and server-initiated streams have odd-numbered stream IDs (with the +bit set to 1).¶
+The second least significant bit (0x2) of the stream ID distinguishes between +bidirectional streams (with the bit set to 0) and unidirectional streams (with +the bit set to 1).¶
+The two least significant bits from a stream ID therefore identify a stream as +one of four types, as summarized in Table 1.¶
+Bits | +Stream Type | +
---|---|
0x0 | +Client-Initiated, Bidirectional | +
0x1 | +Server-Initiated, Bidirectional | +
0x2 | +Client-Initiated, Unidirectional | +
0x3 | +Server-Initiated, Unidirectional | +
The stream space for each type begins at the minimum value (0x0 through 0x3 +respectively); successive streams of each type are created with numerically +increasing stream IDs. A stream ID that is used out of order results in all +streams of that type with lower-numbered stream IDs also being opened.¶
+STREAM frames (Section 19.8) encapsulate data sent by an application. An +endpoint uses the Stream ID and Offset fields in STREAM frames to place data in +order.¶
+Endpoints MUST be able to deliver stream data to an application as an ordered +byte-stream. Delivering an ordered byte-stream requires that an endpoint buffer +any data that is received out of order, up to the advertised flow control limit.¶
+QUIC makes no specific allowances for delivery of stream data out of +order. However, implementations MAY choose to offer the ability to deliver data +out of order to a receiving application.¶
+An endpoint could receive data for a stream at the same stream offset multiple +times. Data that has already been received can be discarded. The data at a +given offset MUST NOT change if it is sent multiple times; an endpoint MAY treat +receipt of different data at the same offset within a stream as a connection +error of type PROTOCOL_VIOLATION.¶
+Streams are an ordered byte-stream abstraction with no other structure visible +to QUIC. STREAM frame boundaries are not expected to be preserved when +data is transmitted, retransmitted after packet loss, or delivered to the +application at a receiver.¶
+An endpoint MUST NOT send data on any stream without ensuring that it is within +the flow control limits set by its peer. Flow control is described in detail in +Section 4.¶
+Stream multiplexing can have a significant effect on application performance if +resources allocated to streams are correctly prioritized.¶
+QUIC does not provide a mechanism for exchanging prioritization information. +Instead, it relies on receiving priority information from the application.¶
+A QUIC implementation SHOULD provide ways in which an application can indicate +the relative priority of streams. An implementation uses information provided +by the application to determine how to allocate resources to active streams.¶
+This document does not define an API for QUIC, but instead defines a set of +functions on streams that application protocols can rely upon. An application +protocol can assume that a QUIC implementation provides an interface that +includes the operations described in this section. An implementation designed +for use with a specific application protocol might provide only those operations +that are used by that protocol.¶
+On the sending part of a stream, an application protocol can:¶
+On the receiving part of a stream, an application protocol can:¶
+An application protocol can also request to be informed of state changes on +streams, including when the peer has opened or reset a stream, when a peer +aborts reading on a stream, when new data is available, and when data can or +cannot be written to the stream due to flow control.¶
+This section describes streams in terms of their send or receive components. +Two state machines are described: one for the streams on which an endpoint +transmits data (Section 3.1), and another for streams on which an +endpoint receives data (Section 3.2).¶
+Unidirectional streams use either the sending or receiving state machine +depending on the stream type and endpoint role. Bidirectional streams use both +state machines at both endpoints. For the most part, the use of these state +machines is the same whether the stream is unidirectional or bidirectional. The +conditions for opening a stream are slightly more complex for a bidirectional +stream because the opening of either the send or receive side causes the stream +to open in both directions.¶
+The state machines shown in this section are largely informative. This +document uses stream states to describe rules for when and how different types +of frames can be sent and the reactions that are expected when different types +of frames are received. Though these state machines are intended to be useful +in implementing QUIC, these states are not intended to constrain +implementations. An implementation can define a different state machine as long +as its behavior is consistent with an implementation that implements these +states.¶
+In some cases, a single event or action can cause a transition through +multiple states. For instance, sending STREAM with a FIN bit set can cause +two state transitions for a sending stream: from the Ready state to the Send +state, and from the Send state to the Data Sent state.¶
+Figure 2 shows the states for the part of a stream that sends +data to a peer.¶
+The sending part of a stream that the endpoint initiates (types 0 +and 2 for clients, 1 and 3 for servers) is opened by the application. The +"Ready" state represents a newly created stream that is able to accept data from +the application. Stream data might be buffered in this state in preparation for +sending.¶
+Sending the first STREAM or STREAM_DATA_BLOCKED frame causes a sending part of a +stream to enter the "Send" state. An implementation might choose to defer +allocating a stream ID to a stream until it sends the first STREAM frame and +enters this state, which can allow for better stream prioritization.¶
+The sending part of a bidirectional stream initiated by a peer (type 0 for a +server, type 1 for a client) starts in the "Ready" state when the receiving part +is created.¶
+In the "Send" state, an endpoint transmits - and retransmits as necessary - +stream data in STREAM frames. The endpoint respects the flow control limits set +by its peer, and continues to accept and process MAX_STREAM_DATA frames. An +endpoint in the "Send" state generates STREAM_DATA_BLOCKED frames if it is +blocked from sending by stream flow control limits (Section 4.1).¶
+After the application indicates that all stream data has been sent and a STREAM +frame containing the FIN bit is sent, the sending part of the stream enters the +"Data Sent" state. From this state, the endpoint only retransmits stream data +as necessary. The endpoint does not need to check flow control limits or send +STREAM_DATA_BLOCKED frames for a stream in this state. MAX_STREAM_DATA frames +might be received until the peer receives the final stream offset. The endpoint +can safely ignore any MAX_STREAM_DATA frames it receives from its peer for a +stream in this state.¶
+Once all stream data has been successfully acknowledged, the sending part of the +stream enters the "Data Recvd" state, which is a terminal state.¶
+From any of the "Ready", "Send", or "Data Sent" states, an application can +signal that it wishes to abandon transmission of stream data. Alternatively, an +endpoint might receive a STOP_SENDING frame from its peer. In either case, the +endpoint sends a RESET_STREAM frame, which causes the stream to enter the "Reset +Sent" state.¶
+An endpoint MAY send a RESET_STREAM as the first frame that mentions a stream; +this causes the sending part of that stream to open and then immediately +transition to the "Reset Sent" state.¶
+Once a packet containing a RESET_STREAM has been acknowledged, the sending part +of the stream enters the "Reset Recvd" state, which is a terminal state.¶
+Figure 3 shows the states for the part of a stream that +receives data from a peer. The states for a receiving part of a stream mirror +only some of the states of the sending part of the stream at the peer. The +receiving part of a stream does not track states on the sending part that cannot +be observed, such as the "Ready" state. Instead, the receiving part of a stream +tracks the delivery of data to the application, some of which cannot be observed +by the sender.¶
+The receiving part of a stream initiated by a peer (types 1 and 3 for a client, +or 0 and 2 for a server) is created when the first STREAM, STREAM_DATA_BLOCKED, +or RESET_STREAM frame is received for that stream. For bidirectional streams +initiated by a peer, receipt of a MAX_STREAM_DATA or STOP_SENDING frame for the +sending part of the stream also creates the receiving part. The initial state +for the receiving part of a stream is "Recv".¶
+For a bidirectional stream, the receiving part enters the "Recv" state when the +sending part initiated by the endpoint (type 0 for a client, type +1 for a server) enters the "Ready" state.¶
+An endpoint opens a bidirectional stream when a MAX_STREAM_DATA or STOP_SENDING +frame is received from the peer for that stream. Receiving a MAX_STREAM_DATA +frame for an unopened stream indicates that the remote peer has opened the +stream and is providing flow control credit. Receiving a STOP_SENDING frame for +an unopened stream indicates that the remote peer no longer wishes to receive +data on this stream. Either frame might arrive before a STREAM or +STREAM_DATA_BLOCKED frame if packets are lost or reordered.¶
+Before a stream is created, all streams of the same type with lower-numbered +stream IDs MUST be created. This ensures that the creation order for streams is +consistent on both endpoints.¶
+In the "Recv" state, the endpoint receives STREAM and STREAM_DATA_BLOCKED +frames. Incoming data is buffered and can be reassembled into the correct order +for delivery to the application. As data is consumed by the application and +buffer space becomes available, the endpoint sends MAX_STREAM_DATA frames to +allow the peer to send more data.¶
+When a STREAM frame with a FIN bit is received, the final size of the stream is +known; see Section 4.5. The receiving part of the stream then enters the +"Size Known" state. In this state, the endpoint no longer needs to send +MAX_STREAM_DATA frames, it only receives any retransmissions of stream data.¶
+Once all data for the stream has been received, the receiving part enters the +"Data Recvd" state. This might happen as a result of receiving the same STREAM +frame that causes the transition to "Size Known". After all data has been +received, any STREAM or STREAM_DATA_BLOCKED frames for the stream can be +discarded.¶
+The "Data Recvd" state persists until stream data has been delivered to the +application. Once stream data has been delivered, the stream enters the "Data +Read" state, which is a terminal state.¶
+Receiving a RESET_STREAM frame in the "Recv" or "Size Known" states causes the +stream to enter the "Reset Recvd" state. This might cause the delivery of +stream data to the application to be interrupted.¶
+It is possible that all stream data has already been received when a +RESET_STREAM is received (that is, in the "Data Recvd" state). Similarly, it is +possible for remaining stream data to arrive after receiving a RESET_STREAM +frame (the "Reset Recvd" state). An implementation is free to manage this +situation as it chooses.¶
+Sending RESET_STREAM means that an endpoint cannot guarantee delivery of stream +data; however there is no requirement that stream data not be delivered if a +RESET_STREAM is received. An implementation MAY interrupt delivery of stream +data, discard any data that was not consumed, and signal the receipt of the +RESET_STREAM. A RESET_STREAM signal might be suppressed or withheld if stream +data is completely received and is buffered to be read by the application. If +the RESET_STREAM is suppressed, the receiving part of the stream remains in +"Data Recvd".¶
+Once the application receives the signal indicating that the stream +was reset, the receiving part of the stream transitions to the "Reset Read" +state, which is a terminal state.¶
+The sender of a stream sends just three frame types that affect the state of a +stream at either sender or receiver: STREAM (Section 19.8), +STREAM_DATA_BLOCKED (Section 19.13), and RESET_STREAM +(Section 19.4).¶
+A sender MUST NOT send any of these frames from a terminal state ("Data Recvd" +or "Reset Recvd"). A sender MUST NOT send a STREAM or STREAM_DATA_BLOCKED frame +for a stream in the "Reset Sent" state or any terminal state, that is, after +sending a RESET_STREAM frame. A receiver could receive any of these three +frames in any state, due to the possibility of delayed delivery of packets +carrying them.¶
+The receiver of a stream sends MAX_STREAM_DATA (Section 19.10) and +STOP_SENDING frames (Section 19.5).¶
+The receiver only sends MAX_STREAM_DATA in the "Recv" state. A receiver MAY +send STOP_SENDING in any state where it has not received a RESET_STREAM frame; +that is states other than "Reset Recvd" or "Reset Read". However there is +little value in sending a STOP_SENDING frame in the "Data Recvd" state, since +all stream data has been received. A sender could receive either of these two +frames in any state as a result of delayed delivery of packets.¶
+A bidirectional stream is composed of sending and receiving parts. +Implementations can represent states of the bidirectional stream as composites +of sending and receiving stream states. The simplest model presents the stream +as "open" when either sending or receiving parts are in a non-terminal state and +"closed" when both sending and receiving streams are in terminal states.¶
+Table 2 shows a more complex mapping of bidirectional stream +states that loosely correspond to the stream states in HTTP/2 +[HTTP2]. This shows that multiple states on sending or receiving +parts of streams are mapped to the same composite state. Note that this is just +one possibility for such a mapping; this mapping requires that data is +acknowledged before the transition to a "closed" or "half-closed" state.¶
+Sending Part | +Receiving Part | +Composite State | +
---|---|---|
No Stream/Ready | +No Stream/Recv *1 | +idle | +
Ready/Send/Data Sent | +Recv/Size Known | +open | +
Ready/Send/Data Sent | +Data Recvd/Data Read | +half-closed (remote) | +
Ready/Send/Data Sent | +Reset Recvd/Reset Read | +half-closed (remote) | +
Data Recvd | +Recv/Size Known | +half-closed (local) | +
Reset Sent/Reset Recvd | +Recv/Size Known | +half-closed (local) | +
Reset Sent/Reset Recvd | +Data Recvd/Data Read | +closed | +
Reset Sent/Reset Recvd | +Reset Recvd/Reset Read | +closed | +
Data Recvd | +Data Recvd/Data Read | +closed | +
Data Recvd | +Reset Recvd/Reset Read | +closed | +
A stream is considered "idle" if it has not yet been created, or if the +receiving part of the stream is in the "Recv" state without yet having +received any frames.¶
+If an application is no longer interested in the data it is receiving on a +stream, it can abort reading the stream and specify an application error code.¶
+If the stream is in the "Recv" or "Size Known" states, the transport SHOULD +signal this by sending a STOP_SENDING frame to prompt closure of the stream in +the opposite direction. This typically indicates that the receiving application +is no longer reading data it receives from the stream, but it is not a guarantee +that incoming data will be ignored.¶
+STREAM frames received after sending a STOP_SENDING frame are still counted +toward connection and stream flow control, even though these frames can be +discarded upon receipt.¶
+A STOP_SENDING frame requests that the receiving endpoint send a RESET_STREAM +frame. An endpoint that receives a STOP_SENDING frame MUST send a RESET_STREAM +frame if the stream is in the Ready or Send state. If the stream is in the +"Data Sent" state, the endpoint MAY defer sending the RESET_STREAM frame until +the packets containing outstanding data are acknowledged or declared lost. If +any outstanding data is declared lost, the endpoint SHOULD send a RESET_STREAM +frame instead of retransmitting the data.¶
+An endpoint SHOULD copy the error code from the STOP_SENDING frame to the +RESET_STREAM frame it sends, but can use any application error code. An +endpoint that sends a STOP_SENDING frame MAY ignore the error code in +any RESET_STREAM frames subsequently received for that stream.¶
+STOP_SENDING SHOULD only be sent for a stream that has not been reset by the +peer. STOP_SENDING is most useful for streams in the "Recv" or "Size Known" +states.¶
+An endpoint is expected to send another STOP_SENDING frame if a packet +containing a previous STOP_SENDING is lost. However, once either all stream +data or a RESET_STREAM frame has been received for the stream - that is, the +stream is in any state other than "Recv" or "Size Known" - sending a +STOP_SENDING frame is unnecessary.¶
+An endpoint that wishes to terminate both directions of a bidirectional stream +can terminate one direction by sending a RESET_STREAM frame, and it can +encourage prompt termination in the opposite direction by sending a STOP_SENDING +frame.¶
+Receivers need to limit the amount of data that they are required to buffer, in +order to prevent a fast sender from overwhelming them or a malicious sender from +consuming a large amount of memory. To enable a receiver to limit memory +commitments for a connection, streams are flow controlled both individually and +across a connection as a whole. A QUIC receiver controls the maximum amount of +data the sender can send on a stream as well as across all streams at any time, +as described in Section 4.1 and Section 4.2.¶
+Similarly, to limit concurrency within a connection, a QUIC endpoint controls +the maximum cumulative number of streams that its peer can initiate, as +described in Section 4.6.¶
+Data sent in CRYPTO frames is not flow controlled in the same way as stream +data. QUIC relies on the cryptographic protocol implementation to avoid +excessive buffering of data; see [QUIC-TLS]. To avoid excessive buffering at +multiple layers, QUIC implementations SHOULD provide an interface for the +cryptographic protocol implementation to communicate its buffering limits.¶
+QUIC employs a limit-based flow-control scheme where a receiver advertises the +limit of total bytes it is prepared to receive on a given stream or for the +entire connection. This leads to two levels of data flow control in QUIC:¶
+Senders MUST NOT send data in excess of either limit.¶
+A receiver sets initial limits for all streams through transport parameters +during the handshake (Section 7.4). Subsequently, a receiver sends +MAX_STREAM_DATA (Section 19.10) or MAX_DATA (Section 19.9) +frames to the sender to advertise larger limits.¶
+A receiver can advertise a larger limit for a stream by sending a +MAX_STREAM_DATA frame with the corresponding stream ID. A MAX_STREAM_DATA frame +indicates the maximum absolute byte offset of a stream. A receiver could +determine the flow control offset to be advertised based on the current offset +of data consumed on that stream.¶
+A receiver can advertise a larger limit for a connection by sending a MAX_DATA +frame, which indicates the maximum of the sum of the absolute byte offsets of +all streams. A receiver maintains a cumulative sum of bytes received on all +streams, which is used to check for violations of the advertised connection or +stream data limits. A receiver could determine the maximum data limit to be +advertised based on the sum of bytes consumed on all streams.¶
+Once a receiver advertises a limit for the connection or a stream, it is not an +error to advertise a smaller limit, but the smaller limit has no effect.¶
+A receiver MUST close the connection with a FLOW_CONTROL_ERROR error +(Section 11) if the sender violates the advertised connection or stream +data limits.¶
+A sender MUST ignore any MAX_STREAM_DATA or MAX_DATA frames that do not increase +flow control limits.¶
+If a sender has sent data up to the limit, it will be unable to send new data +and is considered blocked. A sender SHOULD send a STREAM_DATA_BLOCKED or +DATA_BLOCKED frame to indicate to the receiver that it has data to write but is +blocked by flow control limits. If a sender is blocked for a period longer than +the idle timeout (Section 10.1), the receiver might close the connection +even when the sender has data that is available for transmission. To keep the +connection from closing, a sender that is flow control limited SHOULD +periodically send a STREAM_DATA_BLOCKED or DATA_BLOCKED frame when it has no +ack-eliciting packets in flight.¶
+Implementations decide when and how much credit to advertise in MAX_STREAM_DATA +and MAX_DATA frames, but this section offers a few considerations.¶
+To avoid blocking a sender, a receiver MAY send a MAX_STREAM_DATA or MAX_DATA +frame multiple times within a round trip or send it early enough to allow time +for loss of the frame and subsequent recovery.¶
+Control frames contribute to connection overhead. Therefore, frequently sending +MAX_STREAM_DATA and MAX_DATA frames with small changes is undesirable. On the +other hand, if updates are less frequent, larger increments to limits are +necessary to avoid blocking a sender, requiring larger resource commitments at +the receiver. There is a trade-off between resource commitment and overhead +when determining how large a limit is advertised.¶
+A receiver can use an autotuning mechanism to tune the frequency and amount of +advertised additional credit based on a round-trip time estimate and the rate at +which the receiving application consumes data, similar to common TCP +implementations. As an optimization, an endpoint could send frames related to +flow control only when there are other frames to send, ensuring that flow +control does not cause extra packets to be sent.¶
+A blocked sender is not required to send STREAM_DATA_BLOCKED or DATA_BLOCKED +frames. Therefore, a receiver MUST NOT wait for a STREAM_DATA_BLOCKED or +DATA_BLOCKED frame before sending a MAX_STREAM_DATA or MAX_DATA frame; doing so +could result in the sender being blocked for the rest of the connection. Even if +the sender sends these frames, waiting for them will result in the sender being +blocked for at least an entire round trip.¶
+When a sender receives credit after being blocked, it might be able to send a +large amount of data in response, resulting in short-term congestion; see +Section 7.7 in [QUIC-RECOVERY] for a discussion of how a sender can avoid this +congestion.¶
+If an endpoint cannot ensure that its peer always has available flow control +credit that is greater than the peer's bandwidth-delay product on this +connection, its receive throughput will be limited by flow control.¶
+Packet loss can cause gaps in the receive buffer, preventing the application +from consuming data and freeing up receive buffer space.¶
+Sending timely updates of flow control limits can improve performance. +Sending packets only to provide flow control updates can increase network +load and adversely affect performance. Sending flow control updates along with +other frames, such as ACK frames, reduces the cost of those updates.¶
+Endpoints need to eventually agree on the amount of flow control credit that has +been consumed on every stream, to be able to account for all bytes for +connection-level flow control.¶
+On receipt of a RESET_STREAM frame, an endpoint will tear down state for the +matching stream and ignore further data arriving on that stream.¶
+RESET_STREAM terminates one direction of a stream abruptly. For a bidirectional +stream, RESET_STREAM has no effect on data flow in the opposite direction. Both +endpoints MUST maintain flow control state for the stream in the unterminated +direction until that direction enters a terminal state.¶
+The final size is the amount of flow control credit that is consumed by a +stream. Assuming that every contiguous byte on the stream was sent once, the +final size is the number of bytes sent. More generally, this is one higher +than the offset of the byte with the largest offset sent on the stream, or zero +if no bytes were sent.¶
+A sender always communicates the final size of a stream to the receiver +reliably, no matter how the stream is terminated. The final size is the sum of +the Offset and Length fields of a STREAM frame with a FIN flag, noting that +these fields might be implicit. Alternatively, the Final Size field of a +RESET_STREAM frame carries this value. This guarantees that both endpoints agree +on how much flow control credit was consumed by the sender on that stream.¶
+An endpoint will know the final size for a stream when the receiving part of the +stream enters the "Size Known" or "Reset Recvd" state (Section 3). The +receiver MUST use the final size of the stream to account for all bytes sent on +the stream in its connection level flow controller.¶
+An endpoint MUST NOT send data on a stream at or beyond the final size.¶
+Once a final size for a stream is known, it cannot change. If a RESET_STREAM or +STREAM frame is received indicating a change in the final size for the stream, +an endpoint SHOULD respond with a FINAL_SIZE_ERROR error; see +Section 11. A receiver SHOULD treat receipt of data at or beyond the +final size as a FINAL_SIZE_ERROR error, even after a stream is closed. +Generating these errors is not mandatory, because requiring that an +endpoint generate these errors also means that the endpoint needs to maintain +the final size state for closed streams, which could mean a significant state +commitment.¶
+An endpoint limits the cumulative number of incoming streams a peer can open. +Only streams with a stream ID less than (max_stream * 4 + +initial_stream_id_for_type) can be opened; see Table 1. Initial +limits are set in the transport parameters; see +Section 18.2. Subsequent limits are advertised using +MAX_STREAMS frames; see Section 19.11. Separate limits apply to +unidirectional and bidirectional streams.¶
+If a max_streams transport parameter or a MAX_STREAMS frame is received with a +value greater than 2^60, this would allow a maximum stream ID that cannot be +expressed as a variable-length integer; see Section 16. If either is +received, the connection MUST be closed immediately with a connection error of +type TRANSPORT_PARAMETER_ERROR if the offending value was received in a +transport parameter or of type FRAME_ENCODING_ERROR if it was received in a +frame; see Section 10.2.¶
+Endpoints MUST NOT exceed the limit set by their peer. An endpoint that +receives a frame with a stream ID exceeding the limit it has sent MUST treat +this as a connection error of type STREAM_LIMIT_ERROR (Section 11).¶
+Once a receiver advertises a stream limit using the MAX_STREAMS frame, +advertising a smaller limit has no effect. A receiver MUST ignore any +MAX_STREAMS frame that does not increase the stream limit.¶
+As with stream and connection flow control, this document leaves implementations +to decide when and how many streams should be advertised +to a peer via MAX_STREAMS. Implementations might choose to increase limits as +streams are closed, to keep the number of streams available to peers roughly +consistent.¶
+An endpoint that is unable to open a new stream due to the peer's limits SHOULD +send a STREAMS_BLOCKED frame (Section 19.14). This signal is +considered useful for debugging. An endpoint MUST NOT wait to receive this +signal before advertising additional credit, since doing so will mean that the +peer will be blocked for at least an entire round trip, and potentially +indefinitely if the peer chooses not to send STREAMS_BLOCKED frames.¶
+A QUIC connection is shared state between a client and a server.¶
+Each connection starts with a handshake phase, during which the two endpoints +establish a shared secret using the cryptographic handshake protocol +[QUIC-TLS] and negotiate the application protocol. The handshake +(Section 7) confirms that both endpoints are willing to communicate +(Section 8.1) and establishes parameters for the connection +(Section 7.4).¶
+An application protocol can use the connection during the handshake phase with +some limitations. 0-RTT allows application data to be sent by a client before +receiving a response from the server. However, 0-RTT provides no protection +against replay attacks; see Section 9.2 of [QUIC-TLS]. A server can also send +application data to a client before it receives the final cryptographic +handshake messages that allow it to confirm the identity and liveness of the +client. These capabilities allow an application protocol to offer the option of +trading some security guarantees for reduced latency.¶
+The use of connection IDs (Section 5.1) allows connections to migrate to a +new network path, both as a direct choice of an endpoint and when forced by a +change in a middlebox. Section 9 describes mitigations for the security and +privacy issues associated with migration.¶
+For connections that are no longer needed or desired, there are several ways for +a client and server to terminate a connection, as described in Section 10.¶
+Each connection possesses a set of connection identifiers, or connection IDs, +each of which can identify the connection. Connection IDs are independently +selected by endpoints; each endpoint selects the connection IDs that its peer +uses.¶
+The primary function of a connection ID is to ensure that changes in addressing +at lower protocol layers (UDP, IP) do not cause packets for a QUIC +connection to be delivered to the wrong endpoint. Each endpoint selects +connection IDs using an implementation-specific (and perhaps +deployment-specific) method that will allow packets with that connection ID to +be routed back to the endpoint and to be identified by the endpoint upon +receipt.¶
+Multiple connection IDs are used so that endpoints can send packets that cannot +be identified by an observer as being for the same connection without +cooperation from an endpoint; see Section 9.5.¶
+Connection IDs MUST NOT contain any information that can be used by an external +observer (that is, one that does not cooperate with the issuer) to correlate +them with other connection IDs for the same connection. As a trivial example, +this means the same connection ID MUST NOT be issued more than once on the same +connection.¶
+Packets with long headers include Source Connection ID and Destination +Connection ID fields. These fields are used to set the connection IDs for new +connections; see Section 7.2 for details.¶
+Packets with short headers (Section 17.3) only include the Destination +Connection ID and omit the explicit length. The length of the Destination +Connection ID field is expected to be known to endpoints. Endpoints using a +load balancer that routes based on connection ID could agree with the load +balancer on a fixed length for connection IDs, or agree on an encoding scheme. +A fixed portion could encode an explicit length, which allows the entire +connection ID to vary in length and still be used by the load balancer.¶
+A Version Negotiation (Section 17.2.1) packet echoes the connection IDs +selected by the client, both to ensure correct routing toward the client and to +demonstrate that the packet is in response to a packet sent by the client.¶
+A zero-length connection ID can be used when a connection ID is not needed to +route to the correct endpoint. However, multiplexing connections on the same +local IP address and port while using zero-length connection IDs will cause +failures in the presence of peer connection migration, NAT rebinding, and client +port reuse. An endpoint MUST NOT use the same IP address and port for multiple +concurrent connections with zero-length connection IDs, unless it is certain +that those protocol features are not in use.¶
+When an endpoint uses a non-zero-length connection ID, it needs to ensure that +the peer has a supply of connection IDs from which to choose for packets sent to +the endpoint. These connection IDs are supplied by the endpoint using the +NEW_CONNECTION_ID frame (Section 19.15).¶
+Each Connection ID has an associated sequence number to assist in detecting when +NEW_CONNECTION_ID or RETIRE_CONNECTION_ID frames refer to the same value. The +initial connection ID issued by an endpoint is sent in the Source Connection ID +field of the long packet header (Section 17.2) during the handshake. The +sequence number of the initial connection ID is 0. If the preferred_address +transport parameter is sent, the sequence number of the supplied connection ID +is 1.¶
+Additional connection IDs are communicated to the peer using NEW_CONNECTION_ID +frames (Section 19.15). The sequence number on each newly issued +connection ID MUST increase by 1. The connection ID that a client selects for +the first Destination Connection ID field it sends and any connection ID +provided by a Retry packet are not assigned sequence numbers.¶
+When an endpoint issues a connection ID, it MUST accept packets that carry this +connection ID for the duration of the connection or until its peer invalidates +the connection ID via a RETIRE_CONNECTION_ID frame +(Section 19.16). Connection IDs that are issued and not +retired are considered active; any active connection ID is valid for use with +the current connection at any time, in any packet type. This includes the +connection ID issued by the server via the preferred_address transport +parameter.¶
+An endpoint SHOULD ensure that its peer has a sufficient number of available and +unused connection IDs. Endpoints advertise the number of active connection IDs +they are willing to maintain using the active_connection_id_limit transport +parameter. An endpoint MUST NOT provide more connection IDs than the peer's +limit. An endpoint MAY send connection IDs that temporarily exceed a peer's +limit if the NEW_CONNECTION_ID frame also requires the retirement of any excess, +by including a sufficiently large value in the Retire Prior To field.¶
+A NEW_CONNECTION_ID frame might cause an endpoint to add some active connection +IDs and retire others based on the value of the Retire Prior To field. After +processing a NEW_CONNECTION_ID frame and adding and retiring active connection +IDs, if the number of active connection IDs exceeds the value advertised in its +active_connection_id_limit transport parameter, an endpoint MUST close the +connection with an error of type CONNECTION_ID_LIMIT_ERROR.¶
+An endpoint SHOULD supply a new connection ID when the peer retires a connection +ID. If an endpoint provided fewer connection IDs than the peer's +active_connection_id_limit, it MAY supply a new connection ID when it receives a +packet with a previously unused connection ID. An endpoint MAY limit the +total number of connection IDs issued for each connection to +avoid the risk of running out of connection IDs; see Section 10.3.2. An +endpoint MAY also limit the issuance of connection IDs to reduce the amount of +per-path state it maintains, such as path validation status, as its peer +might interact with it over as many paths as there are issued connection +IDs.¶
+An endpoint that initiates migration and requires non-zero-length connection IDs +SHOULD ensure that the pool of connection IDs available to its peer allows the +peer to use a new connection ID on migration, as the peer will be unable to +respond if the pool is exhausted.¶
+An endpoint that selects a zero-length connection ID during the handshake +cannot issue a new connection ID. A zero-length Destination Connection ID +field is used in all packets sent toward such an endpoint over any network +path.¶
+An endpoint can change the connection ID it uses for a peer to another available +one at any time during the connection. An endpoint consumes connection IDs in +response to a migrating peer; see Section 9.5 for more.¶
+An endpoint maintains a set of connection IDs received from its peer, any of +which it can use when sending packets. When the endpoint wishes to remove a +connection ID from use, it sends a RETIRE_CONNECTION_ID frame to its peer. +Sending a RETIRE_CONNECTION_ID frame indicates that the connection ID will not +be used again and requests that the peer replace it with a new connection ID +using a NEW_CONNECTION_ID frame.¶
+As discussed in Section 9.5, endpoints limit the use of a +connection ID to packets sent from a single local address to a single +destination address. Endpoints SHOULD retire connection IDs when they are no +longer actively using either the local or destination address for which the +connection ID was used.¶
+An endpoint might need to stop accepting previously issued connection IDs in +certain circumstances. Such an endpoint can cause its peer to retire connection +IDs by sending a NEW_CONNECTION_ID frame with an increased Retire Prior To +field. The endpoint SHOULD continue to accept the previously issued connection +IDs until they are retired by the peer. If the endpoint can no longer process +the indicated connection IDs, it MAY close the connection.¶
+Upon receipt of an increased Retire Prior To field, the peer MUST stop using +the corresponding connection IDs and retire them with RETIRE_CONNECTION_ID +frames before adding the newly provided connection ID to the set of active +connection IDs. This ordering allows an endpoint to replace all active +connection IDs without the possibility of a peer having no available connection +IDs and without exceeding the limit the peer sets in the +active_connection_id_limit transport parameter; see +Section 18.2. Failure to cease using the connection IDs +when requested can result in connection failures, as the issuing endpoint might +be unable to continue using the connection IDs with the active connection.¶
+An endpoint SHOULD limit the number of connection IDs it has retired locally and +have not yet been acknowledged. An endpoint SHOULD allow for sending and +tracking a number of RETIRE_CONNECTION_ID frames of at least twice the +active_connection_id limit. An endpoint MUST NOT forget a connection ID without +retiring it, though it MAY choose to treat having connection IDs in need of +retirement that exceed this limit as a connection error of type +CONNECTION_ID_LIMIT_ERROR.¶
+Endpoints SHOULD NOT issue updates of the Retire Prior To field before receiving +RETIRE_CONNECTION_ID frames that retire all connection IDs indicated by the +previous Retire Prior To value.¶
+Incoming packets are classified on receipt. Packets can either be associated +with an existing connection, or - for servers - potentially create a new +connection.¶
+Endpoints try to associate a packet with an existing connection. If the packet +has a non-zero-length Destination Connection ID corresponding to an existing +connection, QUIC processes that packet accordingly. Note that more than one +connection ID can be associated with a connection; see Section 5.1.¶
+If the Destination Connection ID is zero length and the addressing information +in the packet matches the addressing information the endpoint uses to identify a +connection with a zero-length connection ID, QUIC processes the packet as part +of that connection. An endpoint can use just destination IP and port or both +source and destination addresses for identification, though this makes +connections fragile as described in Section 5.1.¶
+Endpoints can send a Stateless Reset (Section 10.3) for any packets that +cannot be attributed to an existing connection. A stateless reset allows a peer +to more quickly identify when a connection becomes unusable.¶
+Packets that are matched to an existing connection are discarded if the packets +are inconsistent with the state of that connection. For example, packets are +discarded if they indicate a different protocol version than that of the +connection, or if the removal of packet protection is unsuccessful once the +expected keys are available.¶
+Invalid packets that lack strong integrity protection, such as Initial, Retry, +or Version Negotiation, MAY be discarded. An endpoint MUST generate a +connection error if processing the contents of these packets prior to +discovering an error, or fully revert any changes made during that processing.¶
+Valid packets sent to clients always include a Destination Connection ID that +matches a value the client selects. Clients that choose to receive zero-length +connection IDs can use the local address and port to identify a connection. +Packets that do not match an existing connection, based on Destination +Connection ID or, if this value is zero-length, local IP address and port, are +discarded.¶
+Due to packet reordering or loss, a client might receive packets for a +connection that are encrypted with a key it has not yet computed. The client MAY +drop these packets, or MAY buffer them in anticipation of later packets that +allow it to compute the key.¶
+If a client receives a packet that uses a different version than it initially +selected, it MUST discard that packet.¶
+If a server receives a packet that indicates an unsupported version and if the +packet is large enough to initiate a new connection for any supported version, +the server SHOULD send a Version Negotiation packet as described in Section 6.1. +A server MAY limit the number of packets to which it responds with a Version +Negotiation packet. Servers MUST drop smaller packets that specify unsupported +versions.¶
+The first packet for an unsupported version can use different semantics and +encodings for any version-specific field. In particular, different packet +protection keys might be used for different versions. Servers that do not +support a particular version are unlikely to be able to decrypt the payload of +the packet or properly interpret the result. Servers SHOULD respond with a +Version Negotiation packet, provided that the datagram is sufficiently long.¶
+Packets with a supported version, or no version field, are matched to a +connection using the connection ID or - for packets with zero-length connection +IDs - the local address and port. These packets are processed using the +selected connection; otherwise, the server continues below.¶
+If the packet is an Initial packet fully conforming with the specification, the +server proceeds with the handshake (Section 7). This commits the server to +the version that the client selected.¶
+If a server refuses to accept a new connection, it SHOULD send an Initial packet +containing a CONNECTION_CLOSE frame with error code CONNECTION_REFUSED.¶
+If the packet is a 0-RTT packet, the server MAY buffer a limited number of these +packets in anticipation of a late-arriving Initial packet. Clients are not able +to send Handshake packets prior to receiving a server response, so servers +SHOULD ignore any such packets.¶
+Servers MUST drop incoming packets under all other circumstances.¶
+A server deployment could load balance among servers using only source and +destination IP addresses and ports. Changes to the client's IP address or port +could result in packets being forwarded to the wrong server. Such a server +deployment could use one of the following methods for connection continuity +when a client's address changes.¶
+A server in a deployment that does not implement a solution to maintain +connection continuity when the client address changes SHOULD indicate migration +is not supported using the disable_active_migration transport parameter. The +disable_active_migration transport parameter does not prohibit connection +migration after a client has acted on a preferred_address transport parameter.¶
+Server deployments that use this simple form of load balancing MUST avoid the +creation of a stateless reset oracle; see Section 21.11.¶
+This document does not define an API for QUIC, but instead defines a set of +functions for QUIC connections that application protocols can rely upon. An +application protocol can assume that an implementation of QUIC provides an +interface that includes the operations described in this section. An +implementation designed for use with a specific application protocol might +provide only those operations that are used by that protocol.¶
+When implementing the client role, an application protocol can:¶
+When implementing the server role, an application protocol can:¶
+In either role, an application protocol can:¶
+Version negotiation allows a server to indicate that it does not support +the version the client used. A server sends a Version Negotiation packet in +response to each packet that might initiate a new connection; see +Section 5.2 for details.¶
+The size of the first packet sent by a client will determine whether a server +sends a Version Negotiation packet. Clients that support multiple QUIC versions +SHOULD ensure that the first UDP datagram they send is sized to the largest of +the minimum datagram sizes from all versions they support, using PADDING frames +(Section 19.1) as necessary. This ensures that the server responds if there +is a mutually supported version. A server might not send a Version Negotiation +packet if the datagram it receives is smaller than the minimum size specified in +a different version; see Section 14.1.¶
+If the version selected by the client is not acceptable to the server, the +server responds with a Version Negotiation packet; see Section 17.2.1. This +includes a list of versions that the server will accept. An endpoint MUST NOT +send a Version Negotiation packet in response to receiving a Version Negotiation +packet.¶
+This system allows a server to process packets with unsupported versions without +retaining state. Though either the Initial packet or the Version Negotiation +packet that is sent in response could be lost, the client will send new packets +until it successfully receives a response or it abandons the connection attempt.¶
+A server MAY limit the number of Version Negotiation packets it sends. For +instance, a server that is able to recognize packets as 0-RTT might choose not +to send Version Negotiation packets in response to 0-RTT packets with the +expectation that it will eventually receive an Initial packet.¶
+Version Negotiation packets are designed to allow for functionality to be +defined in the future that allows QUIC to negotiate the version of QUIC to use +for a connection. Future standards-track specifications might change how +implementations that support multiple versions of QUIC react to Version +Negotiation packets received in response to an attempt to establish a +connection using this version.¶
+A client that supports only this version of QUIC MUST abandon the current +connection attempt if it receives a Version Negotiation packet, with the +following two exceptions. A client MUST discard any Version Negotiation packet +if it has received and successfully processed any other packet, including an +earlier Version Negotiation packet. A client MUST discard a Version Negotiation +packet that lists the QUIC version selected by the client.¶
+How to perform version negotiation is left as future work defined by future +standards-track specifications. In particular, that future work will +ensure robustness against version downgrade attacks; see +Section 21.12.¶
+[[RFC editor: please remove this section before publication.]]¶
+When a draft implementation receives a Version Negotiation packet, it MAY use +it to attempt a new connection with one of the versions listed in the packet, +instead of abandoning the current connection attempt; see Section 6.2.¶
+The client MUST check that the Destination and Source Connection ID fields +match the Source and Destination Connection ID fields in a packet that the +client sent. If this check fails, the packet MUST be discarded.¶
+Once the Version Negotiation packet is determined to be valid, the client then +selects an acceptable protocol version from the list provided by the server. +The client then attempts to create a new connection using that version. The new +connection MUST use a new random Destination Connection ID different from the +one it had previously sent.¶
+Note that this mechanism does not protect against downgrade attacks and +MUST NOT be used outside of draft implementations.¶
+For a server to use a new version in the future, clients need to correctly +handle unsupported versions. Some version numbers (0x?a?a?a?a as defined in +Section 15) are reserved for inclusion in fields that contain version +numbers.¶
+Endpoints MAY add reserved versions to any field where unknown or unsupported +versions are ignored to test that a peer correctly ignores the value. For +instance, an endpoint could include a reserved version in a Version Negotiation +packet; see Section 17.2.1. Endpoints MAY send packets with a reserved +version to test that a peer correctly discards the packet.¶
+QUIC relies on a combined cryptographic and transport handshake to minimize +connection establishment latency. QUIC uses the CRYPTO frame (Section 19.6) +to transmit the cryptographic handshake. The version of QUIC defined in this +document is identified as 0x00000001 and uses TLS as described in [QUIC-TLS]; +a different QUIC version could indicate that a different cryptographic +handshake protocol is in use.¶
+QUIC provides reliable, ordered delivery of the cryptographic handshake +data. QUIC packet protection is used to encrypt as much of the handshake +protocol as possible. The cryptographic handshake MUST provide the following +properties:¶
+authenticated key exchange, where¶
+ +The CRYPTO frame can be sent in different packet number spaces +(Section 12.3). The offsets used by CRYPTO frames to ensure ordered +delivery of cryptographic handshake data start from zero in each packet number +space.¶
+Figure 4 shows a simplified handshake and the exchange of packets and frames +that are used to advance the handshake. Exchange of application data during the +handshake is enabled where possible, shown with a '*'. Once the handshake is +complete, endpoints are able to exchange application data freely.¶
+Endpoints can use packets sent during the handshake to test for Explicit +Congestion Notification (ECN) support; see Section 13.4. An endpoint validates +support for ECN by observing whether the ACK frames acknowledging the first +packets it sends carry ECN counts, as described in Section 13.4.2.¶
+Endpoints MUST explicitly negotiate an application protocol. This avoids +situations where there is a disagreement about the protocol that is in use.¶
+Details of how TLS is integrated with QUIC are provided in [QUIC-TLS], but +some examples are provided here. An extension of this exchange to support +client address validation is shown in Section 8.1.2.¶
+Once any address validation exchanges are complete, the +cryptographic handshake is used to agree on cryptographic keys. The +cryptographic handshake is carried in Initial (Section 17.2.2) and Handshake +(Section 17.2.4) packets.¶
+Figure 5 provides an overview of the 1-RTT handshake. Each line +shows a QUIC packet with the packet type and packet number shown first, followed +by the frames that are typically contained in those packets. So, for instance +the first packet is of type Initial, with packet number 0, and contains a CRYPTO +frame carrying the ClientHello.¶
+Multiple QUIC packets -- even of different packet types -- can be coalesced into +a single UDP datagram; see Section 12.2. As a result, this handshake +could consist of as few as 4 UDP datagrams, or any number more (subject to +limits inherent to the protocol, such as congestion control and +anti-amplification). For instance, the server's first flight contains Initial +packets, Handshake packets, and "0.5-RTT data" in 1-RTT packets.¶
+Figure 6 shows an example of a connection with a 0-RTT handshake +and a single packet of 0-RTT data. Note that as described in +Section 12.3, the server acknowledges 0-RTT data in 1-RTT packets, and +the client sends 1-RTT packets in the same packet number space.¶
+A connection ID is used to ensure consistent routing of packets, as described in +Section 5.1. The long header contains two connection IDs: the Destination +Connection ID is chosen by the recipient of the packet and is used to provide +consistent routing; the Source Connection ID is used to set the Destination +Connection ID used by the peer.¶
+During the handshake, packets with the long header (Section 17.2) are used +to establish the connection IDs used by both endpoints. Each endpoint uses the +Source Connection ID field to specify the connection ID that is used in the +Destination Connection ID field of packets being sent to them. After processing +the first Initial packet, each endpoint sets the Destination Connection ID +field in subsequent packets it sends to the value of the Source Connection ID +field that it received.¶
+When an Initial packet is sent by a client that has not previously received an +Initial or Retry packet from the server, the client populates the Destination +Connection ID field with an unpredictable value. This Destination Connection ID +MUST be at least 8 bytes in length. Until a packet is received from the server, +the client MUST use the same Destination Connection ID value on all packets in +this connection.¶
+The Destination Connection ID field from the first Initial packet sent by a +client is used to determine packet protection keys for Initial packets. These +keys change after receiving a Retry packet; see Section 5.2 of [QUIC-TLS].¶
+The client populates the Source Connection ID field with a value of its choosing +and sets the Source Connection ID Length field to indicate the length.¶
+The first flight of 0-RTT packets use the same Destination Connection ID and +Source Connection ID values as the client's first Initial packet.¶
+Upon first receiving an Initial or Retry packet from the server, the client uses +the Source Connection ID supplied by the server as the Destination Connection ID +for subsequent packets, including any 0-RTT packets. This means that a client +might have to change the connection ID it sets in the Destination Connection ID +field twice during connection establishment: once in response to a Retry, and +once in response to an Initial packet from the server. Once a client has +received a valid Initial packet from the server, it MUST discard any subsequent +packet it receives on that connection with a different Source Connection ID.¶
+A client MUST change the Destination Connection ID it uses for sending packets +in response to only the first received Initial or Retry packet. A server MUST +set the Destination Connection ID it uses for sending packets based on the first +received Initial packet. Any further changes to the Destination Connection ID +are only permitted if the values are taken from NEW_CONNECTION_ID frames; if +subsequent Initial packets include a different Source Connection ID, they MUST +be discarded. This avoids unpredictable outcomes that might otherwise result +from stateless processing of multiple Initial packets with different Source +Connection IDs.¶
+The Destination Connection ID that an endpoint sends can change over the +lifetime of a connection, especially in response to connection migration +(Section 9); see Section 5.1.1 for details.¶
+The choice each endpoint makes about connection IDs during the handshake is +authenticated by including all values in transport parameters; see +Section 7.4. This ensures that all connection IDs used for the +handshake are also authenticated by the cryptographic handshake.¶
+Each endpoint includes the value of the Source Connection ID field from the +first Initial packet it sent in the initial_source_connection_id transport +parameter; see Section 18.2. A server includes the +Destination Connection ID field from the first Initial packet it received from +the client in the original_destination_connection_id transport parameter; if the +server sent a Retry packet, this refers to the first Initial packet received +before sending the Retry packet. If it sends a Retry packet, a server also +includes the Source Connection ID field from the Retry packet in the +retry_source_connection_id transport parameter.¶
+The values provided by a peer for these transport parameters MUST match the +values that an endpoint used in the Destination and Source Connection ID fields +of Initial packets that it sent (and received, for servers). Endpoints MUST +validate that received transport parameters match received Connection ID values. +Including connection ID values in transport +parameters and verifying them ensures that that an attacker cannot influence +the choice of connection ID for a successful connection by injecting packets +carrying attacker-chosen connection IDs during the handshake.¶
+An endpoint MUST treat absence of the initial_source_connection_id transport +parameter from either endpoint or absence of the +original_destination_connection_id transport parameter from the server as a +connection error of type TRANSPORT_PARAMETER_ERROR.¶
+An endpoint MUST treat the following as a connection error of type +TRANSPORT_PARAMETER_ERROR or PROTOCOL_VIOLATION:¶
+If a zero-length connection ID is selected, the corresponding transport +parameter is included with a zero-length value.¶
+Figure 7 shows the connection IDs (with DCID=Destination Connection ID, +SCID=Source Connection ID) that are used in a complete handshake. The exchange +of Initial packets is shown, plus the later exchange of 1-RTT packets that +includes the connection ID established during the handshake.¶
+Figure 8 shows a similar handshake that includes a Retry packet.¶
+In both cases (Figure 7 and Figure 8), the client sets the
+value of the initial_source_connection_id transport parameter to C1
.¶
When the handshake does not include a Retry (Figure 7), the server sets
+original_destination_connection_id to S1
and initial_source_connection_id to
+S3
. In this case, the server does not include a retry_source_connection_id
+transport parameter.¶
When the handshake includes a Retry (Figure 8), the server sets
+original_destination_connection_id to S1
, retry_source_connection_id to S2
,
+and initial_source_connection_id to S3
.¶
During connection establishment, both endpoints make authenticated declarations +of their transport parameters. Endpoints are required to comply with the +restrictions that each parameter defines; the description of each parameter +includes rules for its handling.¶
+Transport parameters are declarations that are made unilaterally by each +endpoint. Each endpoint can choose values for transport parameters independent +of the values chosen by its peer.¶
+The encoding of the transport parameters is detailed in +Section 18.¶
+QUIC includes the encoded transport parameters in the cryptographic handshake. +Once the handshake completes, the transport parameters declared by the peer are +available. Each endpoint validates the values provided by its peer.¶
+Definitions for each of the defined transport parameters are included in +Section 18.2.¶
+An endpoint MUST treat receipt of a transport parameter with an invalid value as +a connection error of type TRANSPORT_PARAMETER_ERROR.¶
+An endpoint MUST NOT send a parameter more than once in a given transport +parameters extension. An endpoint SHOULD treat receipt of duplicate transport +parameters as a connection error of type TRANSPORT_PARAMETER_ERROR.¶
+Endpoints use transport parameters to authenticate the negotiation of +connection IDs during the handshake; see Section 7.3.¶
+Application Layer Protocol Negotiation (ALPN; see [ALPN]) allows +clients to offer multiple application protocols during connection +establishment. The transport parameters that a client includes during the +handshake apply to all application protocols that the client offers. Application +protocols can recommend values for transport parameters, such as the initial +flow control limits. However, application protocols that set constraints on +values for transport parameters could make it impossible for a client to offer +multiple application protocols if these constraints conflict.¶
+Using 0-RTT depends on both client and server using protocol parameters that +were negotiated from a previous connection. To enable 0-RTT, endpoints store +the value of the server transport parameters from a connection and apply them +to any 0-RTT packets that are sent in subsequent connections to that peer that +use a session ticket issued on that connection. This +information is stored with any information required by the application +protocol or cryptographic handshake; see Section 4.6 of [QUIC-TLS].¶
+Remembered transport parameters apply to the new connection until the handshake +completes and the client starts sending 1-RTT packets. Once the handshake +completes, the client uses the transport parameters established in the +handshake. Not all transport parameters are remembered, as some do not apply to +future connections or they have no effect on use of 0-RTT.¶
+The definition of a new transport parameter (Section 7.4.2) MUST +specify whether storing the transport parameter for 0-RTT is mandatory, +optional, or prohibited. A client need not store a transport parameter it cannot +process.¶
+A client MUST NOT use remembered values for the following parameters: +ack_delay_exponent, max_ack_delay, initial_source_connection_id, +original_destination_connection_id, preferred_address, +retry_source_connection_id, and stateless_reset_token. The client MUST use the +server's new values in the handshake instead; if the server does not provide new +values, the default value is used.¶
+A client that attempts to send 0-RTT data MUST remember all other transport +parameters used by the server that it is able to process. The server can +remember these transport parameters, or store an integrity-protected copy of +the values in the ticket and recover the information when accepting 0-RTT data. +A server uses the transport parameters in determining whether to accept 0-RTT +data.¶
+If 0-RTT data is accepted by the server, the server MUST NOT reduce any +limits or alter any values that might be violated by the client with its +0-RTT data. In particular, a server that accepts 0-RTT data MUST NOT set +values for the following parameters (Section 18.2) +that are smaller than the remembered value of the parameters.¶
+Omitting or setting a zero value for certain transport parameters can result in +0-RTT data being enabled, but not usable. The applicable subset of transport +parameters that permit sending of application data SHOULD be set to non-zero +values for 0-RTT. This includes initial_max_data and either +initial_max_streams_bidi and initial_max_stream_data_bidi_remote, or +initial_max_streams_uni and initial_max_stream_data_uni.¶
+A server MAY store and recover the previously sent values of the +max_idle_timeout, max_udp_payload_size, and disable_active_migration parameters +and reject 0-RTT if it selects smaller values. Lowering the values of these +parameters while also accepting 0-RTT data could degrade the performance of the +connection. Specifically, lowering the max_udp_payload_size could result in +dropped packets leading to worse performance compared to rejecting 0-RTT data +outright.¶
+A server MUST reject 0-RTT data if the restored values for transport +parameters cannot be supported.¶
+When sending frames in 0-RTT packets, a client MUST only use remembered +transport parameters; importantly, it MUST NOT use updated values that it learns +from the server's updated transport parameters or from frames received in 1-RTT +packets. Updated values of transport parameters from the handshake apply only +to 1-RTT packets. For instance, flow control limits from remembered transport +parameters apply to all 0-RTT packets even if those values are increased by the +handshake or by frames sent in 1-RTT packets. A server MAY treat use of updated +transport parameters in 0-RTT as a connection error of type PROTOCOL_VIOLATION.¶
+New transport parameters can be used to negotiate new protocol behavior. An +endpoint MUST ignore transport parameters that it does not support. Absence of +a transport parameter therefore disables any optional protocol feature that is +negotiated using the parameter. As described in Section 18.1, +some identifiers are reserved in order to exercise this requirement.¶
+A client that does not understand a transport parameter can discard it and +attempt 0-RTT on subsequent connections. However, if the client adds support +for a discarded transport parameter, it risks violating the constraints that +the transport parameter establishes if it attempts 0-RTT. New transport +parameters can avoid this problem by setting a default of the most conservative +value. Clients can avoid this problem by remembering all parameters, even +ones not currently supported.¶
+New transport parameters can be registered according to the rules in +Section 22.3.¶
+Implementations need to maintain a buffer of CRYPTO data received out of order. +Because there is no flow control of CRYPTO frames, an endpoint could +potentially force its peer to buffer an unbounded amount of data.¶
+Implementations MUST support buffering at least 4096 bytes of data received in +out-of-order CRYPTO frames. Endpoints MAY choose to allow more data to be +buffered during the handshake. A larger limit during the handshake could allow +for larger keys or credentials to be exchanged. An endpoint's buffer size does +not need to remain constant during the life of the connection.¶
+Being unable to buffer CRYPTO frames during the handshake can lead to a +connection failure. If an endpoint's buffer is exceeded during the handshake, it +can expand its buffer temporarily to complete the handshake. If an endpoint +does not expand its buffer, it MUST close the connection with a +CRYPTO_BUFFER_EXCEEDED error code.¶
+Once the handshake completes, if an endpoint is unable to buffer all data in a +CRYPTO frame, it MAY discard that CRYPTO frame and all CRYPTO frames received in +the future, or it MAY close the connection with a CRYPTO_BUFFER_EXCEEDED error +code. Packets containing discarded CRYPTO frames MUST be acknowledged because +the packet has been received and processed by the transport even though the +CRYPTO frame was discarded.¶
+Address validation ensures that an endpoint cannot be used for a traffic +amplification attack. In such an attack, a packet is sent to a server with +spoofed source address information that identifies a victim. If a server +generates more or larger packets in response to that packet, the attacker can +use the server to send more data toward the victim than it would be able to send +on its own.¶
+The primary defense against amplification attacks is verifying that a peer is +able to receive packets at the transport address that it claims. Therefore, +after receiving packets from an address that is not yet validated, an endpoint +MUST limit the amount of data it sends to the unvalidated address to three times +the amount of data received from that address. This limit on the size of +responses is known as the anti-amplification limit.¶
+Address validation is performed both during connection establishment (see +Section 8.1) and during connection migration (see +Section 8.2).¶
+Connection establishment implicitly provides address validation for both +endpoints. In particular, receipt of a packet protected with Handshake keys +confirms that the peer successfully processed an Initial packet. Once an +endpoint has successfully processed a Handshake packet from the peer, it can +consider the peer address to have been validated.¶
+Additionally, an endpoint MAY consider the peer address validated if the peer +uses a connection ID chosen by the endpoint and the connection ID contains at +least 64 bits of entropy.¶
+For the client, the value of the Destination Connection ID field in its first +Initial packet allows it to validate the server address as a part of +successfully processing any packet. Initial packets from the server are +protected with keys that are derived from this value (see Section 5.2 of +[QUIC-TLS]). Alternatively, the value is echoed by the server in Version +Negotiation packets (Section 6) or included in the Integrity Tag +in Retry packets (Section 5.8 of [QUIC-TLS]).¶
+Prior to validating the client address, servers MUST NOT send more than three +times as many bytes as the number of bytes they have received. This limits the +magnitude of any amplification attack that can be mounted using spoofed source +addresses. For the purposes of avoiding amplification prior to address +validation, servers MUST count all of the payload bytes received in datagrams +that are uniquely attributed to a single connection. This includes datagrams +that contain packets that are successfully processed and datagrams that contain +packets that are all discarded.¶
+Clients MUST ensure that UDP datagrams containing Initial packets have UDP +payloads of at least 1200 bytes, adding PADDING frames as necessary. +A client that sends padded datagrams allows the server to +send more data prior to completing address validation.¶
+Loss of an Initial or Handshake packet from the server can cause a deadlock if +the client does not send additional Initial or Handshake packets. A deadlock +could occur when the server reaches its anti-amplification limit and the client +has received acknowledgments for all the data it has sent. In this case, when +the client has no reason to send additional packets, the server will be unable +to send more data because it has not validated the client's address. To prevent +this deadlock, clients MUST send a packet on a probe timeout (PTO, see Section +6.2 of [QUIC-RECOVERY]). Specifically, the client MUST send an Initial packet +in a UDP datagram that contains at least 1200 bytes if it does not have +Handshake keys, and otherwise send a Handshake packet.¶
+A server might wish to validate the client address before starting the +cryptographic handshake. QUIC uses a token in the Initial packet to provide +address validation prior to completing the handshake. This token is delivered to +the client during connection establishment with a Retry packet (see +Section 8.1.2) or in a previous connection using the NEW_TOKEN frame (see +Section 8.1.3).¶
+In addition to sending limits imposed prior to address validation, servers are +also constrained in what they can send by the limits set by the congestion +controller. Clients are only constrained by the congestion controller.¶
+A token sent in a NEW_TOKEN frame or a Retry packet MUST be constructed in a +way that allows the server to identify how it was provided to a client. These +tokens are carried in the same field, but require different handling from +servers.¶
+Upon receiving the client's Initial packet, the server can request address +validation by sending a Retry packet (Section 17.2.5) containing a token. This +token MUST be repeated by the client in all Initial packets it sends for that +connection after it receives the Retry packet.¶
+In response to processing an Initial containing a token that was provided in a +Retry packet, a server cannot send another Retry packet; it can only refuse the +connection or permit it to proceed.¶
+As long as it is not possible for an attacker to generate a valid token for +its own address (see Section 8.1.4) and the client is able to return +that token, it proves to the server that it received the token.¶
+A server can also use a Retry packet to defer the state and processing costs of +connection establishment. Requiring the server to provide a different +connection ID, along with the original_destination_connection_id transport +parameter defined in Section 18.2, forces the server to +demonstrate that it, or an entity it cooperates with, received the original +Initial packet from the client. Providing a different connection ID also grants +a server some control over how subsequent packets are routed. This can be used +to direct connections to a different server instance.¶
+If a server receives a client Initial that contains an invalid Retry token but +is otherwise valid, it knows the client will not accept another Retry token. +The server can discard such a packet and allow the client to time out to +detect handshake failure, but that could impose a significant latency penalty on +the client. Instead, the server SHOULD immediately close (Section 10.2) +the connection with an INVALID_TOKEN error. Note that a server has not +established any state for the connection at this point and so does not enter the +closing period.¶
+A flow showing the use of a Retry packet is shown in Figure 9.¶
+A server MAY provide clients with an address validation token during one +connection that can be used on a subsequent connection. Address validation is +especially important with 0-RTT because a server potentially sends a significant +amount of data to a client in response to 0-RTT data.¶
+The server uses the NEW_TOKEN frame (Section 19.7) to provide the client +with an address validation token that can be used to validate future +connections. In a future connection, the client includes this token in Initial +packets to provide address validation. The client MUST include the token in all +Initial packets it sends, unless a Retry replaces the token with a newer one. +The client MUST NOT use the token provided in a Retry for future connections. +Servers MAY discard any Initial packet that does not carry the expected token.¶
+Unlike the token that is created for a Retry packet, which is used immediately, +the token sent in the NEW_TOKEN frame can be used after some period of +time has passed. Thus, a token SHOULD have an expiration time, which could +be either an explicit expiration time or an issued timestamp that can be +used to dynamically calculate the expiration time. A server can store the +expiration time or include it in an encrypted form in the token.¶
+A token issued with NEW_TOKEN MUST NOT include information that would allow +values to be linked by an observer to the connection on which it was +issued. For example, it cannot include the previous connection ID or addressing +information, unless the values are encrypted. A server MUST ensure that +every NEW_TOKEN frame it sends is unique across all clients, with the exception +of those sent to repair losses of previously sent NEW_TOKEN frames. Information +that allows the server to distinguish between tokens from Retry and NEW_TOKEN +MAY be accessible to entities other than the server.¶
+It is unlikely that the client port number is the same on two different +connections; validating the port is therefore unlikely to be successful.¶
+A token received in a NEW_TOKEN frame is applicable to any server that the +connection is considered authoritative for (e.g., server names included in the +certificate). When connecting to a server for which the client retains an +applicable and unused token, it SHOULD include that token in the Token field of +its Initial packet. Including a token might allow the server to validate the +client address without an additional round trip. A client MUST NOT include a +token that is not applicable to the server that it is connecting to, unless the +client has the knowledge that the server that issued the token and the server +the client is connecting to are jointly managing the tokens. A client MAY use a +token from any previous connection to that server.¶
+A token allows a server to correlate activity between the connection where the +token was issued and any connection where it is used. Clients that want to +break continuity of identity with a server can discard tokens provided using the +NEW_TOKEN frame. In comparison, a token obtained in a Retry packet MUST be used +immediately during the connection attempt and cannot be used in subsequent +connection attempts.¶
+A client SHOULD NOT reuse a NEW_TOKEN token for different connection attempts. +Reusing a token allows connections to be linked by entities on the network path; +see Section 9.5.¶
+Clients might receive multiple tokens on a single connection. Aside from +preventing linkability, any token can be used in any connection attempt. +Servers can send additional tokens to either enable address validation for +multiple connection attempts or to replace older tokens that might become +invalid. For a client, this ambiguity means that sending the most recent unused +token is most likely to be effective. Though saving and using older tokens has +no negative consequences, clients can regard older tokens as being less likely +be useful to the server for address validation.¶
+When a server receives an Initial packet with an address validation token, it +MUST attempt to validate the token, unless it has already completed address +validation. If the token is invalid then the server SHOULD proceed as if +the client did not have a validated address, including potentially sending +a Retry. Tokens provided with NEW_TOKEN frames and Retry packets can be +distinguished by servers (see Section 8.1.1), and the latter +validated more strictly. If the validation succeeds, the server SHOULD then +allow the handshake to proceed.¶
+The rationale for treating the client as unvalidated rather than discarding +the packet is that the client might have received the token in a previous +connection using the NEW_TOKEN frame, and if the server has lost state, it +might be unable to validate the token at all, leading to connection failure if +the packet is discarded.¶
+In a stateless design, a server can use encrypted and authenticated tokens to +pass information to clients that the server can later recover and use to +validate a client address. Tokens are not integrated into the cryptographic +handshake and so they are not authenticated. For instance, a client might be +able to reuse a token. To avoid attacks that exploit this property, a server +can limit its use of tokens to only the information needed to validate client +addresses.¶
+Clients MAY use tokens obtained on one connection for any connection attempt +using the same version. When selecting a token to use, clients do not need to +consider other properties of the connection that is being attempted, including +the choice of possible application protocols, session tickets, or other +connection properties.¶
+An address validation token MUST be difficult to guess. Including a random +value with at least 128 bits of entropy in the token would be sufficient, but +this depends on the server remembering the value it sends to clients.¶
+A token-based scheme allows the server to offload any state associated with +validation to the client. For this design to work, the token MUST be covered by +integrity protection against modification or falsification by clients. Without +integrity protection, malicious clients could generate or guess values for +tokens that would be accepted by the server. Only the server requires access to +the integrity protection key for tokens.¶
+There is no need for a single well-defined format for the token because the +server that generates the token also consumes it. Tokens sent in Retry packets +SHOULD include information that allows the server to verify that the source IP +address and port in client packets remain constant.¶
+Tokens sent in NEW_TOKEN frames MUST include information that allows the server +to verify that the client IP address has not changed from when the token was +issued. Servers can use tokens from NEW_TOKEN in deciding not to send a Retry +packet, even if the client address has changed. If the client IP address has +changed, the server MUST adhere to the anti-amplification limit; see +Section 8. Note that in the presence of NAT, this requirement +might be insufficient to protect other hosts that share the NAT from +amplification attack.¶
+Attackers could replay tokens to use servers as amplifiers in DDoS attacks. To +protect against such attacks, servers MUST ensure that replay of tokens is +prevented or limited. Servers SHOULD ensure that tokens sent in Retry packets +are only accepted for a short time, as they are returned immediately by clients. +Tokens that are provided in NEW_TOKEN frames (Section 19.7) need to be +valid for longer, but SHOULD NOT be accepted multiple times. Servers are +encouraged to allow tokens to be used only once, if possible; tokens MAY +include additional information about clients to further narrow applicability or +reuse.¶
+Path validation is used by both peers during connection migration +(see Section 9) to verify reachability after a change of address. +In path validation, endpoints test reachability between a specific local +address and a specific peer address, where an address is the two-tuple of +IP address and port.¶
+Path validation tests that packets sent on a path to a peer are +received by that peer. Path validation is used to ensure that packets received +from a migrating peer do not carry a spoofed source address.¶
+Path validation does not validate that a peer can send in the return direction. +Acknowledgments cannot be used for return path validation because they contain +insufficient entropy and might be spoofed. Endpoints independently determine +reachability on each direction of a path, and therefore return reachability can +only be established by the peer.¶
+Path validation can be used at any time by either endpoint. For instance, an +endpoint might check that a peer is still in possession of its address after a +period of quiescence.¶
+Path validation is not designed as a NAT traversal mechanism. Though the +mechanism described here might be effective for the creation of NAT bindings +that support NAT traversal, the expectation is that one or other peer is able to +receive packets without first having sent a packet on that path. Effective NAT +traversal needs additional synchronization mechanisms that are not provided +here.¶
+An endpoint MAY include other frames with the PATH_CHALLENGE and PATH_RESPONSE +frames used for path validation. In particular, an endpoint can include PADDING +frames with a PATH_CHALLENGE frame for Path Maximum Transmission Unit Discovery +(PMTUD; see Section 14.2.1); it can also include its own PATH_CHALLENGE frame with +a PATH_RESPONSE frame.¶
+An endpoint uses a new connection ID for probes sent from a new local address; +see Section 9.5. When probing a new path, an endpoint can +ensure that its peer has an unused connection ID available for +responses. Sending NEW_CONNECTION_ID and PATH_CHALLENGE frames in the same +packet, if the peer's active_connection_id_limit permits, ensures that an unused +connection ID will be available to the peer when sending a response.¶
+An endpoint can choose to simultaneously probe multiple paths. The number of +simultaneous paths used for probes is limited by the number of extra connection +IDs its peer has previously supplied, since each new local address used for a +probe requires a previously unused connection ID.¶
+To initiate path validation, an endpoint sends a PATH_CHALLENGE frame containing +an unpredictable payload on the path to be validated.¶
+An endpoint MAY send multiple PATH_CHALLENGE frames to guard against packet +loss. However, an endpoint SHOULD NOT send multiple PATH_CHALLENGE frames in a +single packet.¶
+An endpoint SHOULD NOT probe a new path with packets containing a PATH_CHALLENGE +frame more frequently than it would send an Initial packet. This ensures that +connection migration is no more load on a new path than establishing a new +connection.¶
+The endpoint MUST use unpredictable data in every PATH_CHALLENGE frame so that +it can associate the peer's response with the corresponding PATH_CHALLENGE.¶
+An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame to at +least the smallest allowed maximum datagram size of 1200 bytes, unless the +anti-amplification limit for the path does not permit sending a datagram of +this size. Sending UDP datagrams of this size ensures that the network path +from the endpoint to the peer can be used for QUIC; see Section 14.¶
+When an endpoint is unable to expand the datagram size to 1200 bytes due to the +anti-amplification limit, the path MTU will not be validated. To ensure that +the path MTU is large enough, the endpoint MUST perform a second path validation +by sending a PATH_CHALLENGE frame in a datagram of at least 1200 bytes. This +additional validation can be performed after a PATH_RESPONSE is successfully +received or when enough bytes have been received on the path that sending the +larger datagram will not result in exceeding the anti-amplification limit.¶
+Unlike other cases where datagrams are expanded, endpoints MUST NOT discard +datagrams that appear to be too small when they contain PATH_CHALLENGE or +PATH_RESPONSE.¶
+On receiving a PATH_CHALLENGE frame, an endpoint MUST respond by echoing the +data contained in the PATH_CHALLENGE frame in a PATH_RESPONSE frame. An +endpoint MUST NOT delay transmission of a packet containing a PATH_RESPONSE +frame unless constrained by congestion control.¶
+A PATH_RESPONSE frame MUST be sent on the network path where the +PATH_CHALLENGE was received. This ensures that path validation by a peer only +succeeds if the path is functional in both directions. This requirement MUST +NOT be enforced by the endpoint that initiates path validation as that would +enable an attack on migration; see Section 9.3.3.¶
+An endpoint MUST expand datagrams that contain a PATH_RESPONSE frame to at +least the smallest allowed maximum datagram size of 1200 bytes. This verifies +that the path is able to carry datagrams of this size in both directions. +However, an endpoint MUST NOT expand the datagram containing the PATH_RESPONSE +if the resulting data exceeds the anti-amplification limit. This is expected to +only occur if the received PATH_CHALLENGE was not sent in an expanded datagram.¶
+An endpoint MUST NOT send more than one PATH_RESPONSE frame in response to one +PATH_CHALLENGE frame; see Section 13.3. The peer is +expected to send more PATH_CHALLENGE frames as necessary to evoke additional +PATH_RESPONSE frames.¶
+Path validation succeeds when a PATH_RESPONSE frame is received that contains +the data that was sent in a previous PATH_CHALLENGE frame. A PATH_RESPONSE +frame received on any network path validates the path on which the +PATH_CHALLENGE was sent.¶
+If an endpoint sends a PATH_CHALLENGE frame in a datagram that is not expanded +to at least 1200 bytes, and if the response to it validates the peer address, +the path is validated but not the path MTU. As a result, the endpoint can now +send more than three times the amount of data that has been received. However, +the endpoint MUST initiate another path validation with an expanded datagram to +verify that the path supports the required MTU.¶
+Receipt of an acknowledgment for a packet containing a PATH_CHALLENGE frame is +not adequate validation, since the acknowledgment can be spoofed by a malicious +peer.¶
+Path validation only fails when the endpoint attempting to validate the path +abandons its attempt to validate the path.¶
+Endpoints SHOULD abandon path validation based on a timer. When setting this +timer, implementations are cautioned that the new path could have a longer +round-trip time than the original. A value of three times the larger of the +current Probe Timeout (PTO) or the PTO for the new path (that is, using +kInitialRtt as defined in [QUIC-RECOVERY]) is RECOMMENDED.¶
+This timeout allows for multiple PTOs to expire prior to failing path +validation, so that loss of a single PATH_CHALLENGE or PATH_RESPONSE frame +does not cause path validation failure.¶
+Note that the endpoint might receive packets containing other frames on the new +path, but a PATH_RESPONSE frame with appropriate data is required for path +validation to succeed.¶
+When an endpoint abandons path validation, it determines that the path is +unusable. This does not necessarily imply a failure of the connection - +endpoints can continue sending packets over other paths as appropriate. If no +paths are available, an endpoint can wait for a new path to become available or +close the connection. An endpoint that has no valid network path to its peer +MAY signal this using the NO_VIABLE_PATH connection error, noting that this is +only possible if the network path exists but does not support the required +MTU (Section 14).¶
+A path validation might be abandoned for other reasons besides +failure. Primarily, this happens if a connection migration to a new path is +initiated while a path validation on the old path is in progress.¶
+The use of a connection ID allows connections to survive changes to endpoint +addresses (IP address and port), such as those caused by an +endpoint migrating to a new network. This section describes the process by +which an endpoint migrates to a new address.¶
+The design of QUIC relies on endpoints retaining a stable address for the +duration of the handshake. An endpoint MUST NOT initiate connection migration +before the handshake is confirmed, as defined in section 4.1.2 of [QUIC-TLS].¶
+If the peer sent the disable_active_migration transport parameter, an endpoint +also MUST NOT send packets (including probing packets; see Section 9.1) from a +different local address to the address the peer used during the handshake, +unless the endpoint has acted on a preferred_address transport parameter from +the peer. If the peer violates this requirement, the endpoint MUST either drop +the incoming packets on that path without generating a stateless reset or +proceed with path validation and allow the peer to migrate. Generating a +stateless reset or closing the connection would allow third parties in the +network to cause connections to close by spoofing or otherwise manipulating +observed traffic.¶
+Not all changes of peer address are intentional, or active, migrations. The peer +could experience NAT rebinding: a change of address due to a middlebox, usually +a NAT, allocating a new outgoing port or even a new outgoing IP address for a +flow. An endpoint MUST perform path validation (Section 8.2) if it +detects any change to a peer's address, unless it has previously validated that +address.¶
+When an endpoint has no validated path on which to send packets, it MAY discard +connection state. An endpoint capable of connection migration MAY wait for a +new path to become available before discarding connection state.¶
+This document limits migration of connections to new client addresses, except as +described in Section 9.6. Clients are responsible for initiating all +migrations. Servers do not send non-probing packets (see Section 9.1) toward a +client address until they see a non-probing packet from that address. If a +client receives packets from an unknown server address, the client MUST discard +these packets.¶
+An endpoint MAY probe for peer reachability from a new local address using path +validation (Section 8.2) prior to migrating the connection to the new +local address. Failure of path validation simply means that the new path is not +usable for this connection. Failure to validate a path does not cause the +connection to end unless there are no valid alternative paths available.¶
+PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames are +"probing frames", and all other frames are "non-probing frames". A packet +containing only probing frames is a "probing packet", and a packet containing +any other frame is a "non-probing packet".¶
+An endpoint can migrate a connection to a new local address by sending packets +containing non-probing frames from that address.¶
+Each endpoint validates its peer's address during connection establishment. +Therefore, a migrating endpoint can send to its peer knowing that the peer is +willing to receive at the peer's current address. Thus an endpoint can migrate +to a new local address without first validating the peer's address.¶
+To establish reachability on the new path, an endpoint initiates path +validation (Section 8.2) on the new path. An endpoint MAY defer path +validation until after a peer sends the next non-probing frame to its new +address.¶
+When migrating, the new path might not support the endpoint's current sending +rate. Therefore, the endpoint resets its congestion controller and RTT estimate, +as described in Section 9.4.¶
+The new path might not have the same ECN capability. Therefore, the endpoint +validates ECN capability as described in Section 13.4.¶
+Receiving a packet from a new peer address containing a non-probing frame +indicates that the peer has migrated to that address.¶
+If the recipient permits the migration, it MUST send subsequent packets +to the new peer address and MUST initiate path validation (Section 8.2) +to verify the peer's ownership of the address if validation is not already +underway.¶
+An endpoint only changes the address to which it sends packets in response to +the highest-numbered non-probing packet. This ensures that an endpoint does not +send packets to an old peer address in the case that it receives reordered +packets.¶
+An endpoint MAY send data to an unvalidated peer address, but it MUST protect +against potential attacks as described in Section 9.3.1 and +Section 9.3.2. An endpoint MAY skip validation of a peer address if that +address has been seen recently. In particular, if an endpoint returns to a +previously-validated path after detecting some form of spurious migration, +skipping address validation and restoring loss detection and congestion state +can reduce the performance impact of the attack.¶
+After changing the address to which it sends non-probing packets, an endpoint +can abandon any path validation for other addresses.¶
+Receiving a packet from a new peer address could be the result of a NAT +rebinding at the peer.¶
+After verifying a new client address, the server SHOULD send new address +validation tokens (Section 8) to the client.¶
+It is possible that a peer is spoofing its source address to cause an endpoint +to send excessive amounts of data to an unwilling host. If the endpoint sends +significantly more data than the spoofing peer, connection migration might be +used to amplify the volume of data that an attacker can generate toward a +victim.¶
+As described in Section 9.3, an endpoint is required to validate a +peer's new address to confirm the peer's possession of the new address. Until a +peer's address is deemed valid, an endpoint limits the amount of data it sends +to that address; see Section 8. In the absence of this limit, an +endpoint risks being used for a denial of service attack against an +unsuspecting victim.¶
+If an endpoint skips validation of a peer address as described above, it does +not need to limit its sending rate.¶
+An on-path attacker could cause a spurious connection migration by copying and +forwarding a packet with a spoofed address such that it arrives before the +original packet. The packet with the spoofed address will be seen to come from +a migrating connection, and the original packet will be seen as a duplicate and +dropped. After a spurious migration, validation of the source address will fail +because the entity at the source address does not have the necessary +cryptographic keys to read or respond to the PATH_CHALLENGE frame that is sent +to it even if it wanted to.¶
+To protect the connection from failing due to such a spurious migration, an +endpoint MUST revert to using the last validated peer address when validation +of a new peer address fails. Additionally, receipt of packets with higher +packet numbers from the legitimate peer address will trigger another connection +migration. This will cause the validation of the address of the spurious +migration to be abandoned, thus containing migrations initiated by the attacker +injecting a single packet.¶
+If an endpoint has no state about the last validated peer address, it MUST close +the connection silently by discarding all connection state. This results in new +packets on the connection being handled generically. For instance, an endpoint +MAY send a stateless reset in response to any further incoming packets.¶
+An off-path attacker that can observe packets might forward copies of genuine +packets to endpoints. If the copied packet arrives before the genuine packet, +this will appear as a NAT rebinding. Any genuine packet will be discarded as a +duplicate. If the attacker is able to continue forwarding packets, it might be +able to cause migration to a path via the attacker. This places the attacker on +path, giving it the ability to observe or drop all subsequent packets.¶
+This style of attack relies on the attacker using a path that has approximately +the same characteristics as the direct path between endpoints. The attack is +more reliable if relatively few packets are sent or if packet loss coincides +with the attempted attack.¶
+A non-probing packet received on the original path that increases the maximum +received packet number will cause the endpoint to move back to that path. +Eliciting packets on this path increases the likelihood that the attack is +unsuccessful. Therefore, mitigation of this attack relies on triggering the +exchange of packets.¶
+In response to an apparent migration, endpoints MUST validate the previously +active path using a PATH_CHALLENGE frame. This induces the sending of new +packets on that path. If the path is no longer viable, the validation attempt +will time out and fail; if the path is viable, but no longer desired, the +validation will succeed, but only results in probing packets being sent on the +path.¶
+An endpoint that receives a PATH_CHALLENGE on an active path SHOULD send a +non-probing packet in response. If the non-probing packet arrives before any +copy made by an attacker, this results in the connection being migrated back to +the original path. Any subsequent migration to another path restarts this +entire process.¶
+This defense is imperfect, but this is not considered a serious problem. If the +path via the attack is reliably faster than the original path despite multiple +attempts to use that original path, it is not possible to distinguish between +attack and an improvement in routing.¶
+An endpoint could also use heuristics to improve detection of this style of +attack. For instance, NAT rebinding is improbable if packets were recently +received on the old path; similarly, rebinding is rare on IPv6 paths. Endpoints +can also look for duplicated packets. Conversely, a change in connection ID is +more likely to indicate an intentional migration rather than an attack.¶
+The capacity available on the new path might not be the same as the old path. +Packets sent on the old path MUST NOT contribute to congestion control or RTT +estimation for the new path.¶
+On confirming a peer's ownership of its new address, an endpoint MUST +immediately reset the congestion controller and round-trip time estimator for +the new path to initial values (see Appendices A.3 and B.3 in [QUIC-RECOVERY]) +unless the only change in the peer's address is its port number. Because +port-only changes are commonly the result of NAT rebinding or other middlebox +activity, the endpoint MAY instead retain its congestion control state and +round-trip estimate in those cases instead of reverting to initial values. +In cases where congestion control state +retained from an old path is used on a new path with substantially different +characteristics, a sender could transmit too aggressively until the congestion +controller and the RTT estimator have adapted. Generally, implementations are +advised to be cautious when using previous values on a new path.¶
+There could be apparent reordering at the receiver when an endpoint sends data +and probes from/to multiple addresses during the migration period, since the two +resulting paths could have different round-trip times. A receiver of packets on +multiple paths will still send ACK frames covering all received packets.¶
+While multiple paths might be used during connection migration, a single +congestion control context and a single loss recovery context (as described in +[QUIC-RECOVERY]) could be adequate. For instance, an endpoint might delay +switching to a new congestion control context until it is confirmed that an old +path is no longer needed (such as the case in Section 9.3.3).¶
+A sender can make exceptions for probe packets so that their loss detection is +independent and does not unduly cause the congestion controller to reduce its +sending rate. An endpoint might set a separate timer when a PATH_CHALLENGE is +sent, which is cancelled if the corresponding PATH_RESPONSE is received. If the +timer fires before the PATH_RESPONSE is received, the endpoint might send a new +PATH_CHALLENGE, and restart the timer for a longer period of time. This timer +SHOULD be set as described in Section 6.2.1 of [QUIC-RECOVERY] and MUST NOT be +more aggressive.¶
+Using a stable connection ID on multiple network paths would allow a passive +observer to correlate activity between those paths. An endpoint that moves +between networks might not wish to have their activity correlated by any entity +other than their peer, so different connection IDs are used when sending from +different local addresses, as discussed in Section 5.1. For this to be +effective, endpoints need to ensure that connection IDs they provide cannot be +linked by any other entity.¶
+At any time, endpoints MAY change the Destination Connection ID they transmit +with to a value that has not been used on another path.¶
+An endpoint MUST NOT reuse a connection ID when sending from more than one local +address, for example when initiating connection migration as described in +Section 9.2 or when probing a new network path as described in +Section 9.1.¶
+Similarly, an endpoint MUST NOT reuse a connection ID when sending to more than +one destination address. Due to network changes outside the control of its +peer, an endpoint might receive packets from a new source address with the same +destination connection ID, in which case it MAY continue to use the current +connection ID with the new remote address while still sending from the same +local address.¶
+These requirements regarding connection ID reuse apply only to the sending of +packets, as unintentional changes in path without a change in connection ID are +possible. For example, after a period of network inactivity, NAT rebinding +might cause packets to be sent on a new path when the client resumes sending. +An endpoint responds to such an event as described in Section 9.3.¶
+Using different connection IDs for packets sent in both directions on each new +network path eliminates the use of the connection ID for linking packets from +the same connection across different network paths. Header protection ensures +that packet numbers cannot be used to correlate activity. This does not prevent +other properties of packets, such as timing and size, from being used to +correlate activity.¶
+An endpoint SHOULD NOT initiate migration with a peer that has requested a +zero-length connection ID, because traffic over the new path might be trivially +linkable to traffic over the old one. If the server is able to associate +packets with a zero-length connection ID to the right connection, it means that +the server is using other information to demultiplex packets. For example, a +server might provide a unique address to every client, for instance using HTTP +alternative services [ALTSVC]. Information that might allow correct +routing of packets across multiple network paths will also allow activity on +those paths to be linked by entities other than the peer.¶
+A client might wish to reduce linkability by switching to a new connection ID, +source UDP port, or IP address (see [RFC4941]) when sending traffic after a +period of inactivity. Changing the address from which it sends packets at the +same time might cause the server to detect a connection migration. This +ensures that the mechanisms that support migration are exercised even for +clients that do not experience NAT rebindings or genuine migrations. Changing +address can cause a peer to reset its congestion control state (see +Section 9.4), so addresses SHOULD only be changed infrequently.¶
+An endpoint that exhausts available connection IDs cannot probe new paths or +initiate migration, nor can it respond to probes or attempts by its peer to +migrate. To ensure that migration is possible and packets sent on different +paths cannot be correlated, endpoints SHOULD provide new connection IDs before +peers migrate; see Section 5.1.1. If a peer might have exhausted available +connection IDs, a migrating endpoint could include a NEW_CONNECTION_ID frame in +all packets sent on a new network path.¶
+QUIC allows servers to accept connections on one IP address and attempt to +transfer these connections to a more preferred address shortly after the +handshake. This is particularly useful when clients initially connect to an +address shared by multiple servers but would prefer to use a unicast address to +ensure connection stability. This section describes the protocol for migrating a +connection to a preferred server address.¶
+Migrating a connection to a new server address mid-connection is not supported +by the version of QUIC specified in this document. If a client receives packets +from a new server address when the client has not initiated a migration to that +address, the client SHOULD discard these packets.¶
+A server conveys a preferred address by including the preferred_address +transport parameter in the TLS handshake.¶
+Servers MAY communicate a preferred address of each address family (IPv4 and +IPv6) to allow clients to pick the one most suited to their network attachment.¶
+Once the handshake is confirmed, the client SHOULD select one of the two +addresses provided by the server and initiate path validation (see +Section 8.2). A client constructs packets using any previously unused +active connection ID, taken from either the preferred_address transport +parameter or a NEW_CONNECTION_ID frame.¶
+As soon as path validation succeeds, the client SHOULD begin sending all +future packets to the new server address using the new connection ID and +discontinue use of the old server address. If path validation fails, the client +MUST continue sending all future packets to the server's original IP address.¶
+A client that migrates to a preferred address MUST validate the address it +chooses before migrating; see Section 21.5.3.¶
+A server might receive a packet addressed to its preferred IP address at any +time after it accepts a connection. If this packet contains a PATH_CHALLENGE +frame, the server sends a packet containing a PATH_RESPONSE frame as per +Section 8.2. The server MUST send non-probing packets from its +original address until it receives a non-probing packet from the client at its +preferred address and until the server has validated the new path.¶
+The server MUST probe on the path toward the client from its preferred address. +This helps to guard against spurious migration initiated by an attacker.¶
+Once the server has completed its path validation and has received a non-probing +packet with a new largest packet number on its preferred address, the server +begins sending non-probing packets to the client exclusively from its preferred +IP address. The server SHOULD drop newer packets for this connection that are +received on the old IP address. The server MAY continue to process delayed +packets that are received on the old IP address.¶
+The addresses that a server provides in the preferred_address transport +parameter are only valid for the connection in which they are provided. A +client MUST NOT use these for other connections, including connections that are +resumed from the current connection.¶
+A client might need to perform a connection migration before it has migrated to +the server's preferred address. In this case, the client SHOULD perform path +validation to both the original and preferred server address from the client's +new address concurrently.¶
+If path validation of the server's preferred address succeeds, the client MUST +abandon validation of the original address and migrate to using the server's +preferred address. If path validation of the server's preferred address fails +but validation of the server's original address succeeds, the client MAY migrate +to its new address and continue sending to the server's original address.¶
+If packets received at the server's preferred address have a different source +address than observed from the client during the handshake, the server MUST +protect against potential attacks as described in Section 9.3.1 and +Section 9.3.2. In addition to intentional simultaneous migration, this +might also occur because the client's access network used a different NAT +binding for the server's preferred address.¶
+Servers SHOULD initiate path validation to the client's new address upon +receiving a probe packet from a different address; see Section 8.¶
+A client that migrates to a new address SHOULD use a preferred address from the +same address family for the server.¶
+The connection ID provided in the preferred_address transport parameter is not +specific to the addresses that are provided. This connection ID is provided to +ensure that the client has a connection ID available for migration, but the +client MAY use this connection ID on any path.¶
+Endpoints that send data using IPv6 SHOULD apply an IPv6 flow label in +compliance with [RFC6437], unless the local API does not allow setting IPv6 +flow labels.¶
+The flow label generation MUST be designed to minimize the chances of +linkability with a previously used flow label, as a stable flow label would +enable correlating activity on multiple paths; see Section 9.5.¶
+[RFC6437] suggests deriving values using a pseudorandom function to generate +flow labels. Including the Destination Connection ID field in addition to +source and destination addresses when generating flow labels ensures that +changes are synchronized with changes in other observable identifiers. A +cryptographic hash function that combines these inputs with a local secret is +one way this might be implemented.¶
+An established QUIC connection can be terminated in one of three ways:¶
+An endpoint MAY discard connection state if it does not have a validated path on +which it can send packets; see Section 8.2.¶
+If a max_idle_timeout is specified by either peer in its transport parameters +(Section 18.2), the connection is silently closed +and its state is discarded when it remains idle for longer than the minimum of +both peers max_idle_timeout values.¶
+Each endpoint advertises a max_idle_timeout, but the effective value +at an endpoint is computed as the minimum of the two advertised values (or the +sole advertised value, if only one endpoint advertises a nonzero value). By +announcing a max_idle_timeout, an endpoint commits to initiating an immediate +close (Section 10.2) if it abandons the connection prior to the effective +value.¶
+An endpoint restarts its idle timer when a packet from its peer is received and +processed successfully. An endpoint also restarts its idle timer when sending an +ack-eliciting packet if no other ack-eliciting packets have been sent since last +receiving and processing a packet. Restarting this timer when sending a packet +ensures that connections are not closed after new activity is initiated.¶
+To avoid excessively small idle timeout periods, endpoints MUST increase the +idle timeout period to be at least three times the current Probe Timeout (PTO). +This allows for multiple PTOs to expire, and therefore multiple probes to be +sent and lost, prior to idle timeout.¶
+An endpoint that sends packets close to the effective timeout risks having +them be discarded at the peer, since the idle timeout period might have expired +at the peer before these packets arrive.¶
+An endpoint can send a PING or another ack-eliciting frame to test the +connection for liveness if the peer could time out soon, such as within a PTO; +see Section 6.2 of [QUIC-RECOVERY]. This is especially useful if any +available application data cannot be safely retried. Note that the application +determines what data is safe to retry.¶
+An endpoint might need to send ack-eliciting packets to avoid an idle timeout +if it is expecting response data, but does not have or is unable to send +application data.¶
+An implementation of QUIC might provide applications with an option to defer an +idle timeout. This facility could be used when the application wishes to avoid +losing state that has been associated with an open connection, but does not +expect to exchange application data for some time. With this option, an +endpoint could send a PING frame (Section 19.2) periodically, which will cause +the peer to restart its idle timeout period. Sending a packet containing a PING +frame restarts the idle timeout for this endpoint also if this is the first +ack-eliciting packet sent since receiving a packet. Sending a PING frame causes +the peer to respond with an acknowledgment, which also restarts the idle +timeout for the endpoint.¶
+Application protocols that use QUIC SHOULD provide guidance on when deferring an +idle timeout is appropriate. Unnecessary sending of PING frames could have a +detrimental effect on performance.¶
+A connection will time out if no packets are sent or received for a period +longer than the time negotiated using the max_idle_timeout transport parameter; +see Section 10. However, state in middleboxes might time out earlier than +that. Though REQ-5 in [RFC4787] recommends a 2 minute timeout interval, +experience shows that sending packets every 30 seconds is necessary to prevent +the majority of middleboxes from losing state for UDP flows +[GATEWAY].¶
+An endpoint sends a CONNECTION_CLOSE frame (Section 19.19) to +terminate the connection immediately. A CONNECTION_CLOSE frame causes all +streams to immediately become closed; open streams can be assumed to be +implicitly reset.¶
+After sending a CONNECTION_CLOSE frame, an endpoint immediately enters the +closing state; see Section 10.2.1. After receiving a CONNECTION_CLOSE frame, +endpoints enter the draining state; see Section 10.2.2.¶
+Violations of the protocol lead to an immediate close.¶
+An immediate close can be used after an application protocol has arranged to +close a connection. This might be after the application protocol negotiates a +graceful shutdown. The application protocol can exchange messages that are +needed for both application endpoints to agree that the connection can be +closed, after which the application requests that QUIC close the connection. +When QUIC consequently closes the connection, a CONNECTION_CLOSE frame with an +application-supplied error code will be used to signal closure to the peer.¶
+The closing and draining connection states exist to ensure that connections +close cleanly and that delayed or reordered packets are properly discarded. +These states SHOULD persist for at least three times the current Probe Timeout +(PTO) interval as defined in [QUIC-RECOVERY].¶
+Disposing of connection state prior to exiting the closing or draining state +could result in an endpoint generating a stateless reset unnecessarily when it +receives a late-arriving packet. Endpoints that have some alternative means +to ensure that late-arriving packets do not induce a response, such as those +that are able to close the UDP socket, MAY end these states earlier to allow +for faster resource recovery. Servers that retain an open socket for accepting +new connections SHOULD NOT end the closing or draining states early.¶
+Once its closing or draining state ends, an endpoint SHOULD discard all +connection state. The endpoint MAY send a stateless reset in response to any +further incoming packets belonging to this connection.¶
+An endpoint enters the closing state after initiating an immediate close.¶
+In the closing state, an endpoint retains only enough information to generate a +packet containing a CONNECTION_CLOSE frame and to identify packets as belonging +to the connection. An endpoint in the closing state sends a packet containing a +CONNECTION_CLOSE frame in response to any incoming packet that it attributes to +the connection.¶
+An endpoint SHOULD limit the rate at which it generates packets in the closing +state. For instance, an endpoint could wait for a progressively increasing +number of received packets or amount of time before responding to received +packets.¶
+An endpoint's selected connection ID and the QUIC version are sufficient +information to identify packets for a closing connection; the endpoint MAY +discard all other connection state. An endpoint that is closing is not required +to process any received frame. An endpoint MAY retain packet protection keys for +incoming packets to allow it to read and process a CONNECTION_CLOSE frame.¶
+An endpoint MAY drop packet protection keys when entering the closing state and +send a packet containing a CONNECTION_CLOSE frame in response to any UDP +datagram that is received. However, an endpoint that discards packet protection +keys cannot identify and discard invalid packets. To avoid being used for an +amplification attack, such endpoints MUST limit the cumulative size of packets +it sends to three times the cumulative size of the packets that are received +and attributed to the connection. To minimize the state that an endpoint +maintains for a closing connection, endpoints MAY send the exact same packet in +response to any received packet.¶
+Allowing retransmission of a closing packet is an exception to the requirement +that a new packet number be used for each packet in Section 12.3. +Sending new packet numbers is primarily of advantage to loss recovery and +congestion control, which are not expected to be relevant for a closed +connection. Retransmitting the final packet requires less state.¶
+While in the closing state, an endpoint could receive packets from a new source +address, possibly indicating a connection migration; see Section 9. An +endpoint in the closing state MUST either discard packets received from an +unvalidated address or limit the cumulative size of packets it sends to an +unvalidated address to three times the size of packets it receives from that +address.¶
+An endpoint is not expected to handle key updates when it is closing (Section 6 +of [QUIC-TLS]). A key update might prevent the endpoint from moving from the +closing state to the draining state, as the endpoint will not be able to process +subsequently received packets, but it otherwise has no impact.¶
+The draining state is entered once an endpoint receives a CONNECTION_CLOSE +frame, which indicates that its peer is closing or draining. While otherwise +identical to the closing state, an endpoint in the draining state MUST NOT send +any packets. Retaining packet protection keys is unnecessary once a connection +is in the draining state.¶
+An endpoint that receives a CONNECTION_CLOSE frame MAY send a single packet +containing a CONNECTION_CLOSE frame before entering the draining state, using a +NO_ERROR code if appropriate. An endpoint MUST NOT send further packets. Doing +so could result in a constant exchange of CONNECTION_CLOSE frames until one of +the endpoints exits the closing state.¶
+An endpoint MAY enter the draining state from the closing state if it receives a +CONNECTION_CLOSE frame, which indicates that the peer is also closing or +draining. In this case, the draining state ends when the closing state would +have ended. In other words, the endpoint uses the same end time, but ceases +transmission of any packets on this connection.¶
+When sending CONNECTION_CLOSE, the goal is to ensure that the peer will process +the frame. Generally, this means sending the frame in a packet with the highest +level of packet protection to avoid the packet being discarded. After the +handshake is confirmed (see Section 4.1.2 of [QUIC-TLS]), an endpoint MUST +send any CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to +confirming the handshake, it is possible that more advanced packet protection +keys are not available to the peer, so another CONNECTION_CLOSE frame MAY be +sent in a packet that uses a lower packet protection level. More specifically:¶
+Sending a CONNECTION_CLOSE of type 0x1d in an Initial or Handshake packet could +expose application state or be used to alter application state. A +CONNECTION_CLOSE of type 0x1d MUST be replaced by a CONNECTION_CLOSE of type +0x1c when sending the frame in Initial or Handshake packets. Otherwise, +information about the application state might be revealed. Endpoints MUST clear +the value of the Reason Phrase field and SHOULD use the APPLICATION_ERROR code +when converting to a CONNECTION_CLOSE of type 0x1c.¶
+CONNECTION_CLOSE frames sent in multiple packet types can be coalesced into a +single UDP datagram; see Section 12.2.¶
+An endpoint can send a CONNECTION_CLOSE frame in an Initial packet. This might +be in response to unauthenticated information received in Initial or Handshake +packets. Such an immediate close might expose legitimate connections to a +denial of service. QUIC does not include defensive measures for on-path attacks +during the handshake; see Section 21.2. However, at the cost of reducing +feedback about errors for legitimate peers, some forms of denial of service can +be made more difficult for an attacker if endpoints discard illegal packets +rather than terminating a connection with CONNECTION_CLOSE. For this reason, +endpoints MAY discard packets rather than immediately close if errors are +detected in packets that lack authentication.¶
+An endpoint that has not established state, such as a server that detects an +error in an Initial packet, does not enter the closing state. An endpoint that +has no state for the connection does not enter a closing or draining period on +sending a CONNECTION_CLOSE frame.¶
+A stateless reset is provided as an option of last resort for an endpoint that +does not have access to the state of a connection. A crash or outage might +result in peers continuing to send data to an endpoint that is unable to +properly continue the connection. An endpoint MAY send a stateless reset in +response to receiving a packet that it cannot associate with an active +connection.¶
+A stateless reset is not appropriate for indicating errors in active +connections. An endpoint that wishes to communicate a fatal connection error +MUST use a CONNECTION_CLOSE frame if it is able.¶
+To support this process, an endpoint issues a stateless reset token, which is a +16-byte value that is hard to guess. If the peer subsequently receives a +stateless reset, which is a UDP datagram that ends in that stateless reset +token, the peer will immediately end the connection.¶
+A stateless reset token is specific to a connection ID. An endpoint issues a +stateless reset token by including the value in the Stateless Reset Token field +of a NEW_CONNECTION_ID frame. Servers can also issue a stateless_reset_token +transport parameter during the handshake that applies to the connection ID that +it selected during the handshake. These exchanges are protected by encryption, +so only client and server know their value. Note that clients cannot use the +stateless_reset_token transport parameter because their transport parameters do +not have confidentiality protection.¶
+Tokens are invalidated when their associated connection ID is retired via a +RETIRE_CONNECTION_ID frame (Section 19.16).¶
+An endpoint that receives packets that it cannot process sends a packet in the +following layout (see Section 1.3):¶
+This design ensures that a stateless reset packet is - to the extent possible - +indistinguishable from a regular packet with a short header.¶
+A stateless reset uses an entire UDP datagram, starting with the first two bits +of the packet header. The remainder of the first byte and an arbitrary number +of bytes following it are set to values that SHOULD be indistinguishable +from random. The last 16 bytes of the datagram contain a Stateless Reset Token.¶
+To entities other than its intended recipient, a stateless reset will appear to +be a packet with a short header. For the stateless reset to appear as a valid +QUIC packet, the Unpredictable Bits field needs to include at least 38 bits of +data (or 5 bytes, less the two fixed bits).¶
+The resulting minimum size of 21 bytes does not guarantee that a stateless reset +is difficult to distinguish from other packets if the recipient requires the use +of a connection ID. To achieve that end, the endpoint SHOULD ensure that all +packets it sends are at least 22 bytes longer than the minimum connection ID +length that it requests the peer to include in its packets, adding PADDING +frames as necessary. This ensures that any stateless reset sent by the peer +is indistinguishable from a valid packet sent to the endpoint. An endpoint that +sends a stateless reset in response to a packet that is 43 bytes or shorter +SHOULD send a stateless reset that is one byte shorter than the packet it +responds to.¶
+These values assume that the Stateless Reset Token is the same length as the +minimum expansion of the packet protection AEAD. Additional unpredictable bytes +are necessary if the endpoint could have negotiated a packet protection scheme +with a larger minimum expansion.¶
+An endpoint MUST NOT send a stateless reset that is three times or more larger +than the packet it receives to avoid being used for amplification. +Section 10.3.3 describes additional limits on stateless reset size.¶
+Endpoints MUST discard packets that are too small to be valid QUIC packets. To +give an example, with the set of AEAD functions defined in [QUIC-TLS], short +header packets that are smaller than 21 bytes are never valid.¶
+Endpoints MUST send stateless reset packets formatted as a packet with a short +header. However, endpoints MUST treat any packet ending in a valid stateless +reset token as a stateless reset, as other QUIC versions might allow the use of +a long header.¶
+An endpoint MAY send a stateless reset in response to a packet with a long +header. Sending a stateless reset is not effective prior to the stateless reset +token being available to a peer. In this QUIC version, packets with a long +header are only used during connection establishment. Because the stateless +reset token is not available until connection establishment is complete or near +completion, ignoring an unknown packet with a long header might be as effective +as sending a stateless reset.¶
+An endpoint cannot determine the Source Connection ID from a packet with a short +header, therefore it cannot set the Destination Connection ID in the stateless +reset packet. The Destination Connection ID will therefore differ from the +value used in previous packets. A random Destination Connection ID makes the +connection ID appear to be the result of moving to a new connection ID that was +provided using a NEW_CONNECTION_ID frame (Section 19.15).¶
+Using a randomized connection ID results in two problems:¶
+This stateless reset design is specific to QUIC version 1. An endpoint that +supports multiple versions of QUIC needs to generate a stateless reset that will +be accepted by peers that support any version that the endpoint might support +(or might have supported prior to losing state). Designers of new versions of +QUIC need to be aware of this and either reuse this design, or use a portion of +the packet other than the last 16 bytes for carrying data.¶
+An endpoint detects a potential stateless reset using the trailing 16 bytes of +the UDP datagram. An endpoint remembers all Stateless Reset Tokens associated +with the connection IDs and remote addresses for datagrams it has recently sent. +This includes Stateless Reset Tokens from NEW_CONNECTION_ID frames and the +server's transport parameters but excludes Stateless Reset Tokens associated +with connection IDs that are either unused or retired. The endpoint identifies +a received datagram as a stateless reset by comparing the last 16 bytes of the +datagram with all Stateless Reset Tokens associated with the remote address on +which the datagram was received.¶
+This comparison can be performed for every inbound datagram. Endpoints MAY skip +this check if any packet from a datagram is successfully processed. However, +the comparison MUST be performed when the first packet in an incoming datagram +either cannot be associated with a connection, or cannot be decrypted.¶
+An endpoint MUST NOT check for any Stateless Reset Tokens associated with +connection IDs it has not used or for connection IDs that have been retired.¶
+When comparing a datagram to Stateless Reset Token values, endpoints MUST +perform the comparison without leaking information about the value of the token. +For example, performing this comparison in constant time protects the value of +individual Stateless Reset Tokens from information leakage through timing side +channels. Another approach would be to store and compare the transformed values +of Stateless Reset Tokens instead of the raw token values, where the +transformation is defined as a cryptographically-secure pseudo-random function +using a secret key (e.g., block cipher, HMAC [RFC2104]). An endpoint is not +expected to protect information about whether a packet was successfully +decrypted, or the number of valid Stateless Reset Tokens.¶
+If the last 16 bytes of the datagram are identical in value to a Stateless Reset +Token, the endpoint MUST enter the draining period and not send any further +packets on this connection.¶
+The stateless reset token MUST be difficult to guess. In order to create a +Stateless Reset Token, an endpoint could randomly generate ([RANDOM]) +a secret for every connection that it creates. However, this presents a +coordination problem when there are multiple instances in a cluster or a storage +problem for an endpoint that might lose state. Stateless reset specifically +exists to handle the case where state is lost, so this approach is suboptimal.¶
+A single static key can be used across all connections to the same endpoint by +generating the proof using a pseudorandom function that takes a static key and +the connection ID chosen by the endpoint (see Section 5.1) as input. An +endpoint could use HMAC [RFC2104] (for example, HMAC(static_key, +connection_id)) or HKDF [RFC5869] (for example, using the static key as input +keying material, with the connection ID as salt). The output of this function +is truncated to 16 bytes to produce the Stateless Reset Token for that +connection.¶
+An endpoint that loses state can use the same method to generate a valid +Stateless Reset Token. The connection ID comes from the packet that the +endpoint receives.¶
+This design relies on the peer always sending a connection ID in its packets so +that the endpoint can use the connection ID from a packet to reset the +connection. An endpoint that uses this design MUST either use the same +connection ID length for all connections or encode the length of the connection +ID such that it can be recovered without state. In addition, it cannot provide +a zero-length connection ID.¶
+Revealing the Stateless Reset Token allows any entity to terminate the +connection, so a value can only be used once. This method for choosing the +Stateless Reset Token means that the combination of connection ID and static key +MUST NOT be used for another connection. A denial of service attack is possible +if the same connection ID is used by instances that share a static key, or if an +attacker can cause a packet to be routed to an instance that has no state but +the same static key; see Section 21.11. A connection ID from a connection +that is reset by revealing the Stateless Reset Token MUST NOT be reused for new +connections at nodes that share a static key.¶
+The same Stateless Reset Token MUST NOT be used for multiple connection IDs. +Endpoints are not required to compare new values against all previous values, +but a duplicate value MAY be treated as a connection error of type +PROTOCOL_VIOLATION.¶
+Note that Stateless Reset packets do not have any cryptographic protection.¶
+The design of a Stateless Reset is such that without knowing the stateless reset +token it is indistinguishable from a valid packet. For instance, if a server +sends a Stateless Reset to another server it might receive another Stateless +Reset in response, which could lead to an infinite exchange.¶
+An endpoint MUST ensure that every Stateless Reset that it sends is smaller than +the packet that triggered it, unless it maintains state sufficient to prevent +looping. In the event of a loop, this results in packets eventually being too +small to trigger a response.¶
+An endpoint can remember the number of Stateless Reset packets that it has sent +and stop generating new Stateless Reset packets once a limit is reached. Using +separate limits for different remote addresses will ensure that Stateless Reset +packets can be used to close connections when other peers or connections have +exhausted limits.¶
+Reducing the size of a Stateless Reset below 41 bytes means that the packet +could reveal to an observer that it is a Stateless Reset, depending upon the +length of the peer's connection IDs. Conversely, refusing to send a Stateless +Reset in response to a small packet might result in Stateless Reset not being +useful in detecting cases of broken connections where only very small packets +are sent; such failures might only be detected by other means, such as timers.¶
+An endpoint that detects an error SHOULD signal the existence of that error to +its peer. Both transport-level and application-level errors can affect an +entire connection; see Section 11.1. Only application-level +errors can be isolated to a single stream; see Section 11.2.¶
+The most appropriate error code (Section 20) SHOULD be included in the +frame that signals the error. Where this specification identifies error +conditions, it also identifies the error code that is used; though these are +worded as requirements, different implementation strategies might lead to +different errors being reported. In particular, an endpoint MAY use any +applicable error code when it detects an error condition; a generic error code +(such as PROTOCOL_VIOLATION or INTERNAL_ERROR) can always be used in place of +specific error codes.¶
+A stateless reset (Section 10.3) is not suitable for any error that can +be signaled with a CONNECTION_CLOSE or RESET_STREAM frame. A stateless reset +MUST NOT be used by an endpoint that has the state necessary to send a frame on +the connection.¶
+Errors that result in the connection being unusable, such as an obvious +violation of protocol semantics or corruption of state that affects an entire +connection, MUST be signaled using a CONNECTION_CLOSE frame +(Section 19.19).¶
+Application-specific protocol errors are signaled using the CONNECTION_CLOSE +frame with a frame type of 0x1d. Errors that are specific to the transport, +including all those described in this document, are carried in the +CONNECTION_CLOSE frame with a frame type of 0x1c.¶
+A CONNECTION_CLOSE frame could be sent in a packet that is lost. An endpoint +SHOULD be prepared to retransmit a packet containing a CONNECTION_CLOSE frame if +it receives more packets on a terminated connection. Limiting the number of +retransmissions and the time over which this final packet is sent limits the +effort expended on terminated connections.¶
+An endpoint that chooses not to retransmit packets containing a CONNECTION_CLOSE +frame risks a peer missing the first such packet. The only mechanism available +to an endpoint that continues to receive data for a terminated connection is to +attempt the stateless reset process (Section 10.3).¶
+As the AEAD on Initial packets does not provide strong authentication, an +endpoint MAY discard an invalid Initial packet. Discarding an Initial packet is +permitted even where this specification otherwise mandates a connection error. +An endpoint can only discard a packet if it does not process the frames in the +packet or reverts the effects of any processing. Discarding invalid Initial +packets might be used to reduce exposure to denial of service; see +Section 21.2.¶
+If an application-level error affects a single stream, but otherwise leaves the +connection in a recoverable state, the endpoint can send a RESET_STREAM frame +(Section 19.4) with an appropriate error code to terminate just the +affected stream.¶
+Resetting a stream without the involvement of the application protocol could +cause the application protocol to enter an unrecoverable state. RESET_STREAM +MUST only be instigated by the application protocol that uses QUIC.¶
+The semantics of the application error code carried in RESET_STREAM are +defined by the application protocol. Only the application protocol is able to +cause a stream to be terminated. A local instance of the application protocol +uses a direct API call and a remote instance uses the STOP_SENDING frame, which +triggers an automatic RESET_STREAM.¶
+Application protocols SHOULD define rules for handling streams that are +prematurely cancelled by either endpoint.¶
+QUIC endpoints communicate by exchanging packets. Packets have confidentiality +and integrity protection; see Section 12.1. Packets are carried in UDP +datagrams; see Section 12.2.¶
+This version of QUIC uses the long packet header during connection +establishment; see Section 17.2. Packets with the long header are Initial +(Section 17.2.2), 0-RTT (Section 17.2.3), Handshake (Section 17.2.4), +and Retry (Section 17.2.5). Version negotiation uses a version-independent +packet with a long header; see Section 17.2.1.¶
+Packets with the short header are designed for minimal overhead and are used +after a connection is established and 1-RTT keys are available; see +Section 17.3.¶
+QUIC packets have different levels of cryptographic protection based on the +type of packet. Details of packet protection are found in [QUIC-TLS]; this +section includes an overview of the protections that are provided.¶
+Version Negotiation packets have no cryptographic protection; see +[QUIC-INVARIANTS].¶
+Retry packets use an authenticated encryption with associated data function +(AEAD; [AEAD]) to protect against accidental modification.¶
+Initial packets use an AEAD, the keys for which are derived using a value that +is visible on the wire. Initial packets therefore do not have effective +confidentiality protection. Initial protection exists to ensure that the sender +of the packet is on the network path. Any entity that receives an Initial packet +from a client can recover the keys that will allow them to both read the +contents of the packet and generate Initial packets that will be successfully +authenticated at either endpoint. The AEAD also protects Initial packets +against accidental modification.¶
+All other packets are protected with keys derived from the cryptographic +handshake. The cryptographic handshake ensures that only the communicating +endpoints receive the corresponding keys for Handshake, 0-RTT, and 1-RTT +packets. Packets protected with 0-RTT and 1-RTT keys have strong +confidentiality and integrity protection.¶
+The Packet Number field that appears in some packet types has alternative +confidentiality protection that is applied as part of header protection; see +Section 5.4 of [QUIC-TLS] for details. The underlying packet number increases +with each packet sent in a given packet number space; see Section 12.3 for +details.¶
+Initial (Section 17.2.2), 0-RTT (Section 17.2.3), and Handshake +(Section 17.2.4) packets contain a Length field that determines the end +of the packet. The length includes both the Packet Number and Payload +fields, both of which are confidentiality protected and initially of unknown +length. The length of the Payload field is learned once header protection is +removed.¶
+Using the Length field, a sender can coalesce multiple QUIC packets into one UDP +datagram. This can reduce the number of UDP datagrams needed to complete the +cryptographic handshake and start sending data. This can also be used to +construct PMTU probes; see Section 14.4.1. Receivers MUST be able to +process coalesced packets.¶
+Coalescing packets in order of increasing encryption levels (Initial, 0-RTT, +Handshake, 1-RTT; see Section 4.1.4 of [QUIC-TLS]) makes it more likely the +receiver will be able to process all the packets in a single pass. A packet +with a short header does not include a length, so it can only be the last +packet included in a UDP datagram. An endpoint SHOULD include multiple frames +in a single packet if they are to be sent at the same encryption level, instead +of coalescing multiple packets at the same encryption level.¶
+Receivers MAY route based on the information in the first packet contained in a +UDP datagram. Senders MUST NOT coalesce QUIC packets with different connection +IDs into a single UDP datagram. Receivers SHOULD ignore any subsequent packets +with a different Destination Connection ID than the first packet in the +datagram.¶
+Every QUIC packet that is coalesced into a single UDP datagram is separate and +complete. The receiver of coalesced QUIC packets MUST individually process each +QUIC packet and separately acknowledge them, as if they were received as the +payload of different UDP datagrams. For example, if decryption fails (because +the keys are not available or any other reason), the receiver MAY either discard +or buffer the packet for later processing and MUST attempt to process the +remaining packets.¶
+Retry packets (Section 17.2.5), Version Negotiation packets +(Section 17.2.1), and packets with a short header (Section 17.3) do not +contain a Length field and so cannot be followed by other packets in the same +UDP datagram. Note also that there is no situation where a Retry or Version +Negotiation packet is coalesced with another packet.¶
+The packet number is an integer in the range 0 to 2^62-1. This number is used +in determining the cryptographic nonce for packet protection. Each endpoint +maintains a separate packet number for sending and receiving.¶
+Packet numbers are limited to this range because they need to be representable +in whole in the Largest Acknowledged field of an ACK frame (Section 19.3). +When present in a long or short header however, packet numbers are reduced and +encoded in 1 to 4 bytes; see Section 17.1.¶
+Version Negotiation (Section 17.2.1) and Retry (Section 17.2.5) packets +do not include a packet number.¶
+Packet numbers are divided into 3 spaces in QUIC:¶
+As described in [QUIC-TLS], each packet type uses different protection keys.¶
+Conceptually, a packet number space is the context in which a packet can be +processed and acknowledged. Initial packets can only be sent with Initial +packet protection keys and acknowledged in packets that are also Initial +packets. Similarly, Handshake packets are sent at the Handshake encryption +level and can only be acknowledged in Handshake packets.¶
+This enforces cryptographic separation between the data sent in the different +packet number spaces. Packet numbers in each space start at packet number 0. +Subsequent packets sent in the same packet number space MUST increase the packet +number by at least one.¶
+0-RTT and 1-RTT data exist in the same packet number space to make loss recovery +algorithms easier to implement between the two packet types.¶
+A QUIC endpoint MUST NOT reuse a packet number within the same packet number +space in one connection. If the packet number for sending reaches 2^62 - 1, the +sender MUST close the connection without sending a CONNECTION_CLOSE frame or any +further packets; an endpoint MAY send a Stateless Reset (Section 10.3) in +response to further packets that it receives.¶
+A receiver MUST discard a newly unprotected packet unless it is certain that it +has not processed another packet with the same packet number from the same +packet number space. Duplicate suppression MUST happen after removing packet +protection for the reasons described in Section 9.5 of [QUIC-TLS].¶
+Endpoints that track all individual packets for the purposes of detecting +duplicates are at risk of accumulating excessive state. The data required for +detecting duplicates can be limited by maintaining a minimum packet number below +which all packets are immediately dropped. Any minimum needs to account for +large variations in round trip time, which includes the possibility that a peer +might probe network paths with much larger round trip times; see Section 9.¶
+Packet number encoding at a sender and decoding at a receiver are described in +Section 17.1.¶
+The payload of QUIC packets, after removing packet protection, consists of a +sequence of complete frames, as shown in Figure 11. Version +Negotiation, Stateless Reset, and Retry packets do not contain frames.¶
+The payload of a packet that contains frames MUST contain at least one frame, +and MAY contain multiple frames and multiple frame types. An endpoint MUST +treat receipt of a packet containing no frames as a connection error of type +PROTOCOL_VIOLATION. Frames always fit within a single QUIC packet and cannot +span multiple packets.¶
+Each frame begins with a Frame Type, indicating its type, followed by +additional type-dependent fields:¶
+Table 3 lists and summarizes information about each frame type that is +defined in this specification. A description of this summary is included after +the table.¶
+Type Value | +Frame Type Name | +Definition | +Pkts | +Spec | +
---|---|---|---|---|
0x00 | +PADDING | ++ Section 19.1 + | +IH01 | +NP | +
0x01 | +PING | ++ Section 19.2 + | +IH01 | ++ |
0x02 - 0x03 | +ACK | ++ Section 19.3 + | +IH_1 | +NC | +
0x04 | +RESET_STREAM | ++ Section 19.4 + | +__01 | ++ |
0x05 | +STOP_SENDING | ++ Section 19.5 + | +__01 | ++ |
0x06 | +CRYPTO | ++ Section 19.6 + | +IH_1 | ++ |
0x07 | +NEW_TOKEN | ++ Section 19.7 + | +___1 | ++ |
0x08 - 0x0f | +STREAM | ++ Section 19.8 + | +__01 | +F | +
0x10 | +MAX_DATA | ++ Section 19.9 + | +__01 | ++ |
0x11 | +MAX_STREAM_DATA | ++ Section 19.10 + | +__01 | ++ |
0x12 - 0x13 | +MAX_STREAMS | ++ Section 19.11 + | +__01 | ++ |
0x14 | +DATA_BLOCKED | ++ Section 19.12 + | +__01 | ++ |
0x15 | +STREAM_DATA_BLOCKED | ++ Section 19.13 + | +__01 | ++ |
0x16 - 0x17 | +STREAMS_BLOCKED | ++ Section 19.14 + | +__01 | ++ |
0x18 | +NEW_CONNECTION_ID | ++ Section 19.15 + | +__01 | +P | +
0x19 | +RETIRE_CONNECTION_ID | ++ Section 19.16 + | +__01 | ++ |
0x1a | +PATH_CHALLENGE | ++ Section 19.17 + | +__01 | +P | +
0x1b | +PATH_RESPONSE | ++ Section 19.18 + | +___1 | +P | +
0x1c - 0x1d | +CONNECTION_CLOSE | ++ Section 19.19 + | +ih01 | +N | +
0x1e | +HANDSHAKE_DONE | ++ Section 19.20 + | +___1 | ++ |
The format and semantics of each frame type are explained in more detail in +Section 19. The remainder of this section provides a summary of +important and general information.¶
+The Frame Type in ACK, STREAM, MAX_STREAMS, STREAMS_BLOCKED, and +CONNECTION_CLOSE frames is used to carry other frame-specific flags. For all +other frames, the Frame Type field simply identifies the frame.¶
+The "Pkts" column in Table 3 lists the types of packets that each frame +type could appear in, indicated by the following characters:¶
+Initial (Section 17.2.2)¶
+Handshake (Section 17.2.4)¶
+0-RTT (Section 17.2.3)¶
+1-RTT (Section 17.3.1)¶
+Only a CONNECTION_CLOSE frame of type 0x1c can appear in Initial or Handshake +packets.¶
+For more detail about these restrictions, see Section 12.5. Note +that all frames can appear in 1-RTT packets. An endpoint MUST treat receipt of +a frame in a packet type that is not permitted as a connection error of type +PROTOCOL_VIOLATION.¶
+The "Spec" column in Table 3 summarizes any special rules governing the +processing or generation of the frame type, as indicated by the following +characters:¶
+Packets containing only frames with this marking are not ack-eliciting; see +Section 13.2.¶
+Packets containing only frames with this marking do not count toward bytes +in flight for congestion control purposes; see [QUIC-RECOVERY].¶
+Packets containing only frames with this marking can be used to probe new +network paths during connection migration; see Section 9.1.¶
+The content of frames with this marking are flow controlled; see +Section 4.¶
+The "Pkts" and "Spec" columns in Table 3 do not form part of the IANA +registry; see Section 22.4.¶
+An endpoint MUST treat the receipt of a frame of unknown type as a connection +error of type FRAME_ENCODING_ERROR.¶
+All frames are idempotent in this version of QUIC. That is, a valid frame does +not cause undesirable side effects or errors when received more than once.¶
+The Frame Type field uses a variable-length integer encoding (see +Section 16) with one exception. To ensure simple and efficient +implementations of frame parsing, a frame type MUST use the shortest possible +encoding. For frame types defined in this document, this means a single-byte +encoding, even though it is possible to encode these values as a two-, four- +or eight-byte variable-length integer. For instance, though 0x4001 is +a legitimate two-byte encoding for a variable-length integer with a value +of 1, PING frames are always encoded as a single byte with the value 0x01. +This rule applies to all current and future QUIC frame types. An endpoint +MAY treat the receipt of a frame type that uses a longer encoding than +necessary as a connection error of type PROTOCOL_VIOLATION.¶
+Some frames are prohibited in different packet number spaces. The rules here +generalize those of TLS, in that frames associated with establishing the +connection can usually appear in packets in any packet number space, whereas +those associated with transferring data can only appear in the application +data packet number space:¶
+Note that it is not possible to send the following frames in 0-RTT packets for +various reasons: ACK, CRYPTO, HANDSHAKE_DONE, NEW_TOKEN, PATH_RESPONSE, and +RETIRE_CONNECTION_ID. A server MAY treat receipt of these frames in 0-RTT +packets as a connection error of type PROTOCOL_VIOLATION.¶
+A sender sends one or more frames in a QUIC packet; see Section 12.4.¶
+A sender can minimize per-packet bandwidth and computational costs by including +as many frames as possible in each QUIC packet. A sender MAY wait for a short +period of time to collect multiple frames before sending a packet that is not +maximally packed, to avoid sending out large numbers of small packets. An +implementation MAY use knowledge about application sending behavior or +heuristics to determine whether and for how long to wait. This waiting period +is an implementation decision, and an implementation should be careful to delay +conservatively, since any delay is likely to increase application-visible +latency.¶
+Stream multiplexing is achieved by interleaving STREAM frames from multiple +streams into one or more QUIC packets. A single QUIC packet can include +multiple STREAM frames from one or more streams.¶
+One of the benefits of QUIC is avoidance of head-of-line blocking across +multiple streams. When a packet loss occurs, only streams with data in that +packet are blocked waiting for a retransmission to be received, while other +streams can continue making progress. Note that when data from multiple streams +is included in a single QUIC packet, loss of that packet blocks all those +streams from making progress. Implementations are advised to include as few +streams as necessary in outgoing packets without losing transmission efficiency +to underfilled packets.¶
+A packet MUST NOT be acknowledged until packet protection has been successfully +removed and all frames contained in the packet have been processed. For STREAM +frames, this means the data has been enqueued in preparation to be received by +the application protocol, but it does not require that data is delivered and +consumed.¶
+Once the packet has been fully processed, a receiver acknowledges receipt by +sending one or more ACK frames containing the packet number of the received +packet.¶
+An endpoint SHOULD treat receipt of an acknowledgment for a packet it did not +send as a connection error of type PROTOCOL_VIOLATION, if it is able to detect +the condition. Further discussion of how this might be achieved is in +Section 21.4.¶
+Endpoints acknowledge all packets they receive and process. However, only +ack-eliciting packets cause an ACK frame to be sent within the maximum ack +delay. Packets that are not ack-eliciting are only acknowledged when an ACK +frame is sent for other reasons.¶
+When sending a packet for any reason, an endpoint SHOULD attempt to include an +ACK frame if one has not been sent recently. Doing so helps with timely loss +detection at the peer.¶
+In general, frequent feedback from a receiver improves loss and congestion +response, but this has to be balanced against excessive load generated by a +receiver that sends an ACK frame in response to every ack-eliciting packet. The +guidance offered below seeks to strike this balance.¶
+Every packet SHOULD be acknowledged at least once, and ack-eliciting packets +MUST be acknowledged at least once within the maximum delay an endpoint +communicated using the max_ack_delay transport parameter; see +Section 18.2. max_ack_delay declares an explicit +contract: an endpoint promises to never intentionally delay acknowledgments of +an ack-eliciting packet by more than the indicated value. If it does, any excess +accrues to the RTT estimate and could result in spurious or delayed +retransmissions from the peer. A sender uses the receiver's max_ack_delay value +in determining timeouts for timer-based retransmission, as detailed in Section +6.2 of [QUIC-RECOVERY].¶
+An endpoint MUST acknowledge all ack-eliciting Initial and Handshake packets +immediately and all ack-eliciting 0-RTT and 1-RTT packets within its advertised +max_ack_delay, with the following exception. Prior to handshake confirmation, an +endpoint might not have packet protection keys for decrypting Handshake, 0-RTT, +or 1-RTT packets when they are received. It might therefore buffer them and +acknowledge them when the requisite keys become available.¶
+Since packets containing only ACK frames are not congestion controlled, an +endpoint MUST NOT send more than one such packet in response to receiving an +ack-eliciting packet.¶
+An endpoint MUST NOT send a non-ack-eliciting packet in response to a +non-ack-eliciting packet, even if there are packet gaps that precede the +received packet. This avoids an infinite feedback loop of acknowledgments, +which could prevent the connection from ever becoming idle. Non-ack-eliciting +packets are eventually acknowledged when the endpoint sends an ACK frame in +response to other events.¶
+In order to assist loss detection at the sender, an endpoint SHOULD generate +and send an ACK frame without delay when it receives an ack-eliciting packet +either:¶
+Similarly, packets marked with the ECN Congestion Experienced (CE) codepoint in +the IP header SHOULD be acknowledged immediately, to reduce the peer's response +time to congestion events.¶
+The algorithms in [QUIC-RECOVERY] are expected to be resilient to receivers +that do not follow the guidance offered above. However, an implementation +should only deviate from these requirements after careful consideration of the +performance implications of a change, for connections made by the endpoint and +for other users of the network.¶
+An endpoint that is only sending ACK frames will not receive acknowledgments +from its peer unless those acknowledgments are included in packets with +ack-eliciting frames. An endpoint SHOULD send an ACK frame with other frames +when there are new ack-eliciting packets to acknowledge. When only +non-ack-eliciting packets need to be acknowledged, an endpoint MAY wait until an +ack-eliciting packet has been received to include an ACK frame with outgoing +frames.¶
+A receiver MUST NOT send an ack-eliciting frame in all packets that would +otherwise be non-ack-eliciting, to avoid an infinite feedback loop of +acknowledgments.¶
+A receiver determines how frequently to send acknowledgments in response to +ack-eliciting packets. This determination involves a trade-off.¶
+Endpoints rely on timely acknowledgment to detect loss; see Section 6 of +[QUIC-RECOVERY]. Window-based congestion controllers, such as the one in +Section 7 of [QUIC-RECOVERY], rely on acknowledgments to manage their +congestion window. In both cases, delaying acknowledgments can adversely affect +performance.¶
+On the other hand, reducing the frequency of packets that carry only +acknowledgments reduces packet transmission and processing cost at both +endpoints. It can improve connection throughput on severely asymmetric links +and reduce the volume of acknowledgment traffic using return path capacity; +see Section 3 of [RFC3449].¶
+A receiver SHOULD send an ACK frame after receiving at least two ack-eliciting +packets. This recommendation is general in nature and consistent with +recommendations for TCP endpoint behavior [RFC5681]. Knowledge of network +conditions, knowledge of the peer's congestion controller, or further research +and experimentation might suggest alternative acknowledgment strategies with +better performance characteristics.¶
+A receiver MAY process multiple available packets before determining whether to +send an ACK frame in response.¶
+When an ACK frame is sent, one or more ranges of acknowledged packets are +included. Including acknowledgments for older packets reduces the chance of +spurious retransmissions caused by losing previously sent ACK frames, at the +cost of larger ACK frames.¶
+ACK frames SHOULD always acknowledge the most recently received packets, and the +more out-of-order the packets are, the more important it is to send an updated +ACK frame quickly, to prevent the peer from declaring a packet as lost and +spuriously retransmitting the frames it contains. An ACK frame is expected +to fit within a single QUIC packet. If it does not, then older ranges +(those with the smallest packet numbers) are omitted.¶
+A receiver limits the number of ACK Ranges (Section 19.3.1) it remembers and +sends in ACK frames, both to limit the size of ACK frames and to avoid resource +exhaustion. After receiving acknowledgments for an ACK frame, the receiver +SHOULD stop tracking those acknowledged ACK Ranges. Senders can expect +acknowledgments for most packets, but QUIC does not guarantee receipt of an +acknowledgment for every packet that the receiver processes.¶
+It is possible that retaining many ACK Ranges could cause an ACK frame to become +too large. A receiver can discard unacknowledged ACK Ranges to limit ACK frame +size, at the cost of increased retransmissions from the sender. This is +necessary if an ACK frame would be too large to fit in a packet. +Receivers MAY also limit ACK frame size further to preserve space for other +frames or to limit the capacity that acknowledgments consume.¶
+A receiver MUST retain an ACK Range unless it can ensure that it will not +subsequently accept packets with numbers in that range. Maintaining a minimum +packet number that increases as ranges are discarded is one way to achieve this +with minimal state.¶
+Receivers can discard all ACK Ranges, but they MUST retain the largest packet +number that has been successfully processed as that is used to recover packet +numbers from subsequent packets; see Section 17.1.¶
+A receiver SHOULD include an ACK Range containing the largest received packet +number in every ACK frame. The Largest Acknowledged field is used in ECN +validation at a sender and including a lower value than what was included in a +previous ACK frame could cause ECN to be unnecessarily disabled; see +Section 13.4.2.¶
+Section 13.2.4 describes an exemplary approach for determining what packets +to acknowledge in each ACK frame. Though the goal of this algorithm is to +generate an acknowledgment for every packet that is processed, it is still +possible for acknowledgments to be lost.¶
+When a packet containing an ACK frame is sent, the largest acknowledged in that +frame can be saved. When a packet containing an ACK frame is acknowledged, the +receiver can stop acknowledging packets less than or equal to the largest +acknowledged in the sent ACK frame.¶
+A receiver that sends only non-ack-eliciting packets, such as ACK frames, might +not receive an acknowledgment for a long period of time. This could cause the +receiver to maintain state for a large number of ACK frames for a long period of +time, and ACK frames it sends could be unnecessarily large. In such a case, a +receiver could send a PING or other small ack-eliciting frame occasionally, +such as once per round trip, to elicit an ACK from the peer.¶
+In cases without ACK frame loss, this algorithm allows for a minimum of 1 RTT of +reordering. In cases with ACK frame loss and reordering, this approach does not +guarantee that every acknowledgment is seen by the sender before it is no +longer included in the ACK frame. Packets could be received out of order and all +subsequent ACK frames containing them could be lost. In this case, the loss +recovery algorithm could cause spurious retransmissions, but the sender will +continue making forward progress.¶
+An endpoint measures the delays intentionally introduced between the time the +packet with the largest packet number is received and the time an acknowledgment +is sent. The endpoint encodes this acknowledgment delay in the ACK Delay field +of an ACK frame; see Section 19.3. This allows the receiver of the ACK frame +to adjust for any intentional delays, which is important for getting a better +estimate of the path RTT when acknowledgments are delayed.¶
+A packet might be held in the OS kernel or elsewhere on the host before being +processed. An endpoint MUST NOT include delays that it does not control when +populating the ACK Delay field in an ACK frame. However, endpoints SHOULD +include buffering delays caused by unavailability of decryption keys, since +these delays can be large and are likely to be non-repeating.¶
+When the measured acknowledgment delay is larger than its max_ack_delay, an +endpoint SHOULD report the measured delay. This information is especially useful +during the handshake when delays might be large; see +Section 13.2.1.¶
+ACK frames MUST only be carried in a packet that has the same packet number +space as the packet being acknowledged; see Section 12.1. For instance, +packets that are protected with 1-RTT keys MUST be acknowledged in packets that +are also protected with 1-RTT keys.¶
+Packets that a client sends with 0-RTT packet protection MUST be acknowledged by +the server in packets protected by 1-RTT keys. This can mean that the client is +unable to use these acknowledgments if the server cryptographic handshake +messages are delayed or lost. Note that the same limitation applies to other +data sent by the server protected by the 1-RTT keys.¶
+Packets containing PADDING frames are considered to be in flight for congestion +control purposes [QUIC-RECOVERY]. Packets containing only PADDING frames +therefore consume congestion window but do not generate acknowledgments that +will open the congestion window. To avoid a deadlock, a sender SHOULD ensure +that other frames are sent periodically in addition to PADDING frames to elicit +acknowledgments from the receiver.¶
+QUIC packets that are determined to be lost are not retransmitted whole. The +same applies to the frames that are contained within lost packets. Instead, the +information that might be carried in frames is sent again in new frames as +needed.¶
+New frames and packets are used to carry information that is determined to have +been lost. In general, information is sent again when a packet containing that +information is determined to be lost and sending ceases when a packet +containing that information is acknowledged.¶
+Endpoints SHOULD prioritize retransmission of data over sending new data, unless +priorities specified by the application indicate otherwise; see +Section 2.3.¶
+Even though a sender is encouraged to assemble frames containing up-to-date +information every time it sends a packet, it is not forbidden to retransmit +copies of frames from lost packets. A sender that retransmits copies of frames +needs to handle decreases in available payload size due to change in packet +number length, connection ID length, and path MTU. A receiver MUST accept +packets containing an outdated frame, such as a MAX_DATA frame carrying a +smaller maximum data than one found in an older packet.¶
+A sender SHOULD avoid retransmitting information from packets once they are +acknowledged. This includes packets that are acknowledged after being declared +lost, which can happen in the presence of network reordering. Doing so requires +senders to retain information about packets after they are declared lost. A +sender can discard this information after a period of time elapses that +adequately allows for reordering, such as a PTO (Section 6.2 of +[QUIC-RECOVERY]), or on other events, such as reaching a memory limit.¶
+Upon detecting losses, a sender MUST take appropriate congestion control action. +The details of loss detection and congestion control are described in +[QUIC-RECOVERY].¶
+QUIC endpoints can use Explicit Congestion Notification (ECN) [RFC3168] to +detect and respond to network congestion. ECN allows an endpoint to set an ECT +codepoint in the ECN field of an IP packet. A network node can then indicate +congestion by setting the CE codepoint in the ECN field instead of dropping the +packet [RFC8087]. Endpoints react to reported congestion by reducing their +sending rate in response, as described in [QUIC-RECOVERY].¶
+To enable ECN, a sending QUIC endpoint first determines whether a path supports +ECN marking and whether the peer reports the ECN values in received IP headers; +see Section 13.4.2.¶
+Use of ECN requires the receiving endpoint to read the ECN field from an IP +packet, which is not possible on all platforms. If an endpoint does not +implement ECN support or does not have access to received ECN fields, it +does not report ECN counts for packets it receives.¶
+Even if an endpoint does not set an ECT field on packets it sends, the endpoint +MUST provide feedback about ECN markings it receives, if these are accessible. +Failing to report the ECN counts will cause the sender to disable use of ECN +for this connection.¶
+On receiving an IP packet with an ECT(0), ECT(1) or CE codepoint, an +ECN-enabled endpoint accesses the ECN field and increases the corresponding +ECT(0), ECT(1), or CE count. These ECN counts are included in subsequent ACK +frames; see Section 13.2 and Section 19.3.¶
+Each packet number space maintains separate acknowledgment state and separate +ECN counts. Coalesced QUIC packets (see Section 12.2) share the same IP +header so the ECN counts are incremented once for each coalesced QUIC packet.¶
+For example, if one each of an Initial, Handshake, and 1-RTT QUIC packet are +coalesced into a single UDP datagram, the ECN counts for all three packet number +spaces will be incremented by one each, based on the ECN field of the single IP +header.¶
+ECN counts are only incremented when QUIC packets from the received IP +packet are processed. As such, duplicate QUIC packets are not processed and +do not increase ECN counts; see Section 21.10 for relevant security +concerns.¶
+It is possible for faulty network devices to corrupt or erroneously drop +packets that carry a non-zero ECN codepoint. To ensure connectivity in the +presence of such devices, an endpoint validates the ECN counts for each network +path and disables use of ECN on that path if errors are detected.¶
+To perform ECN validation for a new path:¶
+If an endpoint has cause to expect that IP packets with an ECT codepoint might +be dropped by a faulty network element, the endpoint could set an ECT codepoint +for only the first ten outgoing packets on a path, or for a period of three +PTOs (see Section 6.2 of [QUIC-RECOVERY]). If all packets marked with non-zero +ECN codepoints are subsequently lost, it can disable marking on the assumption +that the marking caused the loss.¶
+An endpoint thus attempts to use ECN and validates this for each new connection, +when switching to a server's preferred address, and on active connection +migration to a new path. Appendix A.4 describes one possible algorithm.¶
+Other methods of probing paths for ECN support are possible, as are different +marking strategies. Implementations MAY use other methods defined in RFCs; see +[RFC8311]. Implementations that use the ECT(1) codepoint need to +perform ECN validation using the reported ECT(1) counts.¶
+Erroneous application of CE markings by the network can result in degraded +connection performance. An endpoint that receives an ACK frame with ECN counts +therefore validates the counts before using them. It performs this validation by +comparing newly received counts against those from the last successfully +processed ACK frame. Any increase in the ECN counts is validated based on the +ECN markings that were applied to packets that are newly acknowledged in the ACK +frame.¶
+If an ACK frame newly acknowledges a packet that the endpoint sent with either +the ECT(0) or ECT(1) codepoint set, ECN validation fails if the corresponding +ECN counts are not present in the ACK frame. This check detects a network +element that zeroes the ECN field or a peer that does not report ECN markings.¶
+ECN validation also fails if the sum of the increase in ECT(0) and ECN-CE counts +is less than the number of newly acknowledged packets that were originally sent +with an ECT(0) marking. Similarly, ECN validation fails if the sum of the +increases to ECT(1) and ECN-CE counts is less than the number of newly +acknowledged packets sent with an ECT(1) marking. These checks can detect +remarking of ECN-CE markings by the network.¶
+An endpoint could miss acknowledgments for a packet when ACK frames are lost. +It is therefore possible for the total increase in ECT(0), ECT(1), and ECN-CE +counts to be greater than the number of packets that are newly acknowledged by +an ACK frame. This is why ECN counts are permitted to be larger than the total +number of packets that are acknowledged.¶
+Validating ECN counts from reordered ACK frames can result in failure. An +endpoint MUST NOT fail ECN validation as a result of processing an ACK frame +that does not increase the largest acknowledged packet number.¶
+ECN validation can fail if the received total count for either ECT(0) or ECT(1) +exceeds the total number of packets sent with each corresponding ECT codepoint. +In particular, validation will fail when an endpoint receives a non-zero ECN +count corresponding to an ECT codepoint that it never applied. This check +detects when packets are remarked to ECT(0) or ECT(1) in the network.¶
+If validation fails, then the endpoint MUST disable ECN. It stops setting the +ECT codepoint in IP packets that it sends, assuming that either the network path +or the peer does not support ECN.¶
+Even if validation fails, an endpoint MAY revalidate ECN for the same path at +any later time in the connection. An endpoint could continue to periodically +attempt validation.¶
+Upon successful validation, an endpoint MAY continue to set an ECT codepoint in +subsequent packets it sends, with the expectation that the path is ECN-capable. +Network routing and path elements can however change mid-connection; an endpoint +MUST disable ECN if validation later fails.¶
+A UDP datagram can include one or more QUIC packets. The datagram size refers to +the total UDP payload size of a single UDP datagram carrying QUIC packets. The +datagram size includes one or more QUIC packet headers and protected payloads, +but not the UDP or IP headers.¶
+The maximum datagram size is defined as the largest size of UDP payload that can +be sent across a network path using a single UDP datagram. QUIC MUST NOT be +used if the network path cannot support a maximum datagram size of at least 1200 +bytes.¶
+QUIC assumes a minimum IP packet size of at least 1280 bytes. This is the IPv6 +minimum size ([IPv6]) and is also supported by most modern IPv4 +networks. Assuming the minimum IP header size of 40 bytes for IPv6 and 20 bytes +for IPv4 and a UDP header size of 8 bytes, this results in a maximum datagram +size of 1232 bytes for IPv6 and 1252 bytes for IPv4. Thus, modern IPv4 +and all IPv6 network paths are expected to be able to support QUIC.¶
+This requirement to support a UDP payload of 1200 bytes limits the space +available for IPv6 extension headers to 32 bytes or IPv4 options to 52 bytes +if the path only supports the IPv6 minimum MTU of 1280 bytes. This affects +Initial packets and path validation.¶
+Any maximum datagram size larger than 1200 bytes can be discovered using Path +Maximum Transmission Unit Discovery (PMTUD; see Section 14.2.1) or Datagram +Packetization Layer PMTU Discovery (DPLPMTUD; see Section 14.3).¶
+Enforcement of the max_udp_payload_size transport parameter +(Section 18.2) might act as an additional limit on the +maximum datagram size. A sender can avoid exceeding this limit, once the value +is known. However, prior to learning the value of the transport parameter, +endpoints risk datagrams being lost if they send datagrams larger than the +smallest allowed maximum datagram size of 1200 bytes.¶
+UDP datagrams MUST NOT be fragmented at the IP layer. In IPv4 +([IPv4]), the DF bit MUST be set if possible, to prevent +fragmentation on the path.¶
+QUIC sometimes requires datagrams to be no smaller than a certain size; see +Section 8.1 as an example. However, the size of a datagram is not +authenticated. That is, if an endpoint receives a datagram of a certain size, it +cannot know that the sender sent the datagram at the same size. Therefore, an +endpoint MUST NOT close a connection when it receives a datagram that does not +meet size constraints; the endpoint MAY however discard such datagrams.¶
+A client MUST expand the payload of all UDP datagrams carrying Initial packets +to at least the smallest allowed maximum datagram size of 1200 bytes by adding +PADDING frames to the Initial packet or by coalescing the Initial packet; see +Section 12.2. Initial packets can even be coalesced with invalid +packets, which a receiver will discard. Similarly, a server MUST expand the +payload of all UDP datagrams carrying ack-eliciting Initial packets to at least +the smallest allowed maximum datagram size of 1200 bytes.¶
+Sending UDP datagrams of this size ensures that the network path supports a +reasonable Path Maximum Transmission Unit (PMTU), in both directions. +Additionally, a client that expands Initial packets helps reduce the amplitude +of amplification attacks caused by server responses toward an unverified client +address; see Section 8.¶
+Datagrams containing Initial packets MAY exceed 1200 bytes if the sender +believes that the network path and peer both support the size that it chooses.¶
+A server MUST discard an Initial packet that is carried in a UDP datagram with a +payload that is smaller than the smallest allowed maximum datagram size of 1200 +bytes. A server MAY also immediately close the connection by sending a +CONNECTION_CLOSE frame with an error code of PROTOCOL_VIOLATION; see +Section 10.2.3.¶
+The server MUST also limit the number of bytes it sends before validating the +address of the client; see Section 8.¶
+The Path Maximum Transmission Unit (PMTU) is the maximum size of the entire IP +packet including the IP header, UDP header, and UDP payload. The UDP payload +includes one or more QUIC packet headers and protected payloads. The PMTU can +depend on path characteristics, and can therefore change over time. The largest +UDP payload an endpoint sends at any given time is referred to as the endpoint's +maximum datagram size.¶
+An endpoint SHOULD use DPLPMTUD (Section 14.3) or PMTUD (Section 14.2.1) to determine +whether the path to a destination will support a desired maximum datagram size +without fragmentation. In the absence of these mechanisms, QUIC endpoints +SHOULD NOT send datagrams larger than the smallest allowed maximum datagram +size.¶
+Both DPLPMTUD and PMTUD send datagrams that are larger than the current maximum +datagram size, referred to as PMTU probes. All QUIC packets that are not sent +in a PMTU probe SHOULD be sized to fit within the maximum datagram size to avoid +the datagram being fragmented or dropped ([RFC8085]).¶
+If a QUIC endpoint determines that the PMTU between any pair of local and +remote IP addresses cannot support the smallest allowed maximum datagram size +of 1200 bytes, it MUST immediately cease sending QUIC packets, except for those +in PMTU probes or those containing CONNECTION_CLOSE frames, on the affected +path. An endpoint MAY terminate the connection if an alternative path cannot be +found.¶
+Each pair of local and remote addresses could have a different PMTU. QUIC +implementations that implement any kind of PMTU discovery therefore SHOULD +maintain a maximum datagram size for each combination of local and remote IP +addresses.¶
+A QUIC implementation MAY be more conservative in computing the maximum datagram +size to allow for unknown tunnel overheads or IP header options/extensions.¶
+Path Maximum Transmission Unit Discovery (PMTUD; [RFC1191], [RFC8201]) +relies on reception of ICMP messages (e.g., IPv6 Packet Too Big messages) that +indicate when an IP packet is dropped because it is larger than the local router +MTU. DPLPMTUD can also optionally use these messages. This use of ICMP messages +is potentially vulnerable to attacks by entities that cannot observe packets +but might successfully guess the addresses used on the path. These attacks +could reduce the PMTU to a bandwidth-inefficient value.¶
+An endpoint MUST ignore an ICMP message that claims the PMTU has decreased below +QUIC's smallest allowed maximum datagram size.¶
+The requirements for generating ICMP ([RFC1812], [RFC4443]) state that the +quoted packet should contain as much of the original packet as possible without +exceeding the minimum MTU for the IP version. The size of the quoted packet can +actually be smaller, or the information unintelligible, as described in Section +1.1 of [DPLPMTUD].¶
+QUIC endpoints using PMTUD SHOULD validate ICMP messages to protect from +packet injection as specified in [RFC8201] and Section 5.2 of [RFC8085]. +This validation SHOULD use the quoted packet supplied in the payload of an ICMP +message to associate the message with a corresponding transport connection (see +Section 4.6.1 of [DPLPMTUD]). ICMP message validation MUST include matching +IP addresses and UDP ports ([RFC8085]) and, when possible, connection IDs to +an active QUIC session. The endpoint SHOULD ignore all ICMP messages that fail +validation.¶
+An endpoint MUST NOT increase PMTU based on ICMP messages; see Section 3, clause +6 of [DPLPMTUD]. Any reduction in QUIC's maximum datagram size in response +to ICMP messages MAY be provisional until QUIC's loss detection algorithm +determines that the quoted packet has actually been lost.¶
+Datagram Packetization Layer PMTU Discovery (DPLPMTUD; [DPLPMTUD]) +relies on tracking loss or acknowledgment of QUIC packets that are carried in +PMTU probes. PMTU probes for DPLPMTUD that use the PADDING frame implement +"Probing using padding data", as defined in Section 4.1 of [DPLPMTUD].¶
+Endpoints SHOULD set the initial value of BASE_PLPMTU (Section 5.1 of +[DPLPMTUD]) to be consistent with QUIC's smallest allowed maximum datagram +size. The MIN_PLPMTU is the same as the BASE_PLPMTU.¶
+QUIC endpoints implementing DPLPMTUD maintain a DPLPMTUD Maximum Packet Size +(MPS, Section 4.4 of [DPLPMTUD]) for each combination of local and remote IP +addresses. This corresponds to the maximum datagram size.¶
+From the perspective of DPLPMTUD, QUIC is an acknowledged Packetization Layer +(PL). A QUIC sender can therefore enter the DPLPMTUD BASE state (Section 5.2 of +[DPLPMTUD]) when the QUIC connection handshake has been completed.¶
+QUIC is an acknowledged PL, therefore a QUIC sender does not implement a +DPLPMTUD CONFIRMATION_TIMER while in the SEARCH_COMPLETE state; see Section 5.2 +of [DPLPMTUD].¶
+An endpoint using DPLPMTUD requires the validation of any received ICMP Packet +Too Big (PTB) message before using the PTB information, as defined in Section +4.6 of [DPLPMTUD]. In addition to UDP port validation, QUIC validates an +ICMP message by using other PL information (e.g., validation of connection IDs +in the quoted packet of any received ICMP message).¶
+The considerations for processing ICMP messages described in Section 14.2.1 also +apply if these messages are used by DPLPMTUD.¶
+PMTU probes are ack-eliciting packets.¶
+Endpoints could limit the content of PMTU probes to PING and PADDING frames, +since packets that are larger than the current maximum datagram size are more +likely to be dropped by the network. Loss of a QUIC packet that is carried in a +PMTU probe is therefore not a reliable indication of congestion and SHOULD NOT +trigger a congestion control reaction; see Section 3, Bullet 7 of [DPLPMTUD]. +However, PMTU probes consume congestion window, which could delay subsequent +transmission by an application.¶
+Endpoints that rely on the destination connection ID for routing incoming QUIC +packets are likely to require that the connection ID be included in +PMTU probes to route any resulting ICMP messages (Section 14.2.1) back to the correct +endpoint. However, only long header packets (Section 17.2) contain the +Source Connection ID field, and long header packets are not decrypted or +acknowledged by the peer once the handshake is complete.¶
+One way to construct a PMTU probe is to coalesce (see Section 12.2) a +packet with a long header, such as a Handshake or 0-RTT packet +(Section 17.2), with a short header packet in a single UDP datagram. If the +resulting PMTU probe reaches the endpoint, the packet with the long header will +be ignored, but the short header packet will be acknowledged. If the PMTU probe +causes an ICMP message to be sent, the first part of the probe will be quoted in +that message. If the Source Connection ID field is within the quoted portion of +the probe, that could be used for routing or validation of the ICMP message.¶
+The purpose of using a packet with a long header is only to ensure that the +quoted packet contained in the ICMP message contains a Source Connection ID +field. This packet does not need to be a valid packet and it can be sent even +if there is no current use for packets of that type.¶
+QUIC versions are identified using a 32-bit unsigned number.¶
+The version 0x00000000 is reserved to represent version negotiation. This +version of the specification is identified by the number 0x00000001.¶
+Other versions of QUIC might have different properties from this version. The +properties of QUIC that are guaranteed to be consistent across all versions of +the protocol are described in [QUIC-INVARIANTS].¶
+Version 0x00000001 of QUIC uses TLS as a cryptographic handshake protocol, as +described in [QUIC-TLS].¶
+Versions with the most significant 16 bits of the version number cleared are +reserved for use in future IETF consensus documents.¶
+Versions that follow the pattern 0x?a?a?a?a are reserved for use in forcing +version negotiation to be exercised. That is, any version number where the low +four bits of all bytes is 1010 (in binary). A client or server MAY advertise +support for any of these reserved versions.¶
+Reserved version numbers will never represent a real protocol; a client MAY use +one of these version numbers with the expectation that the server will initiate +version negotiation; a server MAY advertise support for one of these versions +and can expect that clients ignore the value.¶
+QUIC packets and frames commonly use a variable-length encoding for non-negative +integer values. This encoding ensures that smaller integer values need fewer +bytes to encode.¶
+The QUIC variable-length integer encoding reserves the two most significant bits +of the first byte to encode the base 2 logarithm of the integer encoding length +in bytes. The integer value is encoded on the remaining bits, in network byte +order.¶
+This means that integers are encoded on 1, 2, 4, or 8 bytes and can encode 6-, +14-, 30-, or 62-bit values respectively. Table 4 summarizes the +encoding properties.¶
+2Bit | +Length | +Usable Bits | +Range | +
---|---|---|---|
00 | +1 | +6 | +0-63 | +
01 | +2 | +14 | +0-16383 | +
10 | +4 | +30 | +0-1073741823 | +
11 | +8 | +62 | +0-4611686018427387903 | +
Examples and a sample decoding algorithm are shown in Appendix A.1.¶
+Values do not need to be encoded on the minimum number of bytes necessary, with +the sole exception of the Frame Type field; see Section 12.4.¶
+Versions (Section 15), packet numbers sent in the header +(Section 17.1), and the length of connection IDs in long header packets +(Section 17.2) are described using integers, but do not use this encoding.¶
+All numeric values are encoded in network byte order (that is, big-endian) and +all field sizes are in bits. Hexadecimal notation is used for describing the +value of fields.¶
+Packet numbers are integers in the range 0 to 2^62-1 (Section 12.3). When +present in long or short packet headers, they are encoded in 1 to 4 bytes. The +number of bits required to represent the packet number is reduced by including +only the least significant bits of the packet number.¶
+The encoded packet number is protected as described in Section 5.4 of +[QUIC-TLS].¶
+Prior to receiving an acknowledgment for a packet number space, the full packet +number MUST be included; it is not to be truncated as described below.¶
+After an acknowledgment is received for a packet number space, the sender MUST +use a packet number size able to represent more than twice as large a range than +the difference between the largest acknowledged packet and packet number being +sent. A peer receiving the packet will then correctly decode the packet number, +unless the packet is delayed in transit such that it arrives after many +higher-numbered packets have been received. An endpoint SHOULD use a large +enough packet number encoding to allow the packet number to be recovered even if +the packet arrives after packets that are sent afterwards.¶
+As a result, the size of the packet number encoding is at least one bit more +than the base-2 logarithm of the number of contiguous unacknowledged packet +numbers, including the new packet. Pseudocode and examples for packet number +encoding can be found in Appendix A.2.¶
+At a receiver, protection of the packet number is removed prior to recovering +the full packet number. The full packet number is then reconstructed based on +the number of significant bits present, the value of those bits, and the largest +packet number received in a successfully authenticated packet. Recovering the +full packet number is necessary to successfully remove packet protection.¶
+Once header protection is removed, the packet number is decoded by finding the +packet number value that is closest to the next expected packet. The next +expected packet is the highest received packet number plus one. Pseudocode and +an example for packet number decoding can be found in +Appendix A.3.¶
+Long headers are used for packets that are sent prior to the establishment +of 1-RTT keys. Once 1-RTT keys are available, +a sender switches to sending packets using the short header +(Section 17.3). The long form allows for special packets - such as the +Version Negotiation packet - to be represented in this uniform fixed-length +packet format. Packets that use the long header contain the following fields:¶
+The most significant bit (0x80) of byte 0 (the first byte) is set to 1 for +long headers.¶
+The next bit (0x40) of byte 0 is set to 1, unless the packet is a Version +Negotiation packet. Packets containing a zero value for this bit are not +valid packets in this version and MUST be discarded. A value of 1 for this +bit allows QUIC to coexist with other protocols; see [RFC7983].¶
+The next two bits (those with a mask of 0x30) of byte 0 contain a packet type. +Packet types are listed in Table 5.¶
+The semantics of the lower four bits (those with a mask of 0x0f) of byte 0 are +determined by the packet type.¶
+The QUIC Version is a 32-bit field that follows the first byte. This field +indicates the version of QUIC that is in use and determines how the rest of +the protocol fields are interpreted.¶
+The byte following the version contains the length in bytes of the Destination +Connection ID field that follows it. This length is encoded as an 8-bit +unsigned integer. In QUIC version 1, this value MUST NOT exceed 20. +Endpoints that receive a version 1 long header with a value larger than 20 +MUST drop the packet. In order to properly form a Version Negotiation packet, +servers SHOULD be able to read longer connection IDs from other QUIC versions.¶
+The Destination Connection ID field follows the Destination Connection ID +Length field, which indicates the length of this field. +Section 7.2 describes the use of this field in more detail.¶
+The byte following the Destination Connection ID contains the length in bytes +of the Source Connection ID field that follows it. This length is encoded as +a 8-bit unsigned integer. In QUIC version 1, this value MUST NOT exceed 20 +bytes. Endpoints that receive a version 1 long header with a value larger +than 20 MUST drop the packet. In order to properly form a Version Negotiation +packet, servers SHOULD be able to read longer connection IDs from other QUIC +versions.¶
+The Source Connection ID field follows the Source Connection ID Length field, +which indicates the length of this field. Section 7.2 +describes the use of this field in more detail.¶
+The remainder of the packet, if any, is type-specific.¶
+In this version of QUIC, the following packet types with the long header are +defined:¶
+Type | +Name | +Section | +
---|---|---|
0x0 | +Initial | ++ Section 17.2.2 + | +
0x1 | +0-RTT | ++ Section 17.2.3 + | +
0x2 | +Handshake | ++ Section 17.2.4 + | +
0x3 | +Retry | ++ Section 17.2.5 + | +
The header form bit, Destination and Source Connection ID lengths, Destination +and Source Connection ID fields, and Version fields of a long header packet are +version-independent. The other fields in the first byte are version-specific. +See [QUIC-INVARIANTS] for details on how packets from different versions of +QUIC are interpreted.¶
+The interpretation of the fields and the payload are specific to a version and +packet type. While type-specific semantics for this version are described in +the following sections, several long-header packets in this version of QUIC +contain these additional fields:¶
+Two bits (those with a mask of 0x0c) of byte 0 are reserved across multiple +packet types. These bits are protected using header protection; see Section +5.4 of [QUIC-TLS]. The value included prior to protection MUST be set to 0. +An endpoint MUST treat receipt of a packet that has a non-zero value for these +bits after removing both packet and header protection as a connection error +of type PROTOCOL_VIOLATION. Discarding such a packet after only removing +header protection can expose the endpoint to attacks; see Section 9.5 of +[QUIC-TLS].¶
+In packet types that contain a Packet Number field, the least significant two +bits (those with a mask of 0x03) of byte 0 contain the length of the packet +number, encoded as an unsigned, two-bit integer that is one less than the +length of the packet number field in bytes. That is, the length of the packet +number field is the value of this field, plus one. These bits are protected +using header protection; see Section 5.4 of [QUIC-TLS].¶
+The length of the remainder of the packet (that is, the Packet Number and +Payload fields) in bytes, encoded as a variable-length integer +(Section 16).¶
+The packet number field is 1 to 4 bytes long. The packet number is protected +using header protection; see Section 5.4 of [QUIC-TLS]. The length of the +packet number field is encoded in the Packet Number Length bits of byte 0; see +above.¶
+A Version Negotiation packet is inherently not version-specific. Upon receipt by +a client, it will be identified as a Version Negotiation packet based on the +Version field having a value of 0.¶
+The Version Negotiation packet is a response to a client packet that contains a +version that is not supported by the server, and is only sent by servers.¶
+The layout of a Version Negotiation packet is:¶
+The value in the Unused field is set to an arbitrary value by the server. +Clients MUST ignore the value of this field. Where QUIC might be multiplexed +with other protocols (see [RFC7983]), servers SHOULD set the most significant +bit of this field (0x40) to 1 so that Version Negotiation packets appear to have +the Fixed Bit field. Note that other versions of QUIC might not make a similar +recommendation.¶
+The Version field of a Version Negotiation packet MUST be set to 0x00000000.¶
+The server MUST include the value from the Source Connection ID field of the +packet it receives in the Destination Connection ID field. The value for Source +Connection ID MUST be copied from the Destination Connection ID of the received +packet, which is initially randomly selected by a client. Echoing both +connection IDs gives clients some assurance that the server received the packet +and that the Version Negotiation packet was not generated by an entity that +did not observe the Initial packet.¶
+Future versions of QUIC could have different requirements for the lengths of +connection IDs. In particular, connection IDs might have a smaller minimum +length or a greater maximum length. Version-specific rules for the connection +ID therefore MUST NOT influence a server decision about whether to send a +Version Negotiation packet.¶
+The remainder of the Version Negotiation packet is a list of 32-bit versions +that the server supports.¶
+A Version Negotiation packet is not acknowledged. It is only sent in response +to a packet that indicates an unsupported version; see Section 5.2.2.¶
+The Version Negotiation packet does not include the Packet Number and Length +fields present in other packets that use the long header form. Consequently, +a Version Negotiation packet consumes an entire UDP datagram.¶
+A server MUST NOT send more than one Version Negotiation packet in response to a +single UDP datagram.¶
+See Section 6 for a description of the version negotiation +process.¶
+An Initial packet uses long headers with a type value of 0x0. It carries the +first CRYPTO frames sent by the client and server to perform key exchange, and +carries ACKs in either direction.¶
+The Initial packet contains a long header as well as the Length and Packet +Number fields; see Section 17.2. The first byte contains the Reserved and +Packet Number Length bits; see also Section 17.2. Between the Source +Connection ID and Length fields, there are two additional fields specific to +the Initial packet.¶
+A variable-length integer specifying the length of the Token field, in bytes. +This value is zero if no token is present. Initial packets sent by the server +MUST set the Token Length field to zero; clients that receive an Initial +packet with a non-zero Token Length field MUST either discard the packet or +generate a connection error of type PROTOCOL_VIOLATION.¶
+The value of the token that was previously provided in a Retry packet or +NEW_TOKEN frame; see Section 8.1.¶
+The payload of the packet.¶
+In order to prevent tampering by version-unaware middleboxes, Initial packets +are protected with connection- and version-specific keys (Initial keys) as +described in [QUIC-TLS]. This protection does not provide confidentiality or +integrity against attackers that can observe packets, but provides some level of +protection against attackers that cannot observe packets.¶
+The client and server use the Initial packet type for any packet that contains +an initial cryptographic handshake message. This includes all cases where a new +packet containing the initial cryptographic message needs to be created, such as +the packets sent after receiving a Retry packet (Section 17.2.5).¶
+A server sends its first Initial packet in response to a client Initial. A +server MAY send multiple Initial packets. The cryptographic key exchange could +require multiple round trips or retransmissions of this data.¶
+The payload of an Initial packet includes a CRYPTO frame (or frames) containing +a cryptographic handshake message, ACK frames, or both. PING, PADDING, and +CONNECTION_CLOSE frames of type 0x1c are also permitted. An endpoint that +receives an Initial packet containing other frames can either discard the +packet as spurious or treat it as a connection error.¶
+The first packet sent by a client always includes a CRYPTO frame that contains +the start or all of the first cryptographic handshake message. The first +CRYPTO frame sent always begins at an offset of 0; see Section 7.¶
+Note that if the server sends a TLS HelloRetryRequest (see Section 4.7 of +[QUIC-TLS]), the client will send another series of Initial packets. These +Initial packets will continue the cryptographic handshake and will contain +CRYPTO frames starting at an offset matching the size of the CRYPTO frames sent +in the first flight of Initial packets.¶
+A client stops both sending and processing Initial packets when it sends its +first Handshake packet. A server stops sending and processing Initial packets +when it receives its first Handshake packet. Though packets might still be in +flight or awaiting acknowledgment, no further Initial packets need to be +exchanged beyond this point. Initial packet protection keys are discarded (see +Section 4.9.1 of [QUIC-TLS]) along with any loss recovery and congestion +control state; see Section 6.4 of [QUIC-RECOVERY].¶
+Any data in CRYPTO frames is discarded - and no longer retransmitted - when +Initial keys are discarded.¶
+A 0-RTT packet uses long headers with a type value of 0x1, followed by the +Length and Packet Number fields; see Section 17.2. The first byte contains +the Reserved and Packet Number Length bits; see Section 17.2. A 0-RTT packet +is used to carry "early" data from the client to the server as part of the +first flight, prior to handshake completion. As part of the TLS handshake, the +server can accept or reject this early data.¶
+See Section 2.3 of [TLS13] for a discussion of 0-RTT data and its +limitations.¶
+ +Packet numbers for 0-RTT protected packets use the same space as 1-RTT protected +packets.¶
+After a client receives a Retry packet, 0-RTT packets are likely to have been +lost or discarded by the server. A client SHOULD attempt to resend data in +0-RTT packets after it sends a new Initial packet. New packet numbers MUST be +used for any new packets that are sent; as described in Section 17.2.5.3, +reusing packet numbers could compromise packet protection.¶
+A client only receives acknowledgments for its 0-RTT packets once the handshake +is complete, as defined in Section 4.1.1 of [QUIC-TLS].¶
+A client MUST NOT send 0-RTT packets once it starts processing 1-RTT packets +from the server. This means that 0-RTT packets cannot contain any response to +frames from 1-RTT packets. For instance, a client cannot send an ACK frame in a +0-RTT packet, because that can only acknowledge a 1-RTT packet. An +acknowledgment for a 1-RTT packet MUST be carried in a 1-RTT packet.¶
+A server SHOULD treat a violation of remembered limits (Section 7.4.1) +as a connection error of an appropriate type (for instance, a FLOW_CONTROL_ERROR +for exceeding stream data limits).¶
+A Handshake packet uses long headers with a type value of 0x2, followed by the +Length and Packet Number fields; see Section 17.2. The first byte contains +the Reserved and Packet Number Length bits; see Section 17.2. It is used +to carry cryptographic handshake messages and acknowledgments from the server +and client.¶
+Once a client has received a Handshake packet from a server, it uses Handshake +packets to send subsequent cryptographic handshake messages and acknowledgments +to the server.¶
+The Destination Connection ID field in a Handshake packet contains a connection +ID that is chosen by the recipient of the packet; the Source Connection ID +includes the connection ID that the sender of the packet wishes to use; see +Section 7.2.¶
+Handshake packets have their own packet number space, and thus the first +Handshake packet sent by a server contains a packet number of 0.¶
+The payload of this packet contains CRYPTO frames and could contain PING, +PADDING, or ACK frames. Handshake packets MAY contain CONNECTION_CLOSE frames +of type 0x1c. Endpoints MUST treat receipt of Handshake packets with other +frames as a connection error of type PROTOCOL_VIOLATION.¶
+Like Initial packets (see Section 17.2.2.1), data in CRYPTO frames for +Handshake packets is discarded - and no longer retransmitted - when Handshake +protection keys are discarded.¶
+A Retry packet uses a long packet header with a type value of 0x3. It carries +an address validation token created by the server. It is used by a server that +wishes to perform a retry; see Section 8.1.¶
+A Retry packet (shown in Figure 18) does not contain any protected +fields. The value in the Unused field is set to an arbitrary value by the +server; a client MUST ignore these bits. In addition to the fields from the +long header, it contains these additional fields:¶
+An opaque token that the server can use to validate the client's address.¶
+The server populates the Destination Connection ID with the connection ID that +the client included in the Source Connection ID of the Initial packet.¶
+The server includes a connection ID of its choice in the Source Connection ID +field. This value MUST NOT be equal to the Destination Connection ID field of +the packet sent by the client. A client MUST discard a Retry packet that +contains a Source Connection ID field that is identical to the Destination +Connection ID field of its Initial packet. The client MUST use the value from +the Source Connection ID field of the Retry packet in the Destination Connection +ID field of subsequent packets that it sends.¶
+A server MAY send Retry packets in response to Initial and 0-RTT packets. A +server can either discard or buffer 0-RTT packets that it receives. A server +can send multiple Retry packets as it receives Initial or 0-RTT packets. A +server MUST NOT send more than one Retry packet in response to a single UDP +datagram.¶
+A client MUST accept and process at most one Retry packet for each connection +attempt. After the client has received and processed an Initial or Retry packet +from the server, it MUST discard any subsequent Retry packets that it receives.¶
+Clients MUST discard Retry packets that have a Retry Integrity Tag that cannot +be validated; see the Retry Packet Integrity section of [QUIC-TLS]. This +diminishes an attacker's ability to inject a Retry packet and protects against +accidental corruption of Retry packets. A client MUST discard a Retry packet +with a zero-length Retry Token field.¶
+The client responds to a Retry packet with an Initial packet that includes the +provided Retry Token to continue connection establishment.¶
+A client sets the Destination Connection ID field of this Initial packet to the +value from the Source Connection ID in the Retry packet. Changing Destination +Connection ID also results in a change to the keys used to protect the Initial +packet. It also sets the Token field to the token provided in the Retry. The +client MUST NOT change the Source Connection ID because the server could include +the connection ID as part of its token validation logic; see +Section 8.1.4.¶
+A Retry packet does not include a packet number and cannot be explicitly +acknowledged by a client.¶
+Subsequent Initial packets from the client include the connection ID and token +values from the Retry packet. The client copies the Source Connection ID field +from the Retry packet to the Destination Connection ID field and uses this +value until an Initial packet with an updated value is received; see +Section 7.2. The value of the Token field is copied to all +subsequent Initial packets; see Section 8.1.2.¶
+Other than updating the Destination Connection ID and Token fields, the Initial +packet sent by the client is subject to the same restrictions as the first +Initial packet. A client MUST use the same cryptographic handshake message it +included in this packet. A server MAY treat a packet that contains a different +cryptographic handshake message as a connection error or discard it. Note that +including a Token field reduces the available space for the cryptographic +handshake message, which might result in the client needing to send multiple +Initial packets.¶
+A client MAY attempt 0-RTT after receiving a Retry packet by sending 0-RTT +packets to the connection ID provided by the server.¶
+A client MUST NOT reset the packet number for any packet number space after +processing a Retry packet. In particular, 0-RTT packets contain confidential +information that will most likely be retransmitted on receiving a Retry packet. +The keys used to protect these new 0-RTT packets will not change as a result of +responding to a Retry packet. However, the data sent in these packets could be +different than what was sent earlier. Sending these new packets with the same +packet number is likely to compromise the packet protection for those packets +because the same key and nonce could be used to protect different content. +A server MAY abort the connection if it detects that the client reset the +packet number.¶
+The connection IDs used on Initial and Retry packets exchanged between client +and server are copied to the transport parameters and validated as described +in Section 7.3.¶
+This version of QUIC defines a single packet type that uses the short packet +header.¶
+A 1-RTT packet uses a short packet header. It is used after the version and +1-RTT keys are negotiated.¶
+ +1-RTT packets contain the following fields:¶
+The most significant bit (0x80) of byte 0 is set to 0 for the short header.¶
+The next bit (0x40) of byte 0 is set to 1. Packets containing a zero value +for this bit are not valid packets in this version and MUST be discarded. A +value of 1 for this bit allows QUIC to coexist with other protocols; see +[RFC7983].¶
+The third most significant bit (0x20) of byte 0 is the latency spin bit, set +as described in Section 17.4.¶
+The next two bits (those with a mask of 0x18) of byte 0 are reserved. These +bits are protected using header protection; see Section 5.4 of +[QUIC-TLS]. The value included prior to protection MUST be set to 0. An +endpoint MUST treat receipt of a packet that has a non-zero value for these +bits, after removing both packet and header protection, as a connection error +of type PROTOCOL_VIOLATION. Discarding such a packet after only removing +header protection can expose the endpoint to attacks; see Section 9.5 of +[QUIC-TLS].¶
+The next bit (0x04) of byte 0 indicates the key phase, which allows a +recipient of a packet to identify the packet protection keys that are used to +protect the packet. See [QUIC-TLS] for details. This bit is protected +using header protection; see Section 5.4 of [QUIC-TLS].¶
+The least significant two bits (those with a mask of 0x03) of byte 0 contain +the length of the packet number, encoded as an unsigned, two-bit integer that +is one less than the length of the packet number field in bytes. That is, the +length of the packet number field is the value of this field, plus one. These +bits are protected using header protection; see Section 5.4 of [QUIC-TLS].¶
+The Destination Connection ID is a connection ID that is chosen by the +intended recipient of the packet. See Section 5.1 for more details.¶
+The packet number field is 1 to 4 bytes long. The packet number is protected +using header protection; see +Section 5.4 of [QUIC-TLS]. The length of the packet number field is encoded +in Packet Number Length field. See Section 17.1 for details.¶
+1-RTT packets always include a 1-RTT protected payload.¶
+The header form bit and the connection ID field of a short header packet are +version-independent. The remaining fields are specific to the selected QUIC +version. See [QUIC-INVARIANTS] for details on how packets from different +versions of QUIC are interpreted.¶
+The latency spin bit, which is defined for 1-RTT packets (Section 17.3.1), +enables passive latency monitoring from observation points on the network path +throughout the duration of a connection. The server reflects the spin value +received, while the client 'spins' it after one RTT. On-path observers can +measure the time between two spin bit toggle events to estimate the end-to-end +RTT of a connection.¶
+The spin bit is only present in 1-RTT packets, since it is possible to measure +the initial RTT of a connection by observing the handshake. Therefore, the spin +bit is available after version negotiation and connection establishment are +completed. On-path measurement and use of the latency spin bit is further +discussed in [QUIC-MANAGEABILITY].¶
+The spin bit is an OPTIONAL feature of this version of QUIC. An endpoint that +does not support this feature MUST disable it, as defined below.¶
+Each endpoint unilaterally decides if the spin bit is enabled or disabled for a +connection. Implementations MUST allow administrators of clients and servers to +disable the spin bit either globally or on a per-connection basis. Even when the +spin bit is not disabled by the administrator, endpoints MUST disable their use +of the spin bit for a random selection of at least one in every 16 network +paths, or for one in every 16 connection IDs, in order to ensure that QUIC +connections that disable the spin bit are commonly observed on the network. As +each endpoint disables the spin bit independently, this ensures that the spin +bit signal is disabled on approximately one in eight network paths.¶
+When the spin bit is disabled, endpoints MAY set the spin bit to any value, and +MUST ignore any incoming value. It is RECOMMENDED that endpoints set the spin +bit to a random value either chosen independently for each packet or chosen +independently for each connection ID.¶
+If the spin bit is enabled for the connection, the endpoint maintains a spin +value for each network path and sets the spin bit in the packet header to the +currently stored value when a 1-RTT packet is sent on that path. The spin value +is initialized to 0 in the endpoint for each network path. Each endpoint also +remembers the highest packet number seen from its peer on each path.¶
+When a server receives a 1-RTT packet that increases the highest packet number +seen by the server from the client on a given network path, it sets the spin +value for that path to be equal to the spin bit in the received packet.¶
+When a client receives a 1-RTT packet that increases the highest packet number +seen by the client from the server on a given network path, it sets the spin +value for that path to the inverse of the spin bit in the received packet.¶
+An endpoint resets the spin value for a network path to zero when changing the +connection ID being used on that network path.¶
+The extension_data field of the quic_transport_parameters extension defined in +[QUIC-TLS] contains the QUIC transport parameters. They are encoded as a +sequence of transport parameters, as shown in Figure 20:¶
+Each transport parameter is encoded as an (identifier, length, value) tuple, +as shown in Figure 21:¶
+The Transport Parameter Length field contains the length of the Transport +Parameter Value field in bytes.¶
+QUIC encodes transport parameters into a sequence of bytes, which is then +included in the cryptographic handshake.¶
+Transport parameters with an identifier of the form 31 * N + 27
for integer
+values of N are reserved to exercise the requirement that unknown transport
+parameters be ignored. These transport parameters have no semantics, and can
+carry arbitrary values.¶
This section details the transport parameters defined in this document.¶
+Many transport parameters listed here have integer values. Those transport +parameters that are identified as integers use a variable-length integer +encoding; see Section 16. Transport parameters have a default value +of 0 if the transport parameter is absent unless otherwise stated.¶
+The following transport parameters are defined:¶
+The value of the Destination Connection ID field from the first Initial packet +sent by the client; see Section 7.3. This transport parameter is only sent +by a server.¶
+The max idle timeout is a value in milliseconds that is encoded as an integer; +see (Section 10.1). Idle timeout is disabled when both endpoints omit +this transport parameter or specify a value of 0.¶
+A stateless reset token is used in verifying a stateless reset; see +Section 10.3. This parameter is a sequence of 16 bytes. This +transport parameter MUST NOT be sent by a client, but MAY be sent by a server. +A server that does not send this transport parameter cannot use stateless +reset (Section 10.3) for the connection ID negotiated during the +handshake.¶
+The maximum UDP payload size parameter is an integer value that limits the +size of UDP payloads that the endpoint is willing to receive. UDP datagrams +with payloads larger than this limit are not likely to be processed by the +receiver.¶
+The default for this parameter is the maximum permitted UDP payload of 65527. +Values below 1200 are invalid.¶
+This limit does act as an additional constraint on datagram size in the same +way as the path MTU, but it is a property of the endpoint and not the path; +see Section 14. It is expected that this is the space an endpoint +dedicates to holding incoming packets.¶
+The initial maximum data parameter is an integer value that contains the +initial value for the maximum amount of data that can be sent on the +connection. This is equivalent to sending a MAX_DATA (Section 19.9) for +the connection immediately after completing the handshake.¶
+This parameter is an integer value specifying the initial flow control limit +for locally-initiated bidirectional streams. This limit applies to newly +created bidirectional streams opened by the endpoint that sends the transport +parameter. In client transport parameters, this applies to streams with an +identifier with the least significant two bits set to 0x0; in server transport +parameters, this applies to streams with the least significant two bits set to +0x1.¶
+This parameter is an integer value specifying the initial flow control limit +for peer-initiated bidirectional streams. This limit applies to newly created +bidirectional streams opened by the endpoint that receives the transport +parameter. In client transport parameters, this applies to streams with an +identifier with the least significant two bits set to 0x1; in server transport +parameters, this applies to streams with the least significant two bits set to +0x0.¶
+This parameter is an integer value specifying the initial flow control limit +for unidirectional streams. This limit applies to newly created +unidirectional streams opened by the endpoint that receives the transport +parameter. In client transport parameters, this applies to streams with an +identifier with the least significant two bits set to 0x3; in server transport +parameters, this applies to streams with the least significant two bits set to +0x2.¶
+The initial maximum bidirectional streams parameter is an integer value that +contains the initial maximum number of bidirectional streams the endpoint +that receives this transport parameter is +permitted to initiate. If this parameter is absent or zero, the peer cannot +open bidirectional streams until a MAX_STREAMS frame is sent. Setting this +parameter is equivalent to sending a MAX_STREAMS (Section 19.11) of +the corresponding type with the same value.¶
+The initial maximum unidirectional streams parameter is an integer value that +contains the initial maximum number of unidirectional streams the endpoint +that receives this transport parameter is +permitted to initiate. If this parameter is absent or zero, the peer cannot +open unidirectional streams until a MAX_STREAMS frame is sent. Setting this +parameter is equivalent to sending a MAX_STREAMS (Section 19.11) of +the corresponding type with the same value.¶
+The acknowledgment delay exponent is an integer value indicating an exponent +used to decode the ACK Delay field in the ACK frame (Section 19.3). If this +value is absent, a default value of 3 is assumed (indicating a multiplier of +8). Values above 20 are invalid.¶
+The maximum acknowledgment delay is an integer value indicating the maximum +amount of time in milliseconds by which the endpoint will delay sending +acknowledgments. This value SHOULD include the receiver's expected delays in +alarms firing. For example, if a receiver sets a timer for 5ms and alarms +commonly fire up to 1ms late, then it should send a max_ack_delay of 6ms. If +this value is absent, a default of 25 milliseconds is assumed. Values of 2^14 +or greater are invalid.¶
+The disable active migration transport parameter is included if the endpoint +does not support active connection migration (Section 9) on the address +being used during the handshake. An endpoint that receives this transport +parameter MUST NOT use a new local address when sending to the address that +the peer used during the handshake. This transport parameter does not +prohibit connection migration after a client has acted on a preferred_address +transport parameter. This parameter is a zero-length value.¶
+The server's preferred address is used to effect a change in server address at +the end of the handshake, as described in Section 9.6. This +transport parameter is only sent by a server. Servers MAY choose to only send +a preferred address of one address family by sending an all-zero address and +port (0.0.0.0:0 or [::]:0) for the other family. IP addresses are encoded in +network byte order.¶
+The preferred_address transport parameter contains an address and port for +both IP version 4 and 6. The four-byte IPv4 Address field is followed by the +associated two-byte IPv4 Port field. This is followed by a 16-byte IPv6 +Address field and two-byte IPv6 Port field. After address and port pairs, +a Connection ID Length field describes the length of the following Connection +ID field. Finally, a 16-byte Stateless Reset Token field includes the +stateless reset token associated with the connection ID. The format of this +transport parameter is shown in Figure 22.¶
+The Connection ID field and the Stateless Reset Token field contain an +alternative connection ID that has a sequence number of 1; see Section 5.1.1. +Having these values sent alongside the preferred address ensures that there +will be at least one unused active connection ID when the client initiates +migration to the preferred address.¶
+The Connection ID and Stateless Reset Token fields of a preferred address are +identical in syntax and semantics to the corresponding fields of a +NEW_CONNECTION_ID frame (Section 19.15). A server that chooses +a zero-length connection ID MUST NOT provide a preferred address. Similarly, +a server MUST NOT include a zero-length connection ID in this transport +parameter. A client MUST treat violation of these requirements as a +connection error of type TRANSPORT_PARAMETER_ERROR.¶
+The active connection ID limit is an integer value specifying the +maximum number of connection IDs from the peer that an endpoint is willing +to store. This value includes the connection ID received during the handshake, +that received in the preferred_address transport parameter, and those received +in NEW_CONNECTION_ID frames. +The value of the active_connection_id_limit parameter MUST be at least 2. +An endpoint that receives a value less than 2 MUST close the connection +with an error of type TRANSPORT_PARAMETER_ERROR. +If this transport parameter is absent, a default of 2 is assumed. If an +endpoint issues a zero-length connection ID, it will never send a +NEW_CONNECTION_ID frame and therefore ignores the active_connection_id_limit +value received from its peer.¶
+The value that the endpoint included in the Source Connection ID field of the +first Initial packet it sends for the connection; see Section 7.3.¶
+The value that the server included in the Source Connection ID field of a +Retry packet; see Section 7.3. This transport parameter is only sent by a +server.¶
+If present, transport parameters that set initial per-stream flow control limits +(initial_max_stream_data_bidi_local, initial_max_stream_data_bidi_remote, and +initial_max_stream_data_uni) are equivalent to sending a MAX_STREAM_DATA frame +(Section 19.10) on every stream of the corresponding type +immediately after opening. If the transport parameter is absent, streams of +that type start with a flow control limit of 0.¶
+A client MUST NOT include any server-only transport parameter: +original_destination_connection_id, preferred_address, +retry_source_connection_id, or stateless_reset_token. A server MUST treat +receipt of any of these transport parameters as a connection error of type +TRANSPORT_PARAMETER_ERROR.¶
+As described in Section 12.4, packets contain one or more frames. This section +describes the format and semantics of the core QUIC frame types.¶
+A PADDING frame (type=0x00) has no semantic value. PADDING frames can be used +to increase the size of a packet. Padding can be used to increase an initial +client packet to the minimum required size, or to provide protection against +traffic analysis for protected packets.¶
+PADDING frames are formatted as shown in Figure 23, which shows that +PADDING frames have no content. That is, a PADDING frame consists of the single +byte that identifies the frame as a PADDING frame.¶
+Endpoints can use PING frames (type=0x01) to verify that their peers are still +alive or to check reachability to the peer.¶
+PING frames are formatted as shown in Figure 24, which shows that PING +frames have no content.¶
+The receiver of a PING frame simply needs to acknowledge the packet containing +this frame.¶
+The PING frame can be used to keep a connection alive when an application or +application protocol wishes to prevent the connection from timing out; see +Section 10.1.2.¶
+Receivers send ACK frames (types 0x02 and 0x03) to inform senders of packets +they have received and processed. The ACK frame contains one or more ACK Ranges. +ACK Ranges identify acknowledged packets. If the frame type is 0x03, ACK frames +also contain the cumulative count of QUIC packets with associated ECN marks +received on the connection up until this point. QUIC implementations MUST +properly handle both types and, if they have enabled ECN for packets they send, +they SHOULD use the information in the ECN section to manage their congestion +state.¶
+QUIC acknowledgments are irrevocable. Once acknowledged, a packet remains +acknowledged, even if it does not appear in a future ACK frame. This is unlike +reneging for TCP SACKs ([RFC2018]).¶
+Packets from different packet number spaces can be identified using the same +numeric value. An acknowledgment for a packet needs to indicate both a packet +number and a packet number space. This is accomplished by having each ACK frame +only acknowledge packet numbers in the same space as the packet in which the +ACK frame is contained.¶
+Version Negotiation and Retry packets cannot be acknowledged because they do not +contain a packet number. Rather than relying on ACK frames, these packets are +implicitly acknowledged by the next Initial packet sent by the client.¶
+ACK frames are formatted as shown in Figure 25.¶
+ACK frames contain the following fields:¶
+A variable-length integer representing the largest packet number the peer is +acknowledging; this is usually the largest packet number that the peer has +received prior to generating the ACK frame. Unlike the packet number in the +QUIC long or short header, the value in an ACK frame is not truncated.¶
+A variable-length integer encoding the acknowledgment delay in +microseconds; see Section 13.2.5. It is decoded by multiplying the +value in the field by 2 to the power of the ack_delay_exponent transport +parameter sent by the sender of the ACK frame; see +Section 18.2. Compared to simply expressing +the delay as an integer, this encoding allows for a larger range of +values within the same number of bytes, at the cost of lower resolution.¶
+A variable-length integer specifying the number of ACK Range fields in +the frame.¶
+A variable-length integer indicating the number of contiguous packets +preceding the Largest Acknowledged that are being acknowledged. +That is, the smallest packet acknowledged in the +range is determined by subtracting the First ACK Range value from the Largest +Acknowledged.¶
+Contains additional ranges of packets that are alternately not +acknowledged (Gap) and acknowledged (ACK Range); see Section 19.3.1.¶
+The three ECN Counts; see Section 19.3.2.¶
+Each ACK Range consists of alternating Gap and ACK Range Length values in +descending packet number order. ACK Ranges can be repeated. The number of Gap +and ACK Range Length values is determined by the ACK Range Count field; one of +each value is present for each value in the ACK Range Count field.¶
+ACK Ranges are structured as shown in Figure 26.¶
+The fields that form each ACK Range are:¶
+A variable-length integer indicating the number of contiguous unacknowledged +packets preceding the packet number one lower than the smallest in the +preceding ACK Range.¶
+A variable-length integer indicating the number of contiguous acknowledged +packets preceding the largest packet number, as determined by the +preceding Gap.¶
+Gap and ACK Range Length values use a relative integer encoding for efficiency. +Though each encoded value is positive, the values are subtracted, so that each +ACK Range describes progressively lower-numbered packets.¶
+Each ACK Range acknowledges a contiguous range of packets by indicating the +number of acknowledged packets that precede the largest packet number in that +range. A value of zero indicates that only the largest packet number is +acknowledged. Larger ACK Range values indicate a larger range, with +corresponding lower values for the smallest packet number in the range. Thus, +given a largest packet number for the range, the smallest value is determined by +the formula:¶
++ smallest = largest - ack_range +¶ +
An ACK Range acknowledges all packets between the smallest packet number and the +largest, inclusive.¶
+The largest value for an ACK Range is determined by cumulatively subtracting the +size of all preceding ACK Range Lengths and Gaps.¶
+Each Gap indicates a range of packets that are not being acknowledged. The +number of packets in the gap is one higher than the encoded value of the Gap +field.¶
+The value of the Gap field establishes the largest packet number value for the +subsequent ACK Range using the following formula:¶
++ largest = previous_smallest - gap - 2 +¶ +
If any computed packet number is negative, an endpoint MUST generate a +connection error of type FRAME_ENCODING_ERROR.¶
+The ACK frame uses the least significant bit of the type value (that is, type +0x03) to indicate ECN feedback and report receipt of QUIC packets with +associated ECN codepoints of ECT(0), ECT(1), or CE in the packet's IP header. +ECN Counts are only present when the ACK frame type is 0x03.¶
+When present, there are 3 ECN counts, as shown in Figure 27.¶
+The three ECN Counts are:¶
+A variable-length integer representing the total number of packets received +with the ECT(0) codepoint in the packet number space of the ACK frame.¶
+A variable-length integer representing the total number of packets received +with the ECT(1) codepoint in the packet number space of the ACK frame.¶
+A variable-length integer representing the total number of packets received +with the CE codepoint in the packet number space of the ACK frame.¶
+ECN counts are maintained separately for each packet number space.¶
+An endpoint uses a RESET_STREAM frame (type=0x04) to abruptly terminate the +sending part of a stream.¶
+After sending a RESET_STREAM, an endpoint ceases transmission and retransmission +of STREAM frames on the identified stream. A receiver of RESET_STREAM can +discard any data that it already received on that stream.¶
+An endpoint that receives a RESET_STREAM frame for a send-only stream MUST +terminate the connection with error STREAM_STATE_ERROR.¶
+RESET_STREAM frames are formatted as shown in Figure 28.¶
+RESET_STREAM frames contain the following fields:¶
+A variable-length integer encoding of the Stream ID of the stream being +terminated.¶
+A variable-length integer containing the application protocol error +code (see Section 20.2) that indicates why the stream is being +closed.¶
+A variable-length integer indicating the final size of the stream by the +RESET_STREAM sender, in unit of bytes; see Section 4.5.¶
+An endpoint uses a STOP_SENDING frame (type=0x05) to communicate that incoming +data is being discarded on receipt at application request. STOP_SENDING +requests that a peer cease transmission on a stream.¶
+A STOP_SENDING frame can be sent for streams in the Recv or Size Known states; +see Section 3.1. Receiving a STOP_SENDING frame for a +locally-initiated stream that has not yet been created MUST be treated as a +connection error of type STREAM_STATE_ERROR. An endpoint that receives a +STOP_SENDING frame for a receive-only stream MUST terminate the connection with +error STREAM_STATE_ERROR.¶
+STOP_SENDING frames are formatted as shown in Figure 29.¶
+STOP_SENDING frames contain the following fields:¶
+A variable-length integer carrying the Stream ID of the stream being ignored.¶
+A variable-length integer containing the application-specified reason the +sender is ignoring the stream; see Section 20.2.¶
+A CRYPTO frame (type=0x06) is used to transmit cryptographic handshake messages. +It can be sent in all packet types except 0-RTT. The CRYPTO frame offers the +cryptographic protocol an in-order stream of bytes. CRYPTO frames are +functionally identical to STREAM frames, except that they do not bear a stream +identifier; they are not flow controlled; and they do not carry markers for +optional offset, optional length, and the end of the stream.¶
+CRYPTO frames are formatted as shown in Figure 30.¶
+CRYPTO frames contain the following fields:¶
+A variable-length integer specifying the byte offset in the stream for the +data in this CRYPTO frame.¶
+A variable-length integer specifying the length of the Crypto Data field in +this CRYPTO frame.¶
+The cryptographic message data.¶
+There is a separate flow of cryptographic handshake data in each encryption +level, each of which starts at an offset of 0. This implies that each encryption +level is treated as a separate CRYPTO stream of data.¶
+The largest offset delivered on a stream - the sum of the offset and data +length - cannot exceed 2^62-1. Receipt of a frame that exceeds this limit MUST +be treated as a connection error of type FRAME_ENCODING_ERROR or +CRYPTO_BUFFER_EXCEEDED.¶
+Unlike STREAM frames, which include a Stream ID indicating to which stream the +data belongs, the CRYPTO frame carries data for a single stream per encryption +level. The stream does not have an explicit end, so CRYPTO frames do not have a +FIN bit.¶
+A server sends a NEW_TOKEN frame (type=0x07) to provide the client with a token +to send in the header of an Initial packet for a future connection.¶
+NEW_TOKEN frames are formatted as shown in Figure 31.¶
+NEW_TOKEN frames contain the following fields:¶
+A variable-length integer specifying the length of the token in bytes.¶
+An opaque blob that the client can use with a future Initial packet. The token +MUST NOT be empty. A client MUST treat receipt of a NEW_TOKEN frame with +an empty Token field as a connection error of type FRAME_ENCODING_ERROR.¶
+A client might receive multiple NEW_TOKEN frames that contain the same token +value if packets containing the frame are incorrectly determined to be lost. +Clients are responsible for discarding duplicate values, which might be used +to link connection attempts; see Section 8.1.3.¶
+Clients MUST NOT send NEW_TOKEN frames. A server MUST treat receipt of a +NEW_TOKEN frame as a connection error of type PROTOCOL_VIOLATION.¶
+STREAM frames implicitly create a stream and carry stream data. The STREAM +frame Type field takes the form 0b00001XXX (or the set of values from 0x08 to +0x0f). The three low-order bits of the frame type determine the fields that +are present in the frame:¶
+An endpoint MUST terminate the connection with error STREAM_STATE_ERROR if it +receives a STREAM frame for a locally-initiated stream that has not yet been +created, or for a send-only stream.¶
+STREAM frames are formatted as shown in Figure 32.¶
+STREAM frames contain the following fields:¶
+A variable-length integer indicating the stream ID of the stream; see +Section 2.1.¶
+A variable-length integer specifying the byte offset in the stream for the +data in this STREAM frame. This field is present when the OFF bit is set to +1. When the Offset field is absent, the offset is 0.¶
+A variable-length integer specifying the length of the Stream Data field in +this STREAM frame. This field is present when the LEN bit is set to 1. When +the LEN bit is set to 0, the Stream Data field consumes all the remaining +bytes in the packet.¶
+The bytes from the designated stream to be delivered.¶
+When a Stream Data field has a length of 0, the offset in the STREAM frame is +the offset of the next byte that would be sent.¶
+The first byte in the stream has an offset of 0. The largest offset delivered +on a stream - the sum of the offset and data length - cannot exceed 2^62-1, as +it is not possible to provide flow control credit for that data. Receipt of a +frame that exceeds this limit MUST be treated as a connection error of type +FRAME_ENCODING_ERROR or FLOW_CONTROL_ERROR.¶
+A MAX_DATA frame (type=0x10) is used in flow control to inform the peer of the +maximum amount of data that can be sent on the connection as a whole.¶
+MAX_DATA frames are formatted as shown in Figure 33.¶
+MAX_DATA frames contain the following field:¶
+A variable-length integer indicating the maximum amount of data that can be +sent on the entire connection, in units of bytes.¶
+All data sent in STREAM frames counts toward this limit. The sum of the final +sizes on all streams - including streams in terminal states - MUST NOT exceed +the value advertised by a receiver. An endpoint MUST terminate a connection +with a FLOW_CONTROL_ERROR error if it receives more data than the maximum data +value that it has sent. This includes violations of remembered limits in Early +Data; see Section 7.4.1.¶
+A MAX_STREAM_DATA frame (type=0x11) is used in flow control to inform a peer +of the maximum amount of data that can be sent on a stream.¶
+A MAX_STREAM_DATA frame can be sent for streams in the Recv state; see +Section 3.1. Receiving a MAX_STREAM_DATA frame for a +locally-initiated stream that has not yet been created MUST be treated as a +connection error of type STREAM_STATE_ERROR. An endpoint that receives a +MAX_STREAM_DATA frame for a receive-only stream MUST terminate the connection +with error STREAM_STATE_ERROR.¶
+MAX_STREAM_DATA frames are formatted as shown in Figure 34.¶
+MAX_STREAM_DATA frames contain the following fields:¶
+The stream ID of the stream that is affected encoded as a variable-length +integer.¶
+A variable-length integer indicating the maximum amount of data that can be +sent on the identified stream, in units of bytes.¶
+When counting data toward this limit, an endpoint accounts for the largest +received offset of data that is sent or received on the stream. Loss or +reordering can mean that the largest received offset on a stream can be greater +than the total size of data received on that stream. Receiving STREAM frames +might not increase the largest received offset.¶
+The data sent on a stream MUST NOT exceed the largest maximum stream data value +advertised by the receiver. An endpoint MUST terminate a connection with a +FLOW_CONTROL_ERROR error if it receives more data than the largest maximum +stream data that it has sent for the affected stream. This includes violations +of remembered limits in Early Data; see Section 7.4.1.¶
+A MAX_STREAMS frame (type=0x12 or 0x13) inform the peer of the cumulative +number of streams of a given type it is permitted to open. A MAX_STREAMS frame +with a type of 0x12 applies to bidirectional streams, and a MAX_STREAMS frame +with a type of 0x13 applies to unidirectional streams.¶
+MAX_STREAMS frames are formatted as shown in Figure 35;¶
+MAX_STREAMS frames contain the following field:¶
+A count of the cumulative number of streams of the corresponding type that +can be opened over the lifetime of the connection. This value cannot exceed +2^60, as it is not possible to encode stream IDs larger than 2^62-1. +Receipt of a frame that permits opening of a stream larger than this limit +MUST be treated as a FRAME_ENCODING_ERROR.¶
+Loss or reordering can cause a MAX_STREAMS frame to be received that state a +lower stream limit than an endpoint has previously received. MAX_STREAMS frames +that do not increase the stream limit MUST be ignored.¶
+An endpoint MUST NOT open more streams than permitted by the current stream +limit set by its peer. For instance, a server that receives a unidirectional +stream limit of 3 is permitted to open stream 3, 7, and 11, but not stream 15. +An endpoint MUST terminate a connection with a STREAM_LIMIT_ERROR error if a +peer opens more streams than was permitted. This includes violations of +remembered limits in Early Data; see Section 7.4.1.¶
+Note that these frames (and the corresponding transport parameters) do not +describe the number of streams that can be opened concurrently. The limit +includes streams that have been closed as well as those that are open.¶
+A sender SHOULD send a DATA_BLOCKED frame (type=0x14) when it wishes to send +data, but is unable to do so due to connection-level flow control; see +Section 4. DATA_BLOCKED frames can be used as input to tuning of flow +control algorithms; see Section 4.2.¶
+DATA_BLOCKED frames are formatted as shown in Figure 36.¶
+DATA_BLOCKED frames contain the following field:¶
+A variable-length integer indicating the connection-level limit at which +blocking occurred.¶
+A sender SHOULD send a STREAM_DATA_BLOCKED frame (type=0x15) when it wishes to +send data, but is unable to do so due to stream-level flow control. This frame +is analogous to DATA_BLOCKED (Section 19.12).¶
+An endpoint that receives a STREAM_DATA_BLOCKED frame for a send-only stream +MUST terminate the connection with error STREAM_STATE_ERROR.¶
+STREAM_DATA_BLOCKED frames are formatted as shown in +Figure 37.¶
+STREAM_DATA_BLOCKED frames contain the following fields:¶
+ +A sender SHOULD send a STREAMS_BLOCKED frame (type=0x16 or 0x17) when it wishes +to open a stream, but is unable to due to the maximum stream limit set by its +peer; see Section 19.11. A STREAMS_BLOCKED frame of type 0x16 is used +to indicate reaching the bidirectional stream limit, and a STREAMS_BLOCKED frame +of type 0x17 is used to indicate reaching the unidirectional stream limit.¶
+A STREAMS_BLOCKED frame does not open the stream, but informs the peer that a +new stream was needed and the stream limit prevented the creation of the stream.¶
+STREAMS_BLOCKED frames are formatted as shown in Figure 38.¶
+STREAMS_BLOCKED frames contain the following field:¶
+A variable-length integer indicating the maximum number of streams allowed +at the time the frame was sent. This value cannot exceed 2^60, as it is +not possible to encode stream IDs larger than 2^62-1. Receipt of a frame +that encodes a larger stream ID MUST be treated as a STREAM_LIMIT_ERROR or a +FRAME_ENCODING_ERROR.¶
+An endpoint sends a NEW_CONNECTION_ID frame (type=0x18) to provide its peer with +alternative connection IDs that can be used to break linkability when migrating +connections; see Section 9.5.¶
+NEW_CONNECTION_ID frames are formatted as shown in Figure 39.¶
+NEW_CONNECTION_ID frames contain the following fields:¶
+The sequence number assigned to the connection ID by the sender, encoded as a +variable-length integer; see Section 5.1.1.¶
+A variable-length integer indicating which connection IDs should be retired; +see Section 5.1.2.¶
+An 8-bit unsigned integer containing the length of the connection ID. Values +less than 1 and greater than 20 are invalid and MUST be treated as a +connection error of type FRAME_ENCODING_ERROR.¶
+A connection ID of the specified length.¶
+A 128-bit value that will be used for a stateless reset when the associated +connection ID is used; see Section 10.3.¶
+An endpoint MUST NOT send this frame if it currently requires that its peer send +packets with a zero-length Destination Connection ID. Changing the length of a +connection ID to or from zero-length makes it difficult to identify when the +value of the connection ID changed. An endpoint that is sending packets with a +zero-length Destination Connection ID MUST treat receipt of a NEW_CONNECTION_ID +frame as a connection error of type PROTOCOL_VIOLATION.¶
+Transmission errors, timeouts and retransmissions might cause the same +NEW_CONNECTION_ID frame to be received multiple times. Receipt of the same +frame multiple times MUST NOT be treated as a connection error. A receiver can +use the sequence number supplied in the NEW_CONNECTION_ID frame to handle +receiving the same NEW_CONNECTION_ID frame multiple times.¶
+If an endpoint receives a NEW_CONNECTION_ID frame that repeats a previously +issued connection ID with a different Stateless Reset Token or a different +sequence number, or if a sequence number is used for different connection +IDs, the endpoint MAY treat that receipt as a connection error of type +PROTOCOL_VIOLATION.¶
+The Retire Prior To field applies to connection IDs established during +connection setup and the preferred_address transport parameter; see +Section 5.1.2. The Retire Prior To field MUST be less than or equal to the +Sequence Number field. Receiving a value greater than the Sequence Number MUST +be treated as a connection error of type FRAME_ENCODING_ERROR.¶
+Once a sender indicates a Retire Prior To value, smaller values sent in +subsequent NEW_CONNECTION_ID frames have no effect. A receiver MUST ignore any +Retire Prior To fields that do not increase the largest received Retire Prior To +value.¶
+An endpoint that receives a NEW_CONNECTION_ID frame with a sequence number +smaller than the Retire Prior To field of a previously received +NEW_CONNECTION_ID frame MUST send a corresponding RETIRE_CONNECTION_ID frame +that retires the newly received connection ID, unless it has already done so +for that sequence number.¶
+An endpoint sends a RETIRE_CONNECTION_ID frame (type=0x19) to indicate that it +will no longer use a connection ID that was issued by its peer. This includes +the connection ID provided during the handshake. Sending a RETIRE_CONNECTION_ID +frame also serves as a request to the peer to send additional connection IDs for +future use; see Section 5.1. New connection IDs can be delivered to a +peer using the NEW_CONNECTION_ID frame (Section 19.15).¶
+Retiring a connection ID invalidates the stateless reset token associated with +that connection ID.¶
+RETIRE_CONNECTION_ID frames are formatted as shown in +Figure 40.¶
+RETIRE_CONNECTION_ID frames contain the following field:¶
+The sequence number of the connection ID being retired; see Section 5.1.2.¶
+Receipt of a RETIRE_CONNECTION_ID frame containing a sequence number greater +than any previously sent to the peer MUST be treated as a connection error of +type PROTOCOL_VIOLATION.¶
+The sequence number specified in a RETIRE_CONNECTION_ID frame MUST NOT refer +to the Destination Connection ID field of the packet in which the frame is +contained. The peer MAY treat this as a connection error of type +PROTOCOL_VIOLATION.¶
+An endpoint cannot send this frame if it was provided with a zero-length +connection ID by its peer. An endpoint that provides a zero-length connection +ID MUST treat receipt of a RETIRE_CONNECTION_ID frame as a connection error of +type PROTOCOL_VIOLATION.¶
+Endpoints can use PATH_CHALLENGE frames (type=0x1a) to check reachability to the +peer and for path validation during connection migration.¶
+PATH_CHALLENGE frames are formatted as shown in Figure 41.¶
+PATH_CHALLENGE frames contain the following field:¶
+This 8-byte field contains arbitrary data.¶
+Including 64 bits of entropy in a PATH_CHALLENGE frame ensures that it is easier +to receive the packet than it is to guess the value correctly.¶
+The recipient of this frame MUST generate a PATH_RESPONSE frame +(Section 19.18) containing the same Data.¶
+A PATH_RESPONSE frame (type=0x1b) is sent in response to a PATH_CHALLENGE frame.¶
+PATH_RESPONSE frames are formatted as shown in Figure 42, which is +identical to the PATH_CHALLENGE frame (Section 19.17).¶
+If the content of a PATH_RESPONSE frame does not match the content of a +PATH_CHALLENGE frame previously sent by the endpoint, the endpoint MAY generate +a connection error of type PROTOCOL_VIOLATION.¶
+An endpoint sends a CONNECTION_CLOSE frame (type=0x1c or 0x1d) to notify its +peer that the connection is being closed. The CONNECTION_CLOSE with a frame +type of 0x1c is used to signal errors at only the QUIC layer, or the absence of +errors (with the NO_ERROR code). The CONNECTION_CLOSE frame with a type of 0x1d +is used to signal an error with the application that uses QUIC.¶
+If there are open streams that have not been explicitly closed, they are +implicitly closed when the connection is closed.¶
+CONNECTION_CLOSE frames are formatted as shown in Figure 43.¶
+CONNECTION_CLOSE frames contain the following fields:¶
+A variable-length integer error code that indicates the reason for +closing this connection. A CONNECTION_CLOSE frame of type 0x1c uses codes +from the space defined in Section 20.1. A CONNECTION_CLOSE frame +of type 0x1d uses codes from the application protocol error code space; see +Section 20.2.¶
+A variable-length integer encoding the type of frame that triggered the error. +A value of 0 (equivalent to the mention of the PADDING frame) is used when the +frame type is unknown. The application-specific variant of CONNECTION_CLOSE +(type 0x1d) does not include this field.¶
+A variable-length integer specifying the length of the reason phrase in bytes. +Because a CONNECTION_CLOSE frame cannot be split between packets, any limits +on packet size will also limit the space available for a reason phrase.¶
+Additional diagnostic information for the closure. This can be zero length if +the sender chooses not to give details beyond the Error Code. This SHOULD be +a UTF-8 encoded string [RFC3629], though the frame does not carry +information, such as language tags, that would aid comprehension by any entity +other than the one that created the text.¶
+The application-specific variant of CONNECTION_CLOSE (type 0x1d) can only be +sent using 0-RTT or 1-RTT packets; see Section 12.5. When an +application wishes to abandon a connection during the handshake, an endpoint +can send a CONNECTION_CLOSE frame (type 0x1c) with an error code of +APPLICATION_ERROR in an Initial or a Handshake packet.¶
+The server uses a HANDSHAKE_DONE frame (type=0x1e) to signal confirmation of +the handshake to the client.¶
+HANDSHAKE_DONE frames are formatted as shown in Figure 44, which +shows that HANDSHAKE_DONE frames have no content.¶
+A HANDSHAKE_DONE frame can only be sent by the server. Servers MUST NOT send a +HANDSHAKE_DONE frame before completing the handshake. A server MUST treat +receipt of a HANDSHAKE_DONE frame as a connection error of type +PROTOCOL_VIOLATION.¶
+QUIC frames do not use a self-describing encoding. An endpoint therefore needs +to understand the syntax of all frames before it can successfully process a +packet. This allows for efficient encoding of frames, but it means that an +endpoint cannot send a frame of a type that is unknown to its peer.¶
+An extension to QUIC that wishes to use a new type of frame MUST first ensure +that a peer is able to understand the frame. An endpoint can use a transport +parameter to signal its willingness to receive extension frame types. One +transport parameter can indicate support for one or more extension frame types.¶
+Extensions that modify or replace core protocol functionality (including frame +types) will be difficult to combine with other extensions that modify or +replace the same functionality unless the behavior of the combination is +explicitly defined. Such extensions SHOULD define their interaction with +previously-defined extensions modifying the same protocol components.¶
+Extension frames MUST be congestion controlled and MUST cause an ACK frame to +be sent. The exception is extension frames that replace or supplement the ACK +frame. Extension frames are not included in flow control unless specified +in the extension.¶
+An IANA registry is used to manage the assignment of frame types; see +Section 22.4.¶
+QUIC transport error codes and application error codes are 62-bit unsigned +integers.¶
+This section lists the defined QUIC transport error codes that can be used in a +CONNECTION_CLOSE frame with a type of 0x1c. These errors apply to the entire +connection.¶
+An endpoint uses this with CONNECTION_CLOSE to signal that the connection is +being closed abruptly in the absence of any error.¶
+The endpoint encountered an internal error and cannot continue with the +connection.¶
+The server refused to accept a new connection.¶
+An endpoint received more data than it permitted in its advertised data +limits; see Section 4.¶
+An endpoint received a frame for a stream identifier that exceeded its +advertised stream limit for the corresponding stream type.¶
+An endpoint received a frame for a stream that was not in a state that +permitted that frame; see Section 3.¶
+An endpoint received a STREAM frame containing data that exceeded the +previously established final size. Or an endpoint received a STREAM frame or +a RESET_STREAM frame containing a final size that was lower than the size of +stream data that was already received. Or an endpoint received a STREAM frame +or a RESET_STREAM frame containing a different final size to the one already +established.¶
+An endpoint received a frame that was badly formatted. For instance, a frame +of an unknown type, or an ACK frame that has more acknowledgment ranges than +the remainder of the packet could carry.¶
+An endpoint received transport parameters that were badly formatted, included +an invalid value, omitted a mandatory transport parameter, included a +forbidden transport parameter, or were otherwise in error.¶
+The number of connection IDs provided by the peer exceeds the advertised +active_connection_id_limit.¶
+An endpoint detected an error with protocol compliance that was not covered by +more specific error codes.¶
+A server received a client Initial that contained an invalid Token field.¶
+The application or application protocol caused the connection to be closed.¶
+An endpoint has received more data in CRYPTO frames than it can buffer.¶
+An endpoint detected errors in performing key updates; see Section 6 of +[QUIC-TLS].¶
+An endpoint has reached the confidentiality or integrity limit for the AEAD +algorithm used by the given connection.¶
+An endpoint has determined that the network path is incapable of supporting +QUIC. An endpoint is unlikely to receive CONNECTION_CLOSE carrying this code +except when the path does not support a large enough MTU.¶
+The cryptographic handshake failed. A range of 256 values is reserved for +carrying error codes specific to the cryptographic handshake that is used. +Codes for errors occurring when TLS is used for the crypto handshake are +described in Section 4.8 of [QUIC-TLS].¶
+See Section 22.5 for details of registering new error codes.¶
+In defining these error codes, several principles are applied. Error conditions +that might require specific action on the part of a recipient are given unique +codes. Errors that represent common conditions are given specific codes. +Absent either of these conditions, error codes are used to identify a general +function of the stack, like flow control or transport parameter handling. +Finally, generic errors are provided for conditions where implementations are +unable or unwilling to use more specific codes.¶
+The management of application error codes is left to application protocols. +Application protocol error codes are used for the RESET_STREAM frame +(Section 19.4), the STOP_SENDING frame (Section 19.5), and +the CONNECTION_CLOSE frame with a type of 0x1d (Section 19.19).¶
+The goal of QUIC is to provide a secure transport connection. +Section 21.1 provides an overview of those properties; subsequent +sections discuss constraints and caveats regarding these properties, including +descriptions of known attacks and countermeasures.¶
+A complete security analysis of QUIC is outside the scope of this document. +This section provides an informal description of the desired security properties +as an aid to implementors and to help guide protocol analysis.¶
+QUIC assumes the threat model described in [SEC-CONS] and provides +protections against many of the attacks that arise from that model.¶
+For this purpose, attacks are divided into passive and active attacks. Passive +attackers have the capability to read packets from the network, while active +attackers also have the capability to write packets into the network. However, +a passive attack could involve an attacker with the ability to cause a routing +change or other modification in the path taken by packets that comprise a +connection.¶
+Attackers are additionally categorized as either on-path attackers or off-path +attackers. An on-path attacker can read, +modify, or remove any packet it observes such that it no longer reaches its +destination, while an off-path attacker observes the packets, but cannot prevent +the original packet from reaching its intended destination. Both types of +attackers can also transmit arbitrary packets. This definition differs from +that of Section 3.5 of [SEC-CONS] in that an off-path attacker is able to +observe packets.¶
+Properties of the handshake, protected packets, and connection migration are +considered separately.¶
+The QUIC handshake incorporates the TLS 1.3 handshake and inherits the +cryptographic properties described in Appendix E.1 of [TLS13]. Many +of the security properties of QUIC depend on the TLS handshake providing these +properties. Any attack on the TLS handshake could affect QUIC.¶
+Any attack on the TLS handshake that compromises the secrecy or uniqueness +of session keys, or the authentication of the participating peers, affects other +security guarantees provided by QUIC that depend on those keys. For instance, +migration (Section 9) depends on the efficacy of confidentiality +protections, both for the negotiation of keys using the TLS handshake and for +QUIC packet protection, to avoid linkability across network paths.¶
+An attack on the integrity of the TLS handshake might allow an attacker to +affect the selection of application protocol or QUIC version.¶
+In addition to the properties provided by TLS, the QUIC handshake provides some +defense against DoS attacks on the handshake.¶
+Address validation (Section 8) is used to verify that an entity +that claims a given address is able to receive packets at that address. Address +validation limits amplification attack targets to addresses for which an +attacker can observe packets.¶
+Prior to address validation, endpoints are limited in what they are able to +send. Endpoints cannot send data toward an unvalidated address in excess of +three times the data received from that address.¶
+The anti-amplification limit only applies when an endpoint responds to packets +received from an unvalidated address. The anti-amplification limit does not +apply to clients when establishing a new connection or when initiating +connection migration.¶
+Computing the server's first flight for a full handshake is potentially +expensive, requiring both a signature and a key exchange computation. In order +to prevent computational DoS attacks, the Retry packet provides a cheap token +exchange mechanism that allows servers to validate a client's IP address prior +to doing any expensive computations at the cost of a single round trip. After a +successful handshake, servers can issue new tokens to a client, which will allow +new connection establishment without incurring this cost.¶
+An on-path or off-path attacker can force a handshake to fail by replacing or +racing Initial packets. Once valid Initial packets have been exchanged, +subsequent Handshake packets are protected with the handshake keys and an +on-path attacker cannot force handshake failure other than by dropping packets +to cause endpoints to abandon the attempt.¶
+An on-path attacker can also replace the addresses of packets on either side and +therefore cause the client or server to have an incorrect view of the remote +addresses. Such an attack is indistinguishable from the functions performed by a +NAT.¶
+The entire handshake is cryptographically protected, with the Initial packets +being encrypted with per-version keys and the Handshake and later packets being +encrypted with keys derived from the TLS key exchange. Further, parameter +negotiation is folded into the TLS transcript and thus provides the same +integrity guarantees as ordinary TLS negotiation. An attacker can observe +the client's transport parameters (as long as it knows the version-specific +salt) but cannot observe the server's transport parameters and cannot influence +parameter negotiation.¶
+Connection IDs are unencrypted but integrity protected in all packets.¶
+This version of QUIC does not incorporate a version negotiation mechanism; +implementations of incompatible versions will simply fail to establish a +connection.¶
+Packet protection (Section 12.1) applies authenticated encryption +to all packets except Version Negotiation packets, though Initial and Retry +packets have limited protection due to the use of version-specific +keying material; see [QUIC-TLS] for more details. This section considers +passive and active attacks against protected packets.¶
+Both on-path and off-path attackers can mount a passive attack in which they +save observed packets for an offline attack against packet protection at a +future time; this is true for any observer of any packet on any network.¶
+A blind attacker, one who injects packets without being able to observe valid +packets for a connection, is unlikely to be successful, since packet protection +ensures that valid packets are only generated by endpoints that possess the +key material established during the handshake; see Section 7 and +Section 21.1.1. Similarly, any active attacker that observes packets +and attempts to insert new data or modify existing data in those packets should +not be able to generate packets deemed valid by the receiving endpoint, +other than Initial packets.¶
+A spoofing attack, in which an active attacker rewrites unprotected parts of a +packet that it forwards or injects, such as the source or destination +address, is only effective if the attacker can forward packets to the original +endpoint. Packet protection ensures that the packet payloads can only be +processed by the endpoints that completed the handshake, and invalid +packets are ignored by those endpoints.¶
+An attacker can also modify the boundaries between packets and UDP datagrams, +causing multiple packets to be coalesced into a single datagram, or splitting +coalesced packets into multiple datagrams. Aside from datagrams containing +Initial packets, which require padding, modification of how packets are +arranged in datagrams has no functional effect on a connection, although it +might change some performance characteristics.¶
+Connection Migration (Section 9) provides endpoints with the ability to +transition between IP addresses and ports on multiple paths, using one path at a +time for transmission and receipt of non-probing frames. Path validation +(Section 8.2) establishes that a peer is both willing and able +to receive packets sent on a particular path. This helps reduce the effects of +address spoofing by limiting the number of packets sent to a spoofed address.¶
+This section describes the intended security properties of connection migration +under various types of DoS attacks.¶
+An attacker that can cause a packet it observes to no longer reach its intended +destination is considered an on-path attacker. When an attacker is present +between a client and server, endpoints are required to send packets through the +attacker to establish connectivity on a given path.¶
+An on-path attacker can:¶
+An on-path attacker cannot:¶
+An on-path attacker has the opportunity to modify the packets that it observes, +however any modifications to an authenticated portion of a packet will cause it +to be dropped by the receiving endpoint as invalid, as packet payloads are both +authenticated and encrypted.¶
+In the presence of an on-path attacker, QUIC aims to provide the following +properties:¶
+An off-path attacker is not directly on the path between a client and server, +but could be able to obtain copies of some or all packets sent between the +client and the server. It is also able to send copies of those packets to +either endpoint.¶
+An off-path attacker can:¶
+ +An off-path attacker cannot:¶
+An off-path attacker can create modified copies of packets that it has observed +and inject those copies into the network, potentially with spoofed source and +destination addresses.¶
+For the purposes of this discussion, it is assumed that an off-path attacker has +the ability to inject a modified copy of a packet into the network that will +reach the destination endpoint prior to the arrival of the original packet +observed by the attacker. In other words, an attacker has the ability to +consistently "win" a race with the legitimate packets between the endpoints, +potentially causing the original packet to be ignored by the recipient.¶
+It is also assumed that an attacker has the resources necessary to affect NAT +state, potentially both causing an endpoint to lose its NAT binding, and an +attacker to obtain the same port for use with its traffic.¶
+In the presence of an off-path attacker, QUIC aims to provide the following +properties:¶
+A limited on-path attacker is an off-path attacker that has offered improved +routing of packets by duplicating and forwarding original packets between the +server and the client, causing those packets to arrive before the original +copies such that the original packets are dropped by the destination endpoint.¶
+A limited on-path attacker differs from an on-path attacker in that it is not on +the original path between endpoints, and therefore the original packets sent by +an endpoint are still reaching their destination. This means that a future +failure to route copied packets to the destination faster than their original +path will not prevent the original packets from reaching the destination.¶
+A limited on-path attacker can:¶
+A limited on-path attacker cannot:¶
+A limited on-path attacker can only delay packets up to the point that the +original packets arrive before the duplicate packets, meaning that it cannot +offer routing with worse latency than the original path. If a limited on-path +attacker drops packets, the original copy will still arrive at the destination +endpoint.¶
+In the presence of a limited on-path attacker, QUIC aims to provide the +following properties:¶
+Note that these guarantees are the same guarantees provided for any NAT, for the +same reasons.¶
+As an encrypted and authenticated transport QUIC provides a range of protections +against denial of service. Once the cryptographic handshake is complete, QUIC +endpoints discard most packets that are not authenticated, greatly limiting the +ability of an attacker to interfere with existing connections.¶
+Once a connection is established QUIC endpoints might accept some +unauthenticated ICMP packets (see Section 14.2.1), but the use of these packets +is extremely limited. The only other type of packet that an endpoint might +accept is a stateless reset (Section 10.3), which relies on the token +being kept secret until it is used.¶
+During the creation of a connection, QUIC only provides protection against +attack from off the network path. All QUIC packets contain proof that the +recipient saw a preceding packet from its peer.¶
+Addresses cannot change during the handshake, so endpoints can discard packets +that are received on a different network path.¶
+The Source and Destination Connection ID fields are the primary means of +protection against off-path attack during the handshake; see +Section 8.1. These are required to match those set by a peer. +Except for an Initial and stateless reset packets, an endpoint only accepts +packets that include a Destination Connection ID field that matches a value the +endpoint previously chose. This is the only protection offered for Version +Negotiation packets.¶
+The Destination Connection ID field in an Initial packet is selected by a client +to be unpredictable, which serves an additional purpose. The packets that carry +the cryptographic handshake are protected with a key that is derived from this +connection ID and a salt specific to the QUIC version. This allows endpoints to +use the same process for authenticating packets that they receive as they use +after the cryptographic handshake completes. Packets that cannot be +authenticated are discarded. Protecting packets in this fashion provides a +strong assurance that the sender of the packet saw the Initial packet and +understood it.¶
+These protections are not intended to be effective against an attacker that is +able to receive QUIC packets prior to the connection being established. Such an +attacker can potentially send packets that will be accepted by QUIC endpoints. +This version of QUIC attempts to detect this sort of attack, but it expects that +endpoints will fail to establish a connection rather than recovering. For the +most part, the cryptographic handshake protocol [QUIC-TLS] is responsible for +detecting tampering during the handshake.¶
+Endpoints are permitted to use other methods to detect and attempt to recover +from interference with the handshake. Invalid packets can be identified and +discarded using other methods, but no specific method is mandated in this +document.¶
+An attacker might be able to receive an address validation token +(Section 8) from a server and then release the IP address it used +to acquire that token. At a later time, the attacker can initiate a 0-RTT +connection with a server by spoofing this same address, which might now address +a different (victim) endpoint. The attacker can thus potentially cause the +server to send an initial congestion window's worth of data towards the victim.¶
+Servers SHOULD provide mitigations for this attack by limiting the usage and +lifetime of address validation tokens; see Section 8.1.3.¶
+An endpoint that acknowledges packets it has not received might cause a +congestion controller to permit sending at rates beyond what the network +supports. An endpoint MAY skip packet numbers when sending packets to detect +this behavior. An endpoint can then immediately close the connection with a +connection error of type PROTOCOL_VIOLATION; see Section 10.2.¶
+A request forgery attack occurs where an endpoint causes its peer to issue a +request towards a victim, with the request controlled by the endpoint. Request +forgery attacks aim to provide an attacker with access to capabilities of its +peer that might otherwise be unavailable to the attacker. For a networking +protocol, a request forgery attack is often used to exploit any implicit +authorization conferred on the peer by the victim due to the peer's location in +the network.¶
+For request forgery to be effective, an attacker needs to be able to influence +what packets the peer sends and where these packets are sent. If an attacker +can target a vulnerable service with a controlled payload, that service might +perform actions that are attributed to the attacker's peer, but decided by the +attacker.¶
+For example, cross-site request forgery [CSRF] +exploits on the Web cause a client to issue requests that include authorization +cookies [COOKIE], allowing one site access to information and +actions that are intended to be restricted to a different site.¶
+As QUIC runs over UDP, the primary attack modality of concern is one where an +attacker can select the address to which its peer sends UDP datagrams and can +control some of the unprotected content of those packets. As much of the data +sent by QUIC endpoints is protected, this includes control over ciphertext. An +attack is successful if an attacker can cause a peer to send a UDP datagram to +a host that will perform some action based on content in the datagram.¶
+This section discusses ways in which QUIC might be used for request forgery +attacks.¶
+This section also describes limited countermeasures that can be implemented by +QUIC endpoints. These mitigations can be employed unilaterally by a QUIC +implementation or deployment, without potential targets for request forgery +attacks taking action. However these countermeasures could be insufficient if +UDP-based services do not properly authorize requests.¶
+Because the migration attack described in +Section 21.5.4 is quite powerful and does not have +adequate countermeasures, QUIC server implementations should assume that +attackers can cause them to generate arbitrary UDP payloads to arbitrary +destinations. QUIC servers SHOULD NOT be deployed in networks that do not deploy +ingress filtering [BCP38] and also have inadequately secured UDP endpoints.¶
+Although it is not generally possible to ensure that clients are not co-located +with vulnerable endpoints, this version of QUIC does not allow servers to +migrate, thus preventing spoofed migration attacks on clients. Any future +extension which allows server migration MUST also define countermeasures for +forgery attacks.¶
+QUIC offers some opportunities for an attacker to influence or control where +its peer sends UDP datagrams:¶
+In all cases, the attacker can cause its peer to send datagrams to a +victim that might not understand QUIC. That is, these packets are sent by +the peer prior to address validation; see Section 8.¶
+Outside of the encrypted portion of packets, QUIC offers an endpoint several +options for controlling the content of UDP datagrams that its peer sends. The +Destination Connection ID field offers direct control over bytes that appear +early in packets sent by the peer; see Section 5.1. The Token field in +Initial packets offers a server control over other bytes of Initial packets; +see Section 17.2.2.¶
+There are no measures in this version of QUIC to prevent indirect control over +the encrypted portions of packets. It is necessary to assume that endpoints are +able to control the contents of frames that a peer sends, especially those +frames that convey application data, such as STREAM frames. Though this depends +to some degree on details of the application protocol, some control is possible +in many protocol usage contexts. As the attacker has access to packet +protection keys, they are likely to be capable of predicting how a peer will +encrypt future packets. Successful control over datagram content then only +requires that the attacker be able to predict the packet number and placement +of frames in packets with some amount of reliability.¶
+This section assumes that limiting control over datagram content is not +feasible. The focus of the mitigations in subsequent sections is on limiting +the ways in which datagrams that are sent prior to address validation can be +used for request forgery.¶
+An attacker acting as a server can choose the IP address and port on which it +advertises its availability, so Initial packets from clients are assumed to be +available for use in this sort of attack. The address validation implicit in +the handshake ensures that - for a new connection - a client will not send +other types of packet to a destination that does not understand QUIC or is not +willing to accept a QUIC connection.¶
+Initial packet protection (Section 5.2 of [QUIC-TLS]) makes it difficult for +servers to control the content of Initial packets sent by clients. A client +choosing an unpredictable Destination Connection ID ensures that servers are +unable to control any of the encrypted portion of Initial packets from clients.¶
+However, the Token field is open to server control and does allow a server to +use clients to mount request forgery attacks. Use of tokens provided with the +NEW_TOKEN frame (Section 8.1.3) offers the only option for request +forgery during connection establishment.¶
+Clients however are not obligated to use the NEW_TOKEN frame. Request forgery +attacks that rely on the Token field can be avoided if clients send an empty +Token field when the server address has changed from when the NEW_TOKEN frame +was received.¶
+Clients could avoid using NEW_TOKEN if the server address changes. However, not +including a Token field could adversely affect performance. Servers could rely +on NEW_TOKEN to enable sending of data in excess of the three times limit on +sending data; see Section 8.1. In particular, this affects cases +where clients use 0-RTT to request data from servers.¶
+Sending a Retry packet (Section 17.2.5) offers a server the option to change +the Token field. After sending a Retry, the server can also control the +Destination Connection ID field of subsequent Initial packets from the client. +This also might allow indirect control over the encrypted content of Initial +packets. However, the exchange of a Retry packet validates the server's +address, thereby preventing the use of subsequent Initial packets for request +forgery.¶
+Servers can specify a preferred address, which clients then migrate to after +confirming the handshake; see Section 9.6. The Destination Connection +ID field of packets that the client sends to a preferred address can be used +for request forgery.¶
+A client MUST NOT send non-probing frames to a preferred address prior to +validating that address; see Section 8. This greatly reduces the +options that a server has to control the encrypted portion of datagrams.¶
+This document does not offer any additional countermeasures that are specific +to use of preferred addresses and can be implemented by endpoints. The generic +measures described in Section 21.5.6 could be used as further mitigation.¶
+Clients are able to present a spoofed source address as part of an apparent +connection migration to cause a server to send datagrams to that address.¶
+The Destination Connection ID field in any packets that a server subsequently +sends to this spoofed address can be used for request forgery. A client might +also be able to influence the ciphertext.¶
+A server that only sends probing packets (Section 9.1) to an address prior to +address validation provides an attacker with only limited control over the +encrypted portion of datagrams. However, particularly for NAT rebinding, this +can adversely affect performance. If the server sends frames carrying +application data, an attacker might be able to control most of the content of +datagrams.¶
+This document does not offer specific countermeasures that can be implemented +by endpoints aside from the generic measures described in Section 21.5.6. +However, countermeasures for address spoofing at the network level, in +particular ingress filtering [BCP38], are especially effective +against attacks that use spoofing and originate from an external network.¶
+Clients that are able to present a spoofed source address on a packet can cause +a server to send a Version Negotiation packet Section 17.2.1 to that +address.¶
+The absence of size restrictions on the connection ID fields for packets of an +unknown version increases the amount of data that the client controls from the +resulting datagram. The first byte of this packet is not under client control +and the next four bytes are zero, but the client is able to control up to 512 +bytes starting from the fifth byte.¶
+No specific countermeasures are provided for this attack, though generic +protections Section 21.5.6 could apply. In this case, ingress filtering +[BCP38] is also effective.¶
+The most effective defense against request forgery attacks is to modify +vulnerable services to use strong authentication. However, this is not always +something that is within the control of a QUIC deployment. This section +outlines some others steps that QUIC endpoints could take unilaterally. These +additional steps are all discretionary as, depending on circumstances, they +could interfere with or prevent legitimate uses.¶
+Services offered over loopback interfaces often lack proper authentication. +Endpoints MAY prevent connection attempts or migration to a loopback address. +Endpoints SHOULD NOT allow connections or migration to a loopback address if the +same service was previously available at a different interface or if the address +was provided by a service at a non-loopback address. Endpoints that depend on +these capabilities could offer an option to disable these protections.¶
+Similarly, endpoints could regard a change in address to link-local address +[RFC4291] or an address in a private use range [RFC1918] from a global, +unique-local [RFC4193], or non-private address as a potential attempt at +request forgery. Endpoints could refuse to use these addresses entirely, but +that carries a significant risk of interfering with legitimate uses. Endpoints +SHOULD NOT refuse to use an address unless they have specific knowledge about +the network indicating that sending datagrams to unvalidated addresses in a +given range is not safe.¶
+Endpoints MAY choose to reduce the risk of request forgery by not including +values from NEW_TOKEN frames in Initial packets or by only sending probing +frames in packets prior to completing address validation. Note that this does +not prevent an attacker from using the Destination Connection ID field for an +attack.¶
+Endpoints are not expected to have specific information about the location of +servers that could be vulnerable targets of a request forgery attack. However, +it might be possible over time to identify specific UDP ports that are common +targets of attacks or particular patterns in datagrams that are used for +attacks. Endpoints MAY choose to avoid sending datagrams to these ports or not +send datagrams that match these patterns prior to validating the destination +address. Endpoints MAY retire connection IDs containing patterns known to be +problematic without using them.¶
+Modifying endpoints to apply these protections is more efficient than +deploying network-based protections, as endpoints do not need to perform +any additional processing when sending to an address that has been validated.¶
+The attacks commonly known as Slowloris ([SLOWLORIS]) try to keep many +connections to the target endpoint open and hold them open as long as possible. +These attacks can be executed against a QUIC endpoint by generating the minimum +amount of activity necessary to avoid being closed for inactivity. This might +involve sending small amounts of data, gradually opening flow control windows in +order to control the sender rate, or manufacturing ACK frames that simulate a +high loss rate.¶
+QUIC deployments SHOULD provide mitigations for the Slowloris attacks, such as +increasing the maximum number of clients the server will allow, limiting the +number of connections a single IP address is allowed to make, imposing +restrictions on the minimum transfer speed a connection is allowed to have, and +restricting the length of time an endpoint is allowed to stay connected.¶
+An adversarial sender might intentionally not send portions of the stream data, +causing the receiver to commit resources for the unsent data. This could +cause a disproportionate receive buffer memory commitment and/or the creation of +a large and inefficient data structure at the receiver.¶
+An adversarial receiver might intentionally not acknowledge packets containing +stream data in an attempt to force the sender to store the unacknowledged stream +data for retransmission.¶
+The attack on receivers is mitigated if flow control windows correspond to +available memory. However, some receivers will over-commit memory and +advertise flow control offsets in the aggregate that exceed actual available +memory. The over-commitment strategy can lead to better performance when +endpoints are well behaved, but renders endpoints vulnerable to the stream +fragmentation attack.¶
+QUIC deployments SHOULD provide mitigations against stream fragmentation +attacks. Mitigations could consist of avoiding over-committing memory, +limiting the size of tracking data structures, delaying reassembly +of STREAM frames, implementing heuristics based on the age and +duration of reassembly holes, or some combination.¶
+An adversarial endpoint can open a large number of streams, exhausting state on +an endpoint. The adversarial endpoint could repeat the process on a large +number of connections, in a manner similar to SYN flooding attacks in TCP.¶
+Normally, clients will open streams sequentially, as explained in Section 2.1. +However, when several streams are initiated at short intervals, loss or +reordering can cause STREAM frames that open streams to be received out of +sequence. On receiving a higher-numbered stream ID, a receiver is required to +open all intervening streams of the same type; see Section 3.2. +Thus, on a new connection, opening stream 4000000 opens 1 million and 1 +client-initiated bidirectional streams.¶
+The number of active streams is limited by the initial_max_streams_bidi and +initial_max_streams_uni transport parameters as updated by any received +MAX_STREAMS frames, as explained in +Section 4.6. If chosen judiciously, these limits mitigate the +effect of the stream commitment attack. However, setting the limit too low +could affect performance when applications expect to open large number of +streams.¶
+QUIC and TLS both contain frames or messages that have legitimate uses in some +contexts, but that can be abused to cause a peer to expend processing resources +without having any observable impact on the state of the connection.¶
+Messages can also be used to change and revert state in small or inconsequential +ways, such as by sending small increments to flow control limits.¶
+If processing costs are disproportionately large in comparison to bandwidth +consumption or effect on state, then this could allow a malicious peer to +exhaust processing capacity.¶
+While there are legitimate uses for all messages, implementations SHOULD track +cost of processing relative to progress and treat excessive quantities of any +non-productive packets as indicative of an attack. Endpoints MAY respond to +this condition with a connection error, or by dropping packets.¶
+An on-path attacker could manipulate the value of ECN fields in the IP header +to influence the sender's rate. [RFC3168] discusses manipulations and their +effects in more detail.¶
+A limited on-path attacker can duplicate and send packets with modified ECN +fields to affect the sender's rate. If duplicate packets are discarded by a +receiver, an attacker will need to race the duplicate packet against the +original to be successful in this attack. Therefore, QUIC endpoints ignore the +ECN field on an IP packet unless at least one QUIC packet in that IP packet is +successfully processed; see Section 13.4.¶
+Stateless resets create a possible denial of service attack analogous to a TCP +reset injection. This attack is possible if an attacker is able to cause a +stateless reset token to be generated for a connection with a selected +connection ID. An attacker that can cause this token to be generated can reset +an active connection with the same connection ID.¶
+If a packet can be routed to different instances that share a static key, for +example by changing an IP address or port, then an attacker can cause the server +to send a stateless reset. To defend against this style of denial of service, +endpoints that share a static key for stateless reset (see Section 10.3.2) MUST +be arranged so that packets with a given connection ID always arrive at an +instance that has connection state, unless that connection is no longer active.¶
+More generally, servers MUST NOT generate a stateless reset if a connection with +the corresponding connection ID could be active on any endpoint using the same +static key.¶
+In the case of a cluster that uses dynamic load balancing, it is possible that a +change in load balancer configuration could occur while an active instance +retains connection state. Even if an instance retains connection state, the +change in routing and resulting stateless reset will result in the connection +being terminated. If there is no chance of the packet being routed to the +correct instance, it is better to send a stateless reset than wait for the +connection to time out. However, this is acceptable only if the routing cannot +be influenced by an attacker.¶
+This document defines QUIC Version Negotiation packets in +Section 6 that can be used to negotiate the QUIC version used +between two endpoints. However, this document does not specify how this +negotiation will be performed between this version and subsequent future +versions. In particular, Version Negotiation packets do not contain any +mechanism to prevent version downgrade attacks. Future versions of QUIC that +use Version Negotiation packets MUST define a mechanism that is robust against +version downgrade attacks.¶
+Deployments should limit the ability of an attacker to target a new connection +to a particular server instance. Ideally, routing decisions are made +independently of client-selected values, including addresses. Once an instance +is selected, a connection ID can be selected so that later packets are routed to +the same instance.¶
+The length of QUIC packets can reveal information about the length of the +content of those packets. The PADDING frame is provided so that endpoints have +some ability to obscure the length of packet content; see Section 19.1.¶
+Note however that defeating traffic analysis is challenging and the subject of +active research. Length is not the only way that information might leak. +Endpoints might also reveal sensitive information through other side channels, +such as the timing of packets.¶
+This document establishes several registries for the management of codepoints in +QUIC. These registries operate on a common set of policies as defined in +Section 22.1.¶
+All QUIC registries allow for both provisional and permanent registration of +codepoints. This section documents policies that are common to these +registries.¶
+Provisional registration of codepoints are intended to allow for private use and +experimentation with extensions to QUIC. Provisional registrations only require +the inclusion of the codepoint value and contact information. However, +provisional registrations could be reclaimed and reassigned for another purpose.¶
+Provisional registrations require Expert Review, as defined in Section 4.5 of +[RFC8126]. Designated expert(s) are advised that only registrations for an +excessive proportion of remaining codepoint space or the very first unassigned +value (see Section 22.1.2) can be rejected.¶
+Provisional registrations will include a date field that indicates when the +registration was last updated. A request to update the date on any provisional +registration can be made without review from the designated expert(s).¶
+All QUIC registries include the following fields to support provisional +registration:¶
+The assigned codepoint.¶
+"Permanent" or "Provisional".¶
+A reference to a publicly available specification for the value.¶
+The date of last update to the registration.¶
+The entity that is responsible for the definition of the registration.¶
+Contact details for the registrant.¶
+Supplementary notes about the registration.¶
+Provisional registrations MAY omit the Specification and Notes fields, plus any +additional fields that might be required for a permanent registration. The Date +field is not required as part of requesting a registration as it is set to the +date the registration is created or updated.¶
+New uses of codepoints from QUIC registries SHOULD use a randomly selected +codepoint that excludes both existing allocations and the first unallocated +codepoint in the selected space. Requests for multiple codepoints MAY use a +contiguous range. This minimizes the risk that differing semantics are +attributed to the same codepoint by different implementations.¶
+Use of the first unassigned codepoint is reserved for allocation using the +Standards Action policy; see Section 4.9 of [RFC8126]. The early codepoint +assignment process [EARLY-ASSIGN] can be used for these values.¶
+For codepoints that are encoded in variable-length integers +(Section 16), such as frame types, codepoints that encode to four or +eight bytes (that is, values 2^14 and above) SHOULD be used unless the usage is +especially sensitive to having a longer encoding.¶
+Applications to register codepoints in QUIC registries MAY include a +requested codepoint +as part of the registration. IANA MUST allocate the selected codepoint if the +codepoint is unassigned and the requirements of the registration policy are met.¶
+A request might be made to remove an unused provisional registration from the +registry to reclaim space in a registry, or portion of the registry (such as the +64-16383 range for codepoints that use variable-length encodings). This SHOULD +be done only for the codepoints with the earliest recorded date and entries that +have been updated less than a year prior SHOULD NOT be reclaimed.¶
+A request to remove a codepoint MUST be reviewed by the designated expert(s). +The expert(s) MUST attempt to determine whether the codepoint is still in use. +Experts are advised to contact the listed contacts for the registration, plus as +wide a set of protocol implementers as possible in order to determine whether +any use of the codepoint is known. The expert(s) are advised to allow at least +four weeks for responses.¶
+If any use of the codepoints is identified by this search or a request to update +the registration is made, the codepoint MUST NOT be reclaimed. Instead, the +date on the registration is updated. A note might be added for the registration +recording relevant information that was learned.¶
+If no use of the codepoint was identified and no request was made to update the +registration, the codepoint MAY be removed from the registry.¶
+This review and consultation process also applies to requests to change a +provisional registration into a permanent registration, except that the goal is +not to determine whether there is no use of the codepoint, but to determine that +the registration is an accurate representation of any deployed usage.¶
+Permanent registrations in QUIC registries use the Specification Required policy +([RFC8126]), unless otherwise specified. The designated expert(s) verify +that a specification exists and is readily accessible. Expert(s) are encouraged +to be biased towards approving registrations unless they are abusive, frivolous, +or actively harmful (not merely aesthetically displeasing, or architecturally +dubious). The creation of a registry MAY specify additional constraints on +permanent registrations.¶
+The creation of a registry MAY identify a range of codepoints where +registrations are governed by a different registration policy. For instance, +the frame type registry in Section 22.4 has a stricter policy for codepoints +in the range from 0 to 63.¶
+Any stricter requirements for permanent registrations do not prevent provisional +registrations for affected codepoints. For instance, a provisional registration +for a frame type of 61 could be requested.¶
+All registrations made by Standards Track publications MUST be permanent.¶
+All registrations in this document are assigned a permanent status and list a +change controller of the IETF and a contact of the QUIC working group +(quic@ietf.org).¶
+IANA [SHALL add/has added] a registry for "QUIC Versions" under a "QUIC" +heading.¶
+The "QUIC Versions" registry governs a 32-bit space; see Section 15. This +registry follows the registration policy from Section 22.1. Permanent +registrations in this registry are assigned using the Specification Required +policy ([RFC8126]).¶
+The codepoint of 0x00000001 to the protocol is assigned with permanent status +to the protocol defined in this document. The codepoint of 0x00000000 is +permanently reserved; the note for this codepoint [shall] indicate[s] that +this version is reserved for Version Negotiation.¶
+All codepoints that follow the pattern 0x?a?a?a?a are reserved and MUST NOT be +assigned by IANA and MUST NOT appear in the listing of assigned values.¶
+[[RFC editor: please remove the following note before publication.]]¶
+Several pre-standardization versions will likely be in use at the time of +publication. There is no need to document these in an RFC, but recording +information about these version will ensure that the information in the +registry is accurate. The document editors or working group chairs can +facilitate getting the necessary information.¶
+IANA [SHALL add/has added] a registry for "QUIC Transport Parameters" under a +"QUIC" heading.¶
+The "QUIC Transport Parameters" registry governs a 62-bit space. This registry +follows the registration policy from Section 22.1. Permanent registrations +in this registry are assigned using the Specification Required policy +([RFC8126]).¶
+In addition to the fields in Section 22.1.1, permanent registrations in +this registry MUST include the following field:¶
+A short mnemonic for the parameter.¶
+The initial contents of this registry are shown in Table 6.¶
+Value | +Parameter Name | +Specification | +
---|---|---|
0x00 | +original_destination_connection_id | ++ Section 18.2 + | +
0x01 | +max_idle_timeout | ++ Section 18.2 + | +
0x02 | +stateless_reset_token | ++ Section 18.2 + | +
0x03 | +max_udp_payload_size | ++ Section 18.2 + | +
0x04 | +initial_max_data | ++ Section 18.2 + | +
0x05 | +initial_max_stream_data_bidi_local | ++ Section 18.2 + | +
0x06 | +initial_max_stream_data_bidi_remote | ++ Section 18.2 + | +
0x07 | +initial_max_stream_data_uni | ++ Section 18.2 + | +
0x08 | +initial_max_streams_bidi | ++ Section 18.2 + | +
0x09 | +initial_max_streams_uni | ++ Section 18.2 + | +
0x0a | +ack_delay_exponent | ++ Section 18.2 + | +
0x0b | +max_ack_delay | ++ Section 18.2 + | +
0x0c | +disable_active_migration | ++ Section 18.2 + | +
0x0d | +preferred_address | ++ Section 18.2 + | +
0x0e | +active_connection_id_limit | ++ Section 18.2 + | +
0x0f | +initial_source_connection_id | ++ Section 18.2 + | +
0x10 | +retry_source_connection_id | ++ Section 18.2 + | +
Each value of the format 31 * N + 27
for integer values of N (that is, 27, 58,
+89, ...) are reserved; these values MUST NOT be assigned by IANA and MUST NOT
+appear in the listing of assigned values.¶
IANA [SHALL add/has added] a registry for "QUIC Frame Types" under a +"QUIC" heading.¶
+The "QUIC Frame Types" registry governs a 62-bit space. This registry follows +the registration policy from Section 22.1. Permanent registrations in this +registry are assigned using the Specification Required policy ([RFC8126]), +except for values between 0x00 and 0x3f (in hexadecimal; inclusive), which are +assigned using Standards Action or IESG Approval as defined in Section 4.9 and +4.10 of [RFC8126].¶
+In addition to the fields in Section 22.1.1, permanent registrations in +this registry MUST include the following field:¶
+A short mnemonic for the frame type.¶
+In addition to the advice in Section 22.1, specifications for new permanent +registrations SHOULD describe the means by which an endpoint might determine +that it can send the identified type of frame. An accompanying transport +parameter registration is expected for most registrations; see +Section 22.3. Specifications for permanent registrations also +need to describe the format and assigned semantics of any fields in the frame.¶
+The initial contents of this registry are tabulated in Table 3. Note +that the registry does not include the "Pkts" and "Spec" columns from +Table 3.¶
+IANA [SHALL add/has added] a registry for "QUIC Transport Error Codes" under a +"QUIC" heading.¶
+The "QUIC Transport Error Codes" registry governs a 62-bit space. This space is +split into three regions that are governed by different policies. Permanent +registrations in this registry are assigned using the Specification Required +policy ([RFC8126]), except for values between 0x00 and 0x3f (in hexadecimal; +inclusive), which are assigned using Standards Action or IESG Approval as +defined in Section 4.9 and 4.10 of [RFC8126].¶
+In addition to the fields in Section 22.1.1, permanent registrations in +this registry MUST include the following fields:¶
+A short mnemonic for the parameter.¶
+A brief description of the error code semantics, which MAY be a summary if a +specification reference is provided.¶
+The initial contents of this registry are shown in Table 7.¶
+Value | +Code | +Description | +Specification | +
---|---|---|---|
0x0 | +NO_ERROR | +No error | ++ Section 20 + | +
0x1 | +INTERNAL_ERROR | +Implementation error | ++ Section 20 + | +
0x2 | +CONNECTION_REFUSED | +Server refuses a connection | ++ Section 20 + | +
0x3 | +FLOW_CONTROL_ERROR | +Flow control error | ++ Section 20 + | +
0x4 | +STREAM_LIMIT_ERROR | +Too many streams opened | ++ Section 20 + | +
0x5 | +STREAM_STATE_ERROR | +Frame received in invalid stream state | ++ Section 20 + | +
0x6 | +FINAL_SIZE_ERROR | +Change to final size | ++ Section 20 + | +
0x7 | +FRAME_ENCODING_ERROR | +Frame encoding error | ++ Section 20 + | +
0x8 | +TRANSPORT_PARAMETER_ERROR | +Error in transport parameters | ++ Section 20 + | +
0x9 | +CONNECTION_ID_LIMIT_ERROR | +Too many connection IDs received | ++ Section 20 + | +
0xa | +PROTOCOL_VIOLATION | +Generic protocol violation | ++ Section 20 + | +
0xb | +INVALID_TOKEN | +Invalid Token Received | ++ Section 20 + | +
0xc | +APPLICATION_ERROR | +Application error | ++ Section 20 + | +
0xd | +CRYPTO_BUFFER_EXCEEDED | +CRYPTO data buffer overflowed | ++ Section 20 + | +
0xe | +KEY_UPDATE_ERROR | +Invalid packet protection update | ++ Section 20 + | +
0xf | +AEAD_LIMIT_REACHED | +Excessive use of packet protection keys | ++ Section 20 + | +
0x10 | +NO_VIABLE_PATH | +No viable network path exists | ++ Section 20 + | +
The pseudocode in this section describes sample algorithms. These algorithms +are intended to be correct and clear, rather than being optimally performant.¶
+The pseudocode segments in this section are licensed as Code Components; see the +copyright notice.¶
+The pseudocode in Figure 45 shows how a variable-length integer can be +read from a stream of bytes. The function ReadVarint takes a single argument, a +sequence of bytes which can be read in network byte order.¶
+For example, the eight-byte sequence 0xc2197c5eff14e88c decodes to the decimal +value 151,288,809,941,952,652; the four-byte sequence 0x9d7f3e7d decodes to +494,878,333; the two-byte sequence 0x7bbd decodes to 15,293; and the single byte +0x25 decodes to 37 (as does the two-byte sequence 0x4025).¶
+The pseudocode in Figure 46 shows how an implementation can select +an appropriate size for packet number encodings.¶
+The EncodePacketNumber function takes two arguments:¶
+For example, if an endpoint has received an acknowledgment for packet 0xabe8bc +and is sending a packet with a number of 0xac5c02, there are 29,519 (0x734f) +outstanding packets. In order to represent at least twice this range (59,038 +packets, or 0xe69e), 16 bits are required.¶
+In the same state, sending a packet with a number of 0xace8fe uses the 24-bit +encoding, because at least 18 bits are required to represent twice the range +(131,182 packets, or 0x2006e).¶
+The pseudocode in Figure 47 includes an example algorithm for decoding +packet numbers after header protection has been removed.¶
+The DecodePacketNumber function takes three arguments:¶
+For example, if the highest successfully authenticated packet had a packet +number of 0xa82f30ea, then a packet containing a 16-bit value of 0x9b32 will be +decoded as 0xa82f9b32.¶
+Each time an endpoint commences sending on a new network path, it determines +whether the path supports ECN; see Section 13.4. If the path supports ECN, the goal +is to use ECN. Endpoints might also periodically reassess a path that was +determined to not support ECN.¶
+This section describes one method for testing new paths. This algorithm is +intended to show how a path might be tested for ECN support. Endpoints can +implement different methods.¶
+The path is assigned an ECN state that is one of "testing", "unknown", "failed", +or "capable". On paths with a "testing" or "capable" state the endpoint sends +packets with an ECT marking, by default ECT(0); otherwise, the endpoint sends +unmarked packets.¶
+To start testing a path, the ECN state is set to "testing" and existing ECN +counts are remembered as a baseline.¶
+The testing period runs for a number of packets or a limited time, as +determined by the endpoint. The goal is not to limit the duration of the +testing period, but to ensure that enough marked packets are sent for received +ECN counts to provide a clear indication of how the path treats marked packets. +Section 13.4.2 suggests limiting this to 10 packets or 3 times the probe +timeout.¶
+After the testing period ends, the ECN state for the path becomes "unknown". +From the "unknown" state, successful validation of the ECN counts an ACK frame +(see Section 13.4.2.1) causes the ECN state for the path to become "capable", unless +no marked packet has been acknowledged.¶
+If validation of ECN counts fails at any time, the ECN state for the affected +path becomes "failed". An endpoint can also mark the ECN state for a path as +"failed" if marked packets are all declared lost or if they are all CE marked.¶
+Following this algorithm ensures that ECN is rarely disabled for paths that +properly support ECN. Any path that incorrectly modifies markings will cause +ECN to be disabled. For those rare cases where marked packets are discarded by +the path, the short duration of the testing period limits the number of losses +incurred.¶
+Issue and pull request numbers are listed with a leading octothorp.¶
+A number of improvements to IANA considerations:¶
+ +Require expansion of datagrams to ensure that a path supports at least 1200 +bytes in both directions:¶
+ +Stateless reset changes (#2152, #2993)¶
+ +Rework the first byte (#2006)¶
+Substantial editorial reorganization; no technical changes.¶
+Changes to integration of the TLS handshake (#829, #1018, #1094, #1165, #1190, +#1233, #1242, #1252, #1450, #1458)¶
+Streams are split into unidirectional and bidirectional (#643, #656, #720, +#872, #175, #885)¶
+ +Improvements to connection close¶
+ +Split some frames into separate connection- and stream- level frames +(#443)¶
+ +Transport parameters for 0-RTT are retained from a previous connection (#405, +#513, #512)¶
+The original design and rationale behind this protocol draw significantly from +work by Jim Roskind [EARLY-DESIGN].¶
+The IETF QUIC Working Group received an enormous amount of support from many +people. The following people provided substantive contributions to this +document:¶
+奥 一穂 (Kazuho Oku)¶
+Mikkel Fahnøe Jørgensen¶
+Mirja Kühlewind¶
+View saved issues, or the latest GitHub issues and pull requests.
+draft-ietf-quic-recovery | +html | +plain text | +diff with master | +diff with last submission | +
---|---|---|---|---|
draft-ietf-quic-invariants | +html | +plain text | +diff with master | +diff with last submission | +
draft-ietf-quic-transport | +html | +plain text | +diff with master | +diff with last submission | +
draft-ietf-quic-qpack | +html | +plain text | +diff with master | +diff with last submission | +
draft-ietf-quic-http | +html | +plain text | +diff with master | +diff with last submission | +
draft-ietf-quic-tls | +html | +plain text | +diff with master | +diff with last submission | +