diff --git a/final-http/draft-ietf-quic-http.html b/final-http/draft-ietf-quic-http.html deleted file mode 100644 index ccc4e70a5a..0000000000 --- a/final-http/draft-ietf-quic-http.html +++ /dev/null @@ -1,5341 +0,0 @@ - - -
- - - -Internet-Draft | -HTTP/3 | -December 2020 | -
Bishop | -Expires 13 June 2021 | -[Page] | -
The QUIC transport protocol has several features that are desirable in a -transport for HTTP, such as stream multiplexing, per-stream flow control, and -low-latency connection establishment. This document describes a mapping of HTTP -semantics over QUIC. This document also identifies HTTP/2 features that are -subsumed by QUIC, and describes how HTTP/2 extensions can be ported to HTTP/3.¶
-Discussion of this draft takes place on the QUIC working group mailing list -(quic@ietf.org), which is archived at -https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
-Working Group information can be found at https://github.com/quicwg; source -code and issues list for this draft can be found at -https://github.com/quicwg/base-drafts/labels/-http.¶
-- This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79.¶
-- Internet-Drafts are working documents of the Internet Engineering Task - Force (IETF). Note that other groups may also distribute working - documents as Internet-Drafts. The list of current Internet-Drafts is - at https://datatracker.ietf.org/drafts/current/.¶
-- Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress."¶
-- This Internet-Draft will expire on 13 June 2021.¶
-- Copyright (c) 2020 IETF Trust and the persons identified as the - document authors. All rights reserved.¶
-- This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with - respect to this document. Code Components extracted from this - document must include Simplified BSD License text as described in - Section 4.e of the Trust Legal Provisions and are provided without - warranty as described in the Simplified BSD License.¶
-HTTP semantics ([SEMANTICS]) are used for a broad -range of services on the Internet. These semantics have most commonly been used -with HTTP/1.1, over a variety of transport and session layers, and with HTTP/2 -over TLS. HTTP/3 supports the same semantics over a new transport protocol, -QUIC.¶
-HTTP/1.1 ([HTTP11]) uses whitespace-delimited text -fields to convey HTTP messages. While these exchanges are human-readable, using -whitespace for message formatting leads to parsing complexity and excessive -tolerance of variant behavior. Because HTTP/1.x does not include a multiplexing -layer, multiple TCP connections are often used to service requests in parallel. -However, that has a negative impact on congestion control and network -efficiency, since TCP does not share congestion control across multiple -connections.¶
-HTTP/2 ([HTTP2]) introduced a binary framing and multiplexing layer -to improve latency without modifying the transport layer. However, because the -parallel nature of HTTP/2's multiplexing is not visible to TCP's loss recovery -mechanisms, a lost or reordered packet causes all active transactions to -experience a stall regardless of whether that transaction was directly impacted -by the lost packet.¶
-The QUIC transport protocol incorporates stream multiplexing and per-stream flow -control, similar to that provided by the HTTP/2 framing layer. By providing -reliability at the stream level and congestion control across the entire -connection, QUIC has the capability to improve the performance of HTTP compared -to a TCP mapping. QUIC also incorporates TLS 1.3 ([TLS13]) at the -transport layer, offering comparable security to running TLS over TCP, with the -improved connection setup latency of TCP Fast Open ([TFO]).¶
-This document defines a mapping of HTTP semantics over the QUIC transport -protocol, drawing heavily on the design of HTTP/2. While delegating stream -lifetime and flow control issues to QUIC, a similar binary framing is used on -each stream. Some HTTP/2 features are subsumed by QUIC, while other features are -implemented atop QUIC.¶
-QUIC is described in [QUIC-TRANSPORT]. For a full description of HTTP/2, see -[HTTP2].¶
-HTTP/3 provides a transport for HTTP semantics using the QUIC transport protocol -and an internal framing layer similar to HTTP/2.¶
-Once a client knows that an HTTP/3 server exists at a certain endpoint, it opens -a QUIC connection. QUIC provides protocol negotiation, stream-based -multiplexing, and flow control. Discovery of an HTTP/3 endpoint is described in -Section 3.1.¶
-Within each stream, the basic unit of HTTP/3 communication is a frame -(Section 7.2). Each frame type serves a different purpose. For example, HEADERS -and DATA frames form the basis of HTTP requests and responses -(Section 4.1).¶
-Multiplexing of requests is performed using the QUIC stream abstraction, -described in Section 2 of [QUIC-TRANSPORT]. Each request-response pair -consumes a single QUIC stream. Streams are independent of each other, so one -stream that is blocked or suffers packet loss does not prevent progress on other -streams.¶
-Server push is an interaction mode introduced in HTTP/2 ([HTTP2]) that -permits a server to push a request-response exchange to a client in anticipation -of the client making the indicated request. This trades off network usage -against a potential latency gain. Several HTTP/3 frames are used to manage -server push, such as PUSH_PROMISE, MAX_PUSH_ID, and CANCEL_PUSH.¶
-As in HTTP/2, request and response fields are compressed for transmission. -Because HPACK ([HPACK]) relies on in-order transmission of compressed -field sections (a guarantee not provided by QUIC), HTTP/3 replaces HPACK with -QPACK ([QPACK]). QPACK uses separate unidirectional streams to modify and track -field table state, while encoded field sections refer to the state of the table -without modifying it.¶
-The following sections provide a detailed overview of the lifecycle of an HTTP/3 -connection:¶
-The details of the wire protocol and interactions with the transport are -described in subsequent sections:¶
-Additional resources are provided in the final sections:¶
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL -NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in BCP 14 [RFC2119] [RFC8174] when, and only when, they -appear in all capitals, as shown here.¶
-This document uses the variable-length integer encoding from -[QUIC-TRANSPORT].¶
-The following terms are used:¶
-An abrupt termination of a connection or stream, possibly due to an error -condition.¶
-The endpoint that initiates an HTTP/3 connection. Clients send HTTP requests -and receive HTTP responses.¶
-A transport-layer connection between two endpoints, using QUIC as the -transport protocol.¶
-An error that affects the entire HTTP/3 connection.¶
-Either the client or server of the connection.¶
-The smallest unit of communication on a stream in HTTP/3, consisting of a -header and a variable-length sequence of bytes structured according to the -frame type.¶
-Protocol elements called "frames" exist in both this document and -[QUIC-TRANSPORT]. Where frames from [QUIC-TRANSPORT] are referenced, the -frame name will be prefaced with "QUIC." For example, "QUIC CONNECTION_CLOSE -frames." References without this preface refer to frames defined in -Section 7.2.¶
-A QUIC connection where the negotiated application protocol is HTTP/3.¶
-An endpoint. When discussing a particular endpoint, "peer" refers to the -endpoint that is remote to the primary subject of discussion.¶
-An endpoint that is receiving frames.¶
-An endpoint that is transmitting frames.¶
-The endpoint that accepts an HTTP/3 connection. Servers receive HTTP requests -and send HTTP responses.¶
-A bidirectional or unidirectional bytestream provided by the QUIC transport. -All streams within an HTTP/3 connection can be considered "HTTP/3 streams," -but multiple stream types are defined within HTTP/3.¶
-An application-level error on the individual stream.¶
-The term "payload body" is defined in Section 5.5.4 of [SEMANTICS].¶
-Finally, the terms "resource", "message", "user agent", "origin server", -"gateway", "intermediary", "proxy", and "tunnel" are defined in Section 3 of -[SEMANTICS].¶
-Packet diagrams in this document use the format defined in Section 1.3 of -[QUIC-TRANSPORT] to illustrate the order and size of fields.¶
-HTTP relies on the notion of an authoritative response: a response that has been -determined to be the most appropriate response for that request given the state -of the target resource at the time of response message origination by (or at the -direction of) the origin server identified within the target URI. Locating an -authoritative server for an HTTP URL is discussed in Section 4.3 of -[SEMANTICS].¶
-The "https" scheme associates authority with possession of a certificate that -the client considers to be trustworthy for the host identified by the authority -component of the URL. If a server presents a certificate and proof that it -controls the corresponding private key, then a client will accept a secured -TLS session with that server as being authoritative for all origins with the -"https" scheme and a host identified in the certificate.¶
-A client MAY attempt access to a resource with an "https" URI by resolving the -host identifier to an IP address, establishing a QUIC connection to that address -on the indicated port, and sending an HTTP/3 request message targeting the URI -to the server over that secured connection. Unless some other mechanism is used -to select HTTP/3, the token "h3" is used in the Application Layer Protocol -Negotiation (ALPN; see [RFC7301]) extension during the TLS handshake.¶
-Connectivity problems (e.g., blocking UDP) can result in QUIC connection -establishment failure; clients SHOULD attempt to use TCP-based versions of HTTP -in this case.¶
-Servers MAY serve HTTP/3 on any UDP port; an alternative service advertisement -always includes an explicit port, and URLs contain either an explicit port or a -default port associated with the scheme.¶
-An HTTP origin advertises the availability of an equivalent HTTP/3 endpoint via -the Alt-Svc HTTP response header field or the HTTP/2 ALTSVC frame ([ALTSVC]), -using the "h3" ALPN token.¶
-For example, an origin could indicate in an HTTP response that HTTP/3 was -available on UDP port 50781 at the same hostname by including the following -header field:¶
--Alt-Svc: h3=":50781" -¶ -
On receipt of an Alt-Svc record indicating HTTP/3 support, a client MAY attempt -to establish a QUIC connection to the indicated host and port; if this -connection is successful, the client can send HTTP requests using the mapping -described in this document.¶
-Although HTTP is independent of the transport protocol, the "http" scheme -associates authority with the ability to receive TCP connections on the -indicated port of whatever host is identified within the authority component. -Because HTTP/3 does not use TCP, HTTP/3 cannot be used for direct access to the -authoritative server for a resource identified by an "http" URI. However, -protocol extensions such as [ALTSVC] permit the authoritative server -to identify other services that are also authoritative and that might be -reachable over HTTP/3.¶
-Prior to making requests for an origin whose scheme is not "https", the client -MUST ensure the server is willing to serve that scheme. For origins whose scheme -is "http", an experimental method to accomplish this is described in -[RFC8164]. Other mechanisms might be defined for various schemes in the -future.¶
-HTTP/3 relies on QUIC version 1 as the underlying transport. The use of other -QUIC transport versions with HTTP/3 MAY be defined by future specifications.¶
-QUIC version 1 uses TLS version 1.3 or greater as its handshake protocol. -HTTP/3 clients MUST support a mechanism to indicate the target host to the -server during the TLS handshake. If the server is identified by a DNS name, -clients MUST send the Server Name Indication (SNI; [RFC6066]) TLS extension -unless an alternative mechanism to indicate the target host is used.¶
-QUIC connections are established as described in [QUIC-TRANSPORT]. During -connection establishment, HTTP/3 support is indicated by selecting the ALPN -token "h3" in the TLS handshake. Support for other application-layer protocols -MAY be offered in the same handshake.¶
-While connection-level options pertaining to the core QUIC protocol are set in -the initial crypto handshake, HTTP/3-specific settings are conveyed in the -SETTINGS frame. After the QUIC connection is established, a SETTINGS frame -(Section 7.2.4) MUST be sent by each endpoint as the initial frame of their -respective HTTP control stream; see Section 6.2.1.¶
-HTTP/3 connections are persistent across multiple requests. For best -performance, it is expected that clients will not close connections until it is -determined that no further communication with a server is necessary (for -example, when a user navigates away from a particular web page) or until the -server closes the connection.¶
-Once a connection exists to a server endpoint, this connection MAY be reused for -requests with multiple different URI authority components. In general, a server -is considered authoritative for all URIs with the "https" scheme for which the -hostname in the URI is present in the authenticated certificate provided by the -server, either as the CN field of the certificate subject or as a dNSName in the -subjectAltName field of the certificate; see [RFC6125]. For a host that is -an IP address, the client MUST verify that the address appears as an iPAddress -in the subjectAltName field of the certificate. If the hostname or address is -not present in the certificate, the client MUST NOT consider the server -authoritative for origins containing that hostname or address. See Section 4.3 -of [SEMANTICS] for more detail on authoritative access.¶
-Clients SHOULD NOT open more than one HTTP/3 connection to a given host and port -pair, where the host is derived from a URI, a selected alternative service -([ALTSVC]), or a configured proxy. A client MAY open multiple HTTP/3 -connections to the same IP address and UDP port using different transport or TLS -configurations but SHOULD avoid creating multiple connections with the same -configuration.¶
-Servers are encouraged to maintain open HTTP/3 connections for as long as -possible but are permitted to terminate idle connections if necessary. When -either endpoint chooses to close the HTTP/3 connection, the terminating endpoint -SHOULD first send a GOAWAY frame (Section 5.2) so that both -endpoints can reliably determine whether previously sent frames have been -processed and gracefully complete or terminate any necessary remaining tasks.¶
-A server that does not wish clients to reuse HTTP/3 connections for a particular -origin can indicate that it is not authoritative for a request by sending a 421 -(Misdirected Request) status code in response to the request; see Section 9.1.2 -of [HTTP2].¶
-A client sends an HTTP request on a request stream, which is a client-initiated -bidirectional QUIC stream; see Section 6.1. A client MUST send only a -single request on a given stream. A server sends zero or more interim HTTP -responses on the same stream as the request, followed by a single final HTTP -response, as detailed below. See Section 14 of [SEMANTICS] for a description -of interim and final HTTP responses.¶
-Pushed responses are sent on a server-initiated unidirectional QUIC stream; see -Section 6.2.2. A server sends zero or more interim HTTP responses, followed -by a single final HTTP response, in the same manner as a standard response. -Push is described in more detail in Section 4.4.¶
-On a given stream, receipt of multiple requests or receipt of an additional HTTP -response following a final HTTP response MUST be treated as malformed -(Section 4.1.3).¶
-An HTTP message (request or response) consists of:¶
-Header and trailer field sections are described in Sections 5.4 and 5.6 of -[SEMANTICS]; the payload body is described in Section 5.5.4 of -[SEMANTICS].¶
-Receipt of an invalid sequence of frames MUST be treated as a connection error -of type H3_FRAME_UNEXPECTED; see Section 8. In particular, a DATA frame before -any HEADERS frame, or a HEADERS or DATA frame after the trailing HEADERS frame -is considered invalid. Other frame types, especially unknown frame types, -might be permitted subject to their own rules; see Section 9.¶
-A server MAY send one or more PUSH_PROMISE frames (Section 7.2.5) -before, after, or interleaved with the frames of a response message. These -PUSH_PROMISE frames are not part of the response; see Section 4.4 for more -details. PUSH_PROMISE frames are not permitted on push streams; a pushed -response that includes PUSH_PROMISE frames MUST be treated as a connection error -of type H3_FRAME_UNEXPECTED; see Section 8.¶
-Frames of unknown types (Section 9), including reserved frames -(Section 7.2.8) MAY be sent on a request or push stream before, after, or -interleaved with other frames described in this section.¶
-The HEADERS and PUSH_PROMISE frames might reference updates to the QPACK dynamic -table. While these updates are not directly part of the message exchange, they -must be received and processed before the message can be consumed. See -Section 4.1.1 for more details.¶
-The "chunked" transfer encoding defined in Section 7.1 of [HTTP11] MUST NOT -be used.¶
-A response MAY consist of multiple messages when and only when one or more -interim responses (1xx; see Section 14.2 of [SEMANTICS]) precede a final -response to the same request. Interim responses do not contain a payload body -or trailers.¶
-An HTTP request/response exchange fully consumes a client-initiated -bidirectional QUIC stream. After sending a request, a client MUST close the -stream for sending. Unless using the CONNECT method (see Section 4.2), clients -MUST NOT make stream closure dependent on receiving a response to their request. -After sending a final response, the server MUST close the stream for sending. At -this point, the QUIC stream is fully closed.¶
-When a stream is closed, this indicates the end of the final HTTP message. -Because some messages are large or unbounded, endpoints SHOULD begin processing -partial HTTP messages once enough of the message has been received to make -progress. If a client-initiated stream terminates without enough of the HTTP -message to provide a complete response, the server SHOULD abort its response -with the error code H3_REQUEST_INCOMPLETE; see Section 8.¶
-A server can send a complete response prior to the client sending an entire -request if the response does not depend on any portion of the request that has -not been sent and received. When the server does not need to receive the -remainder of the request, it MAY abort reading the request stream, send a -complete response, and cleanly close the sending part of the stream. The error -code H3_NO_ERROR SHOULD be used when requesting that the client stop sending on -the request stream. Clients MUST NOT discard complete responses as a result of -having their request terminated abruptly, though clients can always discard -responses at their discretion for other reasons. If the server sends a partial -or complete response but does not abort reading the request, clients SHOULD -continue sending the body of the request and close the stream normally.¶
-HTTP messages carry metadata as a series of key-value pairs called HTTP fields; -see Sections 5.4 and 5.6 of [SEMANTICS]. For a listing of registered HTTP -fields, see the "Hypertext Transfer Protocol (HTTP) Field Name Registry" -maintained at https://www.iana.org/assignments/http-fields/.¶
-As in previous versions of HTTP, field names are strings containing a subset of -ASCII characters that are compared in a case-insensitive fashion. Properties of -HTTP field names and values are discussed in more detail in Section 5.4.3 of -[SEMANTICS]. As in HTTP/2, characters in field names MUST be converted to -lowercase prior to their encoding. A request or response containing uppercase -characters in field names MUST be treated as malformed (Section 4.1.3).¶
-Like HTTP/2, HTTP/3 does not use the Connection header field to indicate -connection-specific fields; in this protocol, connection-specific metadata is -conveyed by other means. An endpoint MUST NOT generate an HTTP/3 field section -containing connection-specific fields; any message containing -connection-specific fields MUST be treated as malformed (Section 4.1.3).¶
-The only exception to this is the TE header field, which MAY be present in an -HTTP/3 request header; when it is, it MUST NOT contain any value other than -"trailers".¶
-This means that an intermediary transforming an HTTP/1.x message to HTTP/3 will -need to remove any fields nominated by the Connection field, along with the -Connection field itself. Such intermediaries SHOULD also remove other -connection-specific fields, such as Keep-Alive, Proxy-Connection, -Transfer-Encoding, and Upgrade, even if they are not nominated by the Connection -field.¶
-Like HTTP/2, HTTP/3 employs a series of pseudo-header fields where the field -name begins with the ':' character (ASCII 0x3a). These pseudo-header fields -convey the target URI, the method of the request, and the status code for the -response.¶
-Pseudo-header fields are not HTTP fields. Endpoints MUST NOT generate -pseudo-header fields other than those defined in this document; however, an -extension could negotiate a modification of this restriction; see -Section 9.¶
-Pseudo-header fields are only valid in the context in which they are defined. -Pseudo-header fields defined for requests MUST NOT appear in responses; -pseudo-header fields defined for responses MUST NOT appear in requests. -Pseudo-header fields MUST NOT appear in trailers. Endpoints MUST treat a -request or response that contains undefined or invalid pseudo-header fields as -malformed (Section 4.1.3).¶
-All pseudo-header fields MUST appear in the header field section before regular -header fields. Any request or response that contains a pseudo-header field that -appears in a header field section after a regular header field MUST be treated -as malformed (Section 4.1.3).¶
-The following pseudo-header fields are defined for requests:¶
-Contains the scheme portion of the target URI (Section 3.1 of - [URI])¶
-":scheme" is not restricted to "http" and "https" schemed URIs. A proxy or -gateway can translate requests for non-HTTP schemes, enabling the use of -HTTP to interact with non-HTTP services.¶
-Contains the authority portion of the target URI (Section 3.2 of -[URI]). The authority MUST NOT include the deprecated "userinfo" -subcomponent for "http" or "https" schemed URIs.¶
-To ensure that the HTTP/1.1 request line can be reproduced accurately, this -pseudo-header field MUST be omitted when translating from an HTTP/1.1 -request that has a request target in origin or asterisk form; see Section -3.2 of [HTTP11]. Clients that generate HTTP/3 requests directly SHOULD -use the ":authority" pseudo-header field instead of the Host field. An -intermediary that converts an HTTP/3 request to HTTP/1.1 MUST create a Host -field if one is not present in a request by copying the value of the -":authority" pseudo-header field.¶
-Contains the path and query parts of the target URI (the "path-absolute" -production and optionally a '?' character followed by the "query" -production; see Sections 3.3 and 3.4 of [URI]. A request in -asterisk form includes the value '*' for the ":path" pseudo-header field.¶
-This pseudo-header field MUST NOT be empty for "http" or "https" URIs; -"http" or "https" URIs that do not contain a path component MUST include a -value of '/'. The exception to this rule is an OPTIONS request for an -"http" or "https" URI that does not include a path component; these MUST -include a ":path" pseudo-header field with a value of '*'; see Section 3.2.4 -of [HTTP11].¶
-All HTTP/3 requests MUST include exactly one value for the ":method", ":scheme", -and ":path" pseudo-header fields, unless it is a CONNECT request; see -Section 4.2.¶
-If the ":scheme" pseudo-header field identifies a scheme that has a mandatory -authority component (including "http" and "https"), the request MUST contain -either an ":authority" pseudo-header field or a "Host" header field. If these -fields are present, they MUST NOT be empty. If both fields are present, they -MUST contain the same value. If the scheme does not have a mandatory authority -component and none is provided in the request target, the request MUST NOT -contain the ":authority" pseudo-header or "Host" header fields.¶
-An HTTP request that omits mandatory pseudo-header fields or contains invalid -values for those pseudo-header fields is malformed (Section 4.1.3).¶
-HTTP/3 does not define a way to carry the version identifier that is included in -the HTTP/1.1 request line.¶
-For responses, a single ":status" pseudo-header field is defined that carries -the HTTP status code; see Section 14 of [SEMANTICS]. This pseudo-header -field MUST be included in all responses; otherwise, the response is malformed -(Section 4.1.3).¶
-HTTP/3 does not define a way to carry the version or reason phrase that is -included in an HTTP/1.1 status line.¶
-HTTP/3 uses QPACK field compression as described in [QPACK], a variation of -HPACK that allows the flexibility to avoid compression-induced head-of-line -blocking. See that document for additional details.¶
-To allow for better compression efficiency, the "Cookie" field ([RFC6265]) -MAY be split into separate field lines, each with one or more cookie-pairs, -before compression. If a decompressed field section contains multiple cookie -field lines, these MUST be concatenated into a single octet string using the -two-octet delimiter of 0x3b, 0x20 (the ASCII string "; ") before being passed -into a context other than HTTP/2 or HTTP/3, such as an HTTP/1.1 connection, or a -generic HTTP server application.¶
-An HTTP/3 implementation MAY impose a limit on the maximum size of the message -header it will accept on an individual HTTP message. A server that receives a -larger header section than it is willing to handle can send an HTTP 431 (Request -Header Fields Too Large) status code ([RFC6585]). A client can discard -responses that it cannot process. The size of a field list is calculated based -on the uncompressed size of fields, including the length of the name and value -in bytes plus an overhead of 32 bytes for each field.¶
-If an implementation wishes to advise its peer of this limit, it can be conveyed -as a number of bytes in the SETTINGS_MAX_FIELD_SECTION_SIZE parameter. An -implementation that has received this parameter SHOULD NOT send an HTTP message -header that exceeds the indicated size, as the peer will likely refuse to -process it. However, an HTTP message can traverse one or more intermediaries -before reaching the origin server; see Section 3.7 of [SEMANTICS]. Because -this limit is applied separately by each implementation which processes the -message, messages below this limit are not guaranteed to be accepted.¶
-Once a request stream has been opened, the request MAY be cancelled by either -endpoint. Clients cancel requests if the response is no longer of interest; -servers cancel requests if they are unable to or choose not to respond. When -possible, it is RECOMMENDED that servers send an HTTP response with an -appropriate status code rather than canceling a request it has already begun -processing.¶
-Implementations SHOULD cancel requests by abruptly terminating any -directions of a stream that are still open. This means resetting the -sending parts of streams and aborting reading on receiving parts of streams; -see Section 2.4 of [QUIC-TRANSPORT].¶
-When the server cancels a request without performing any application processing, -the request is considered "rejected." The server SHOULD abort its response -stream with the error code H3_REQUEST_REJECTED. In this context, "processed" -means that some data from the stream was passed to some higher layer of software -that might have taken some action as a result. The client can treat requests -rejected by the server as though they had never been sent at all, thereby -allowing them to be retried later.¶
-Servers MUST NOT use the H3_REQUEST_REJECTED error code for requests that were -partially or fully processed. When a server abandons a response after partial -processing, it SHOULD abort its response stream with the error code -H3_REQUEST_CANCELLED.¶
-Client SHOULD use the error code H3_REQUEST_CANCELLED to cancel requests. Upon -receipt of this error code, a server MAY abruptly terminate the response using -the error code H3_REQUEST_REJECTED if no processing was performed. Clients MUST -NOT use the H3_REQUEST_REJECTED error code, except when a server has requested -closure of the request stream with this error code.¶
-If a stream is canceled after receiving a complete response, the client MAY -ignore the cancellation and use the response. However, if a stream is cancelled -after receiving a partial response, the response SHOULD NOT be used. Only -idempotent actions such as GET, PUT, or DELETE can be safely retried; a client -SHOULD NOT automatically retry a request with a non-idempotent method unless it -has some means to know that the request semantics are idempotent -independent of the method or some means to detect that the original request was -never applied. See Section 8.2.2 of [SEMANTICS] for more details.¶
-A malformed request or response is one that is an otherwise valid sequence of -frames but is invalid due to:¶
-A request or response that includes a payload body can include a -Content-Length header field. A request or response is also malformed if the -value of a content-length header field does not equal the sum of the DATA frame -payload lengths that form the body. A response that is defined to have no -payload, as described in Section 5.5.4 of [SEMANTICS], can have a non-zero -content-length field, even though no content is included in DATA frames.¶
-Intermediaries that process HTTP requests or responses (i.e., any intermediary -not acting as a tunnel) MUST NOT forward a malformed request or response. -Malformed requests or responses that are detected MUST be treated as a stream -error (Section 8) of type H3_GENERAL_PROTOCOL_ERROR.¶
-For malformed requests, a server MAY send an HTTP response indicating the error -prior to closing or resetting the stream. Clients MUST NOT accept a malformed -response. Note that these requirements are intended to protect against several -types of common attacks against HTTP; they are deliberately strict because being -permissive can expose implementations to these vulnerabilities.¶
-The CONNECT method requests that the recipient establish a tunnel to the -destination origin server identified by the request-target; see Section 8.3.6 of -[SEMANTICS]. It is primarily used with HTTP proxies to establish a TLS -session with an origin server for the purposes of interacting with "https" -resources.¶
-In HTTP/1.x, CONNECT is used to convert an entire HTTP connection into a tunnel -to a remote host. In HTTP/2 and HTTP/3, the CONNECT method is used to establish -a tunnel over a single stream.¶
-A CONNECT request MUST be constructed as follows:¶
-The request stream remains open at the end of the request to carry the data to -be transferred. A CONNECT request that does not conform to these restrictions -is malformed; see Section 4.1.3.¶
-A proxy that supports CONNECT establishes a TCP connection ([RFC0793]) to the -server identified in the ":authority" pseudo-header field. Once this connection -is successfully established, the proxy sends a HEADERS frame containing a 2xx -series status code to the client, as defined in Section 14.3 of [SEMANTICS].¶
-All DATA frames on the stream correspond to data sent or received on the TCP -connection. The payload of any DATA frame sent by the client is transmitted by -the proxy to the TCP server; data received from the TCP server is packaged into -DATA frames by the proxy. Note that the size and number of TCP segments is not -guaranteed to map predictably to the size and number of HTTP DATA or QUIC STREAM -frames.¶
-Once the CONNECT method has completed, only DATA frames are permitted to be sent -on the stream. Extension frames MAY be used if specifically permitted by the -definition of the extension. Receipt of any other known frame type MUST be -treated as a connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
-The TCP connection can be closed by either peer. When the client ends the -request stream (that is, the receive stream at the proxy enters the "Data Recvd" -state), the proxy will set the FIN bit on its connection to the TCP server. When -the proxy receives a packet with the FIN bit set, it will close the send stream -that it sends to the client. TCP connections that remain half-closed in a -single direction are not invalid, but are often handled poorly by servers, so -clients SHOULD NOT close a stream for sending while they still expect to receive -data from the target of the CONNECT.¶
-A TCP connection error is signaled by abruptly terminating the stream. A proxy -treats any error in the TCP connection, which includes receiving a TCP segment -with the RST bit set, as a stream error of type H3_CONNECT_ERROR; see -Section 8. Correspondingly, if a proxy detects an error with the stream or the -QUIC connection, it MUST close the TCP connection. If the underlying TCP -implementation permits it, the proxy SHOULD send a TCP segment with the RST bit -set.¶
-HTTP/3 does not support the HTTP Upgrade mechanism (Section 6.6 of -[SEMANTICS]) or 101 (Switching Protocols) informational status code (Section -14.2.2 of [SEMANTICS]).¶
-Server push is an interaction mode that permits a server to push a -request-response exchange to a client in anticipation of the client making the -indicated request. This trades off network usage against a potential latency -gain. HTTP/3 server push is similar to what is described in Section 8.2 of -[HTTP2], but uses different mechanisms.¶
-Each server push is assigned a unique Push ID by the server. The Push ID is -used to refer to the push in various contexts throughout the lifetime of the -HTTP/3 connection.¶
-The Push ID space begins at zero, and ends at a maximum value set by the -MAX_PUSH_ID frame; see Section 7.2.7. In particular, a server is not -able to push until after the client sends a MAX_PUSH_ID frame. A client sends -MAX_PUSH_ID frames to control the number of pushes that a server can promise. A -server SHOULD use Push IDs sequentially, beginning from zero. A client MUST -treat receipt of a push stream as a connection error of type H3_ID_ERROR -(Section 8) when no MAX_PUSH_ID frame has been sent or when the stream -references a Push ID that is greater than the maximum Push ID.¶
-The Push ID is used in one or more PUSH_PROMISE frames (Section 7.2.5) -that carry the header section of the request message. These frames are sent on -the request stream that generated the push. This allows the server push to be -associated with a client request. When the same Push ID is promised on multiple -request streams, the decompressed request field sections MUST contain the same -fields in the same order, and both the name and the value in each field MUST be -identical.¶
-The Push ID is then included with the push stream that ultimately fulfills -those promises; see Section 6.2.2. The push stream identifies the Push ID of -the promise that it fulfills, then contains a response to the promised request -as described in Section 4.1.¶
-Finally, the Push ID can be used in CANCEL_PUSH frames; see -Section 7.2.3. Clients use this frame to indicate they do not wish to -receive a promised resource. Servers use this frame to indicate they will not -be fulfilling a previous promise.¶
-Not all requests can be pushed. A server MAY push requests that have the -following properties:¶
-The server MUST include a value in the ":authority" pseudo-header field for -which the server is authoritative; see Section 3.3.¶
-Clients SHOULD send a CANCEL_PUSH frame upon receipt of a PUSH_PROMISE frame -carrying a request that is not cacheable, is not known to be safe, that -indicates the presence of a request body, or for which it does not consider the -server authoritative. Any corresponding responses MUST NOT be used or cached.¶
-Each pushed response is associated with one or more client requests. The push -is associated with the request stream on which the PUSH_PROMISE frame was -received. The same server push can be associated with additional client -requests using a PUSH_PROMISE frame with the same Push ID on multiple request -streams. These associations do not affect the operation of the protocol, but -MAY be considered by user agents when deciding how to use pushed resources.¶
-Ordering of a PUSH_PROMISE frame in relation to certain parts of the response is -important. The server SHOULD send PUSH_PROMISE frames prior to sending HEADERS -or DATA frames that reference the promised responses. This reduces the chance -that a client requests a resource that will be pushed by the server.¶
-Due to reordering, push stream data can arrive before the corresponding -PUSH_PROMISE frame. When a client receives a new push stream with an -as-yet-unknown Push ID, both the associated client request and the pushed -request header fields are unknown. The client can buffer the stream data in -expectation of the matching PUSH_PROMISE. The client can use stream flow control -(see section 4.1 of [QUIC-TRANSPORT]) to limit the amount of data a server may -commit to the pushed stream.¶
-Push stream data can also arrive after a client has canceled a push. In this -case, the client can abort reading the stream with an error code of -H3_REQUEST_CANCELLED. This asks the server not to transfer additional data and -indicates that it will be discarded upon receipt.¶
-Pushed responses that are cacheable (see Section 3 of -[CACHING]) can be stored by the client, if it -implements an HTTP cache. Pushed responses are considered successfully -validated on the origin server (e.g., if the "no-cache" cache response directive -is present; see Section 5.2.2.3 of [CACHING]) at the time the pushed response -is received.¶
-Pushed responses that are not cacheable MUST NOT be stored by any HTTP cache. -They MAY be made available to the application separately.¶
-Once established, an HTTP/3 connection can be used for many requests and -responses over time until the connection is closed. Connection closure can -happen in any of several different ways.¶
-Each QUIC endpoint declares an idle timeout during the handshake. If the QUIC -connection remains idle (no packets received) for longer than this duration, the -peer will assume that the connection has been closed. HTTP/3 implementations -will need to open a new HTTP/3 connection for new requests if the existing -connection has been idle for longer than the idle timeout negotiated during the -QUIC handshake, and SHOULD do so if approaching the idle timeout; see Section -10.1 of [QUIC-TRANSPORT].¶
-HTTP clients are expected to request that the transport keep connections open -while there are responses outstanding for requests or server pushes, as -described in Section 10.1.2 of [QUIC-TRANSPORT]. If the client is not -expecting a response from the server, allowing an idle connection to time out is -preferred over expending effort maintaining a connection that might not be -needed. A gateway MAY maintain connections in anticipation of need rather than -incur the latency cost of connection establishment to servers. Servers SHOULD -NOT actively keep connections open.¶
-Even when a connection is not idle, either endpoint can decide to stop using the -connection and initiate a graceful connection close. Endpoints initiate the -graceful shutdown of an HTTP/3 connection by sending a GOAWAY frame -(Section 7.2.6). The GOAWAY frame contains an identifier that indicates to -the receiver the range of requests or pushes that were or might be processed in -this connection. The server sends a client-initiated bidirectional Stream ID; -the client sends a Push ID (Section 4.4). Requests or pushes with the -indicated identifier or greater are rejected (Section 4.1.2) by the -sender of the GOAWAY. This identifier MAY be zero if no requests or pushes were -processed.¶
-The information in the GOAWAY frame enables a client and server to agree on -which requests or pushes were accepted prior to the shutdown of the HTTP/3 -connection. Upon sending a GOAWAY frame, the endpoint SHOULD explicitly cancel -(see Section 4.1.2 and Section 7.2.3) any requests or pushes -that have identifiers greater than or equal to that indicated, in order to clean -up transport state for the affected streams. The endpoint SHOULD continue to do -so as more requests or pushes arrive.¶
-Endpoints MUST NOT initiate new requests or promise new pushes on the connection -after receipt of a GOAWAY frame from the peer. Clients MAY establish a new -connection to send additional requests.¶
-Some requests or pushes might already be in transit:¶
-Upon receipt of a GOAWAY frame, if the client has already sent requests with -a Stream ID greater than or equal to the identifier contained in the GOAWAY -frame, those requests will not be processed. Clients can safely retry -unprocessed requests on a different HTTP connection. A client that is -unable to retry requests loses all requests that are in flight when the -server closes the connection.¶
--Requests on Stream IDs less than the Stream ID in a GOAWAY frame from the -server might have been processed; their status cannot be known until a -response is received, the stream is reset individually, another GOAWAY is -received, or the connection terminates.¶
--Servers MAY reject individual requests on streams below the indicated ID if -these requests were not processed.¶
-Servers SHOULD send a GOAWAY frame when the closing of a connection is known -in advance, even if the advance notice is small, so that the remote peer can -know whether a request has been partially processed or not. For example, if an -HTTP client sends a POST at the same time that a server closes a QUIC -connection, the client cannot know if the server started to process that POST -request if the server does not send a GOAWAY frame to indicate what streams it -might have acted on.¶
-An endpoint MAY send multiple GOAWAY frames indicating different identifiers, -but the identifier in each frame MUST NOT be greater than the identifier in any -previous frame, since clients might already have retried unprocessed requests on -another HTTP connection. Receiving a GOAWAY containing a larger identifier than -previously received MUST be treated as a connection error of type H3_ID_ERROR; -see Section 8.¶
-An endpoint that is attempting to gracefully shut down a connection can send a -GOAWAY frame with a value set to the maximum possible value (2^62-4 for servers, -2^62-1 for clients). This ensures that the peer stops creating new requests or -pushes. After allowing time for any in-flight requests or pushes to arrive, the -endpoint can send another GOAWAY frame indicating which requests or pushes it -might accept before the end of the connection. This ensures that a connection -can be cleanly shut down without losing requests.¶
-A client has more flexibility in the value it chooses for the Push ID in a -GOAWAY that it sends. A value of 2^62 - 1 indicates that the server can -continue fulfilling pushes that have already been promised. A smaller value -indicates the client will reject pushes with Push IDs greater than or equal to -this value. Like the server, the client MAY send subsequent GOAWAY frames so -long as the specified Push ID is no greater than any previously sent value.¶
-Even when a GOAWAY indicates that a given request or push will not be processed -or accepted upon receipt, the underlying transport resources still exist. The -endpoint that initiated these requests can cancel them to clean up transport -state.¶
-Once all accepted requests and pushes have been processed, the endpoint can -permit the connection to become idle, or MAY initiate an immediate closure of -the connection. An endpoint that completes a graceful shutdown SHOULD use the -H3_NO_ERROR error code when closing the connection.¶
-If a client has consumed all available bidirectional stream IDs with requests, -the server need not send a GOAWAY frame, since the client is unable to make -further requests.¶
-An HTTP/3 implementation can immediately close the QUIC connection at any time. -This results in sending a QUIC CONNECTION_CLOSE frame to the peer indicating -that the application layer has terminated the connection. The application error -code in this frame indicates to the peer why the connection is being closed. -See Section 8 for error codes that can be used when closing a connection in -HTTP/3.¶
-Before closing the connection, a GOAWAY frame MAY be sent to allow the client to -retry some requests. Including the GOAWAY frame in the same packet as the QUIC -CONNECTION_CLOSE frame improves the chances of the frame being received by -clients.¶
-For various reasons, the QUIC transport could indicate to the application layer -that the connection has terminated. This might be due to an explicit closure -by the peer, a transport-level error, or a change in network topology that -interrupts connectivity.¶
-If a connection terminates without a GOAWAY frame, clients MUST assume that any -request that was sent, whether in whole or in part, might have been processed.¶
-A QUIC stream provides reliable in-order delivery of bytes, but makes no -guarantees about order of delivery with regard to bytes on other streams. On the -wire, data is framed into QUIC STREAM frames, but this framing is invisible to -the HTTP framing layer. The transport layer buffers and orders received QUIC -STREAM frames, exposing the data contained within as a reliable byte stream to -the application. Although QUIC permits out-of-order delivery within a stream, -HTTP/3 does not make use of this feature.¶
-QUIC streams can be either unidirectional, carrying data only from initiator to -receiver, or bidirectional. Streams can be initiated by either the client or -the server. For more detail on QUIC streams, see Section 2 of -[QUIC-TRANSPORT].¶
-When HTTP fields and data are sent over QUIC, the QUIC layer handles most of -the stream management. HTTP does not need to do any separate multiplexing when -using QUIC - data sent over a QUIC stream always maps to a particular HTTP -transaction or to the entire HTTP/3 connection context.¶
-All client-initiated bidirectional streams are used for HTTP requests and -responses. A bidirectional stream ensures that the response can be readily -correlated with the request. These streams are referred to as request streams.¶
-This means that the client's first request occurs on QUIC stream 0, with -subsequent requests on stream 4, 8, and so on. In order to permit these streams -to open, an HTTP/3 server SHOULD configure non-zero minimum values for the -number of permitted streams and the initial stream flow control window. So as -to not unnecessarily limit parallelism, at least 100 requests SHOULD be -permitted at a time.¶
-HTTP/3 does not use server-initiated bidirectional streams, though an extension -could define a use for these streams. Clients MUST treat receipt of a -server-initiated bidirectional stream as a connection error of type -H3_STREAM_CREATION_ERROR (Section 8) unless such an extension has been -negotiated.¶
-Unidirectional streams, in either direction, are used for a range of purposes. -The purpose is indicated by a stream type, which is sent as a variable-length -integer at the start of the stream. The format and structure of data that -follows this integer is determined by the stream type.¶
-Two stream types are defined in this document: control streams -(Section 6.2.1) and push streams (Section 6.2.2). [QPACK] defines two -additional stream types. Other stream types can be defined by extensions to -HTTP/3; see Section 9 for more details. Some stream types are reserved -(Section 6.2.3).¶
-The performance of HTTP/3 connections in the early phase of their lifetime is -sensitive to the creation and exchange of data on unidirectional streams. -Endpoints that excessively restrict the number of streams or the flow control -window of these streams will increase the chance that the remote peer reaches -the limit early and becomes blocked. In particular, implementations should -consider that remote peers may wish to exercise reserved stream behavior -(Section 6.2.3) with some of the unidirectional streams they are permitted -to use. To avoid blocking, the transport parameters sent by both clients and -servers MUST allow the peer to create at least one unidirectional stream for the -HTTP control stream plus the number of unidirectional streams required by -mandatory extensions (three being the minimum number required for the base -HTTP/3 protocol and QPACK), and SHOULD provide at least 1,024 bytes of flow -control credit to each stream.¶
-Note that an endpoint is not required to grant additional credits to create more -unidirectional streams if its peer consumes all the initial credits before -creating the critical unidirectional streams. Endpoints SHOULD create the HTTP -control stream as well as the unidirectional streams required by mandatory -extensions (such as the QPACK encoder and decoder streams) first, and then -create additional streams as allowed by their peer.¶
-If the stream header indicates a stream type that is not supported by the -recipient, the remainder of the stream cannot be consumed as the semantics are -unknown. Recipients of unknown stream types MAY abort reading of the stream with -an error code of H3_STREAM_CREATION_ERROR or a reserved error code -(Section 8.1), but MUST NOT consider such streams to be a connection -error of any kind.¶
-Implementations MAY send stream types before knowing whether the peer supports -them. However, stream types that could modify the state or semantics of -existing protocol components, including QPACK or other extensions, MUST NOT be -sent until the peer is known to support them.¶
-A sender can close or reset a unidirectional stream unless otherwise specified. -A receiver MUST tolerate unidirectional streams being closed or reset prior to -the reception of the unidirectional stream header.¶
-A control stream is indicated by a stream type of 0x00. Data on this stream -consists of HTTP/3 frames, as defined in Section 7.2.¶
-Each side MUST initiate a single control stream at the beginning of the -connection and send its SETTINGS frame as the first frame on this stream. If -the first frame of the control stream is any other frame type, this MUST be -treated as a connection error of type H3_MISSING_SETTINGS. Only one control -stream per peer is permitted; receipt of a second stream claiming to be a -control stream MUST be treated as a connection error of type -H3_STREAM_CREATION_ERROR. The sender MUST NOT close the control stream, and the -receiver MUST NOT request that the sender close the control stream. If either -control stream is closed at any point, this MUST be treated as a connection -error of type H3_CLOSED_CRITICAL_STREAM. Connection errors are described in -Section 8.¶
-A pair of unidirectional streams is used rather than a single bidirectional -stream. This allows either peer to send data as soon as it is able. Depending -on whether 0-RTT is enabled on the QUIC connection, either client or server -might be able to send stream data first after the cryptographic handshake -completes.¶
-Server push is an optional feature introduced in HTTP/2 that allows a server to -initiate a response before a request has been made. See Section 4.4 for -more details.¶
-A push stream is indicated by a stream type of 0x01, followed by the Push ID -of the promise that it fulfills, encoded as a variable-length integer. The -remaining data on this stream consists of HTTP/3 frames, as defined in -Section 7.2, and fulfills a promised server push by zero or more interim HTTP -responses followed by a single final HTTP response, as defined in -Section 4.1. Server push and Push IDs are described in -Section 4.4.¶
-Only servers can push; if a server receives a client-initiated push stream, this -MUST be treated as a connection error of type H3_STREAM_CREATION_ERROR; see -Section 8.¶
-Each Push ID MUST only be used once in a push stream header. If a push stream -header includes a Push ID that was used in another push stream header, the -client MUST treat this as a connection error of type H3_ID_ERROR; see -Section 8.¶
-Stream types of the format 0x1f * N + 0x21
for non-negative integer values of
-N are reserved to exercise the requirement that unknown types be ignored. These
-streams have no semantics, and can be sent when application-layer padding is
-desired. They MAY also be sent on connections where no data is currently being
-transferred. Endpoints MUST NOT consider these streams to have any meaning upon
-receipt.¶
The payload and length of the stream are selected in any manner the sending -implementation chooses. When sending a reserved stream type, the implementation -MAY either terminate the stream cleanly or reset it. When resetting the stream, -either the H3_NO_ERROR error code or a reserved error code -(Section 8.1) SHOULD be used.¶
-HTTP frames are carried on QUIC streams, as described in Section 6. -HTTP/3 defines three stream types: control stream, request stream, and push -stream. This section describes HTTP/3 frame formats and their permitted stream -types; see Table 1 for an overview. A comparison between -HTTP/2 and HTTP/3 frames is provided in Appendix A.2.¶
-Frame | -Control Stream | -Request Stream | -Push Stream | -Section | -
---|---|---|---|---|
DATA | -No | -Yes | -Yes | -- Section 7.2.1 - | -
HEADERS | -No | -Yes | -Yes | -- Section 7.2.2 - | -
CANCEL_PUSH | -Yes | -No | -No | -- Section 7.2.3 - | -
SETTINGS | -Yes (1) | -No | -No | -- Section 7.2.4 - | -
PUSH_PROMISE | -No | -Yes | -No | -- Section 7.2.5 - | -
GOAWAY | -Yes | -No | -No | -- Section 7.2.6 - | -
MAX_PUSH_ID | -Yes | -No | -No | -- Section 7.2.7 - | -
Reserved | -Yes | -Yes | -Yes | -- Section 7.2.8 - | -
Certain frames can only occur as the first frame of a particular stream type; -these are indicated in Table 1 with a (1). Specific guidance -is provided in the relevant section.¶
-Note that, unlike QUIC frames, HTTP/3 frames can span multiple packets.¶
-All frames have the following format:¶
-A frame includes the following fields:¶
-A variable-length integer that identifies the frame type.¶
-A variable-length integer that describes the length in bytes of -the Frame Payload.¶
-A payload, the semantics of which are determined by the Type field.¶
-Each frame's payload MUST contain exactly the fields identified in its -description. A frame payload that contains additional bytes after the -identified fields or a frame payload that terminates before the end of the -identified fields MUST be treated as a connection error of type -H3_FRAME_ERROR; see Section 8.¶
-When a stream terminates cleanly, if the last frame on the stream was truncated, -this MUST be treated as a connection error of type H3_FRAME_ERROR; see -Section 8. Streams that terminate abruptly may be reset at any point in a -frame.¶
-DATA frames (type=0x0) convey arbitrary, variable-length sequences of bytes -associated with an HTTP request or response payload body.¶
-DATA frames MUST be associated with an HTTP request or response. If a DATA -frame is received on a control stream, the recipient MUST respond with a -connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
-The HEADERS frame (type=0x1) is used to carry an HTTP field section, encoded -using QPACK. See [QPACK] for more details.¶
-HEADERS frames can only be sent on request or push streams. If a HEADERS frame -is received on a control stream, the recipient MUST respond with a connection -error (Section 8) of type H3_FRAME_UNEXPECTED.¶
-The CANCEL_PUSH frame (type=0x3) is used to request cancellation of a server -push prior to the push stream being received. The CANCEL_PUSH frame identifies -a server push by Push ID (see Section 4.4), encoded as a variable-length -integer.¶
-When a client sends CANCEL_PUSH, it is indicating that it does not wish to -receive the promised resource. The server SHOULD abort sending the resource, -but the mechanism to do so depends on the state of the corresponding push -stream. If the server has not yet created a push stream, it does not create -one. If the push stream is open, the server SHOULD abruptly terminate that -stream. If the push stream has already ended, the server MAY still abruptly -terminate the stream or MAY take no action.¶
-A server sends CANCEL_PUSH to indicate that it will not be fulfilling a promise -which was previously sent. The client cannot expect the corresponding promise -to be fulfilled, unless it has already received and processed the promised -response. Regardless of whether a push stream has been opened, a server -SHOULD send a CANCEL_PUSH frame when it determines that promise will not be -fulfilled. If a stream has already been opened, the server can -abort sending on the stream with an error code of H3_REQUEST_CANCELLED.¶
-Sending a CANCEL_PUSH frame has no direct effect on the state of existing push -streams. A client SHOULD NOT send a CANCEL_PUSH frame when it has already -received a corresponding push stream. A push stream could arrive after a client -has sent a CANCEL_PUSH frame, because a server might not have processed the -CANCEL_PUSH. The client SHOULD abort reading the stream with an error code of -H3_REQUEST_CANCELLED.¶
-A CANCEL_PUSH frame is sent on the control stream. Receiving a CANCEL_PUSH -frame on a stream other than the control stream MUST be treated as a connection -error of type H3_FRAME_UNEXPECTED.¶
-The CANCEL_PUSH frame carries a Push ID encoded as a variable-length integer. -The Push ID identifies the server push that is being cancelled; see -Section 4.4. If a CANCEL_PUSH frame is received that references a Push ID -greater than currently allowed on the connection, this MUST be treated as a -connection error of type H3_ID_ERROR.¶
-If the client receives a CANCEL_PUSH frame, that frame might identify a Push ID -that has not yet been mentioned by a PUSH_PROMISE frame due to reordering. If a -server receives a CANCEL_PUSH frame for a Push ID that has not yet been -mentioned by a PUSH_PROMISE frame, this MUST be treated as a connection error of -type H3_ID_ERROR.¶
-The SETTINGS frame (type=0x4) conveys configuration parameters that affect how -endpoints communicate, such as preferences and constraints on peer behavior. -Individually, a SETTINGS parameter can also be referred to as a "setting"; the -identifier and value of each setting parameter can be referred to as a "setting -identifier" and a "setting value".¶
-SETTINGS frames always apply to an entire HTTP/3 connection, never a single -stream. A SETTINGS frame MUST be sent as the first frame of each control stream -(see Section 6.2.1) by each peer, and MUST NOT be sent subsequently. If an -endpoint receives a second SETTINGS frame on the control stream, the endpoint -MUST respond with a connection error of type H3_FRAME_UNEXPECTED.¶
-SETTINGS frames MUST NOT be sent on any stream other than the control stream. -If an endpoint receives a SETTINGS frame on a different stream, the endpoint -MUST respond with a connection error of type H3_FRAME_UNEXPECTED.¶
-SETTINGS parameters are not negotiated; they describe characteristics of the -sending peer that can be used by the receiving peer. However, a negotiation -can be implied by the use of SETTINGS - each peer uses SETTINGS to advertise a -set of supported values. The definition of the setting would describe how each -peer combines the two sets to conclude which choice will be used. SETTINGS does -not provide a mechanism to identify when the choice takes effect.¶
-Different values for the same parameter can be advertised by each peer. For -example, a client might be willing to consume a very large response field -section, while servers are more cautious about request size.¶
-The same setting identifier MUST NOT occur more than once in the SETTINGS frame. -A receiver MAY treat the presence of duplicate setting identifiers as a -connection error of type H3_SETTINGS_ERROR.¶
-The payload of a SETTINGS frame consists of zero or more parameters. Each -parameter consists of a setting identifier and a value, both encoded as QUIC -variable-length integers.¶
-An implementation MUST ignore the contents for any SETTINGS identifier it does -not understand.¶
-The following settings are defined in HTTP/3:¶
-The default value is unlimited. See Section 4.1.1.3 for usage.¶
-Setting identifiers of the format 0x1f * N + 0x21
for non-negative integer
-values of N are reserved to exercise the requirement that unknown identifiers be
-ignored. Such settings have no defined meaning. Endpoints SHOULD include at
-least one such setting in their SETTINGS frame. Endpoints MUST NOT consider such
-settings to have any meaning upon receipt.¶
Because the setting has no defined meaning, the value of the setting can be any -value the implementation selects.¶
-Setting identifiers which were used in HTTP/2 where there is no corresponding -HTTP/3 setting have also been reserved (Section 11.2.2). These settings MUST -NOT be sent, and their receipt MUST be treated as a connection error of type -H3_SETTINGS_ERROR.¶
-Additional settings can be defined by extensions to HTTP/3; see Section 9 -for more details.¶
-An HTTP implementation MUST NOT send frames or requests that would be invalid -based on its current understanding of the peer's settings.¶
-All settings begin at an initial value. Each endpoint SHOULD use these initial -values to send messages before the peer's SETTINGS frame has arrived, as packets -carrying the settings can be lost or delayed. When the SETTINGS frame arrives, -any settings are changed to their new values.¶
-This removes the need to wait for the SETTINGS frame before sending messages. -Endpoints MUST NOT require any data to be received from the peer prior to -sending the SETTINGS frame; settings MUST be sent as soon as the transport is -ready to send data.¶
-For servers, the initial value of each client setting is the default value.¶
-For clients using a 1-RTT QUIC connection, the initial value of each server -setting is the default value. 1-RTT keys will always become available prior to -the packet containing SETTINGS being processed by QUIC, even if the server sends -SETTINGS immediately. Clients SHOULD NOT wait indefinitely for SETTINGS to -arrive before sending requests, but SHOULD process received datagrams in order -to increase the likelihood of processing SETTINGS before sending the first -request.¶
-When a 0-RTT QUIC connection is being used, the initial value of each server -setting is the value used in the previous session. Clients SHOULD store the -settings the server provided in the HTTP/3 connection where resumption -information was provided, but MAY opt not to store settings in certain cases -(e.g., if the session ticket is received before the SETTINGS frame). A client -MUST comply with stored settings -- or default values, if no values are stored --- when attempting 0-RTT. Once a server has provided new settings, clients MUST -comply with those values.¶
-A server can remember the settings that it advertised, or store an -integrity-protected copy of the values in the ticket and recover the information -when accepting 0-RTT data. A server uses the HTTP/3 settings values in -determining whether to accept 0-RTT data. If the server cannot determine that -the settings remembered by a client are compatible with its current settings, it -MUST NOT accept 0-RTT data. Remembered settings are compatible if a client -complying with those settings would not violate the server's current settings.¶
-A server MAY accept 0-RTT and subsequently provide different settings in its -SETTINGS frame. If 0-RTT data is accepted by the server, its SETTINGS frame MUST -NOT reduce any limits or alter any values that might be violated by the client -with its 0-RTT data. The server MUST include all settings that differ from -their default values. If a server accepts 0-RTT but then sends settings that -are not compatible with the previously specified settings, this MUST be treated -as a connection error of type H3_SETTINGS_ERROR. If a server accepts 0-RTT but -then sends a SETTINGS frame that omits a setting value that the client -understands (apart from reserved setting identifiers) that was previously -specified to have a non-default value, this MUST be treated as a connection -error of type H3_SETTINGS_ERROR.¶
-The PUSH_PROMISE frame (type=0x5) is used to carry a promised request header -field section from server to client on a request stream, as in HTTP/2.¶
-The payload consists of:¶
-A variable-length integer that identifies the server push operation. A Push -ID is used in push stream headers (Section 4.4) and CANCEL_PUSH frames -(Section 7.2.3).¶
-QPACK-encoded request header fields for the promised response. See [QPACK] -for more details.¶
-A server MUST NOT use a Push ID that is larger than the client has provided in a -MAX_PUSH_ID frame (Section 7.2.7). A client MUST treat receipt of a -PUSH_PROMISE frame that contains a larger Push ID than the client has advertised -as a connection error of H3_ID_ERROR.¶
-A server MAY use the same Push ID in multiple PUSH_PROMISE frames. If so, the -decompressed request header sets MUST contain the same fields in the same order, -and both the name and the value in each field MUST be exact matches. Clients -SHOULD compare the request header sections for resources promised multiple -times. If a client receives a Push ID that has already been promised and detects -a mismatch, it MUST respond with a connection error of type -H3_GENERAL_PROTOCOL_ERROR. If the decompressed field sections match exactly, the -client SHOULD associate the pushed content with each stream on which a -PUSH_PROMISE frame was received.¶
-Allowing duplicate references to the same Push ID is primarily to reduce -duplication caused by concurrent requests. A server SHOULD avoid reusing a Push -ID over a long period. Clients are likely to consume server push responses and -not retain them for reuse over time. Clients that see a PUSH_PROMISE frame that -uses a Push ID that they have already consumed and discarded are forced to -ignore the promise.¶
-If a PUSH_PROMISE frame is received on the control stream, the client MUST -respond with a connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
-A client MUST NOT send a PUSH_PROMISE frame. A server MUST treat the receipt of -a PUSH_PROMISE frame as a connection error of type H3_FRAME_UNEXPECTED; see -Section 8.¶
-See Section 4.4 for a description of the overall server push mechanism.¶
-The GOAWAY frame (type=0x7) is used to initiate graceful shutdown of an HTTP/3 -connection by either endpoint. GOAWAY allows an endpoint to stop accepting new -requests or pushes while still finishing processing of previously received -requests and pushes. This enables administrative actions, like server -maintenance. GOAWAY by itself does not close a connection.¶
-The GOAWAY frame is always sent on the control stream. In the server to client -direction, it carries a QUIC Stream ID for a client-initiated bidirectional -stream encoded as a variable-length integer. A client MUST treat receipt of a -GOAWAY frame containing a Stream ID of any other type as a connection error of -type H3_ID_ERROR.¶
-In the client to server direction, the GOAWAY frame carries a Push ID encoded as -a variable-length integer.¶
-The GOAWAY frame applies to the entire connection, not a specific stream. A -client MUST treat a GOAWAY frame on a stream other than the control stream as a -connection error of type H3_FRAME_UNEXPECTED; see Section 8.¶
-See Section 5.2 for more information on the use of the GOAWAY frame.¶
-The MAX_PUSH_ID frame (type=0xd) is used by clients to control the number of -server pushes that the server can initiate. This sets the maximum value for a -Push ID that the server can use in PUSH_PROMISE and CANCEL_PUSH frames. -Consequently, this also limits the number of push streams that the server can -initiate in addition to the limit maintained by the QUIC transport.¶
-The MAX_PUSH_ID frame is always sent on the control stream. Receipt of a -MAX_PUSH_ID frame on any other stream MUST be treated as a connection error of -type H3_FRAME_UNEXPECTED.¶
-A server MUST NOT send a MAX_PUSH_ID frame. A client MUST treat the receipt of -a MAX_PUSH_ID frame as a connection error of type H3_FRAME_UNEXPECTED.¶
-The maximum Push ID is unset when an HTTP/3 connection is created, meaning that -a server cannot push until it receives a MAX_PUSH_ID frame. A client that -wishes to manage the number of promised server pushes can increase the maximum -Push ID by sending MAX_PUSH_ID frames as the server fulfills or cancels server -pushes.¶
-The MAX_PUSH_ID frame carries a single variable-length integer that identifies -the maximum value for a Push ID that the server can use; see Section 4.4. A -MAX_PUSH_ID frame cannot reduce the maximum Push ID; receipt of a MAX_PUSH_ID -frame that contains a smaller value than previously received MUST be treated as -a connection error of type H3_ID_ERROR.¶
-Frame types of the format 0x1f * N + 0x21
for non-negative integer values of N
-are reserved to exercise the requirement that unknown types be ignored
-(Section 9). These frames have no semantics, and MAY be sent on any stream
-where frames are allowed to be sent. This enables their use for
-application-layer padding. Endpoints MUST NOT consider these frames to have any
-meaning upon receipt.¶
The payload and length of the frames are selected in any manner the -implementation chooses.¶
-Frame types that were used in HTTP/2 where there is no corresponding HTTP/3 -frame have also been reserved (Section 11.2.1). These frame types MUST NOT be -sent, and their receipt MUST be treated as a connection error of type -H3_FRAME_UNEXPECTED.¶
-When a stream cannot be completed successfully, QUIC allows the application to -abruptly terminate (reset) that stream and communicate a reason; see Section 2.4 -of [QUIC-TRANSPORT]. This is referred to as a "stream error." An HTTP/3 -implementation can decide to close a QUIC stream and communicate the type of -error. Wire encodings of error codes are defined in Section 8.1. -Stream errors are distinct from HTTP status codes which indicate error -conditions. Stream errors indicate that the sender did not transfer or consume -the full request or response, while HTTP status codes indicate the result of a -request that was successfully received.¶
-If an entire connection needs to be terminated, QUIC similarly provides -mechanisms to communicate a reason; see Section 5.3 of [QUIC-TRANSPORT]. This -is referred to as a "connection error." Similar to stream errors, an HTTP/3 -implementation can terminate a QUIC connection and communicate the reason using -an error code from Section 8.1.¶
-Although the reasons for closing streams and connections are called "errors," -these actions do not necessarily indicate a problem with the connection or -either implementation. For example, a stream can be reset if the requested -resource is no longer needed.¶
-An endpoint MAY choose to treat a stream error as a connection error under -certain circumstances, closing the entire connection in response to a condition -on a single stream. Implementations need to consider the impact on outstanding -requests before making this choice.¶
-Because new error codes can be defined without negotiation (see Section 9), -use of an error code in an unexpected context or receipt of an unknown error -code MUST be treated as equivalent to H3_NO_ERROR. However, closing a stream -can have other effects regardless of the error code; for example, see -Section 4.1.¶
-The following error codes are defined for use when abruptly terminating streams, -aborting reading of streams, or immediately closing HTTP/3 connections.¶
-No error. This is used when the connection or stream needs to be closed, but -there is no error to signal.¶
-Peer violated protocol requirements in a way that does not match a more -specific error code, or endpoint declines to use the more specific error code.¶
-An internal error has occurred in the HTTP stack.¶
-The endpoint detected that its peer created a stream that it will not accept.¶
-A stream required by the HTTP/3 connection was closed or reset.¶
-A frame was received that was not permitted in the current state or on the -current stream.¶
-A frame that fails to satisfy layout requirements or with an invalid size -was received.¶
-The endpoint detected that its peer is exhibiting a behavior that might be -generating excessive load.¶
-A Stream ID or Push ID was used incorrectly, such as exceeding a limit, -reducing a limit, or being reused.¶
-An endpoint detected an error in the payload of a SETTINGS frame.¶
-No SETTINGS frame was received at the beginning of the control stream.¶
-A server rejected a request without performing any application processing.¶
-The request or its response (including pushed response) is cancelled.¶
-The client's stream terminated without containing a fully-formed request.¶
-The TCP connection established in response to a CONNECT request was reset or -abnormally closed.¶
-The requested operation cannot be served over HTTP/3. The peer should -retry over HTTP/1.1.¶
-Error codes of the format 0x1f * N + 0x21
for non-negative integer values of N
-are reserved to exercise the requirement that unknown error codes be treated as
-equivalent to H3_NO_ERROR (Section 9). Implementations SHOULD select an
-error code from this space with some probability when they would have sent
-H3_NO_ERROR.¶
HTTP/3 permits extension of the protocol. Within the limitations described in -this section, protocol extensions can be used to provide additional services or -alter any aspect of the protocol. Extensions are effective only within the -scope of a single HTTP/3 connection.¶
-This applies to the protocol elements defined in this document. This does not -affect the existing options for extending HTTP, such as defining new methods, -status codes, or fields.¶
-Extensions are permitted to use new frame types (Section 7.2), new settings -(Section 7.2.4.1), new error codes (Section 8), or new unidirectional -stream types (Section 6.2). Registries are established for -managing these extension points: frame types (Section 11.2.1), settings -(Section 11.2.2), error codes (Section 11.2.3), and stream types -(Section 11.2.4).¶
-Implementations MUST ignore unknown or unsupported values in all extensible -protocol elements. Implementations MUST discard frames and unidirectional -streams that have unknown or unsupported types. This means that any of these -extension points can be safely used by extensions without prior arrangement or -negotiation. However, where a known frame type is required to be in a specific -location, such as the SETTINGS frame as the first frame of the control stream -(see Section 6.2.1), an unknown frame type does not satisfy that -requirement and SHOULD be treated as an error.¶
-Extensions that could change the semantics of existing protocol components MUST -be negotiated before being used. For example, an extension that changes the -layout of the HEADERS frame cannot be used until the peer has given a positive -signal that this is acceptable. Coordinating when such a revised layout comes -into effect could prove complex. As such, allocating new identifiers for -new definitions of existing protocol elements is likely to be more effective.¶
-This document does not mandate a specific method for negotiating the use of an -extension but notes that a setting (Section 7.2.4.1) could be used for -that purpose. If both peers set a value that indicates willingness to use the -extension, then the extension can be used. If a setting is used for extension -negotiation, the default value MUST be defined in such a fashion that the -extension is disabled if the setting is omitted.¶
-The security considerations of HTTP/3 should be comparable to those of HTTP/2 -with TLS. However, many of the considerations from Section 10 of [HTTP2] -apply to [QUIC-TRANSPORT] and are discussed in that document.¶
- -The use of ALPN in the TLS and QUIC handshakes establishes the target -application protocol before application-layer bytes are processed. Because all -QUIC packets are encrypted, it is difficult for an attacker to control the -plaintext bytes of an HTTP/3 connection, which could be used in a cross-protocol -attack on a plaintext protocol.¶
-The HTTP/3 field encoding allows the expression of names that are not valid -field names in the syntax used by HTTP (Section 5.4.3 of [SEMANTICS]). -Requests or responses containing invalid field names MUST be treated as -malformed (Section 4.1.3). An intermediary therefore cannot translate an HTTP/3 -request or response containing an invalid field name into an HTTP/1.1 message.¶
-Similarly, HTTP/3 can transport field values that are not valid. While most -values that can be encoded will not alter field parsing, carriage return (CR, -ASCII 0xd), line feed (LF, ASCII 0xa), and the zero character (NUL, ASCII 0x0) -might be exploited by an attacker if they are translated verbatim. Any request -or response that contains a character not permitted in a field value MUST be -treated as malformed (Section 4.1.3). Valid characters are defined by the -"field-content" ABNF rule in Section 5.4.4 of [SEMANTICS].¶
-Pushed responses do not have an explicit request from the client; the request is -provided by the server in the PUSH_PROMISE frame.¶
-Caching responses that are pushed is possible based on the guidance provided by -the origin server in the Cache-Control header field. However, this can cause -issues if a single server hosts more than one tenant. For example, a server -might offer multiple users each a small portion of its URI space.¶
-Where multiple tenants share space on the same server, that server MUST ensure -that tenants are not able to push representations of resources that they do not -have authority over. Failure to enforce this would allow a tenant to provide a -representation that would be served out of cache, overriding the actual -representation that the authoritative tenant provides.¶
-Clients are required to reject pushed responses for which an origin server is -not authoritative; see Section 4.4.¶
-An HTTP/3 connection can demand a greater commitment of resources to operate -than an HTTP/1.1 or HTTP/2 connection. The use of field compression and flow -control depend on a commitment of resources for storing a greater amount of -state. Settings for these features ensure that memory commitments for these -features are strictly bounded.¶
-The number of PUSH_PROMISE frames is constrained in a similar fashion. A client -that accepts server push SHOULD limit the number of Push IDs it issues at a -time.¶
-Processing capacity cannot be guarded as effectively as state capacity.¶
-The ability to send undefined protocol elements that the peer is required to -ignore can be abused to cause a peer to expend additional processing time. This -might be done by setting multiple undefined SETTINGS parameters, unknown frame -types, or unknown stream types. Note, however, that some uses are entirely -legitimate, such as optional-to-understand extensions and padding to increase -resistance to traffic analysis.¶
-Compression of field sections also offers some opportunities to waste processing -resources; see Section 7 of [QPACK] for more details on potential abuses.¶
-All these features -- i.e., server push, unknown protocol elements, field -compression -- have legitimate uses. These features become a burden only when -they are used unnecessarily or to excess.¶
-An endpoint that does not monitor this behavior exposes itself to a risk of -denial-of-service attack. Implementations SHOULD track the use of these -features and set limits on their use. An endpoint MAY treat activity that is -suspicious as a connection error of type H3_EXCESSIVE_LOAD (Section 8), but -false positives will result in disrupting valid connections and requests.¶
-A large field section (Section 4.1) can cause an implementation to -commit a large amount of state. Header fields that are critical for routing can -appear toward the end of a header field section, which prevents streaming of the -header field section to its ultimate destination. This ordering and other -reasons, such as ensuring cache correctness, mean that an endpoint likely needs -to buffer the entire header field section. Since there is no hard limit to the -size of a field section, some endpoints could be forced to commit a large amount -of available memory for header fields.¶
-An endpoint can use the SETTINGS_MAX_FIELD_SECTION_SIZE -(Section 4.1.1.3) setting to advise peers of limits that might apply -on the size of field sections. This setting is only advisory, so endpoints MAY -choose to send field sections that exceed this limit and risk having the request -or response being treated as malformed. This setting is specific to an HTTP/3 -connection, so any request or response could encounter a hop with a lower, -unknown limit. An intermediary can attempt to avoid this problem by passing on -values presented by different peers, but they are not obligated to do so.¶
-A server that receives a larger field section than it is willing to handle can -send an HTTP 431 (Request Header Fields Too Large) status code ([RFC6585]). -A client can discard responses that it cannot process.¶
-The CONNECT method can be used to create disproportionate load on a proxy, -since stream creation is relatively inexpensive when compared to the creation -and maintenance of a TCP connection. A proxy might also maintain some resources -for a TCP connection beyond the closing of the stream that carries the CONNECT -request, since the outgoing TCP connection remains in the TIME_WAIT state. -Therefore, a proxy cannot rely on QUIC stream limits alone to control the -resources consumed by CONNECT requests.¶
-Compression can allow an attacker to recover secret data when it is compressed -in the same context as data under attacker control. HTTP/3 enables compression -of fields (Section 4.1.1); the following concerns also apply to the use -of HTTP compressed content-codings; see Section 7.5.1 of [SEMANTICS].¶
-There are demonstrable attacks on compression that exploit the characteristics -of the web (e.g., [BREACH]). The attacker induces multiple requests -containing varying plaintext, observing the length of the resulting ciphertext -in each, which reveals a shorter length when a guess about the secret is -correct.¶
-Implementations communicating on a secure channel MUST NOT compress content that -includes both confidential and attacker-controlled data unless separate -compression contexts are used for each source of data. Compression MUST NOT be -used if the source of data cannot be reliably determined.¶
-Further considerations regarding the compression of fields sections are -described in [QPACK].¶
-Padding can be used to obscure the exact size of frame content and is provided -to mitigate specific attacks within HTTP, for example, attacks where compressed -content includes both attacker-controlled plaintext and secret data (e.g., -[BREACH]).¶
-Where HTTP/2 employs PADDING frames and Padding fields in other frames to make a -connection more resistant to traffic analysis, HTTP/3 can either rely on -transport-layer padding or employ the reserved frame and stream types discussed -in Section 7.2.8 and Section 6.2.3. These methods of padding produce -different results in terms of the granularity of padding, how padding is -arranged in relation to the information that is being protected, whether padding -is applied in the case of packet loss, and how an implementation might control -padding.¶
-Reserved stream types can be used to give the appearance of sending traffic even -when the connection is idle. Because HTTP traffic often occurs in bursts, -apparent traffic can be used to obscure the timing or duration of such bursts, -even to the point of appearing to send a constant stream of data. However, as -such traffic is still flow controlled by the receiver, a failure to promptly -drain such streams and provide additional flow control credit can limit the -sender's ability to send real traffic.¶
-To mitigate attacks that rely on compression, disabling or limiting compression -might be preferable to padding as a countermeasure.¶
-Use of padding can result in less protection than might seem immediately -obvious. Redundant padding could even be counterproductive. At best, padding -only makes it more difficult for an attacker to infer length information by -increasing the number of frames an attacker has to observe. Incorrectly -implemented padding schemes can be easily defeated. In particular, randomized -padding with a predictable distribution provides very little protection; -similarly, padding payloads to a fixed size exposes information as payload sizes -cross the fixed-sized boundary, which could be possible if an attacker can -control plaintext.¶
-Several protocol elements contain nested length elements, typically in the form -of frames with an explicit length containing variable-length integers. This -could pose a security risk to an incautious implementer. An implementation MUST -ensure that the length of a frame exactly matches the length of the fields it -contains.¶
-The use of 0-RTT with HTTP/3 creates an exposure to replay attack. The -anti-replay mitigations in [HTTP-REPLAY] MUST be applied when using -HTTP/3 with 0-RTT.¶
-Certain HTTP implementations use the client address for logging or -access-control purposes. Since a QUIC client's address might change during a -connection (and future versions might support simultaneous use of multiple -addresses), such implementations will need to either actively retrieve the -client's current address or addresses when they are relevant or explicitly -accept that the original address might change.¶
-Several characteristics of HTTP/3 provide an observer an opportunity to -correlate actions of a single client or server over time. These include the -value of settings, the timing of reactions to stimulus, and the handling of any -features that are controlled by settings.¶
-As far as these create observable differences in behavior, they could be used as -a basis for fingerprinting a specific client.¶
-HTTP/3's preference for using a single QUIC connection allows correlation of a -user's activity on a site. Reusing connections for different origins allows -for correlation of activity across those origins.¶
-Several features of QUIC solicit immediate responses and can be used by an -endpoint to measure latency to their peer; this might have privacy implications -in certain scenarios.¶
-This document registers a new ALPN protocol ID (Section 11.1) and creates new -registries that manage the assignment of codepoints in HTTP/3.¶
-This document creates a new registration for the identification of -HTTP/3 in the "Application Layer Protocol Negotiation (ALPN) -Protocol IDs" registry established in [RFC7301].¶
-The "h3" string identifies HTTP/3:¶
- -New registries created in this document operate under the QUIC registration -policy documented in Section 22.1 of [QUIC-TRANSPORT]. These registries all -include the common set of fields listed in Section 22.1.1 of [QUIC-TRANSPORT]. -These registries [SHALL be/are] collected under a "Hypertext Transfer Protocol -version 3 (HTTP/3) Parameters" heading.¶
-The initial allocations in these registries created in this document are all -assigned permanent status and list a change controller of the IETF and a contact -of the HTTP working group (ietf-http-wg@w3.org).¶
-This document establishes a registry for HTTP/3 frame type codes. The "HTTP/3 -Frame Type" registry governs a 62-bit space. This registry follows the QUIC -registry policy; see Section 11.2. Permanent registrations in this registry -are assigned using the Specification Required policy ([RFC8126]), except for -values between 0x00 and 0x3f (in hexadecimal; inclusive), which are assigned -using Standards Action or IESG Approval as defined in Section 4.9 and 4.10 of -[RFC8126].¶
-While this registry is separate from the "HTTP/2 Frame Type" registry defined in -[HTTP2], it is preferable that the assignments parallel each other where the -code spaces overlap. If an entry is present in only one registry, every effort -SHOULD be made to avoid assigning the corresponding value to an unrelated -operation.¶
-In addition to common fields as described in Section 11.2, permanent -registrations in this registry MUST include the following field:¶
-A name or label for the frame type.¶
-Specifications of frame types MUST include a description of the frame layout and -its semantics, including any parts of the frame that are conditionally present.¶
-The entries in Table 2 are registered by this document.¶
-Frame Type | -Value | -Specification | -
---|---|---|
DATA | -0x0 | -- Section 7.2.1 - | -
HEADERS | -0x1 | -- Section 7.2.2 - | -
Reserved | -0x2 | -N/A | -
CANCEL_PUSH | -0x3 | -- Section 7.2.3 - | -
SETTINGS | -0x4 | -- Section 7.2.4 - | -
PUSH_PROMISE | -0x5 | -- Section 7.2.5 - | -
Reserved | -0x6 | -N/A | -
GOAWAY | -0x7 | -- Section 7.2.6 - | -
Reserved | -0x8 | -N/A | -
Reserved | -0x9 | -N/A | -
MAX_PUSH_ID | -0xd | -- Section 7.2.7 - | -
Additionally, each code of the format 0x1f * N + 0x21
for non-negative integer
-values of N (that is, 0x21, 0x40, ..., through 0x3FFFFFFFFFFFFFFE) MUST
-NOT be assigned by IANA.¶
This document establishes a registry for HTTP/3 settings. The "HTTP/3 Settings" -registry governs a 62-bit space. This registry follows the QUIC registry -policy; see Section 11.2. Permanent registrations in this registry are -assigned using the Specification Required policy ([RFC8126]), except for -values between 0x00 and 0x3f (in hexadecimal; inclusive), which are assigned -using Standards Action or IESG Approval as defined in Section 4.9 and 4.10 of -[RFC8126].¶
-While this registry is separate from the "HTTP/2 Settings" registry defined in -[HTTP2], it is preferable that the assignments parallel each other. If an -entry is present in only one registry, every effort SHOULD be made to avoid -assigning the corresponding value to an unrelated operation.¶
-In addition to common fields as described in Section 11.2, permanent -registrations in this registry MUST include the following fields:¶
-A symbolic name for the setting. Specifying a setting name is optional.¶
-The value of the setting unless otherwise indicated. A default SHOULD be the -most restrictive possible value.¶
-The entries in Table 3 are registered by this document.¶
-Setting Name | -Value | -Specification | -Default | -
---|---|---|---|
Reserved | -0x2 | -N/A | -N/A | -
Reserved | -0x3 | -N/A | -N/A | -
Reserved | -0x4 | -N/A | -N/A | -
Reserved | -0x5 | -N/A | -N/A | -
MAX_FIELD_SECTION_SIZE | -0x6 | -- Section 7.2.4.1 - | -Unlimited | -
Additionally, each code of the format 0x1f * N + 0x21
for non-negative integer
-values of N (that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be
-assigned by IANA.¶
This document establishes a registry for HTTP/3 error codes. The "HTTP/3 Error -Code" registry manages a 62-bit space. This registry follows the QUIC registry -policy; see Section 11.2. Permanent registrations in this registry are -assigned using the Specification Required policy ([RFC8126]), except for -values between 0x00 and 0x3f (in hexadecimal; inclusive), which are assigned -using Standards Action or IESG Approval as defined in Section 4.9 and 4.10 of -[RFC8126].¶
-Registrations for error codes are required to include a description of the -error code. An expert reviewer is advised to examine new registrations for -possible duplication with existing error codes. Use of existing -registrations is to be encouraged, but not mandated. Use of values that -are registered in the "HTTP/2 Error Code" registry is discouraged.¶
-In addition to common fields as described in Section 11.2, this registry -includes two additional fields. Permanent registrations in this registry MUST -include the following field:¶
-A name for the error code.¶
-A brief description of the error code semantics.¶
-The entries in Table 4 are registered by this document. These -error codes were selected from the range that operates on a Specification -Required policy to avoid collisions with HTTP/2 error codes.¶
-Name | -Value | -Description | -Specification | -
---|---|---|---|
H3_NO_ERROR | -0x0100 | -No error | -- Section 8.1 - | -
H3_GENERAL_PROTOCOL_ERROR | -0x0101 | -General protocol error | -- Section 8.1 - | -
H3_INTERNAL_ERROR | -0x0102 | -Internal error | -- Section 8.1 - | -
H3_STREAM_CREATION_ERROR | -0x0103 | -Stream creation error | -- Section 8.1 - | -
H3_CLOSED_CRITICAL_STREAM | -0x0104 | -Critical stream was closed | -- Section 8.1 - | -
H3_FRAME_UNEXPECTED | -0x0105 | -Frame not permitted in the current state | -- Section 8.1 - | -
H3_FRAME_ERROR | -0x0106 | -Frame violated layout or size rules | -- Section 8.1 - | -
H3_EXCESSIVE_LOAD | -0x0107 | -Peer generating excessive load | -- Section 8.1 - | -
H3_ID_ERROR | -0x0108 | -An identifier was used incorrectly | -- Section 8.1 - | -
H3_SETTINGS_ERROR | -0x0109 | -SETTINGS frame contained invalid values | -- Section 8.1 - | -
H3_MISSING_SETTINGS | -0x010a | -No SETTINGS frame received | -- Section 8.1 - | -
H3_REQUEST_REJECTED | -0x010b | -Request not processed | -- Section 8.1 - | -
H3_REQUEST_CANCELLED | -0x010c | -Data no longer needed | -- Section 8.1 - | -
H3_REQUEST_INCOMPLETE | -0x010d | -Stream terminated early | -- Section 8.1 - | -
H3_CONNECT_ERROR | -0x010f | -TCP reset or error on CONNECT request | -- Section 8.1 - | -
H3_VERSION_FALLBACK | -0x0110 | -Retry over HTTP/1.1 | -- Section 8.1 - | -
Additionally, each code of the format 0x1f * N + 0x21
for non-negative integer
-values of N (that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be
-assigned by IANA.¶
This document establishes a registry for HTTP/3 unidirectional stream types. The -"HTTP/3 Stream Type" registry governs a 62-bit space. This registry follows the -QUIC registry policy; see Section 11.2. Permanent registrations in this -registry are assigned using the Specification Required policy ([RFC8126]), -except for values between 0x00 and 0x3f (in hexadecimal; inclusive), which are -assigned using Standards Action or IESG Approval as defined in Section 4.9 and -4.10 of [RFC8126].¶
-In addition to common fields as described in Section 11.2, permanent -registrations in this registry MUST include the following fields:¶
-A name or label for the stream type.¶
-Which endpoint on an HTTP/3 connection may initiate a stream of this type. -Values are "Client", "Server", or "Both".¶
-Specifications for permanent registrations MUST include a description of the -stream type, including the layout and semantics of the stream contents.¶
-The entries in the following table are registered by this document.¶
-Stream Type | -Value | -Specification | -Sender | -
---|---|---|---|
Control Stream | -0x00 | -- Section 6.2.1 - | -Both | -
Push Stream | -0x01 | -- Section 4.4 - | -Server | -
Additionally, each code of the format 0x1f * N + 0x21
for non-negative integer
-values of N (that is, 0x21, 0x40, ..., through 0x3ffffffffffffffe) MUST NOT be
-assigned by IANA.¶
HTTP/3 is strongly informed by HTTP/2, and bears many similarities. This -section describes the approach taken to design HTTP/3, points out important -differences from HTTP/2, and describes how to map HTTP/2 extensions into HTTP/3.¶
-HTTP/3 begins from the premise that similarity to HTTP/2 is preferable, but not -a hard requirement. HTTP/3 departs from HTTP/2 where QUIC differs from TCP, -either to take advantage of QUIC features (like streams) or to accommodate -important shortcomings (such as a lack of total ordering). These differences -make HTTP/3 similar to HTTP/2 in key aspects, such as the relationship of -requests and responses to streams. However, the details of the HTTP/3 design are -substantially different from HTTP/2.¶
-These departures are noted in this section.¶
-HTTP/3 permits use of a larger number of streams (2^62-1) than HTTP/2. The same -considerations about exhaustion of stream identifier space apply, though the -space is significantly larger such that it is likely that other limits in QUIC -are reached first, such as the limit on the connection flow control window.¶
-In contrast to HTTP/2, stream concurrency in HTTP/3 is managed by QUIC. QUIC -considers a stream closed when all data has been received and sent data has been -acknowledged by the peer. HTTP/2 considers a stream closed when the frame -containing the END_STREAM bit has been committed to the transport. As a result, -the stream for an equivalent exchange could remain "active" for a longer period -of time. HTTP/3 servers might choose to permit a larger number of concurrent -client-initiated bidirectional streams to achieve equivalent concurrency to -HTTP/2, depending on the expected usage patterns.¶
-Due to the presence of other unidirectional stream types, HTTP/3 does not rely -exclusively on the number of concurrent unidirectional streams to control the -number of concurrent in-flight pushes. Instead, HTTP/3 clients use the -MAX_PUSH_ID frame to control the number of pushes received from an HTTP/3 -server.¶
-Many framing concepts from HTTP/2 can be elided on QUIC, because the transport -deals with them. Because frames are already on a stream, they can omit the -stream number. Because frames do not block multiplexing (QUIC's multiplexing -occurs below this layer), the support for variable-maximum-length packets can be -removed. Because stream termination is handled by QUIC, an END_STREAM flag is -not required. This permits the removal of the Flags field from the generic -frame layout.¶
-Frame payloads are largely drawn from [HTTP2]. However, QUIC includes many -features (e.g., flow control) that are also present in HTTP/2. In these cases, -the HTTP mapping does not re-implement them. As a result, several HTTP/2 frame -types are not required in HTTP/3. Where an HTTP/2-defined frame is no longer -used, the frame ID has been reserved in order to maximize portability between -HTTP/2 and HTTP/3 implementations. However, even equivalent frames between the -two mappings are not identical.¶
-Many of the differences arise from the fact that HTTP/2 provides an absolute -ordering between frames across all streams, while QUIC provides this guarantee -on each stream only. As a result, if a frame type makes assumptions that frames -from different streams will still be received in the order sent, HTTP/3 will -break them.¶
-Some examples of feature adaptations are described below, as well as general -guidance to extension frame implementors converting an HTTP/2 extension to -HTTP/3.¶
-HTTP/2 specifies priority assignments in PRIORITY frames and (optionally) in -HEADERS frames. HTTP/3 does not provide a means of signaling priority.¶
-Note that while there is no explicit signaling for priority, this does not mean -that prioritization is not important for achieving good performance.¶
-HPACK was designed with the assumption of in-order delivery. A sequence of -encoded field sections must arrive (and be decoded) at an endpoint in the same -order in which they were encoded. This ensures that the dynamic state at the two -endpoints remains in sync.¶
-Because this total ordering is not provided by QUIC, HTTP/3 uses a modified -version of HPACK, called QPACK. QPACK uses a single unidirectional stream to -make all modifications to the dynamic table, ensuring a total order of updates. -All frames that contain encoded fields merely reference the table state at a -given time without modifying it.¶
- -HTTP/2 specifies a stream flow control mechanism. Although all HTTP/2 frames are -delivered on streams, only the DATA frame payload is subject to flow control. -QUIC provides flow control for stream data and all HTTP/3 frame types defined in -this document are sent on streams. Therefore, all frame headers and payload are -subject to flow control.¶
-Frame type definitions in HTTP/3 often use the QUIC variable-length integer -encoding. In particular, Stream IDs use this encoding, which allows for a -larger range of possible values than the encoding used in HTTP/2. Some frames -in HTTP/3 use an identifier rather than a Stream ID (e.g., Push -IDs). Redefinition of the encoding of extension frame types might be necessary -if the encoding includes a Stream ID.¶
-Because the Flags field is not present in generic HTTP/3 frames, those frames -that depend on the presence of flags need to allocate space for flags as part -of their frame payload.¶
-Other than these issues, frame type HTTP/2 extensions are typically portable to -QUIC simply by replacing Stream 0 in HTTP/2 with a control stream in HTTP/3. -HTTP/3 extensions will not assume ordering, but would not be harmed by ordering, -and would be portable to HTTP/2 in the same manner.¶
-Padding is not defined in HTTP/3 frames. See Section 7.2.1.¶
-The PRIORITY region of HEADERS is not defined in HTTP/3 frames. Padding is not -defined in HTTP/3 frames. See Section 7.2.2.¶
-As described in Appendix A.2.1, HTTP/3 does not provide a means of -signaling priority.¶
-RST_STREAM frames do not exist in HTTP/3, since QUIC provides stream lifecycle -management. The same code point is used for the CANCEL_PUSH frame -(Section 7.2.3).¶
-SETTINGS frames are sent only at the beginning of the connection. See -Section 7.2.4 and Appendix A.3.¶
-The PUSH_PROMISE frame does not reference a stream; instead the push stream -references the PUSH_PROMISE frame using a Push ID. See -Section 7.2.5.¶
-PING frames do not exist in HTTP/3, as QUIC provides equivalent -functionality.¶
-GOAWAY does not contain an error code. In the client to server direction, -it carries a Push ID instead of a server initiated stream ID. -See Section 7.2.6.¶
-WINDOW_UPDATE frames do not exist in HTTP/3, since QUIC provides flow control.¶
-CONTINUATION frames do not exist in HTTP/3; instead, larger -HEADERS/PUSH_PROMISE frames than HTTP/2 are permitted.¶
-Frame types defined by extensions to HTTP/2 need to be separately registered for -HTTP/3 if still applicable. The IDs of frames defined in [HTTP2] have been -reserved for simplicity. Note that the frame type space in HTTP/3 is -substantially larger (62 bits versus 8 bits), so many HTTP/3 frame types have no -equivalent HTTP/2 code points. See Section 11.2.1.¶
-An important difference from HTTP/2 is that settings are sent once, as the first -frame of the control stream, and thereafter cannot change. This eliminates many -corner cases around synchronization of changes.¶
-Some transport-level options that HTTP/2 specifies via the SETTINGS frame are -superseded by QUIC transport parameters in HTTP/3. The HTTP-level options that -are retained in HTTP/3 have the same value as in HTTP/2. The superseded -settings are reserved, and their receipt is an error. See -Section 7.2.4.1 for discussion of both the retained and reserved values.¶
-Below is a listing of how each HTTP/2 SETTINGS parameter is mapped:¶
-This is removed in favor of the MAX_PUSH_ID frame, which provides a more -granular control over server push. Specifying a setting with the identifier -0x2 (corresponding to the SETTINGS_ENABLE_PUSH parameter) in the HTTP/3 -SETTINGS frame is an error.¶
-QUIC controls the largest open Stream ID as part of its flow control logic. -Specifying a setting with the identifier 0x3 (corresponding to the -SETTINGS_MAX_CONCURRENT_STREAMS parameter) in the HTTP/3 SETTINGS frame is an -error.¶
-QUIC requires both stream and connection flow control window sizes to be -specified in the initial transport handshake. Specifying a setting with the -identifier 0x4 (corresponding to the SETTINGS_INITIAL_WINDOW_SIZE parameter) -in the HTTP/3 SETTINGS frame is an error.¶
-This setting has no equivalent in HTTP/3. Specifying a setting with the -identifier 0x5 (corresponding to the SETTINGS_MAX_FRAME_SIZE parameter) in the -HTTP/3 SETTINGS frame is an error.¶
-This setting identifier has been renamed SETTINGS_MAX_FIELD_SECTION_SIZE.¶
-In HTTP/3, setting values are variable-length integers (6, 14, 30, or 62 bits -long) rather than fixed-length 32-bit fields as in HTTP/2. This will often -produce a shorter encoding, but can produce a longer encoding for settings that -use the full 32-bit space. Settings ported from HTTP/2 might choose to redefine -their value to limit it to 30 bits for more efficient encoding, or to make use -of the 62-bit space if more than 30 bits are required.¶
-Settings need to be defined separately for HTTP/2 and HTTP/3. The IDs of -settings defined in [HTTP2] have been reserved for simplicity. Note that -the settings identifier space in HTTP/3 is substantially larger (62 bits versus -16 bits), so many HTTP/3 settings have no equivalent HTTP/2 code point. See -Section 11.2.2.¶
-As QUIC streams might arrive out of order, endpoints are advised not to wait for -the peers' settings to arrive before responding to other streams. See -Section 7.2.4.2.¶
-QUIC has the same concepts of "stream" and "connection" errors that HTTP/2 -provides. However, the differences between HTTP/2 and HTTP/3 mean that error -codes are not directly portable between versions.¶
-The HTTP/2 error codes defined in Section 7 of [HTTP2] logically map to -the HTTP/3 error codes as follows:¶
-H3_NO_ERROR in Section 8.1.¶
-This is mapped to H3_GENERAL_PROTOCOL_ERROR except in cases where more -specific error codes have been defined. Such cases include H3_FRAME_UNEXPECTED -and H3_CLOSED_CRITICAL_STREAM defined in Section 8.1.¶
-H3_INTERNAL_ERROR in Section 8.1.¶
-Not applicable, since QUIC handles flow control.¶
-Not applicable, since no acknowledgement of SETTINGS is defined.¶
-Not applicable, since QUIC handles stream management.¶
-H3_FRAME_ERROR error code defined in Section 8.1.¶
-H3_REQUEST_REJECTED (in Section 8.1) is used to indicate that a -request was not processed. Otherwise, not applicable because QUIC handles -stream management.¶
-H3_REQUEST_CANCELLED in Section 8.1.¶
-H3_CONNECT_ERROR in Section 8.1.¶
-H3_EXCESSIVE_LOAD in Section 8.1.¶
-Not applicable, since QUIC is assumed to provide sufficient security on all -connections.¶
-H3_VERSION_FALLBACK in Section 8.1.¶
-Error codes need to be defined for HTTP/2 and HTTP/3 separately. See -Section 11.2.3.¶
-An intermediary that converts between HTTP/2 and HTTP/3 may encounter error -conditions from either upstream. It is useful to communicate the occurrence of -error to the downstream but error codes largely reflect connection-local -problems that generally do not make sense to propagate.¶
-An intermediary that encounters an error from an upstream origin can indicate -this by sending an HTTP status code such as 502, which is suitable for a broad -class of errors.¶
-There are some rare cases where it is beneficial to propagate the error by -mapping it to the closest matching error type to the receiver. For example, an -intermediary that receives an HTTP/2 stream error of type REFUSED_STREAM from -the origin has a clear signal that the request was not processed and that the -request is safe to retry. Propagating this error condition to the client as an -HTTP/3 stream error of type H3_REQUEST_REJECTED allows the client to take the -action it deems most appropriate. In the reverse direction, the intermediary -might deem it beneficial to pass on client request cancellations that are -indicated by terminating a stream with H3_REQUEST_CANCELLED; see -Section 4.1.2.¶
-Conversion between errors is described in the logical mapping. The error codes -are defined in non-overlapping spaces in order to protect against accidental -conversion that could result in the use of inappropriate or unknown error codes -for the target version. An intermediary is permitted to promote stream errors to -connection errors but they should be aware of the cost to the HTTP/3 connection -for what might be a temporary or intermittent error.¶
-Editorial changes only.¶
-Editorial changes only.¶
-Further changes to error codes (#2662,#2551):¶
- -http-opportunistic
resource (RFC 8164) when scheme is
-http
(#2439,#2973)¶
-Changes to SETTINGS frames in 0-RTT (#2972,#2790,#2945):¶
-No changes¶
-Extensive changes to error codes and conditions of their sending¶
-Use variable-length integers throughout (#2437,#2233,#2253,#2275)¶
- -Changes to PRIORITY frame (#1865, #2075)¶
- -Substantial editorial reorganization; no technical changes.¶
-None.¶
-SETTINGS changes (#181):¶
- -The original authors of this specification were Robbie Shade and Mike Warres.¶
-The IETF QUIC Working Group received an enormous amount of support from many -people. Among others, the following people provided substantial contributions to -this document:¶
-奥 一穂 (Kazuho Oku)¶
-A portion of Mike's contribution was supported by Microsoft during his -employment there.¶
-Internet-Draft | -QUIC Invariants | -December 2020 | -
Thomson | -Expires 13 June 2021 | -[Page] | -
This document defines the properties of the QUIC transport protocol that are -expected to remain unchanged over time as new versions of the protocol are -developed.¶
-Discussion of this draft takes place on the QUIC working group mailing list -(quic@ietf.org), which is archived at -https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
-Working Group information can be found at https://github.com/quicwg; source -code and issues list for this draft can be found at -https://github.com/quicwg/base-drafts/labels/-invariants.¶
-- This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79.¶
-- Internet-Drafts are working documents of the Internet Engineering Task - Force (IETF). Note that other groups may also distribute working - documents as Internet-Drafts. The list of current Internet-Drafts is - at https://datatracker.ietf.org/drafts/current/.¶
-- Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress."¶
-- This Internet-Draft will expire on 13 June 2021.¶
-- Copyright (c) 2020 IETF Trust and the persons identified as the - document authors. All rights reserved.¶
-- This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with - respect to this document. Code Components extracted from this - document must include Simplified BSD License text as described in - Section 4.e of the Trust Legal Provisions and are provided without - warranty as described in the Simplified BSD License.¶
-QUIC is a connection-oriented protocol between two endpoints. Those endpoints -exchange UDP datagrams. These UDP datagrams contain QUIC packets. QUIC -endpoints use QUIC packets to establish a QUIC connection, which is shared -protocol state between those endpoints.¶
-In addition to providing secure, multiplexed transport, QUIC [QUIC-TRANSPORT] -allows for the option to negotiate a version. This allows the protocol to -change over time in response to new requirements. Many characteristics of the -protocol could change between versions.¶
-This document describes the subset of QUIC that is intended to remain stable as -new versions are developed and deployed. All of these invariants are -IP-version-independent.¶
-The primary goal of this document is to ensure that it is possible to deploy new -versions of QUIC. By documenting the properties that cannot change, this -document aims to preserve the ability for QUIC endpoints to negotiate changes to -any other aspect of the protocol. As a consequence, this also guarantees a -minimal amount of information that is made available to entities other than -endpoints. Unless specifically prohibited in this document, any aspect of the -protocol can change between different versions.¶
-Appendix A is a non-exhaustive list of some incorrect assumptions that -might be made based on knowledge of QUIC version 1; these do not apply to every -version of QUIC.¶
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL -NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in BCP 14 [RFC2119] [RFC8174] when, and only when, they -appear in all capitals, as shown here.¶
-This document defines requirements on future QUIC versions, even where normative -language is not used.¶
-This document uses terms and notational conventions from [QUIC-TRANSPORT].¶
-Packet diagrams in this document use a format defined in [QUIC-TRANSPORT] to -illustrate the order and size of fields.¶
-Complex fields are named and then followed by a list of fields surrounded by a -pair of matching braces. Each field in this list is separated by commas.¶
-Individual fields include length information, plus indications about fixed -value, optionality, or repetitions. Individual fields use the following -notational conventions, with all lengths in bits:¶
-Indicates that x is A bits long¶
-Indicates that x can be any length from A to B; A can be omitted to indicate -a minimum of zero bits and B can be omitted to indicate no set upper limit; -values in this format always end on an octet boundary¶
-Indicates that x has a fixed value of C¶
-Indicates that x is repeated zero or more times (and that each instance is -length E)¶
-This document uses network byte order (that is, big endian) values. Fields -are placed starting from the high-order bits of each byte.¶
-Figure 1 shows an example structure:¶
-QUIC endpoints exchange UDP datagrams that contain one or more QUIC packets. -This section describes the invariant characteristics of a QUIC packet. A -version of QUIC could permit multiple QUIC packets in a single UDP datagram, but -the invariant properties only describe the first packet in a datagram.¶
-QUIC defines two types of packet header: long and short. Packets with long -headers are identified by the most significant bit of the first byte being set; -packets with a short header have that bit cleared.¶
-QUIC packets might be integrity protected, including the header. However, QUIC -Version Negotiation packets are not integrity protected; see Section 6.¶
-Aside from the values described here, the payload of QUIC packets is -version-specific and of arbitrary length.¶
-Long headers take the form described in Figure 2.¶
-A QUIC packet with a long header has the high bit of the first byte set to 1. -All other bits in that byte are version specific.¶
-The next four bytes include a 32-bit Version field. Versions are described in -Section 5.4.¶
-The next byte contains the length in bytes of the Destination Connection ID -field that follows it. This length is encoded as an 8-bit unsigned integer. -The Destination Connection ID field follows the Destination Connection ID Length -field and is between 0 and 255 bytes in length. Connection IDs are described in -Section 5.3.¶
-The next byte contains the length in bytes of the Source Connection ID field -that follows it. This length is encoded as an 8-bit unsigned integer. The -Source Connection ID field follows the Source Connection ID Length field and is -between 0 and 255 bytes in length.¶
-The remainder of the packet contains version-specific content.¶
-Short headers take the form described in Figure 3.¶
-A QUIC packet with a short header has the high bit of the first byte set to 0.¶
-A QUIC packet with a short header includes a Destination Connection ID -immediately following the first byte. The short header does not include the -Connection ID Lengths, Source Connection ID, or Version fields. The length of -the Destination Connection ID is not encoded in packets with a short header -and is not constrained by this specification.¶
-The remainder of the packet has version-specific semantics.¶
-A connection ID is an opaque field of arbitrary length.¶
-The primary function of a connection ID is to ensure that changes in addressing -at lower protocol layers (UDP, IP, and below) do not cause packets for a QUIC -connection to be delivered to the wrong QUIC endpoint. The connection ID -is used by endpoints and the intermediaries that support them to ensure that -each QUIC packet can be delivered to the correct instance of an endpoint. At -the endpoint, the connection ID is used to identify the QUIC connection for -which the packet is intended.¶
-The connection ID is chosen by each endpoint using version-specific methods. -Packets for the same QUIC connection might use different connection ID values.¶
-The Version field contains a 4-byte identifier. This value can be used by -endpoints to identify a QUIC Version. A Version field with a value of -0x00000000 is reserved for version negotiation; see Section 6. All other values -are potentially valid.¶
-The properties described in this document apply to all versions of QUIC. A -protocol that does not conform to the properties described in this document is -not QUIC. Future documents might describe additional properties that apply to -a specific QUIC version, or to a range of QUIC versions.¶
-A QUIC endpoint that receives a packet with a long header and a version it -either does not understand or does not support might send a Version Negotiation -packet in response. Packets with a short header do not trigger version -negotiation.¶
-A Version Negotiation packet sets the high bit of the first byte, and thus it -conforms with the format of a packet with a long header as defined in -Section 5.1. A Version Negotiation packet is identifiable as such by the -Version field, which is set to 0x00000000.¶
-Only the most significant bit of the first byte of a Version Negotiation packet -has any defined value. The remaining 7 bits, labeled Unused, can be set to any -value when sending and MUST be ignored on receipt.¶
-After the Source Connection ID field, the Version Negotiation packet contains a -list of Supported Version fields, each identifying a version that the endpoint -sending the packet supports. A Version Negotiation packet contains no other -fields. An endpoint MUST ignore a packet that contains no Supported Version -fields, or a truncated Supported Version.¶
-Version Negotiation packets do not use integrity or confidentiality protection. -Specific QUIC versions might include protocol elements that allow endpoints to -detect modification or corruption in the set of supported versions.¶
-An endpoint MUST include the value from the Source Connection ID field of the -packet it receives in the Destination Connection ID field. The value for Source -Connection ID MUST be copied from the Destination Connection ID of the received -packet, which is initially randomly selected by a client. Echoing both -connection IDs gives clients some assurance that the server received the packet -and that the Version Negotiation packet was not generated by an off-path -attacker.¶
-An endpoint that receives a Version Negotiation packet might change the version -that it decides to use for subsequent packets. The conditions under which an -endpoint changes QUIC version will depend on the version of QUIC that it -chooses.¶
-See [QUIC-TRANSPORT] for a more thorough description of how an endpoint that -supports QUIC version 1 generates and consumes a Version Negotiation packet.¶
-It is possible that middleboxes could observe traits of a specific version of -QUIC and assume that when other versions of QUIC exhibit similar traits the same -underlying semantic is being expressed. There are potentially many such traits; -see Appendix A. Some effort has been made to either eliminate or -obscure some observable traits in QUIC version 1, but many of these remain. -Other QUIC versions might make different design decisions and so exhibit -different traits.¶
-The QUIC version number does not appear in all QUIC packets, which means that -reliably extracting information from a flow based on version-specific traits -requires that middleboxes retain state for every connection ID they see.¶
-The Version Negotiation packet described in this document is not -integrity-protected; it only has modest protection against insertion by off-path -attackers. An endpoint MUST authenticate the contents of a Version Negotiation -packet if it attempts a different QUIC version as a result.¶
-This document makes no request of IANA.¶
-There are several traits of QUIC version 1 [QUIC-TRANSPORT] that are not -protected from observation, but are nonetheless considered to be changeable when -a new version is deployed.¶
-This section lists a sampling of incorrect assumptions that might be made based -on knowledge of QUIC version 1. Some of these statements are not even true for -QUIC version 1. This is not an exhaustive list; it is intended to be -illustrative only.¶
-Any and all of the following statements can be false for a given QUIC -version:¶
-Internet-Draft | -QPACK | -December 2020 | -
Krasic, et al. | -Expires 13 June 2021 | -[Page] | -
This specification defines QPACK, a compression format for efficiently -representing HTTP fields, to be used in HTTP/3. This is a variation of HPACK -compression that seeks to reduce head-of-line blocking.¶
-Discussion of this draft takes place on the QUIC working group mailing list -(quic@ietf.org), which is archived at -https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
-Working Group information can be found at https://github.com/quicwg; source -code and issues list for this draft can be found at -https://github.com/quicwg/base-drafts/labels/-qpack.¶
-- This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79.¶
-- Internet-Drafts are working documents of the Internet Engineering Task - Force (IETF). Note that other groups may also distribute working - documents as Internet-Drafts. The list of current Internet-Drafts is - at https://datatracker.ietf.org/drafts/current/.¶
-- Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress."¶
-- This Internet-Draft will expire on 13 June 2021.¶
-- Copyright (c) 2020 IETF Trust and the persons identified as the - document authors. All rights reserved.¶
-- This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with - respect to this document. Code Components extracted from this - document must include Simplified BSD License text as described in - Section 4.e of the Trust Legal Provisions and are provided without - warranty as described in the Simplified BSD License.¶
-The QUIC transport protocol ([QUIC-TRANSPORT]) is designed to support HTTP -semantics, and its design subsumes many of the features of HTTP/2 -([RFC7540]). HTTP/2 uses HPACK ([RFC7541]) for compression of the header -and trailer sections. If HPACK were used for HTTP/3 ([HTTP3]), it would -induce head-of-line blocking for field sections due to built-in assumptions of a -total ordering across frames on all streams.¶
-QPACK reuses core concepts from HPACK, but is redesigned to allow correctness in -the presence of out-of-order delivery, with flexibility for implementations to -balance between resilience against head-of-line blocking and optimal compression -ratio. The design goals are to closely approach the compression ratio of HPACK -with substantially less head-of-line blocking under the same loss conditions.¶
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL -NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in BCP 14 [RFC2119] [RFC8174] when, and only when, they -appear in all capitals, as shown here.¶
-Definitions of terms that are used in this document:¶
-Metadata sent as part of an HTTP message. The term encompasses both header -and trailer fields. Colloquially, the term "headers" has often been used to -refer to HTTP header fields and trailer fields; this document uses "fields" -for generality.¶
-A name-value pair sent as part of an HTTP field section. See Section 5.4 -and Section 5.6 of [SEMANTICS].¶
-Data associated with a field name, composed from all field line values with -that field name in that section, concatenated together and separated with -commas.¶
-An ordered collection of HTTP field lines associated with an HTTP message. A -field section can contain multiple field lines with the same name. It can -also contain duplicate field lines. An HTTP message can include both header -field and trailer field sections.¶
-An instruction that represents a field line, possibly by reference to the -dynamic and static tables.¶
-An implementation that encodes field sections.¶
-An implementation that decodes encoded field sections.¶
-A unique index for each entry in the dynamic table.¶
-A reference point for relative and post-base indices. Representations that -reference dynamic table entries are relative to a Base.¶
-The total number of entries inserted in the dynamic table.¶
-QPACK is a name, not an acronym.¶
-Diagrams use the format described in Section 3.1 of [RFC2360], with the -following additional conventions:¶
-Indicates that x is A bits long¶
-Indicates that x uses the prefixed integer encoding defined in -Section 4.1.1, beginning with an A-bit prefix.¶
-Indicates that x is variable-length and extends to the end of the region.¶
-Like HPACK, QPACK uses two tables for associating field lines ("headers") to -indices. The static table (Section 3.1) is predefined and contains -common header field lines (some of them with an empty value). The dynamic table -(Section 3.2) is built up over the course of the connection and can -be used by the encoder to index both header and trailer field lines in the -encoded field sections.¶
-QPACK defines unidirectional streams for sending instructions from encoder to -decoder and vice versa.¶
-An encoder converts a header or trailer field section into a series of -representations by emitting either an indexed or a literal representation for -each field line in the list; see Section 4.5. Indexed -representations achieve high compression by replacing the literal name and -possibly the value with an index to either the static or dynamic table. -References to the static table and literal representations do not require any -dynamic state and never risk head-of-line blocking. References to the dynamic -table risk head-of-line blocking if the encoder has not received an -acknowledgement indicating the entry is available at the decoder.¶
-An encoder MAY insert any entry in the dynamic table it chooses; it is not -limited to field lines it is compressing.¶
-QPACK preserves the ordering of field lines within each field section. An -encoder MUST emit field representations in the order they appear in the input -field section.¶
-QPACK is designed to contain the more complex state tracking to the encoder, -while the decoder is relatively simple.¶
-Inserting entries into the dynamic table might not be possible if the table -contains entries that cannot be evicted.¶
-A dynamic table entry cannot be evicted immediately after insertion, even if it -has never been referenced. Once the insertion of a dynamic table entry has been -acknowledged and there are no outstanding references to the entry in -unacknowledged representations, the entry becomes evictable. Note that -references on the encoder stream never preclude the eviction of an entry, -because those references are guaranteed to be processed before the instruction -evicting the entry.¶
-If the dynamic table does not contain enough room for a new entry without -evicting other entries, and the entries that would be evicted are not -evictable, the encoder MUST NOT insert that entry into the dynamic table -(including duplicates of existing entries). In order to avoid this, an encoder -that uses the dynamic table has to keep track of each dynamic table entry -referenced by each field section until those representations are acknowledged by -the decoder; see Section 4.4.1.¶
-To ensure that the encoder is not prevented from adding new entries, the encoder -can avoid referencing entries that are close to eviction. Rather than -reference such an entry, the encoder can emit a Duplicate instruction -(Section 4.3.4), and reference the duplicate instead.¶
-Determining which entries are too close to eviction to reference is an encoder -preference. One heuristic is to target a fixed amount of available space in the -dynamic table: either unused space or space that can be reclaimed by evicting -non-blocking entries. To achieve this, the encoder can maintain a draining -index, which is the smallest absolute index (Section 3.2.4) in the dynamic table -that it will emit a reference for. As new entries are inserted, the encoder -increases the draining index to maintain the section of the table that it will -not reference. If the encoder does not create new references to entries with an -absolute index lower than the draining index, the number of unacknowledged -references to those entries will eventually become zero, allowing them to be -evicted.¶
-Because QUIC does not guarantee order between data on different streams, a -decoder might encounter a representation that references a dynamic table entry -that it has not yet received.¶
-Each encoded field section contains a Required Insert Count (Section 4.5.1), -the lowest possible value for the Insert Count with which the field section can -be decoded. For a field section encoded using references to the dynamic table, -the Required Insert Count is one larger than the largest absolute index of all -referenced dynamic table entries. For a field section encoded with no references -to the dynamic table, the Required Insert Count is zero.¶
-When the decoder receives an encoded field section with a Required Insert Count -greater than its own Insert Count, the stream cannot be processed immediately, -and is considered "blocked"; see Section 2.2.1.¶
-The decoder specifies an upper bound on the number of streams that can be -blocked using the SETTINGS_QPACK_BLOCKED_STREAMS setting; see Section 5. -An encoder MUST limit the number of streams that could become blocked to the -value of SETTINGS_QPACK_BLOCKED_STREAMS at all times. If a decoder encounters -more blocked streams than it promised to support, it MUST treat this as a -connection error of type QPACK_DECOMPRESSION_FAILED.¶
-Note that the decoder might not become blocked on every stream that risks -becoming blocked.¶
-An encoder can decide whether to risk having a stream become blocked. If -permitted by the value of SETTINGS_QPACK_BLOCKED_STREAMS, compression efficiency -can often be improved by referencing dynamic table entries that are still in -transit, but if there is loss or reordering the stream can become blocked at the -decoder. An encoder can avoid the risk of blocking by only referencing dynamic -table entries that have been acknowledged, but this could mean using literals. -Since literals make the encoded field section larger, this can result in the -encoder becoming blocked on congestion or flow control limits.¶
-Writing instructions on streams that are limited by flow control can produce -deadlocks.¶
-A decoder might stop issuing flow control credit on the stream that carries an -encoded field section until the necessary updates are received on the encoder -stream. If the granting of flow control credit on the encoder stream (or the -connection as a whole) depends on the consumption and release of data on the -stream carrying the encoded field section, a deadlock might result.¶
-More generally, a stream containing a large instruction can become deadlocked if -the decoder withholds flow control credit until the instruction is completely -received.¶
-To avoid these deadlocks, an encoder SHOULD avoid writing an instruction unless -sufficient stream and connection flow control credit is available for the entire -instruction.¶
-The Known Received Count is the total number of dynamic table insertions and -duplications acknowledged by the decoder. The encoder tracks the Known Received -Count in order to identify which dynamic table entries can be referenced without -potentially blocking a stream. The decoder tracks the Known Received Count in -order to be able to send Insert Count Increment instructions.¶
-A Section Acknowledgement instruction (Section 4.4.1) implies that -the decoder has received all dynamic table state necessary to decode the field -section. If the Required Insert Count of the acknowledged field section is -greater than the current Known Received Count, Known Received Count is updated -to the value of the Required Insert Count.¶
-An Insert Count Increment instruction (Section 4.4.3) increases the -Known Received Count by its Increment parameter. See Section 2.2.2.3 for -guidance.¶
-As in HPACK, the decoder processes a series of representations and emits the -corresponding field sections. It also processes instructions received on the -encoder stream that modify the dynamic table. Note that encoded field sections -and encoder stream instructions arrive on separate streams. This is unlike -HPACK, where encoded field sections (header blocks) can contain instructions -that modify the dynamic table, and there is no dedicated stream of HPACK -instructions.¶
-The decoder MUST emit field lines in the order their representations appear in -the encoded field section.¶
-Upon receipt of an encoded field section, the decoder examines the Required -Insert Count. When the Required Insert Count is less than or equal to the -decoder's Insert Count, the field section can be processed immediately. -Otherwise, the stream on which the field section was received becomes blocked.¶
-While blocked, encoded field section data SHOULD remain in the blocked stream's -flow control window. A stream becomes unblocked when the Insert Count becomes -greater than or equal to the Required Insert Count for all encoded field -sections the decoder has started reading from the stream.¶
-When processing encoded field sections, the decoder expects the Required Insert -Count to equal the lowest possible value for the Insert Count with which the -field section can be decoded, as prescribed in Section 2.1.2. If it -encounters a Required Insert Count smaller than expected, it MUST treat this as -a connection error of type QPACK_DECOMPRESSION_FAILED; see -Section 2.2.3. If it encounters a Required Insert Count larger than -expected, it MAY treat this as a connection error of type -QPACK_DECOMPRESSION_FAILED.¶
-The decoder signals the following events by emitting decoder instructions -(Section 4.4) on the decoder stream.¶
-After the decoder finishes decoding a field section encoded using -representations containing dynamic table references, it MUST emit a Section -Acknowledgement instruction (Section 4.4.1). A stream may carry -multiple field sections in the case of intermediate responses, trailers, and -pushed requests. The encoder interprets each Section Acknowledgement -instruction as acknowledging the earliest unacknowledged field section -containing dynamic table references sent on the given stream.¶
-When an endpoint receives a stream reset before the end of a stream or before -all encoded field sections are processed on that stream, or when it abandons -reading of a stream, it generates a Stream Cancellation instruction; see -Section 4.4.2. This signals to the encoder that all references to the -dynamic table on that stream are no longer outstanding. A decoder with a -maximum dynamic table capacity (Section 3.2.3) equal to -zero MAY omit sending Stream Cancellations, because the encoder cannot have any -dynamic table references. An encoder cannot infer from this instruction that -any updates to the dynamic table have been received.¶
-The Section Acknowledgement and Stream Cancellation instructions permit the -encoder to remove references to entries in the dynamic table. When an entry -with absolute index lower than the Known Received Count has zero references, -then it is considered evictable; see Section 2.1.1.¶
-After receiving new table entries on the encoder stream, the decoder chooses -when to emit Insert Count Increment instructions; see -Section 4.4.3. Emitting this instruction after adding each new -dynamic table entry will provide the timeliest feedback to the encoder, but -could be redundant with other decoder feedback. By delaying an Insert Count -Increment instruction, the decoder might be able to coalesce multiple Insert -Count Increment instructions, or replace them entirely with Section -Acknowledgements; see Section 4.4.1. However, delaying too long -may lead to compression inefficiencies if the encoder waits for an entry to be -acknowledged before using it.¶
-If the decoder encounters a reference in a field line representation to a -dynamic table entry that has already been evicted or that has an absolute -index greater than or equal to the declared Required Insert Count -(Section 4.5.1), it MUST treat this as a connection error of type -QPACK_DECOMPRESSION_FAILED.¶
-If the decoder encounters a reference in an encoder instruction to a dynamic -table entry that has already been evicted, it MUST treat this as a connection -error of type QPACK_ENCODER_STREAM_ERROR.¶
-Unlike in HPACK, entries in the QPACK static and dynamic tables are addressed -separately. The following sections describe how entries in each table are -addressed.¶
-The static table consists of a predefined list of field lines, each of which has -a fixed index over time. Its entries are defined in Appendix A.¶
-All entries in the static table have a name and a value. However, values can be -empty (that is, have a length of 0). Each entry is identified by a unique -index.¶
-Note that the QPACK static table is indexed from 0, whereas the HPACK static -table is indexed from 1.¶
-When the decoder encounters an invalid static table index in a field line -representation it MUST treat this as a connection error of type -QPACK_DECOMPRESSION_FAILED. If this index is received on the encoder stream, -this MUST be treated as a connection error of type QPACK_ENCODER_STREAM_ERROR.¶
-The dynamic table consists of a list of field lines maintained in first-in, -first-out order. Each HTTP/3 endpoint holds a dynamic table that is initially -empty. Entries are added by encoder instructions received on the encoder -stream; see Section 4.3.¶
-The dynamic table can contain duplicate entries (i.e., entries with the same -name and same value). Therefore, duplicate entries MUST NOT be treated as an -error by the decoder.¶
-Dynamic table entries can have empty values.¶
-The size of the dynamic table is the sum of the size of its entries.¶
-The size of an entry is the sum of its name's length in bytes, its value's -length in bytes, and 32. The size of an entry is calculated using the length of -its name and value without Huffman encoding applied.¶
-The encoder sets the capacity of the dynamic table, which serves as the upper -limit on its size. The initial capacity of the dynamic table is zero. The -encoder sends a Set Dynamic Table Capacity instruction -(Section 4.3.1) with a non-zero capacity to begin using the dynamic -table.¶
-Before a new entry is added to the dynamic table, entries are evicted from the -end of the dynamic table until the size of the dynamic table is less than or -equal to (table capacity - size of new entry). The encoder MUST NOT cause a -dynamic table entry to be evicted unless that entry is evictable; see -Section 2.1.1. The new entry is then added to the table. It is an -error if the encoder attempts to add an entry that is larger than the dynamic -table capacity; the decoder MUST treat this as a connection error of type -QPACK_ENCODER_STREAM_ERROR.¶
-A new entry can reference an entry in the dynamic table that will be evicted -when adding this new entry into the dynamic table. Implementations are -cautioned to avoid deleting the referenced name or value if the referenced entry -is evicted from the dynamic table prior to inserting the new entry.¶
-Whenever the dynamic table capacity is reduced by the encoder -(Section 4.3.1), entries are evicted from the end of the dynamic -table until the size of the dynamic table is less than or equal to the new table -capacity. This mechanism can be used to completely clear entries from the -dynamic table by setting a capacity of 0, which can subsequently be restored.¶
-To bound the memory requirements of the decoder, the decoder limits the maximum -value the encoder is permitted to set for the dynamic table capacity. In -HTTP/3, this limit is determined by the value of -SETTINGS_QPACK_MAX_TABLE_CAPACITY sent by the decoder; see Section 5. -The encoder MUST NOT set a dynamic table capacity that exceeds this maximum, but -it can choose to use a lower dynamic table capacity; see -Section 4.3.1.¶
-For clients using 0-RTT data in HTTP/3, the server's maximum table capacity is -the remembered value of the setting, or zero if the value was not previously -sent. When the client's 0-RTT value of the SETTING is zero, the server MAY set -it to a non-zero value in its SETTINGS frame. If the remembered value is -non-zero, the server MUST send the same non-zero value in its SETTINGS frame. If -it specifies any other value, or omits SETTINGS_QPACK_MAX_TABLE_CAPACITY from -SETTINGS, the encoder must treat this as a connection error of type -QPACK_DECODER_STREAM_ERROR.¶
-For HTTP/3 servers and HTTP/3 clients when 0-RTT is not attempted or is -rejected, the maximum table capacity is 0 until the encoder processes a SETTINGS -frame with a non-zero value of SETTINGS_QPACK_MAX_TABLE_CAPACITY.¶
-When the maximum table capacity is zero, the encoder MUST NOT insert entries -into the dynamic table, and MUST NOT send any encoder instructions on the -encoder stream.¶
-Each entry possesses an absolute index that is fixed for the lifetime of that -entry. The first entry inserted has an absolute index of "0"; indices increase -by one with each insertion.¶
-Relative indices begin at zero and increase in the opposite direction from the -absolute index. Determining which entry has a relative index of "0" depends on -the context of the reference.¶
-In encoder instructions (Section 4.3), a relative index of "0" -refers to the most recently inserted value in the dynamic table. Note that this -means the entry referenced by a given relative index will change while -interpreting instructions on the encoder stream.¶
- -Unlike in encoder instructions, relative indices in field line representations -are relative to the Base at the beginning of the encoded field section; see -Section 4.5.1. This ensures that references are stable even if encoded field -sections and dynamic table updates are processed out of order.¶
-In a field line representation, a relative index of "0" refers to the entry with -absolute index equal to Base - 1.¶
- -Post-Base indices are used in field line representations for entries with -absolute indices greater than or equal to Base, starting at 0 for the entry with -absolute index equal to Base, and increasing in the same direction as the -absolute index.¶
-Post-Base indices allow an encoder to process a field section in a single pass -and include references to entries added while processing this (or other) field -sections.¶
- -The prefixed integer from Section 5.1 of [RFC7541] is used heavily throughout -this document. The format from [RFC7541] is used unmodified. Note, however, -that QPACK uses some prefix sizes not actually used in HPACK.¶
-QPACK implementations MUST be able to decode integers up to and including 62 -bits long.¶
-The string literal defined by Section 5.2 of [RFC7541] is also used throughout. -This string format includes optional Huffman encoding.¶
-HPACK defines string literals to begin on a byte boundary. They begin with a -single bit flag, denoted as 'H' in this document (indicating whether the string -is Huffman-coded), followed by the Length encoded as a 7-bit prefix integer, and -finally Length bytes of data. When Huffman encoding is enabled, the Huffman -table from Appendix B of [RFC7541] is used without modification and Length -indicates the size of the string after encoding.¶
-This document expands the definition of string literals by permitting them to -begin other than on a byte boundary. An "N-bit prefix string literal" begins -mid-byte, with the first (8-N) bits allocated to a previous field. The string -uses one bit for the Huffman flag, followed by the Length encoded as an -(N-1)-bit prefix integer. The prefix size, N, can have a value between 2 and 8 -inclusive. The remainder of the string literal is unmodified.¶
-A string literal without a prefix length noted is an 8-bit prefix string literal -and follows the definitions in [RFC7541] without modification.¶
-QPACK defines two unidirectional stream types:¶
-HTTP/3 endpoints contain a QPACK encoder and decoder. Each endpoint MUST -initiate at most one encoder stream and at most one decoder stream. Receipt of a -second instance of either stream type MUST be treated as a connection error of -type H3_STREAM_CREATION_ERROR. These streams MUST NOT be closed. Closure of -either unidirectional stream type MUST be treated as a connection error of type -H3_CLOSED_CRITICAL_STREAM.¶
-An endpoint MAY avoid creating an encoder stream if it will not be used (for -example if its encoder does not wish to use the dynamic table, or if the maximum -size of the dynamic table permitted by the peer is zero).¶
-An endpoint MAY avoid creating a decoder stream if its decoder sets the maximum -capacity of the dynamic table to zero.¶
-An endpoint MUST allow its peer to create an encoder stream and a decoder stream -even if the connection's settings prevent their use.¶
-An encoder sends encoder instructions on the encoder stream to set the capacity -of the dynamic table and add dynamic table entries. Instructions adding table -entries can use existing entries to avoid transmitting redundant information. -The name can be transmitted as a reference to an existing entry in the static or -the dynamic table or as a string literal. For entries that already exist in -the dynamic table, the full entry can also be used by reference, creating a -duplicate entry.¶
-An encoder informs the decoder of a change to the dynamic table capacity using -an instruction that begins with the '001' three-bit pattern. This is followed -by the new dynamic table capacity represented as an integer with a 5-bit prefix; -see Section 4.1.1.¶
-The new capacity MUST be lower than or equal to the limit described in -Section 3.2.3. In HTTP/3, this limit is the value of the -SETTINGS_QPACK_MAX_TABLE_CAPACITY parameter (Section 5) received from -the decoder. The decoder MUST treat a new dynamic table capacity value that -exceeds this limit as a connection error of type QPACK_ENCODER_STREAM_ERROR.¶
-Reducing the dynamic table capacity can cause entries to be evicted; see -Section 3.2.2. This MUST NOT cause the eviction of entries that are not -evictable; see Section 2.1.1. Changing the capacity of the dynamic -table is not acknowledged as this instruction does not insert an entry.¶
-An encoder adds an entry to the dynamic table where the field name matches the -field name of an entry stored in the static or the dynamic table using an -instruction that starts with the '1' one-bit pattern. The second ('T') bit -indicates whether the reference is to the static or dynamic table. The 6-bit -prefix integer (Section 4.1.1) that follows is used to locate the table -entry for the field name. When T=1, the number represents the static table -index; when T=0, the number is the relative index of the entry in the dynamic -table.¶
-The field name reference is followed by the field value represented as a string -literal; see Section 4.1.2.¶
- -An encoder adds an entry to the dynamic table where both the field name and the -field value are represented as string literals using an instruction that starts -with the '01' two-bit pattern.¶
-This is followed by the name represented as a 6-bit prefix string literal, and -the value represented as an 8-bit prefix string literal; see -Section 4.1.2.¶
- -An encoder duplicates an existing entry in the dynamic table using an -instruction that begins with the '000' three-bit pattern. This is followed by -the relative index of the existing entry represented as an integer with a 5-bit -prefix; see Section 4.1.1.¶
-The existing entry is re-inserted into the dynamic table without resending -either the name or the value. This is useful to avoid adding a reference to an -older entry, which might block inserting new entries.¶
-A decoder sends decoder instructions on the decoder stream to inform the encoder -about the processing of field sections and table updates to ensure consistency -of the dynamic table.¶
-After processing an encoded field section whose declared Required Insert Count -is not zero, the decoder emits a Section Acknowledgement instruction. The -instruction begins with the '1' one-bit pattern, followed by the field -section's associated stream ID encoded as a 7-bit prefix integer; see -Section 4.1.1.¶
-This instruction is used as described in Section 2.1.4 and -in Section 2.2.2.¶
-If an encoder receives a Section Acknowledgement instruction referring to a -stream on which every encoded field section with a non-zero Required Insert -Count has already been acknowledged, this MUST be treated as a connection error -of type QPACK_DECODER_STREAM_ERROR.¶
-The Section Acknowledgement instruction might increase the Known Received Count; -see Section 2.1.4.¶
-When a stream is reset or reading is abandoned, the decoder emits a Stream -Cancellation instruction. The instruction begins with the '01' two-bit -pattern, followed by the stream ID of the affected stream encoded as a -6-bit prefix integer.¶
-This instruction is used as described in Section 2.2.2.¶
-The Insert Count Increment instruction begins with the '00' two-bit pattern, -followed by the Increment encoded as a 6-bit prefix integer. This instruction -increases the Known Received Count (Section 2.1.4) by the value of -the Increment parameter. The decoder should send an Increment value that -increases the Known Received Count to the total number of dynamic table -insertions and duplications processed so far.¶
-An encoder that receives an Increment field equal to zero, or one that increases -the Known Received Count beyond what the encoder has sent MUST treat this as a -connection error of type QPACK_DECODER_STREAM_ERROR.¶
-An encoded field section consists of a prefix and a possibly empty sequence of -representations defined in this section. Each representation corresponds to a -single field line. These representations reference the static table or the -dynamic table in a particular state, but do not modify that state.¶
-Encoded field sections are carried in frames on streams defined by the enclosing -protocol.¶
-Each encoded field section is prefixed with two integers. The Required Insert -Count is encoded as an integer with an 8-bit prefix using the encoding described -in Section 4.5.1.1. The Base is encoded as a sign bit ('S') and a Delta Base value -with a 7-bit prefix; see Section 4.5.1.2.¶
-Required Insert Count identifies the state of the dynamic table needed to -process the encoded field section. Blocking decoders use the Required Insert -Count to determine when it is safe to process the rest of the field section.¶
-The encoder transforms the Required Insert Count as follows before encoding:¶
-- if ReqInsertCount == 0: - EncInsertCount = 0 - else: - EncInsertCount = (ReqInsertCount mod (2 * MaxEntries)) + 1 -¶ -
Here MaxEntries
is the maximum number of entries that the dynamic table can
-have. The smallest entry has empty name and value strings and has the size of
-32. Hence MaxEntries
is calculated as¶
- MaxEntries = floor( MaxTableCapacity / 32 ) -¶ -
MaxTableCapacity
is the maximum capacity of the dynamic table as specified by
-the decoder; see Section 3.2.3.¶
This encoding limits the length of the prefix on long-lived connections.¶
-The decoder can reconstruct the Required Insert Count using an algorithm such as -the following. If the decoder encounters a value of EncodedInsertCount that -could not have been produced by a conformant encoder, it MUST treat this as a -connection error of type QPACK_DECOMPRESSION_FAILED.¶
-TotalNumberOfInserts is the total number of inserts into the decoder's dynamic -table.¶
-- FullRange = 2 * MaxEntries - if EncodedInsertCount == 0: - ReqInsertCount = 0 - else: - if EncodedInsertCount > FullRange: - Error - MaxValue = TotalNumberOfInserts + MaxEntries - - # MaxWrapped is the largest possible value of - # ReqInsertCount that is 0 mod 2*MaxEntries - MaxWrapped = floor(MaxValue / FullRange) * FullRange - ReqInsertCount = MaxWrapped + EncodedInsertCount - 1 - - # If ReqInsertCount exceeds MaxValue, the Encoder's value - # must have wrapped one fewer time - if ReqInsertCount > MaxValue: - if ReqInsertCount <= FullRange: - Error - ReqInsertCount -= FullRange - - # Value of 0 must be encoded as 0. - if ReqInsertCount == 0: - Error -¶ -
For example, if the dynamic table is 100 bytes, then the Required Insert Count -will be encoded modulo 6. If a decoder has received 10 inserts, then an encoded -value of 4 indicates that the Required Insert Count is 9 for the field section.¶
-The Base is used to resolve references in the dynamic table as described in -Section 3.2.5.¶
-To save space, the Base is encoded relative to the Required Insert Count using a -one-bit sign ('S') and the Delta Base value. A sign bit of 0 indicates that the -Base is greater than or equal to the value of the Required Insert Count; the -decoder adds the value of Delta Base to the Required Insert Count to determine -the value of the Base. A sign bit of 1 indicates that the Base is less than the -Required Insert Count; the decoder subtracts the value of Delta Base from the -Required Insert Count and also subtracts one to determine the value of the Base. -That is:¶
-- if S == 0: - Base = ReqInsertCount + DeltaBase - else: - Base = ReqInsertCount - DeltaBase - 1 -¶ -
A single-pass encoder determines the Base before encoding a field section. If -the encoder inserted entries in the dynamic table while encoding the field -section and is referencing them, Required Insert Count will be greater than the -Base, so the encoded difference is negative and the sign bit is set to 1. If -the field section was not encoded using representations that reference the most -recent entry in the table and did not insert any new entries, the Base will be -greater than the Required Insert Count, so the delta will be positive and the -sign bit is set to 0.¶
-An encoder that produces table updates before encoding a field section might set -Base to the value of Required Insert Count. In such case, both the sign bit and -the Delta Base will be set to zero.¶
-A field section that was encoded without references to the dynamic table can use -any value for the Base; setting Delta Base to zero is one of the most efficient -encodings.¶
-For example, with a Required Insert Count of 9, a decoder receives an S bit of 1 -and a Delta Base of 2. This sets the Base to 6 and enables post-base indexing -for three entries. In this example, a relative index of 1 refers to the 5th -entry that was added to the table; a post-base index of 1 refers to the 8th -entry.¶
-An indexed field line representation identifies an entry in the static table, -or an entry in the dynamic table with an absolute index less than the value of -the Base.¶
- -This representation starts with the '1' 1-bit pattern, followed by the 'T' bit -indicating whether the reference is into the static or dynamic table. The 6-bit -prefix integer (Section 4.1.1) that follows is used to locate the -table entry for the field line. When T=1, the number represents the static -table index; when T=0, the number is the relative index of the entry in the -dynamic table.¶
-An indexed field line with post-base index representation identifies an entry -in the dynamic table with an absolute index greater than or equal to the value -of the Base.¶
- -This representation starts with the '0001' 4-bit pattern. This is followed by -the post-base index (Section 3.2.6) of the matching field line, represented as -an integer with a 4-bit prefix; see Section 4.1.1.¶
-A literal field line with name reference representation encodes a field line -where the field name matches the field name of an entry in the static table, or -the field name of an entry in the dynamic table with an absolute index less than -the value of the Base.¶
- -This representation starts with the '01' two-bit pattern. The following bit, -'N', indicates whether an intermediary is permitted to add this field line to -the dynamic table on subsequent hops. When the 'N' bit is set, the encoded field -line MUST always be encoded with a literal representation. In particular, when a -peer sends a field line that it received represented as a literal field line -with the 'N' bit set, it MUST use a literal representation to forward this field -line. This bit is intended for protecting field values that are not to be put -at risk by compressing them; see Section 7 for more details.¶
-The fourth ('T') bit indicates whether the reference is to the static or dynamic -table. The 4-bit prefix integer (Section 4.1.1) that follows is used to -locate the table entry for the field name. When T=1, the number represents the -static table index; when T=0, the number is the relative index of the entry in -the dynamic table.¶
-Only the field name is taken from the dynamic table entry; the field value is -encoded as an 8-bit prefix string literal; see Section 4.1.2.¶
-A literal field line with post-base name reference representation encodes a -field line where the field name matches the field name of a dynamic table entry -with an absolute index greater than or equal to the value of the Base.¶
- -This representation starts with the '0000' four-bit pattern. The fifth bit is -the 'N' bit as described in Section 4.5.4. This is followed by a -post-base index of the dynamic table entry (Section 3.2.6) encoded as an -integer with a 3-bit prefix; see Section 4.1.1.¶
-Only the field name is taken from the dynamic table entry; the field value is -encoded as an 8-bit prefix string literal; see Section 4.1.2.¶
-The literal field line with literal name representation encodes a -field name and a field value as string literals.¶
- -This representation begins with the '001' three-bit pattern. The fourth bit is -the 'N' bit as described in Section 4.5.4. The name follows, -represented as a 4-bit prefix string literal, then the value, represented as an -8-bit prefix string literal; see Section 4.1.2.¶
-QPACK defines two settings for the HTTP/3 SETTINGS frame:¶
-The default value is zero. See Section 3.2 for usage. This is -the equivalent of the SETTINGS_HEADER_TABLE_SIZE from HTTP/2.¶
-The default value is zero. See Section 2.1.2.¶
-The following error codes are defined for HTTP/3 to indicate failures of -QPACK that prevent the stream or connection from continuing:¶
-The decoder failed to interpret an encoded field section and is not able to -continue decoding that field section.¶
-The decoder failed to interpret an encoder instruction received on the -encoder stream.¶
-The encoder failed to interpret a decoder instruction received on the -decoder stream.¶
-This section describes potential areas of security concern with QPACK:¶
-QPACK reduces the encoded size of field sections by exploiting the redundancy -inherent in protocols like HTTP. The ultimate goal of this is to reduce the -amount of data that is required to send HTTP requests or responses.¶
-The compression context used to encode header and trailer fields can be probed -by an attacker who can both define fields to be encoded and transmitted and -observe the length of those fields once they are encoded. When an attacker can -do both, they can adaptively modify requests in order to confirm guesses about -the dynamic table state. If a guess is compressed into a shorter length, the -attacker can observe the encoded length and infer that the guess was correct.¶
-This is possible even over the Transport Layer Security Protocol (TLS, see -[TLS]), because while TLS provides confidentiality protection for -content, it only provides a limited amount of protection for the length of that -content.¶
-Padding schemes only provide limited protection against an attacker with these -capabilities, potentially only forcing an increased number of guesses to learn -the length associated with a given guess. Padding schemes also work directly -against compression by increasing the number of bits that are transmitted.¶
-Attacks like CRIME ([CRIME]) demonstrated the existence of these general -attacker capabilities. The specific attack exploited the fact that DEFLATE -([RFC1951]) removes redundancy based on prefix matching. This permitted the -attacker to confirm guesses a character at a time, reducing an exponential-time -attack into a linear-time attack.¶
-QPACK mitigates but does not completely prevent attacks modeled on CRIME -([CRIME]) by forcing a guess to match an entire field line, rather than -individual characters. An attacker can only learn whether a guess is correct or -not, so is reduced to a brute force guess for the field values associated with a -given field name.¶
-The viability of recovering specific field values therefore depends on the -entropy of values. As a result, values with high entropy are unlikely to be -recovered successfully. However, values with low entropy remain vulnerable.¶
-Attacks of this nature are possible any time that two mutually distrustful -entities control requests or responses that are placed onto a single HTTP/3 -connection. If the shared QPACK compressor permits one entity to add entries to -the dynamic table, and the other to access those entries to encode chosen field -lines, then the attacker can learn the state of the table by observing the -length of the encoded output.¶
-Having requests or responses from mutually distrustful entities occurs when an -intermediary either:¶
-Web browsers also need to assume that requests made on the same connection by -different web origins ([RFC6454]) are made by mutually distrustful entities.¶
-Users of HTTP that require confidentiality for header or trailer fields can use -values with entropy sufficient to make guessing infeasible. However, this is -impractical as a general solution because it forces all users of HTTP to take -steps to mitigate attacks. It would impose new constraints on how HTTP is used.¶
-Rather than impose constraints on users of HTTP, an implementation of QPACK can -instead constrain how compression is applied in order to limit the potential for -dynamic table probing.¶
-An ideal solution segregates access to the dynamic table based on the entity -that is constructing the message. Field values that are added to the table are -attributed to an entity, and only the entity that created a particular value can -extract that value.¶
-To improve compression performance of this option, certain entries might be -tagged as being public. For example, a web browser might make the values of the -Accept-Encoding header field available in all requests.¶
-An encoder without good knowledge of the provenance of field values might -instead introduce a penalty for many field lines with the same field name and -different values. This penalty could cause a large number of attempts to guess -a field value to result in the field not being compared to the dynamic table -entries in future messages, effectively preventing further guesses.¶
-Simply removing entries corresponding to the field from the dynamic table can -be ineffectual if the attacker has a reliable way of causing values to be -reinstalled. For example, a request to load an image in a web browser -typically includes the Cookie header field (a potentially highly valued target -for this sort of attack), and web sites can easily force an image to be -loaded, thereby refreshing the entry in the dynamic table.¶
-This response might be made inversely proportional to the length of the -field value. Disabling access to the dynamic table for a given field name might -occur for shorter values more quickly or with higher probability than for longer -values.¶
-Implementations can also choose to protect sensitive fields by not compressing -them and instead encoding their value as literals.¶
-Refusing to insert a field line into the dynamic table is only effective if -doing so is avoided on all hops. The never-indexed literal bit (see -Section 4.5.4) can be used to signal to intermediaries that a -particular value was intentionally sent as a literal.¶
-An intermediary MUST NOT re-encode a value that uses a literal representation -with the 'N' bit set with another representation that would index it. If QPACK -is used for re-encoding, a literal representation with the 'N' bit set MUST be -used. If HPACK is used for re-encoding, the never-indexed literal -representation (see Section 6.2.3 of [RFC7541]) MUST be used.¶
-The choice to mark that a field value should never be indexed depends on several -factors. Since QPACK does not protect against guessing an entire field value, -short or low-entropy values are more readily recovered by an adversary. -Therefore, an encoder might choose not to index values with low entropy.¶
-An encoder might also choose not to index values for fields that are considered -to be highly valuable or sensitive to recovery, such as the Cookie or -Authorization header fields.¶
-On the contrary, an encoder might prefer indexing values for fields that have -little or no value if they were exposed. For instance, a User-Agent header field -does not commonly vary between requests and is sent to any server. In that case, -confirmation that a particular User-Agent value has been used provides little -value.¶
-Note that these criteria for deciding to use a never-indexed literal -representation will evolve over time as new attacks are discovered.¶
-There is no currently known attack against a static Huffman encoding. A study -has shown that using a static Huffman encoding table created an information -leakage, however this same study concluded that an attacker could not take -advantage of this information leakage to recover any meaningful amount of -information (see [PETAL]).¶
-An attacker can try to cause an endpoint to exhaust its memory. QPACK is -designed to limit both the peak and stable amounts of memory allocated by an -endpoint.¶
-The amount of memory used by the encoder is limited by the protocol using -QPACK through the definition of the maximum size of the dynamic table, and the -maximum number of blocking streams. In HTTP/3, these values are controlled by -the decoder through the settings parameters SETTINGS_QPACK_MAX_TABLE_CAPACITY -and SETTINGS_QPACK_BLOCKED_STREAMS, respectively (see -Section 3.2.3 and Section 2.1.2). The limit on the -size of the dynamic table takes into account the size of the data stored in the -dynamic table, plus a small allowance for overhead. The limit on the number of -blocked streams is only a proxy for the maximum amount of memory required by the -decoder. The actual maximum amount of memory will depend on how much memory the -decoder uses to track each blocked stream.¶
-A decoder can limit the amount of state memory used for the dynamic table by -setting an appropriate value for the maximum size of the dynamic table. In -HTTP/3, this is realized by setting an appropriate value for the -SETTINGS_QPACK_MAX_TABLE_CAPACITY parameter. An encoder can limit the amount of -state memory it uses by signaling a lower dynamic table size than the decoder -allows (see Section 3.2.2).¶
-A decoder can limit the amount of state memory used for blocked streams by -setting an appropriate value for the maximum number of blocked streams. In -HTTP/3, this is realized by setting an appropriate value for the -QPACK_BLOCKED_STREAMS parameter. Streams which risk becoming blocked consume no -additional state memory on the encoder.¶
-An encoder allocates memory to track all dynamic table references in -unacknowledged field sections. An implementation can directly limit the amount -of state memory by only using as many references to the dynamic table as it -wishes to track; no signaling to the decoder is required. However, limiting -references to the dynamic table will reduce compression effectiveness.¶
-The amount of temporary memory consumed by an encoder or decoder can be limited -by processing field lines sequentially. A decoder implementation does not need -to retain a complete list of field lines while decoding a field section. An -encoder implementation does not need to retain a complete list of field lines -while encoding a field section if it is using a single-pass algorithm. Note -that it might be necessary for an application to retain a complete list of field -lines for other reasons; even if QPACK does not force this to occur, application -constraints might make this necessary.¶
-While the negotiated limit on the dynamic table size accounts for much of the -memory that can be consumed by a QPACK implementation, data that cannot be -immediately sent due to flow control is not affected by this limit. -Implementations should limit the size of unsent data, especially on the decoder -stream where flexibility to choose what to send is limited. Possible responses -to an excess of unsent data might include limiting the ability of the peer to -open new streams, reading only from the encoder stream, or closing the -connection.¶
-An implementation of QPACK needs to ensure that large values for integers, long -encoding for integers, or long string literals do not create security -weaknesses.¶
-An implementation has to set a limit for the values it accepts for integers, as -well as for the encoded length; see Section 4.1.1. In the same way, it -has to set a limit to the length it accepts for string literals; see -Section 4.1.2.¶
-This document specifies two settings. The entries in the following table are -registered in the "HTTP/3 Settings" registry established in [HTTP3].¶
-Setting Name | -Code | -Specification | -Default | -
---|---|---|---|
QPACK_MAX_TABLE_CAPACITY | -0x1 | -- Section 5 - | -0 | -
QPACK_BLOCKED_STREAMS | -0x7 | -- Section 5 - | -0 | -
This document specifies two stream types. The entries in the following table are -registered in the "HTTP/3 Stream Type" registry established in [HTTP3].¶
-Stream Type | -Code | -Specification | -Sender | -
---|---|---|---|
QPACK Encoder Stream | -0x02 | -- Section 4.2 - | -Both | -
QPACK Decoder Stream | -0x03 | -- Section 4.2 - | -Both | -
This document specifies three error codes. The entries in the following table -are registered in the "HTTP/3 Error Code" registry established in [HTTP3].¶
-Name | -Code | -Description | -Specification | -
---|---|---|---|
QPACK_DECOMPRESSION_FAILED | -0x200 | -Decoding of a field section failed | -- Section 6 - | -
QPACK_ENCODER_STREAM_ERROR | -0x201 | -Error on the encoder stream | -- Section 6 - | -
QPACK_DECODER_STREAM_ERROR | -0x202 | -Error on the decoder stream | -- Section 6 - | -
This table was generated by analyzing actual Internet traffic in 2018 and -including the most common header fields, after filtering out some unsupported -and non-standard values. Due to this methodology, some of the entries may be -inconsistent or appear multiple times with similar but not identical values. The -order of the entries is optimized to encode the most common header fields with -the smallest number of bytes.¶
-Index | -Name | -Value | -
---|---|---|
0 | -:authority | -- |
1 | -:path | -/ | -
2 | -age | -0 | -
3 | -content-disposition | -- |
4 | -content-length | -0 | -
5 | -cookie | -- |
6 | -date | -- |
7 | -etag | -- |
8 | -if-modified-since | -- |
9 | -if-none-match | -- |
10 | -last-modified | -- |
11 | -link | -- |
12 | -location | -- |
13 | -referer | -- |
14 | -set-cookie | -- |
15 | -:method | -CONNECT | -
16 | -:method | -DELETE | -
17 | -:method | -GET | -
18 | -:method | -HEAD | -
19 | -:method | -OPTIONS | -
20 | -:method | -POST | -
21 | -:method | -PUT | -
22 | -:scheme | -http | -
23 | -:scheme | -https | -
24 | -:status | -103 | -
25 | -:status | -200 | -
26 | -:status | -304 | -
27 | -:status | -404 | -
28 | -:status | -503 | -
29 | -accept | -*/* | -
30 | -accept | -application/dns-message | -
31 | -accept-encoding | -gzip, deflate, br | -
32 | -accept-ranges | -bytes | -
33 | -access-control-allow-headers | -cache-control | -
34 | -access-control-allow-headers | -content-type | -
35 | -access-control-allow-origin | -* | -
36 | -cache-control | -max-age=0 | -
37 | -cache-control | -max-age=2592000 | -
38 | -cache-control | -max-age=604800 | -
39 | -cache-control | -no-cache | -
40 | -cache-control | -no-store | -
41 | -cache-control | -public, max-age=31536000 | -
42 | -content-encoding | -br | -
43 | -content-encoding | -gzip | -
44 | -content-type | -application/dns-message | -
45 | -content-type | -application/javascript | -
46 | -content-type | -application/json | -
47 | -content-type | -application/x-www-form-urlencoded | -
48 | -content-type | -image/gif | -
49 | -content-type | -image/jpeg | -
50 | -content-type | -image/png | -
51 | -content-type | -text/css | -
52 | -content-type | -text/html; charset=utf-8 | -
53 | -content-type | -text/plain | -
54 | -content-type | -text/plain;charset=utf-8 | -
55 | -range | -bytes=0- | -
56 | -strict-transport-security | -max-age=31536000 | -
57 | -strict-transport-security | -max-age=31536000; includesubdomains | -
58 | -strict-transport-security | -max-age=31536000; includesubdomains; preload | -
59 | -vary | -accept-encoding | -
60 | -vary | -origin | -
61 | -x-content-type-options | -nosniff | -
62 | -x-xss-protection | -1; mode=block | -
63 | -:status | -100 | -
64 | -:status | -204 | -
65 | -:status | -206 | -
66 | -:status | -302 | -
67 | -:status | -400 | -
68 | -:status | -403 | -
69 | -:status | -421 | -
70 | -:status | -425 | -
71 | -:status | -500 | -
72 | -accept-language | -- |
73 | -access-control-allow-credentials | -FALSE | -
74 | -access-control-allow-credentials | -TRUE | -
75 | -access-control-allow-headers | -* | -
76 | -access-control-allow-methods | -get | -
77 | -access-control-allow-methods | -get, post, options | -
78 | -access-control-allow-methods | -options | -
79 | -access-control-expose-headers | -content-length | -
80 | -access-control-request-headers | -content-type | -
81 | -access-control-request-method | -get | -
82 | -access-control-request-method | -post | -
83 | -alt-svc | -clear | -
84 | -authorization | -- |
85 | -content-security-policy | -script-src 'none'; object-src 'none'; base-uri 'none' | -
86 | -early-data | -1 | -
87 | -expect-ct | -- |
88 | -forwarded | -- |
89 | -if-range | -- |
90 | -origin | -- |
91 | -purpose | -prefetch | -
92 | -server | -- |
93 | -timing-allow-origin | -* | -
94 | -upgrade-insecure-requests | -1 | -
95 | -user-agent | -- |
96 | -x-forwarded-for | -- |
97 | -x-frame-options | -deny | -
98 | -x-frame-options | -sameorigin | -
The following examples represent a series of exchanges between an encoder and a -decoder. The exchanges are designed to exercise most QPACK instructions, and -highlight potentially common patterns and their impact on dynamic table state. -The encoder sends three encoded field sections containing one field line each, -as wells as two speculative inserts that are not referenced.¶
-The state of the encoder's dynamic table is shown, along with its -current size. Each entry is shown with the Absolute Index of the entry (Abs), -the current number of outstanding encoded field sections with references to that -entry (Ref), along with the name and value. Entries above the 'acknowledged' -line have been acknowledged by the decoder.¶
-The encoder sends an encoded field section containing a literal representation -of a field with a static name reference.¶
--Data | Interpretation - | Encoder's Dynamic Table - -Stream: 0 -0000 | Required Insert Count = 0, Base = 0 -510b 2f69 6e64 6578 | Literal Field Line with Name Reference -2e68 746d 6c | Static Table, Index=1 - | (:path=/index.html) - - Abs Ref Name Value - ^-- acknowledged --^ - Size=0 -¶ -
The encoder sets the dynamic table capacity, inserts a header with a dynamic -name reference, then sends a potentially blocking, encoded field section -referencing this new entry. The decoder acknowledges processing the encoded -field section, which implicitly acknowledges all dynamic table insertions up to -the Required Insert Count.¶
--Stream: Encoder -3fbd01 | Set Dynamic Table Capacity=220 -c00f 7777 772e 6578 | Insert With Name Reference -616d 706c 652e 636f | Static Table, Index=0 -6d | (:authority=www.example.com) -c10c 2f73 616d 706c | Insert With Name Reference -652f 7061 7468 | Static Table, Index=1 - | (:path=/sample/path) - - Abs Ref Name Value - ^-- acknowledged --^ - 1 0 :authority www.example.com - 2 0 :path /sample/path - Size=106 - -Stream: 4 -0381 | Required Insert Count = 2, Base = 0 -10 | Indexed Field Line With Post-Base Index - | Absolute Index = Base(0) + Index(0) + 1 = 1 - | (:authority=www.example.com) -11 | Indexed Field Line With Post-Base Index - | Absolute Index = Base(0) + Index(1) + 1 = 2 - | (:path=/sample/path) - - Abs Ref Name Value - ^-- acknowledged --^ - 1 1 :authority www.example.com - 2 1 :path /sample/path - Size=106 - -Stream: Decoder -84 | Section Acknowledgement (stream=4) - - Abs Ref Name Value - 1 0 :authority www.example.com - 2 0 :path /sample/path - ^-- acknowledged --^ - Size=106 -¶ -
The encoder inserts a header into the dynamic table with a literal name. -The decoder acknowledges receipt of the entry. The encoder does not send -any encoded field sections.¶
--Stream: Encoder -4a63 7573 746f 6d2d | Insert With Literal Name -6b65 790c 6375 7374 | (custom-key=custom-value) -6f6d 2d76 616c 7565 | - - Abs Ref Name Value - 1 0 :authority www.example.com - 2 0 :path /sample/path - ^-- acknowledged --^ - 3 0 custom-key custom-value - Size=160 - -Stream: Decoder -01 | Insert Count Increment (1) - - Abs Ref Name Value - 1 0 :authority www.example.com - 2 0 :path /sample/path - 3 0 custom-key custom-value - ^-- acknowledged --^ - Size=160 - -¶ -
The encoder duplicates an existing entry in the dynamic table, then sends an -encoded field section referencing the dynamic table entries including the -duplicated entry. The packet containing the encoder stream data is delayed. -Before the packet arrives, the decoder cancels the stream and notifies the -encoder that the encoded field section was not processed.¶
--Stream: Encoder -02 | Duplicate (Relative Index=2) - - Abs Ref Name Value - 1 0 :authority www.example.com - 2 0 :path /sample/path - 3 0 custom-key custom-value - ^-- acknowledged --^ - 4 0 :authority www.example.com - Size=217 - -Stream: 8 -0500 | Required Insert Count = 4, Base = 4 -80 | Indexed Field Line, Dynamic Table - | Absolute Index = Base(4) - Index(0) = 4 - | (:authority=www.example.com) -c1 | Indexed Field Line, Static Table Index = 1 - | (:path=/) -81 | Indexed Field Line, Dynamic Table - | Absolute Index = Base(4) - Index(1) = 3 - | (custom-key=custom-value) - - Abs Ref Name Value - 1 0 :authority www.example.com - 2 0 :path /sample/path - 3 1 custom-key custom-value - ^-- acknowledged --^ - 4 1 :authority www.example.com - Size=217 - -Stream: Decoder -48 | Stream Cancellation (Stream=8) - - Abs Ref Name Value - 1 0 :authority www.example.com - 2 0 :path /sample/path - 3 0 custom-key custom-value - ^-- acknowledged --^ - 4 0 :authority www.example.com - Size=215 - -¶ -
The encoder inserts another header into the dynamic table, which evicts the -oldest entry. The encoder does not send any encoded field sections.¶
--Stream: Encoder -810d 6375 7374 6f6d | Insert With Name Reference -2d76 616c 7565 32 | Dynamic Table, Absolute Index=2 - | (custom-key=custom-value2) - - Abs Ref Name Value - 2 0 :path /sample/path - 3 0 custom-key custom-value - ^-- acknowledged --^ - 4 0 :authority www.example.com - 5 0 custom-key custom-value2 - Size=215 -¶ -
Pseudo-code for single pass encoding, excluding handling of duplicates, -non-blocking mode, available encoder stream flow control and reference tracking.¶
--base = dynamicTable.getInsertCount() -requiredInsertCount = 0 -for line in field_lines: - staticIndex = staticTable.findIndex(line) - if staticIndex is not None: - encodeIndexReference(streamBuffer, staticIndex) - continue - - dynamicIndex = dynamicTable.findIndex(line) - if dynamicIndex is None: - # No matching entry. Either insert+index or encode literal - staticNameIndex = staticTable.findName(line.name) - if staticNameIndex is None: - dynamicNameIndex = dynamicTable.findName(line.name) - - if shouldIndex(line) and dynamicTable.canIndex(line): - encodeInsert(encoderBuffer, staticNameIndex, - dynamicNameIndex, line) - dynamicIndex = dynamicTable.add(line) - - if dynamicIndex is None: - # Could not index it, literal - if dynamicNameIndex is not None: - # Encode literal with dynamic name, possibly above base - encodeDynamicLiteral(streamBuffer, dynamicNameIndex, - base, line) - requiredInsertCount = max(requiredInsertCount, - dynamicNameIndex) - else: - # Encodes a literal with a static name or literal name - encodeLiteral(streamBuffer, staticNameIndex, line) - else: - # Dynamic index reference - assert(dynamicIndex is not None) - requiredInsertCount = max(requiredInsertCount, dynamicIndex) - # Encode dynamicIndex, possibly above base - encodeDynamicIndexReference(streamBuffer, dynamicIndex, base) - -# encode the prefix -if requiredInsertCount == 0: - encodeIndexReference(prefixBuffer, 0, 0, 8) - encodeIndexReference(prefixBuffer, 0, 0, 7) -else: - wireRIC = ( - requiredInsertCount - % (2 * getMaxEntries(maxTableCapacity)) - ) + 1; - encodeInteger(prefixBuffer, 0x00, wireRIC, 8) - if base >= requiredInsertCount: - encodeInteger(prefixBuffer, 0, base - requiredInsertCount, 7) - else: - encodeInteger(prefixBuffer, 0x80, - requiredInsertCount - base - 1, 7) - -return encoderBuffer, prefixBuffer + streamBuffer -¶ -
Editorial changes only¶
-Editorial changes only¶
-Editorial changes only¶
-No changes¶
-Added security considerations¶
-No changes¶
-Editorial changes only¶
-Editorial changes only¶
-Editorial changes only¶
-The IETF QUIC Working Group received an enormous amount of support from many -people.¶
-The compression design team did substantial work exploring the problem space and -influencing the initial draft. The contributions of design team members Roberto -Peon, Martin Thomson, and Dmitri Tikhonov are gratefully acknowledged.¶
-The following people also provided substantial contributions to this document:¶
-奥 一穂 (Kazuho Oku)¶
-This draft draws heavily on the text of [RFC7541]. The indirect input of -those authors is also gratefully acknowledged.¶
-Buck's contribution was supported by Google during his employment there.¶
-A portion of Mike's contribution was supported by Microsoft during his -employment there.¶
-Internet-Draft | -QUIC Loss Detection | -December 2020 | -
Iyengar & Swett | -Expires 13 June 2021 | -[Page] | -
This document describes loss detection and congestion control mechanisms for -QUIC.¶
-Discussion of this draft takes place on the QUIC working group mailing list -(quic@ietf.org), which is archived at -https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
-Working Group information can be found at https://github.com/quicwg; source -code and issues list for this draft can be found at -https://github.com/quicwg/base-drafts/labels/-recovery.¶
-- This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79.¶
-- Internet-Drafts are working documents of the Internet Engineering Task - Force (IETF). Note that other groups may also distribute working - documents as Internet-Drafts. The list of current Internet-Drafts is - at https://datatracker.ietf.org/drafts/current/.¶
-- Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress."¶
-- This Internet-Draft will expire on 13 June 2021.¶
-- Copyright (c) 2020 IETF Trust and the persons identified as the - document authors. All rights reserved.¶
-- This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with - respect to this document. Code Components extracted from this - document must include Simplified BSD License text as described in - Section 4.e of the Trust Legal Provisions and are provided without - warranty as described in the Simplified BSD License.¶
-QUIC is a secure general-purpose transport protocol, described in -[QUIC-TRANSPORT]). This document describes loss detection and congestion -control mechanisms for QUIC.¶
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL -NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in BCP 14 [RFC2119] [RFC8174] when, and only when, they -appear in all capitals, as shown here.¶
-Definitions of terms that are used in this document:¶
-All frames other than ACK, PADDING, and CONNECTION_CLOSE are considered -ack-eliciting.¶
-Packets that contain ack-eliciting frames elicit an ACK from the receiver -within the maximum acknowledgement delay and are called ack-eliciting packets.¶
-Packets are considered in-flight when they are ack-eliciting or contain a -PADDING frame, and they have been sent but are not acknowledged, declared -lost, or discarded along with old keys.¶
-All transmissions in QUIC are sent with a packet-level header, which indicates -the encryption level and includes a packet sequence number (referred to below as -a packet number). The encryption level indicates the packet number space, as -described in [QUIC-TRANSPORT]. Packet numbers never repeat within a packet -number space for the lifetime of a connection. Packet numbers are sent in -monotonically increasing order within a space, preventing ambiguity.¶
-This design obviates the need for disambiguating between transmissions and -retransmissions; this eliminates significant complexity from QUIC's -interpretation of TCP loss detection mechanisms.¶
-QUIC packets can contain multiple frames of different types. The recovery -mechanisms ensure that data and frames that need reliable delivery are -acknowledged or declared lost and sent in new packets as necessary. The types -of frames contained in a packet affect recovery and congestion control logic:¶
-Readers familiar with TCP's loss detection and congestion control will find -algorithms here that parallel well-known TCP ones. However, protocol differences -between QUIC and TCP contribute to algorithmic differences. These protocol -differences are briefly described below.¶
-QUIC uses separate packet number spaces for each encryption level, except 0-RTT -and all generations of 1-RTT keys use the same packet number space. Separate -packet number spaces ensures acknowledgement of packets sent with one level of -encryption will not cause spurious retransmission of packets sent with a -different encryption level. Congestion control and round-trip time (RTT) -measurement are unified across packet number spaces.¶
-TCP conflates transmission order at the sender with delivery order at the -receiver, resulting in the retransmission ambiguity problem -([RETRANSMISSION]). QUIC separates transmission order from delivery order: -packet numbers indicate transmission order, and delivery order is determined by -the stream offsets in STREAM frames.¶
-QUIC's packet number is strictly increasing within a packet number space, -and directly encodes transmission order. A higher packet number signifies -that the packet was sent later, and a lower packet number signifies that -the packet was sent earlier. When a packet containing ack-eliciting -frames is detected lost, QUIC includes necessary frames in a new packet -with a new packet number, removing ambiguity about which packet is -acknowledged when an ACK is received. Consequently, more accurate RTT -measurements can be made, spurious retransmissions are trivially detected, and -mechanisms such as Fast Retransmit can be applied universally, based only on -packet number.¶
-This design point significantly simplifies loss detection mechanisms for QUIC. -Most TCP mechanisms implicitly attempt to infer transmission ordering based on -TCP sequence numbers - a non-trivial task, especially when TCP timestamps are -not available.¶
-QUIC starts a loss epoch when a packet is lost. The loss epoch ends when any -packet sent after the start of the epoch is acknowledged. TCP waits for the gap -in the sequence number space to be filled, and so if a segment is lost multiple -times in a row, the loss epoch may not end for several round trips. Because both -should reduce their congestion windows only once per epoch, QUIC will do it once -for every round trip that experiences loss, while TCP may only do it once across -multiple round trips.¶
-QUIC ACKs contain information that is similar to TCP SACK, but QUIC does not -allow any acknowledged packet to be reneged, greatly simplifying implementations -on both sides and reducing memory pressure on the sender.¶
-QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In high loss -environments, this speeds recovery, reduces spurious retransmits, and ensures -forward progress without relying on timeouts.¶
-QUIC endpoints measure the delay incurred between when a packet is received and -when the corresponding acknowledgment is sent, allowing a peer to maintain a -more accurate round-trip time estimate; see Section 13.2 of [QUIC-TRANSPORT].¶
-QUIC uses a probe timeout (PTO; see Section 6.2), with a timer based on TCP's RTO -computation. QUIC's PTO includes the peer's maximum expected acknowledgement -delay instead of using a fixed minimum timeout. QUIC does not collapse the -congestion window until persistent congestion (Section 7.6) is -declared, unlike TCP, which collapses the congestion window upon expiry of an -RTO. Instead of collapsing the congestion window and declaring everything -in-flight lost, QUIC allows probe packets to temporarily exceed the congestion -window whenever the timer expires.¶
-In doing this, QUIC avoids unnecessary congestion window reductions, obviating -the need for correcting mechanisms such as F-RTO ([RFC5682]). Since QUIC does -not collapse the congestion window on a PTO expiration, a QUIC sender is not -limited from sending more in-flight packets after a PTO expiration if it still -has available congestion window. This occurs when a sender is -application-limited and the PTO timer expires. This is more aggressive than -TCP's RTO mechanism when application-limited, but identical when not -application-limited.¶
-A single packet loss at the tail does not indicate persistent congestion, so -QUIC specifies a time-based definition to ensure one or more packets are sent -prior to a dramatic decrease in congestion window; see -Section 7.6.¶
-TCP uses a minimum congestion window of one packet. However, loss of -that single packet means that the sender needs to waiting for a PTO -(Section 6.2) to recover, which can be much longer than a round-trip time. -Sending a single ack-eliciting packet also increases the chances of incurring -additional latency when a receiver delays its acknowledgement.¶
-QUIC therefore recommends that the minimum congestion window be two -packets. While this increases network load, it is considered safe, since the -sender will still reduce its sending rate exponentially under persistent -congestion (Section 6.2).¶
-At a high level, an endpoint measures the time from when a packet was sent to -when it is acknowledged as a round-trip time (RTT) sample. The endpoint uses -RTT samples and peer-reported host delays (see Section 13.2 of -[QUIC-TRANSPORT]) to generate a statistical description of the network -path's RTT. An endpoint computes the following three values for each path: -the minimum value observed over the lifetime of the path (min_rtt), an -exponentially-weighted moving average (smoothed_rtt), and the mean deviation -(referred to as "variation" in the rest of this document) in the observed RTT -samples (rttvar).¶
-An endpoint generates an RTT sample on receiving an ACK frame that meets the -following two conditions:¶
-The RTT sample, latest_rtt, is generated as the time elapsed since the largest -acknowledged packet was sent:¶
--latest_rtt = ack_time - send_time_of_largest_acked -¶ -
An RTT sample is generated using only the largest acknowledged packet in the -received ACK frame. This is because a peer reports acknowledgment delays for -only the largest acknowledged packet in an ACK frame. While the reported -acknowledgment delay is not used by the RTT sample measurement, it is used to -adjust the RTT sample in subsequent computations of smoothed_rtt and rttvar -(Section 5.3).¶
-To avoid generating multiple RTT samples for a single packet, an ACK frame -SHOULD NOT be used to update RTT estimates if it does not newly acknowledge the -largest acknowledged packet.¶
-An RTT sample MUST NOT be generated on receiving an ACK frame that does not -newly acknowledge at least one ack-eliciting packet. A peer usually does not -send an ACK frame when only non-ack-eliciting packets are received. Therefore -an ACK frame that contains acknowledgements for only non-ack-eliciting packets -could include an arbitrarily large ACK Delay value. Ignoring -such ACK frames avoids complications in subsequent smoothed_rtt and rttvar -computations.¶
-A sender might generate multiple RTT samples per RTT when multiple ACK frames -are received within an RTT. As suggested in [RFC6298], doing so might result -in inadequate history in smoothed_rtt and rttvar. Ensuring that RTT estimates -retain sufficient history is an open research question.¶
-min_rtt is the sender's estimate of the minimum RTT observed for a given network -path. In this document, min_rtt is used by loss detection to reject implausibly -small rtt samples.¶
-min_rtt MUST be set to the latest_rtt on the first RTT sample. min_rtt MUST be -set to the lesser of min_rtt and latest_rtt (Section 5.1) on all other -samples.¶
-An endpoint uses only locally observed times in computing the min_rtt and does -not adjust for acknowledgment delays reported by the peer. Doing so allows the -endpoint to set a lower bound for the smoothed_rtt based entirely on what it -observes (see Section 5.3), and limits potential underestimation due to -erroneously-reported delays by the peer.¶
-The RTT for a network path may change over time. If a path's actual RTT -decreases, the min_rtt will adapt immediately on the first low sample. If the -path's actual RTT increases however, the min_rtt will not adapt to it, allowing -future RTT samples that are smaller than the new RTT to be included in -smoothed_rtt.¶
-Endpoints SHOULD set the min_rtt to the newest RTT sample after persistent -congestion is established. This is to allow a connection to reset its estimate -of min_rtt and smoothed_rtt (Section 5.3) after a disruptive network event, -and because it is possible that an increase in path delay resulted in persistent -congestion being incorrectly declared.¶
-Endpoints MAY re-establish the min_rtt at other times in the connection, such as -when traffic volume is low and an acknowledgement is received with a low -acknowledgement delay. Implementations SHOULD NOT refresh the min_rtt -value too often, since the actual minimum RTT of the path is not -frequently observable.¶
-smoothed_rtt is an exponentially-weighted moving average of an endpoint's RTT -samples, and rttvar is the variation in the RTT samples, estimated using a -mean variation.¶
-The calculation of smoothed_rtt uses RTT samples after adjusting them for -acknowledgement delays. These delays are decoded from the ACK Delay field of -ACK frames as described in Section 19.3 of [QUIC-TRANSPORT].¶
-The peer might report acknowledgement delays that are larger than the peer's -max_ack_delay during the handshake (Section 13.2.1 of [QUIC-TRANSPORT]). To -account for this, the endpoint SHOULD ignore max_ack_delay until the handshake -is confirmed (Section 4.1.2 of [QUIC-TLS]). When they occur, these large -acknowledgement delays are likely to be non-repeating and limited to the -handshake. The endpoint can therefore use them without limiting them to the -max_ack_delay, avoiding unnecessary inflation of the RTT estimate.¶
-Note however that a large acknowledgement delay can result in a substantially -inflated smoothed_rtt, if there is either an error in the peer's reporting of -the acknowledgement delay or in the endpoint's min_rtt estimate. Therefore, -prior to handshake confirmation, an endpoint MAY ignore RTT samples if adjusting -the RTT sample for acknowledgement delay causes the sample to be less than the -min_rtt.¶
-After the handshake is confirmed, any acknowledgement delays reported by the -peer that are greater than the peer's max_ack_delay are attributed to -unintentional but potentially repeating delays, such as scheduler latency at the -peer or loss of previous acknowledgements. Excess delays could also be due to -a non-compliant receiver. Therefore, these extra delays are considered -effectively part of path delay and incorporated into the RTT estimate.¶
-Therefore, when adjusting an RTT sample using peer-reported acknowledgement -delays, an endpoint:¶
-Additionally, an endpoint might postpone the processing of acknowledgements when -the corresponding decryption keys are not immediately available. For example, a -client might receive an acknowledgement for a 0-RTT packet that it cannot -decrypt because 1-RTT packet protection keys are not yet available to it. In -such cases, an endpoint SHOULD subtract such local delays from its RTT sample -until the handshake is confirmed.¶
-Similar to [RFC6298], smoothed_rtt and rttvar are computed as follows.¶
-An endpoint initializes the RTT estimator during connection establishment and -when the estimator is reset during connection migration; see Section 9.4 of -[QUIC-TRANSPORT]. Before any RTT samples are available for a new path or when -the estimator is reset, the estimator is initialized using the initial RTT; see -Section 6.2.2.¶
-smoothed_rtt and rttvar are initialized as follows, where kInitialRtt contains -the initial RTT value:¶
--smoothed_rtt = kInitialRtt -rttvar = kInitialRtt / 2 -¶ -
RTT samples for the network path are recorded in latest_rtt; see -Section 5.1. On the first RTT sample after initialization, the estimator is -reset using that sample. This ensures that the estimator retains no history of -past samples.¶
-On the first RTT sample after initialization, smoothed_rtt and rttvar are set as -follows:¶
--smoothed_rtt = latest_rtt -rttvar = latest_rtt / 2 -¶ -
On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows:¶
--ack_delay = decoded acknowledgement delay from ACK frame -if (handshake confirmed): - ack_delay = min(ack_delay, max_ack_delay) -adjusted_rtt = latest_rtt -if (min_rtt + ack_delay < latest_rtt): - adjusted_rtt = latest_rtt - ack_delay -smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt -rttvar_sample = abs(smoothed_rtt - adjusted_rtt) -rttvar = 3/4 * rttvar + 1/4 * rttvar_sample -¶ -
QUIC senders use acknowledgements to detect lost packets, and a probe -time out (see Section 6.2) to ensure acknowledgements are received. This section -provides a description of these algorithms.¶
-If a packet is lost, the QUIC transport needs to recover from that loss, such -as by retransmitting the data, sending an updated frame, or discarding the -frame. For more information, see Section 13.3 of [QUIC-TRANSPORT].¶
-Loss detection is separate per packet number space, unlike RTT measurement and -congestion control, because RTT and congestion control are properties of the -path, whereas loss detection also relies upon key availability.¶
-Acknowledgement-based loss detection implements the spirit of TCP's Fast -Retransmit ([RFC5681]), Early Retransmit ([RFC5827]), FACK ([FACK]), -SACK loss recovery ([RFC6675]), and RACK ([RACK]). This -section provides an overview of how these algorithms are implemented in QUIC.¶
-A packet is declared lost if it meets all the following conditions:¶
-The acknowledgement indicates that a packet sent later was delivered, and the -packet and time thresholds provide some tolerance for packet reordering.¶
-Spuriously declaring packets as lost leads to unnecessary retransmissions and -may result in degraded performance due to the actions of the congestion -controller upon detecting loss. Implementations can detect spurious -retransmissions and increase the reordering threshold in packets or time to -reduce future spurious retransmissions and loss events. Implementations with -adaptive time thresholds MAY choose to start with smaller initial reordering -thresholds to minimize recovery latency.¶
-The RECOMMENDED initial value for the packet reordering threshold -(kPacketThreshold) is 3, based on best practices for TCP loss detection -([RFC5681], [RFC6675]). In order to remain similar to TCP, -implementations SHOULD NOT use a packet threshold less than 3; see [RFC5681].¶
-Some networks may exhibit higher degrees of packet reordering, causing a sender -to detect spurious losses. Additionally, packet reordering could be more common -with QUIC than TCP, because network elements that could observe and reorder -TCP packets cannot do that for QUIC, because QUIC packet numbers are encrypted. -Algorithms that increase the reordering threshold after spuriously detecting -losses, such as RACK [RACK], have proven to be useful in TCP and are -expected to be at least as useful in QUIC.¶
-Once a later packet within the same packet number space has been acknowledged, -an endpoint SHOULD declare an earlier packet lost if it was sent a threshold -amount of time in the past. To avoid declaring packets as lost too early, this -time threshold MUST be set to at least the local timer granularity, as -indicated by the kGranularity constant. The time threshold is:¶
--max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) -¶ -
If packets sent prior to the largest acknowledged packet cannot yet be declared -lost, then a timer SHOULD be set for the remaining time.¶
-Using max(smoothed_rtt, latest_rtt) protects from the two following cases:¶
-The RECOMMENDED time threshold (kTimeThreshold), expressed as a round-trip time -multiplier, is 9/8. The RECOMMENDED value of the timer granularity -(kGranularity) is 1ms.¶
-TCP's RACK ([RACK]) specifies a slightly larger -threshold, equivalent to 5/4, for a similar purpose. Experience with QUIC shows -that 9/8 works well.¶
-Implementations MAY experiment with absolute thresholds, thresholds from -previous connections, adaptive thresholds, or including RTT variation. Smaller -thresholds reduce reordering resilience and increase spurious retransmissions, -and larger thresholds increase loss detection delay.¶
-A Probe Timeout (PTO) triggers sending one or two probe datagrams when -ack-eliciting packets are not acknowledged within the expected period of -time or the server may not have validated the client's address. A PTO enables -a connection to recover from loss of tail packets or acknowledgements.¶
-As with loss detection, the probe timeout is per packet number space. That is, a -PTO value is computed per packet number space.¶
-A PTO timer expiration event does not indicate packet loss and MUST NOT cause -prior unacknowledged packets to be marked as lost. When an acknowledgement -is received that newly acknowledges packets, loss detection proceeds as -dictated by packet and time threshold mechanisms; see Section 6.1.¶
-The PTO algorithm used in QUIC implements the reliability functions of -Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for -TCP [RFC5682]. The timeout computation is based on TCP's retransmission -timeout period [RFC6298].¶
-When an ack-eliciting packet is transmitted, the sender schedules a timer for -the PTO period as follows:¶
--PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay -¶ -
The PTO period is the amount of time that a sender ought to wait for an -acknowledgement of a sent packet. This time period includes the estimated -network roundtrip-time (smoothed_rtt), the variation in the estimate (4*rttvar), -and max_ack_delay, to account for the maximum time by which a receiver might -delay sending an acknowledgement.¶
-When the PTO is armed for Initial or Handshake packet number spaces, the -max_ack_delay in the PTO period computation is set to 0, since the peer is -expected to not delay these packets intentionally; see 13.2.1 of -[QUIC-TRANSPORT].¶
-The PTO period MUST be at least kGranularity, to avoid the timer expiring -immediately.¶
-When ack-eliciting packets in multiple packet number spaces are in flight, the -timer MUST be set to the earlier value of the Initial and Handshake packet -number spaces.¶
-An endpoint MUST NOT set its PTO timer for the application data packet number -space until the handshake is confirmed. Doing so prevents the endpoint from -retransmitting information in packets when either the peer does not yet have the -keys to process them or the endpoint does not yet have the keys to process their -acknowledgements. For example, this can happen when a client sends 0-RTT packets -to the server; it does so without knowing whether the server will be able to -decrypt them. Similarly, this can happen when a server sends 1-RTT packets -before confirming that the client has verified the server's certificate and can -therefore read these 1-RTT packets.¶
-A sender SHOULD restart its PTO timer every time an ack-eliciting packet is sent -or acknowledged, when the handshake is confirmed (Section 4.1.2 of -[QUIC-TLS]), or when Initial or Handshake keys are discarded (Section 4.9 of -[QUIC-TLS]). This ensures the PTO is always set based on the latest estimate -of the round-trip time and for the correct packet across packet number spaces.¶
-When a PTO timer expires, the PTO backoff MUST be increased, resulting in the -PTO period being set to twice its current value. The PTO backoff factor is reset -when an acknowledgement is received, except in the following case. A server -might take longer to respond to packets during the handshake than otherwise. To -protect such a server from repeated client probes, the PTO backoff is not reset -at a client that is not yet certain that the server has finished validating the -client's address. That is, a client does not reset the PTO backoff factor on -receiving acknowledgements in Initial packets.¶
-This exponential reduction in the sender's rate is important because -consecutive PTOs might be caused by loss of packets or acknowledgements due to -severe congestion. Even when there are ack-eliciting packets in-flight in -multiple packet number spaces, the exponential increase in probe timeout -occurs across all spaces to prevent excess load on the network. For example, -a timeout in the Initial packet number space doubles the length of the timeout -in the Handshake packet number space.¶
-The total length of time over which consecutive PTOs expire is limited by the -idle timeout.¶
-The PTO timer MUST NOT be set if a timer is set for time threshold -loss detection; see Section 6.1.2. A timer that is set for time -threshold loss detection will expire earlier than the PTO timer -in most cases and is less likely to spuriously retransmit data.¶
-Resumed connections over the same network MAY use the previous connection's -final smoothed RTT value as the resumed connection's initial RTT. When no -previous RTT is available, the initial RTT SHOULD be set to 333ms. This -results in handshakes starting with a PTO of 1 second, as recommended -for TCP's initial retransmission timeout; see Section 2 of [RFC6298].¶
-A connection MAY use the delay between sending a PATH_CHALLENGE and receiving a -PATH_RESPONSE to set the initial RTT (see kInitialRtt in -Appendix A.2) for a new path, but the delay SHOULD NOT be -considered an RTT sample.¶
-Initial packets and Handshake packets could be never acknowledged, but they are -removed from bytes in flight when the Initial and Handshake keys are discarded, -as described below in Section 6.4. When Initial or Handshake keys are -discarded, the PTO and loss detection timers MUST be reset, because discarding -keys indicates forward progress and the loss detection timer might have been set -for a now discarded packet number space.¶
-Until the server has validated the client's address on the path, the amount of -data it can send is limited to three times the amount of data received, -as specified in Section 8.1 of [QUIC-TRANSPORT]. If no additional data can be -sent, the server's PTO timer MUST NOT be armed until datagrams have been -received from the client, because packets sent on PTO count against the -anti-amplification limit. Note that the server could fail to validate the -client's address even if 0-RTT is accepted.¶
-Since the server could be blocked until more datagrams are received from the -client, it is the client's responsibility to send packets to unblock the server -until it is certain that the server has finished its address validation -(see Section 8 of [QUIC-TRANSPORT]). That is, the client MUST set the -probe timer if the client has not received an acknowledgement for one of its -Handshake packets and the handshake is not confirmed (see Section 4.1.2 of -[QUIC-TLS]), even if there are no packets in flight. When the PTO fires, -the client MUST send a Handshake packet if it has Handshake keys, otherwise it -MUST send an Initial packet in a UDP datagram with a payload of at least 1200 -bytes.¶
-When a server receives an Initial packet containing duplicate CRYPTO data, -it can assume the client did not receive all of the server's CRYPTO data sent -in Initial packets, or the client's estimated RTT is too small. When a -client receives Handshake or 1-RTT packets prior to obtaining Handshake keys, -it may assume some or all of the server's Initial packets were lost.¶
-To speed up handshake completion under these conditions, an endpoint MAY, for a -limited number of occasions per each connection, send a packet containing -unacknowledged CRYPTO data earlier than the PTO expiry, subject to the address -validation limits in Section 8.1 of [QUIC-TRANSPORT]. Doing so at most once -for each connection is adequate to quickly recover from a single packet loss. -Endpoints that do not cease retransmitting packets in response to -unauthenticated data risk creating an infinite exchange of packets.¶
-Endpoints can also use coalesced packets (see Section 12.2 of -[QUIC-TRANSPORT]) to ensure that each datagram elicits at least one -acknowledgement. For example, a client can coalesce an Initial packet -containing PING and PADDING frames with a 0-RTT data packet and a server can -coalesce an Initial packet containing a PING frame with one or more packets in -its first flight.¶
-When a PTO timer expires, a sender MUST send at least one ack-eliciting packet -in the packet number space as a probe. An endpoint MAY send up to two -full-sized datagrams containing ack-eliciting packets, to avoid an expensive -consecutive PTO expiration due to a single lost datagram or transmit data -from multiple packet number spaces. All probe packets sent on a PTO MUST be -ack-eliciting.¶
-In addition to sending data in the packet number space for which the timer -expired, the sender SHOULD send ack-eliciting packets from other packet -number spaces with in-flight data, coalescing packets if possible. This is -particularly valuable when the server has both Initial and Handshake data -in-flight or the client has both Handshake and Application Data in-flight, -because the peer might only have receive keys for one of the two packet number -spaces.¶
-If the sender wants to elicit a faster acknowledgement on PTO, it can skip a -packet number to eliminate the acknowledgment delay.¶
-When the PTO timer expires, an ack-eliciting packet MUST be sent. An endpoint -SHOULD include new data in this packet. Previously sent data MAY be sent if -no new data can be sent. Implementations MAY use alternative strategies for -determining the content of probe packets, including sending new or -retransmitted data based on the application's priorities.¶
-It is possible the sender has no new or previously-sent data to send. -As an example, consider the following sequence of events: new application data -is sent in a STREAM frame, deemed lost, then retransmitted in a new packet, -and then the original transmission is acknowledged. When there is no data to -send, the sender SHOULD send a PING or other ack-eliciting frame in a single -packet, re-arming the PTO timer.¶
-Alternatively, instead of sending an ack-eliciting packet, the sender MAY mark -any packets still in flight as lost. Doing so avoids sending an additional -packet, but increases the risk that loss is declared too aggressively, resulting -in an unnecessary rate reduction by the congestion controller.¶
-Consecutive PTO periods increase exponentially, and as a result, connection -recovery latency increases exponentially as packets continue to be dropped in -the network. Sending two packets on PTO expiration increases resilience to -packet drops, thus reducing the probability of consecutive PTO events.¶
-When the PTO timer expires multiple times and new data cannot be sent, -implementations must choose between sending the same payload every time -or sending different payloads. Sending the same payload may be simpler -and ensures the highest priority frames arrive first. Sending different -payloads each time reduces the chances of spurious retransmission.¶
-A Retry packet causes a client to send another Initial packet, effectively -restarting the connection process. A Retry packet indicates that the Initial -was received, but not processed. A Retry packet cannot be treated as an -acknowledgment, because it does not indicate that a packet was processed or -specify the packet number.¶
-Clients that receive a Retry packet reset congestion control and loss recovery -state, including resetting any pending timers. Other connection state, in -particular cryptographic handshake messages, is retained; see Section 17.2.5 of -[QUIC-TRANSPORT].¶
-The client MAY compute an RTT estimate to the server as the time period from -when the first Initial was sent to when a Retry or a Version Negotiation packet -is received. The client MAY use this value in place of its default for the -initial RTT estimate.¶
-When packet protection keys are discarded (see Section 4.9 of [QUIC-TLS]), -all packets that were sent with those keys can no longer be acknowledged because -their acknowledgements cannot be processed anymore. The sender MUST discard -all recovery state associated with those packets and MUST remove them from -the count of bytes in flight.¶
-Endpoints stop sending and receiving Initial packets once they start exchanging -Handshake packets; see Section 17.2.2.1 of [QUIC-TRANSPORT]. At this point, -recovery state for all in-flight Initial packets is discarded.¶
-When 0-RTT is rejected, recovery state for all in-flight 0-RTT packets is -discarded.¶
-If a server accepts 0-RTT, but does not buffer 0-RTT packets that arrive -before Initial packets, early 0-RTT packets will be declared lost, but that -is expected to be infrequent.¶
-It is expected that keys are discarded after packets encrypted with them would -be acknowledged or declared lost. However, Initial secrets are discarded as -soon as handshake keys are proven to be available to both client and server; -see Section 4.9.1 of [QUIC-TLS].¶
-This document specifies a sender-side congestion controller for QUIC similar to -TCP NewReno ([RFC6582]).¶
-The signals QUIC provides for congestion control are generic and are designed to -support different sender-side algorithms. A sender can unilaterally choose a -different algorithm to use, such as Cubic ([RFC8312]).¶
-If a sender uses a different controller than that specified in this document, -the chosen controller MUST conform to the congestion control guidelines -specified in Section 3.1 of [RFC8085].¶
-Similar to TCP, packets containing only ACK frames do not count towards bytes -in flight and are not congestion controlled. Unlike TCP, QUIC can detect the -loss of these packets and MAY use that information to adjust the congestion -controller or the rate of ACK-only packets being sent, but this document does -not describe a mechanism for doing so.¶
-The algorithm in this document specifies and uses the controller's congestion -window in bytes.¶
-An endpoint MUST NOT send a packet if it would cause bytes_in_flight (see -Appendix B.2) to be larger than the congestion window, unless the packet -is sent on a PTO timer expiration (see Section 6.2) or when entering recovery -(see Section 7.3.2).¶
-If a path has been validated to support ECN ([RFC3168], [RFC8311]), QUIC -treats a Congestion Experienced (CE) codepoint in the IP header as a signal of -congestion. This document specifies an endpoint's response when the -peer-reported ECN-CE count increases; see Section 13.4.2 of [QUIC-TRANSPORT].¶
-QUIC begins every connection in slow start with the congestion window set to -an initial value. Endpoints SHOULD use an initial congestion window of 10 times -the maximum datagram size (max_datagram_size), limited to the larger of 14720 -bytes or twice the maximum datagram size. This follows the analysis and -recommendations in [RFC6928], increasing the byte limit to account for the -smaller 8 byte overhead of UDP compared to the 20 byte overhead for TCP.¶
-If the maximum datagram size changes during the connection, the initial -congestion window SHOULD be recalculated with the new size. If the maximum -datagram size is decreased in order to complete the handshake, the -congestion window SHOULD be set to the new initial congestion window.¶
-Prior to validating the client's address, the server can be further limited by -the anti-amplification limit as specified in Section 8.1 of [QUIC-TRANSPORT]. -Though the anti-amplification limit can prevent the congestion window from -being fully utilized and therefore slow down the increase in congestion window, -it does not directly affect the congestion window.¶
-The minimum congestion window is the smallest value the congestion window can -decrease to as a response to loss, increase in the peer-reported ECN-CE count, -or persistent congestion. The RECOMMENDED value is 2 * max_datagram_size.¶
-The NewReno congestion controller described in this document has three -distinct states, as shown in Figure 1.¶
-These states and the transitions between them are described in subsequent -sections.¶
-A NewReno sender is in slow start any time the congestion window is below the -slow start threshold. A sender begins in slow start because the slow start -threshold is initialized to an infinite value.¶
-While a sender is in slow start, the congestion window increases by the number -of bytes acknowledged when each acknowledgment is processed. This results in -exponential growth of the congestion window.¶
-The sender MUST exit slow start and enter a recovery period when a packet is -lost or when the ECN-CE count reported by its peer increases.¶
-A sender re-enters slow start any time the congestion window is less than the -slow start threshold, which only occurs after persistent congestion is -declared.¶
-A NewReno sender enters a recovery period when it detects the loss of a packet -or the ECN-CE count reported by its peer increases. A sender that is already in -a recovery period stays in it and does not re-enter it.¶
-On entering a recovery period, a sender MUST set the slow start threshold to -half the value of the congestion window when loss is detected. The congestion -window MUST be set to the reduced value of the slow start threshold before -exiting the recovery period.¶
-Implementations MAY reduce the congestion window immediately upon entering a -recovery period or use other mechanisms, such as Proportional Rate Reduction -([PRR]), to reduce the congestion window more gradually. If the -congestion window is reduced immediately, a single packet can be sent prior to -reduction. This speeds up loss recovery if the data in the lost packet is -retransmitted and is similar to TCP as described in Section 5 of [RFC6675].¶
-The recovery period aims to limit congestion window reduction to once per round -trip. Therefore during a recovery period, the congestion window does not change -in response to new losses or increases in the ECN-CE count.¶
-A recovery period ends and the sender enters congestion avoidance when a packet -sent during the recovery period is acknowledged. This is slightly different -from TCP's definition of recovery, which ends when the lost segment that -started recovery is acknowledged ([RFC5681]).¶
-A NewReno sender is in congestion avoidance any time the congestion window is -at or above the slow start threshold and not in a recovery period.¶
-A sender in congestion avoidance uses an Additive Increase Multiplicative -Decrease (AIMD) approach that MUST limit the increase to the congestion window -to at most one maximum datagram size for each congestion window that is -acknowledged.¶
-The sender exits congestion avoidance and enters a recovery period when a -packet is lost or when the ECN-CE count reported by its peer increases.¶
-During the handshake, some packet protection keys might not be available when -a packet arrives and the receiver can choose to drop the packet. In particular, -Handshake and 0-RTT packets cannot be processed until the Initial packets -arrive and 1-RTT packets cannot be processed until the handshake completes. -Endpoints MAY ignore the loss of Handshake, 0-RTT, and 1-RTT packets that might -have arrived before the peer had packet protection keys to process those -packets. Endpoints MUST NOT ignore the loss of packets that were sent after -the earliest acknowledged packet in a given packet number space.¶
-Probe packets MUST NOT be blocked by the congestion controller. A sender MUST -however count these packets as being additionally in flight, since these packets -add network load without establishing packet loss. Note that sending probe -packets might cause the sender's bytes in flight to exceed the congestion window -until an acknowledgement is received that establishes loss or delivery of -packets.¶
-When a sender establishes loss of all packets sent over a long enough duration, -the network is considered to be experiencing persistent congestion.¶
-The persistent congestion duration is computed as follows:¶
--(smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay) * - kPersistentCongestionThreshold -¶ -
Unlike the PTO computation in Section 6.2, this duration includes the max_ack_delay -irrespective of the packet number spaces in which losses are established.¶
-This duration allows a sender to send as many packets before establishing -persistent congestion, including some in response to PTO expiration, as TCP does -with Tail Loss Probes ([RACK]) and a Retransmission Timeout ([RFC5681]).¶
-Larger values of kPersistentCongestionThreshold cause the sender to become less -responsive to persistent congestion in the network, which can result in -aggressive sending into a congested network. Too small a value can result in a -sender declaring persistent congestion unnecessarily, resulting in reduced -throughput for the sender.¶
-The RECOMMENDED value for kPersistentCongestionThreshold is 3, which results in -behavior that is approximately equivalent to a TCP sender declaring an RTO after -two TLPs.¶
-This design does not use consecutive PTO events to establish persistent -congestion, since application patterns impact PTO expirations. For example, a -sender that sends small amounts of data with silence periods between them -restarts the PTO timer every time it sends, potentially preventing the PTO timer -from expiring for a long period of time, even when no acknowledgments are being -received. The use of a duration enables a sender to establish persistent -congestion without depending on PTO expiration.¶
-A sender establishes persistent congestion after the receipt of an -acknowledgement if at least two ack-eliciting packets are declared lost, and:¶
-These two packets MUST be ack-eliciting, since a receiver is required to -acknowledge only ack-eliciting packets within its maximum ack delay; see Section -13.2 of [QUIC-TRANSPORT].¶
-The persistent congestion period SHOULD NOT start until there is at least one -RTT sample. Before the first RTT sample, a sender arms its PTO timer based on -the initial RTT (Section 6.2.2), which could be substantially larger than -the actual RTT. Requiring a prior RTT sample prevents a sender from establishing -persistent congestion with potentially too few probes.¶
-Since network congestion is not affected by packet number spaces, persistent -congestion SHOULD consider packets sent across packet number spaces. A sender -that does not have state for all packet number spaces or an implementation that -cannot compare send times across packet number spaces MAY use state for just the -packet number space that was acknowledged.¶
-When persistent congestion is declared, the sender's congestion window MUST be -reduced to the minimum congestion window (kMinimumWindow), similar to a TCP -sender's response on an RTO ([RFC5681]).¶
-The following example illustrates how a sender might establish persistent -congestion. Assume:¶
--smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay = 2 -kPersistentCongestionThreshold = 3 -¶ -
Consider the following sequence of events:¶
-Time | -Action | -
---|---|
t=0 | -Send packet #1 (app data) | -
t=1 | -Send packet #2 (app data) | -
t=1.2 | -Recv acknowledgement of #1 | -
t=2 | -Send packet #3 (app data) | -
t=3 | -Send packet #4 (app data) | -
t=4 | -Send packet #5 (app data) | -
t=5 | -Send packet #6 (app data) | -
t=6 | -Send packet #7 (app data) | -
t=8 | -Send packet #8 (PTO 1) | -
t=12 | -Send packet #9 (PTO 2) | -
t=12.2 | -Recv acknowledgement of #9 | -
Packets 2 through 8 are declared lost when the acknowledgement for packet 9 is -received at t = 12.2.¶
-The congestion period is calculated as the time between the oldest and newest -lost packets: 8 - 1 = 7. The persistent congestion duration is: 2 * 3 = 6. -Because the threshold was reached and because none of the packets between the -oldest and the newest lost packets were acknowledged, the network is considered -to have experienced persistent congestion.¶
-While this example shows PTO expiration, they are not required for persistent -congestion to be established.¶
-A sender SHOULD pace sending of all in-flight packets based on input from the -congestion controller.¶
-Sending multiple packets into the network without any delay between them creates -a packet burst that might cause short-term congestion and losses. Senders MUST -either use pacing or limit such bursts. Senders SHOULD limit bursts to the -initial congestion window; see Section 7.2. A sender with knowledge that -the network path to the receiver can absorb larger bursts MAY use a higher -limit.¶
-An implementation should take care to architect its congestion controller to -work well with a pacer. For instance, a pacer might wrap the congestion -controller and control the availability of the congestion window, or a pacer -might pace out packets handed to it by the congestion controller.¶
-Timely delivery of ACK frames is important for efficient loss recovery. Packets -containing only ACK frames SHOULD therefore not be paced, to avoid delaying -their delivery to the peer.¶
-Endpoints can implement pacing as they choose. A perfectly paced sender spreads -packets exactly evenly over time. For a window-based congestion controller, such -as the one in this document, that rate can be computed by averaging the -congestion window over the round-trip time. Expressed as a rate in bytes:¶
--rate = N * congestion_window / smoothed_rtt -¶ -
Or, expressed as an inter-packet interval:¶
--interval = smoothed_rtt * packet_size / congestion_window / N -¶ -
Using a value for N
that is small, but at least 1 (for example, 1.25) ensures
-that variations in round-trip time do not result in under-utilization of the
-congestion window.¶
Practical considerations, such as packetization, scheduling delays, and -computational efficiency, can cause a sender to deviate from this rate over time -periods that are much shorter than a round-trip time.¶
-One possible implementation strategy for pacing uses a leaky bucket algorithm, -where the capacity of the "bucket" is limited to the maximum burst size and the -rate the "bucket" fills is determined by the above function.¶
-When bytes in flight is smaller than the congestion window and sending is not -pacing limited, the congestion window is under-utilized. When this occurs, -the congestion window SHOULD NOT be increased in either slow start or -congestion avoidance. This can happen due to insufficient application data -or flow control limits.¶
-A sender that paces packets (see Section 7.7) might delay sending packets -and not fully utilize the congestion window due to this delay. A sender -SHOULD NOT consider itself application limited if it would have fully -utilized the congestion window without pacing delay.¶
-A sender MAY implement alternative mechanisms to update its congestion window -after periods of under-utilization, such as those proposed for TCP in -[RFC7661].¶
-Congestion control fundamentally involves the consumption of signals -- both -loss and ECN codepoints -- from unauthenticated entities. On-path attackers can -spoof or alter these signals. An attacker can cause endpoints to reduce their -sending rate by dropping packets, or alter send rate by changing ECN codepoints.¶
-Packets that carry only ACK frames can be heuristically identified by observing -packet size. Acknowledgement patterns may expose information about link -characteristics or application behavior. To reduce leaked information, -endpoints can bundle acknowledgments with other frames, or they can use PADDING -frames at a potential cost to performance.¶
-A receiver can misreport ECN markings to alter the congestion response of a -sender. Suppressing reports of ECN-CE markings could cause a sender to -increase their send rate. This increase could result in congestion and loss.¶
-A sender can detect suppression of reports by marking occasional packets that it -sends with an ECN-CE marking. If a packet sent with an ECN-CE marking is not -reported as having been CE marked when the packet is acknowledged, then the -sender can disable ECN for that path by not setting ECT codepoints in subsequent -packets sent on that path [RFC3168].¶
-Reporting additional ECN-CE markings will cause a sender to reduce their sending -rate, which is similar in effect to advertising reduced connection flow control -limits and so no advantage is gained by doing so.¶
-Endpoints choose the congestion controller that they use. Congestion controllers -respond to reports of ECN-CE by reducing their rate, but the response may vary. -Markings can be treated as equivalent to loss ([RFC3168]), but other -responses can be specified, such as ([RFC8511]) or ([RFC8311]).¶
-This document has no IANA actions.¶
-We now describe an example implementation of the loss detection mechanisms -described in Section 6.¶
-The pseudocode segments in this section are licensed as Code Components; see the -copyright notice.¶
-To correctly implement congestion control, a QUIC sender tracks every -ack-eliciting packet until the packet is acknowledged or lost. -It is expected that implementations will be able to access this information by -packet number and crypto context and store the per-packet fields -(Appendix A.1.1) for loss recovery and congestion control.¶
-After a packet is declared lost, the endpoint can still maintain state for it -for an amount of time to allow for packet reordering; see Section 13.3 of -[QUIC-TRANSPORT]. This enables a sender to detect spurious retransmissions.¶
-Sent packets are tracked for each packet number space, and ACK -processing only applies to a single space.¶
-The packet number of the sent packet.¶
-A boolean that indicates whether a packet is ack-eliciting. -If true, it is expected that an acknowledgement will be received, -though the peer could delay sending the ACK frame containing it -by up to the max_ack_delay.¶
-A boolean that indicates whether the packet counts towards bytes in -flight.¶
-The number of bytes sent in the packet, not including UDP or IP -overhead, but including QUIC framing overhead.¶
-The time the packet was sent.¶
-Constants used in loss recovery are based on a combination of RFCs, papers, and -common practice.¶
-Maximum reordering in packets before packet threshold loss detection -considers a packet lost. The value recommended in Section 6.1.1 is 3.¶
-Maximum reordering in time before time threshold loss detection -considers a packet lost. Specified as an RTT multiplier. The value -recommended in Section 6.1.2 is 9/8.¶
-Timer granularity. This is a system-dependent value, and Section 6.1.2 -recommends a value of 1ms.¶
-The RTT used before an RTT sample is taken. The value recommended in -Section 6.2.2 is 333ms.¶
-An enum to enumerate the three packet number spaces.¶
--enum kPacketNumberSpace { - Initial, - Handshake, - ApplicationData, -} -¶ -
Variables required to implement the congestion control mechanisms -are described in this section.¶
-The most recent RTT measurement made when receiving an ack for -a previously unacked packet.¶
-The smoothed RTT of the connection, computed as described in -Section 5.3.¶
-The RTT variation, computed as described in Section 5.3.¶
-The minimum RTT seen in the connection, ignoring acknowledgment delay, as -described in Section 5.2.¶
-The time that the first RTT sample was obtained.¶
-The maximum amount of time by which the receiver intends to delay -acknowledgments for packets in the Application Data packet number -space, as defined by the eponymous transport parameter (Section 18.2 -of [QUIC-TRANSPORT]). Note that the actual ack_delay in a received -ACK frame may be larger due to late timers, reordering, or loss.¶
-Multi-modal timer used for loss detection.¶
-The number of times a PTO has been sent without receiving an ack.¶
-The time the most recent ack-eliciting packet was sent.¶
-The largest packet number acknowledged in the packet number space so far.¶
-The time at which the next packet in that packet number space will be -considered lost based on exceeding the reordering window in time.¶
-An association of packet numbers in a packet number space to information -about them. Described in detail above in Appendix A.1.¶
-At the beginning of the connection, initialize the loss detection variables as -follows:¶
--loss_detection_timer.reset() -pto_count = 0 -latest_rtt = 0 -smoothed_rtt = kInitialRtt -rttvar = kInitialRtt / 2 -min_rtt = 0 -first_rtt_sample = 0 -for pn_space in [ Initial, Handshake, ApplicationData ]: - largest_acked_packet[pn_space] = infinite - time_of_last_ack_eliciting_packet[pn_space] = 0 - loss_time[pn_space] = 0 -¶ -
After a packet is sent, information about the packet is stored. The parameters -to OnPacketSent are described in detail above in Appendix A.1.1.¶
-Pseudocode for OnPacketSent follows:¶
--OnPacketSent(packet_number, pn_space, ack_eliciting, - in_flight, sent_bytes): - sent_packets[pn_space][packet_number].packet_number = - packet_number - sent_packets[pn_space][packet_number].time_sent = now() - sent_packets[pn_space][packet_number].ack_eliciting = - ack_eliciting - sent_packets[pn_space][packet_number].in_flight = in_flight - sent_packets[pn_space][packet_number].sent_bytes = sent_bytes - if (in_flight): - if (ack_eliciting): - time_of_last_ack_eliciting_packet[pn_space] = now() - OnPacketSentCC(sent_bytes) - SetLossDetectionTimer() -¶ -
When a server is blocked by anti-amplification limits, receiving -a datagram unblocks it, even if none of the packets in the -datagram are successfully processed. In such a case, the PTO -timer will need to be re-armed.¶
-Pseudocode for OnDatagramReceived follows:¶
--OnDatagramReceived(datagram): - // If this datagram unblocks the server, arm the - // PTO timer to avoid deadlock. - if (server was at anti-amplification limit): - SetLossDetectionTimer() -¶ -
When an ACK frame is received, it may newly acknowledge any number of packets.¶
-Pseudocode for OnAckReceived and UpdateRtt follow:¶
--IncludesAckEliciting(packets): - for packet in packets: - if (packet.ack_eliciting): - return true - return false - -OnAckReceived(ack, pn_space): - if (largest_acked_packet[pn_space] == infinite): - largest_acked_packet[pn_space] = ack.largest_acked - else: - largest_acked_packet[pn_space] = - max(largest_acked_packet[pn_space], ack.largest_acked) - - // DetectAndRemoveAckedPackets finds packets that are newly - // acknowledged and removes them from sent_packets. - newly_acked_packets = - DetectAndRemoveAckedPackets(ack, pn_space) - // Nothing to do if there are no newly acked packets. - if (newly_acked_packets.empty()): - return - - // Update the RTT if the largest acknowledged is newly acked - // and at least one ack-eliciting was newly acked. - if (newly_acked_packets.largest().packet_number == - ack.largest_acked && - IncludesAckEliciting(newly_acked_packets)): - latest_rtt = - now() - newly_acked_packets.largest().time_sent - UpdateRtt(ack.ack_delay) - - // Process ECN information if present. - if (ACK frame contains ECN information): - ProcessECN(ack, pn_space) - - lost_packets = DetectAndRemoveLostPackets(pn_space) - if (!lost_packets.empty()): - OnPacketsLost(lost_packets) - OnPacketsAcked(newly_acked_packets) - - // Reset pto_count unless the client is unsure if - // the server has validated the client's address. - if (PeerCompletedAddressValidation()): - pto_count = 0 - SetLossDetectionTimer() - - -UpdateRtt(ack_delay): - if (first_rtt_sample == 0): - min_rtt = latest_rtt - smoothed_rtt = latest_rtt - rttvar = latest_rtt / 2 - first_rtt_sample = now() - return - - // min_rtt ignores acknowledgment delay. - min_rtt = min(min_rtt, latest_rtt) - // Limit ack_delay by max_ack_delay after handshake - // confirmation. Note that ack_delay is 0 for - // acknowledgements of Initial and Handshake packets. - if (handshake confirmed): - ack_delay = min(ack_delay, max_ack_delay) - - // Adjust for acknowledgment delay if plausible. - adjusted_rtt = latest_rtt - if (latest_rtt > min_rtt + ack_delay): - adjusted_rtt = latest_rtt - ack_delay - - rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) - smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt -¶ -
QUIC loss detection uses a single timer for all timeout loss detection. The -duration of the timer is based on the timer's mode, which is set in the packet -and timer events further below. The function SetLossDetectionTimer defined -below shows how the single timer is set.¶
-This algorithm may result in the timer being set in the past, particularly if -timers wake up late. Timers set in the past fire immediately.¶
-Pseudocode for SetLossDetectionTimer follows:¶
--GetLossTimeAndSpace(): - time = loss_time[Initial] - space = Initial - for pn_space in [ Handshake, ApplicationData ]: - if (time == 0 || loss_time[pn_space] < time): - time = loss_time[pn_space]; - space = pn_space - return time, space - -GetPtoTimeAndSpace(): - duration = (smoothed_rtt + max(4 * rttvar, kGranularity)) - * (2 ^ pto_count) - // Arm PTO from now when there are no inflight packets. - if (no in-flight packets): - assert(!PeerCompletedAddressValidation()) - if (has handshake keys): - return (now() + duration), Handshake - else: - return (now() + duration), Initial - pto_timeout = infinite - pto_space = Initial - for space in [ Initial, Handshake, ApplicationData ]: - if (no in-flight packets in space): - continue; - if (space == ApplicationData): - // Skip Application Data until handshake confirmed. - if (handshake is not confirmed): - return pto_timeout, pto_space - // Include max_ack_delay and backoff for Application Data. - duration += max_ack_delay * (2 ^ pto_count) - - t = time_of_last_ack_eliciting_packet[space] + duration - if (t < pto_timeout): - pto_timeout = t - pto_space = space - return pto_timeout, pto_space - -PeerCompletedAddressValidation(): - // Assume clients validate the server's address implicitly. - if (endpoint is server): - return true - // Servers complete address validation when a - // protected packet is received. - return has received Handshake ACK || - handshake confirmed - -SetLossDetectionTimer(): - earliest_loss_time, _ = GetLossTimeAndSpace() - if (earliest_loss_time != 0): - // Time threshold loss detection. - loss_detection_timer.update(earliest_loss_time) - return - - if (server is at anti-amplification limit): - // The server's timer is not set if nothing can be sent. - loss_detection_timer.cancel() - return - - if (no ack-eliciting packets in flight && - PeerCompletedAddressValidation()): - // There is nothing to detect lost, so no timer is set. - // However, the client needs to arm the timer if the - // server might be blocked by the anti-amplification limit. - loss_detection_timer.cancel() - return - - // Determine which PN space to arm PTO for. - timeout, _ = GetPtoTimeAndSpace() - loss_detection_timer.update(timeout) -¶ -
When the loss detection timer expires, the timer's mode determines the action -to be performed.¶
-Pseudocode for OnLossDetectionTimeout follows:¶
--OnLossDetectionTimeout(): - earliest_loss_time, pn_space = GetLossTimeAndSpace() - if (earliest_loss_time != 0): - // Time threshold loss Detection - lost_packets = DetectAndRemoveLostPackets(pn_space) - assert(!lost_packets.empty()) - OnPacketsLost(lost_packets) - SetLossDetectionTimer() - return - - if (bytes_in_flight > 0): - // PTO. Send new data if available, else retransmit old data. - // If neither is available, send a single PING frame. - _, pn_space = GetPtoTimeAndSpace() - SendOneOrTwoAckElicitingPackets(pn_space) - else: - assert(!PeerCompletedAddressValidation()) - // Client sends an anti-deadlock packet: Initial is padded - // to earn more anti-amplification credit, - // a Handshake packet proves address ownership. - if (has Handshake keys): - SendOneAckElicitingHandshakePacket() - else: - SendOneAckElicitingPaddedInitialPacket() - - pto_count++ - SetLossDetectionTimer() -¶ -
DetectAndRemoveLostPackets is called every time an ACK is received or the time -threshold loss detection timer expires. This function operates on the -sent_packets for that packet number space and returns a list of packets newly -detected as lost.¶
-Pseudocode for DetectAndRemoveLostPackets follows:¶
--DetectAndRemoveLostPackets(pn_space): - assert(largest_acked_packet[pn_space] != infinite) - loss_time[pn_space] = 0 - lost_packets = [] - loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) - - // Minimum time of kGranularity before packets are deemed lost. - loss_delay = max(loss_delay, kGranularity) - - // Packets sent before this time are deemed lost. - lost_send_time = now() - loss_delay - - foreach unacked in sent_packets[pn_space]: - if (unacked.packet_number > largest_acked_packet[pn_space]): - continue - - // Mark packet as lost, or set time when it should be marked. - // Note: The use of kPacketThreshold here assumes that there - // were no sender-induced gaps in the packet number space. - if (unacked.time_sent <= lost_send_time || - largest_acked_packet[pn_space] >= - unacked.packet_number + kPacketThreshold): - sent_packets[pn_space].remove(unacked.packet_number) - if (unacked.in_flight): - lost_packets.insert(unacked) - else: - if (loss_time[pn_space] == 0): - loss_time[pn_space] = unacked.time_sent + loss_delay - else: - loss_time[pn_space] = min(loss_time[pn_space], - unacked.time_sent + loss_delay) - return lost_packets -¶ -
When Initial or Handshake keys are discarded, packets from the space -are discarded and loss detection state is updated.¶
-Pseudocode for OnPacketNumberSpaceDiscarded follows:¶
--OnPacketNumberSpaceDiscarded(pn_space): - assert(pn_space != ApplicationData) - RemoveFromBytesInFlight(sent_packets[pn_space]) - sent_packets[pn_space].clear() - // Reset the loss detection and PTO timer - time_of_last_ack_eliciting_packet[pn_space] = 0 - loss_time[pn_space] = 0 - pto_count = 0 - SetLossDetectionTimer() -¶ -
We now describe an example implementation of the congestion controller described -in Section 7.¶
-The pseudocode segments in this section are licensed as Code Components; see the -copyright notice.¶
-Constants used in congestion control are based on a combination of RFCs, papers, -and common practice.¶
-Default limit on the initial bytes in flight as described in Section 7.2.¶
-Minimum congestion window in bytes as described in Section 7.2.¶
-Reduction in congestion window when a new loss event is detected. -Section 7 recommends a value is 0.5.¶
-Period of time for persistent congestion to be established, specified as a PTO -multiplier. Section 7.6 recommends a value of 3.¶
-Variables required to implement the congestion control mechanisms -are described in this section.¶
-The sender's current maximum payload size. Does not include UDP or IP -overhead. The max datagram size is used for congestion window -computations. An endpoint sets the value of this variable based on its Path -Maximum Transmission Unit (PMTU; see Section 14.2 of [QUIC-TRANSPORT]), with -a minimum value of 1200 bytes.¶
-The highest value reported for the ECN-CE counter in the packet number space -by the peer in an ACK frame. This value is used to detect increases in the -reported ECN-CE counter.¶
-The sum of the size in bytes of all sent packets that contain at least one -ack-eliciting or PADDING frame, and have not been acknowledged or declared -lost. The size does not include IP or UDP overhead, but does include the QUIC -header and AEAD overhead. Packets only containing ACK frames do not count -towards bytes_in_flight to ensure congestion control does not impede -congestion feedback.¶
-Maximum number of bytes-in-flight that may be sent.¶
-The time when QUIC first detects congestion due to loss or ECN, causing -it to enter congestion recovery. When a packet sent after this time is -acknowledged, QUIC exits congestion recovery.¶
-Slow start threshold in bytes. When the congestion window is below ssthresh, -the mode is slow start and the window grows by the number of bytes -acknowledged.¶
-The congestion control pseudocode also accesses some of the variables from the -loss recovery pseudocode.¶
-At the beginning of the connection, initialize the congestion control -variables as follows:¶
--congestion_window = kInitialWindow -bytes_in_flight = 0 -congestion_recovery_start_time = 0 -ssthresh = infinite -for pn_space in [ Initial, Handshake, ApplicationData ]: - ecn_ce_counters[pn_space] = 0 -¶ -
Whenever a packet is sent, and it contains non-ACK frames, the packet -increases bytes_in_flight.¶
--OnPacketSentCC(sent_bytes): - bytes_in_flight += sent_bytes -¶ -
Invoked from loss detection's OnAckReceived and is supplied with the -newly acked_packets from sent_packets.¶
-In congestion avoidance, implementers that use an integer representation -for congestion_window should be careful with division, and can use -the alternative approach suggested in Section 2.1 of [RFC3465].¶
--InCongestionRecovery(sent_time): - return sent_time <= congestion_recovery_start_time - -OnPacketsAcked(acked_packets): - for acked_packet in acked_packets: - OnPacketAcked(acked_packet) - -OnPacketAcked(acked_packet): - if (!acked_packet.in_flight): - return; - // Remove from bytes_in_flight. - bytes_in_flight -= acked_packet.sent_bytes - // Do not increase congestion_window if application - // limited or flow control limited. - if (IsAppOrFlowControlLimited()) - return - // Do not increase congestion window in recovery period. - if (InCongestionRecovery(acked_packet.time_sent)): - return - if (congestion_window < ssthresh): - // Slow start. - congestion_window += acked_packet.sent_bytes - else: - // Congestion avoidance. - congestion_window += - max_datagram_size * acked_packet.sent_bytes - / congestion_window -¶ -
Invoked from ProcessECN and OnPacketsLost when a new congestion event is -detected. If not already in recovery, this starts a recovery period and -reduces the slow start threshold and congestion window immediately.¶
--OnCongestionEvent(sent_time): - // No reaction if already in a recovery period. - if (InCongestionRecovery(sent_time)): - return - - // Enter recovery period. - congestion_recovery_start_time = now() - ssthresh = congestion_window * kLossReductionFactor - congestion_window = max(ssthresh, kMinimumWindow) - // A packet can be sent to speed up loss recovery. - MaybeSendOnePacket() -¶ -
Invoked when an ACK frame with an ECN section is received from the peer.¶
--ProcessECN(ack, pn_space): - // If the ECN-CE counter reported by the peer has increased, - // this could be a new congestion event. - if (ack.ce_counter > ecn_ce_counters[pn_space]): - ecn_ce_counters[pn_space] = ack.ce_counter - sent_time = sent_packets[ack.largest_acked].time_sent - OnCongestionEvent(sent_time) -¶ -
Invoked when DetectAndRemoveLostPackets deems packets lost.¶
--OnPacketsLost(lost_packets): - // Remove lost packets from bytes_in_flight. - for lost_packet in lost_packets: - assert(lost_packet.in_flight) - bytes_in_flight -= lost_packet.sent_bytes - OnCongestionEvent(lost_packets.largest().time_sent) - - // Reset the congestion window if the loss of these - // packets indicates persistent congestion. - // Only consider packets sent after getting an RTT sample. - if (first_rtt_sample == 0): - return - pc_lost = [] - for lost in lost_packets: - if lost.time_sent > first_rtt_sample: - pc_lost.insert(lost) - if (InPersistentCongestion(pc_lost)): - congestion_window = kMinimumWindow - congestion_recovery_start_time = 0 -¶ -
When Initial or Handshake keys are discarded, packets sent in that space no -longer count toward bytes in flight.¶
-Pseudocode for RemoveFromBytesInFlight follows:¶
--RemoveFromBytesInFlight(discarded_packets): - // Remove any unacknowledged packets from flight. - foreach packet in discarded_packets: - if packet.in_flight - bytes_in_flight -= size -¶ -
Issue and pull request numbers are listed with a leading octothorp.¶
-Editorial changes only.¶
-No changes.¶
-No significant changes.¶
-No significant changes.¶
-No significant changes.¶
-No significant changes.¶
-No significant changes.¶
-No significant changes.¶
-The IETF QUIC Working Group received an enormous amount of support from many -people. The following people provided substantive contributions to this -document:¶
- -Internet-Draft | -Using TLS to Secure QUIC | -December 2020 | -
Thomson & Turner | -Expires 13 June 2021 | -[Page] | -
This document describes how Transport Layer Security (TLS) is used to secure -QUIC.¶
-Discussion of this draft takes place on the QUIC working group mailing list -(quic@ietf.org), which is archived at -https://mailarchive.ietf.org/arch/search/?email_list=quic.¶
-Working Group information can be found at https://github.com/quicwg; source -code and issues list for this draft can be found at -https://github.com/quicwg/base-drafts/labels/-tls.¶
-- This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79.¶
-- Internet-Drafts are working documents of the Internet Engineering Task - Force (IETF). Note that other groups may also distribute working - documents as Internet-Drafts. The list of current Internet-Drafts is - at https://datatracker.ietf.org/drafts/current/.¶
-- Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress."¶
-- This Internet-Draft will expire on 13 June 2021.¶
-- Copyright (c) 2020 IETF Trust and the persons identified as the - document authors. All rights reserved.¶
-- This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with - respect to this document. Code Components extracted from this - document must include Simplified BSD License text as described in - Section 4.e of the Trust Legal Provisions and are provided without - warranty as described in the Simplified BSD License.¶
-This document describes how QUIC [QUIC-TRANSPORT] is secured using TLS -[TLS13].¶
-TLS 1.3 provides critical latency improvements for connection establishment over -previous versions. Absent packet loss, most new connections can be established -and secured within a single round trip; on subsequent connections between the -same client and server, the client can often send application data immediately, -that is, using a zero round trip setup.¶
-This document describes how TLS acts as a security component of QUIC.¶
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL -NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in BCP 14 [RFC2119] [RFC8174] when, and only when, they -appear in all capitals, as shown here.¶
-This document uses the terminology established in [QUIC-TRANSPORT].¶
-For brevity, the acronym TLS is used to refer to TLS 1.3, though a newer version -could be used; see Section 4.2.¶
-TLS provides two endpoints with a way to establish a means of communication -over an untrusted medium (that is, the Internet). TLS enables authentication of -peers and provides confidentiality and integrity protection for messages that -endpoints exchange.¶
-Internally, TLS is a layered protocol, with the structure shown in -Figure 1.¶
-Each Handshake layer message (e.g., Handshake, Alerts, and Application Data) is -carried as a series of typed TLS records by the Record layer. Records are -individually cryptographically protected and then transmitted over a reliable -transport (typically TCP), which provides sequencing and guaranteed delivery.¶
-The TLS authenticated key exchange occurs between two endpoints: client and -server. The client initiates the exchange and the server responds. If the key -exchange completes successfully, both client and server will agree on a secret. -TLS supports both pre-shared key (PSK) and Diffie-Hellman over either finite -fields or elliptic curves ((EC)DHE) key exchanges. PSK is the basis for Early -Data (0-RTT); the latter provides perfect forward secrecy (PFS) when the (EC)DHE -keys are destroyed.¶
-After completing the TLS handshake, the client will have learned and -authenticated an identity for the server and the server is optionally able to -learn and authenticate an identity for the client. TLS supports X.509 -[RFC5280] certificate-based authentication for both server and client.¶
-The TLS key exchange is resistant to tampering by attackers and it produces -shared secrets that cannot be controlled by either participating peer.¶
-TLS provides two basic handshake modes of interest to QUIC:¶
-A simplified TLS handshake with 0-RTT application data is shown in Figure 2.¶
-Figure 2 omits the EndOfEarlyData message, which is not used in QUIC; see -Section 8.3. Likewise, neither ChangeCipherSpec nor KeyUpdate messages are -used by QUIC. ChangeCipherSpec is redundant in TLS 1.3; see Section 8.4. -QUIC has its own key update mechanism; see Section 6.¶
-Data is protected using a number of encryption levels:¶
-Application Data may appear only in the Early Data and Application Data -levels. Handshake and Alert messages may appear in any level.¶
-The 0-RTT handshake is only possible if the client and server have previously -communicated. In the 1-RTT handshake, the client is unable to send protected -Application Data until it has received all of the Handshake messages sent by the -server.¶
-QUIC [QUIC-TRANSPORT] assumes responsibility for the confidentiality and -integrity protection of packets. For this it uses keys derived from a TLS -handshake [TLS13], but instead of carrying TLS records over QUIC (as with -TCP), TLS Handshake and Alert messages are carried directly over the QUIC -transport, which takes over the responsibilities of the TLS record layer, as -shown in Figure 3.¶
-QUIC also relies on TLS for authentication and negotiation of parameters that -are critical to security and performance.¶
-Rather than a strict layering, these two protocols cooperate: QUIC uses the TLS -handshake; TLS uses the reliability, ordered delivery, and record layer provided -by QUIC.¶
-At a high level, there are two main interactions between the TLS and QUIC -components:¶
-Figure 4 shows these interactions in more detail, with the QUIC packet -protection being called out specially.¶
-Unlike TLS over TCP, QUIC applications that want to send data do not send it -through TLS "application_data" records. Rather, they send it as QUIC STREAM -frames or other frame types, which are then carried in QUIC packets.¶
-QUIC carries TLS handshake data in CRYPTO frames, each of which consists of a -contiguous block of handshake data identified by an offset and length. Those -frames are packaged into QUIC packets and encrypted under the current TLS -encryption level. As with TLS over TCP, once TLS handshake data has been -delivered to QUIC, it is QUIC's responsibility to deliver it reliably. Each -chunk of data that is produced by TLS is associated with the set of keys that -TLS is currently using. If QUIC needs to retransmit that data, it MUST use the -same keys even if TLS has already updated to newer keys.¶
-One important difference between TLS records (used with TCP) and QUIC CRYPTO -frames is that in QUIC multiple frames may appear in the same QUIC packet as -long as they are associated with the same packet number space. For instance, -an endpoint can bundle a Handshake message and an ACK for some Handshake data -into the same packet. Some frames are prohibited in different packet number -spaces; see Section 12.5 of [QUIC-TRANSPORT].¶
-Because packets could be reordered on the wire, QUIC uses the packet type to -indicate which keys were used to protect a given packet, as shown in -Table 1. When packets of different types need to be sent, -endpoints SHOULD use coalesced packets to send them in the same UDP datagram.¶
-Packet Type | -Encryption Keys | -PN Space | -
---|---|---|
Initial | -Initial secrets | -Initial | -
0-RTT Protected | -0-RTT | -Application data | -
Handshake | -Handshake | -Handshake | -
Retry | -Retry | -N/A | -
Version Negotiation | -N/A | -N/A | -
Short Header | -1-RTT | -Application data | -
Section 17 of [QUIC-TRANSPORT] shows how packets at the various encryption -levels fit into the handshake process.¶
-As shown in Figure 4, the interface from QUIC to TLS consists of four -primary functions:¶
-Additional functions might be needed to configure TLS.¶
-In this document, the TLS handshake is considered complete when the TLS stack -has reported that the handshake is complete. This happens when the TLS stack -has both sent a Finished message and verified the peer's Finished message. -Verifying the peer's Finished provides the endpoints with an assurance that -previous handshake messages have not been modified. Note that the handshake -does not complete at both endpoints simultaneously. Consequently, any -requirement that is based on the completion of the handshake depends on the -perspective of the endpoint in question.¶
-In this document, the TLS handshake is considered confirmed at the server when -the handshake completes. At the client, the handshake is considered confirmed -when a HANDSHAKE_DONE frame is received.¶
-A client MAY consider the handshake to be confirmed when it receives an -acknowledgement for a 1-RTT packet. This can be implemented by recording the -lowest packet number sent with 1-RTT keys, and comparing it to the Largest -Acknowledged field in any received 1-RTT ACK frame: once the latter is greater -than or equal to the former, the handshake is confirmed.¶
-In order to drive the handshake, TLS depends on being able to send and receive -handshake messages. There are two basic functions on this interface: one where -QUIC requests handshake messages and one where QUIC provides bytes that comprise -handshake messages.¶
-Before starting the handshake QUIC provides TLS with the transport parameters -(see Section 8.2) that it wishes to carry.¶
-A QUIC client starts TLS by requesting TLS handshake bytes from TLS. The client -acquires handshake bytes before sending its first packet. A QUIC server starts -the process by providing TLS with the client's handshake bytes.¶
-At any time, the TLS stack at an endpoint will have a current sending -encryption level and receiving encryption level. Encryption levels determine -the packet type and keys that are used for protecting data.¶
-Each encryption level is associated with a different sequence of bytes, which is -reliably transmitted to the peer in CRYPTO frames. When TLS provides handshake -bytes to be sent, they are appended to the handshake bytes for the current -encryption level. The encryption level then determines the type of packet that -the resulting CRYPTO frame is carried in; see Table 1.¶
-Four encryption levels are used, producing keys for Initial, 0-RTT, Handshake, -and 1-RTT packets. CRYPTO frames are carried in just three of these levels, -omitting the 0-RTT level. These four levels correspond to three packet number -spaces: Initial and Handshake encrypted packets use their own separate spaces; -0-RTT and 1-RTT packets use the application data packet number space.¶
-QUIC takes the unprotected content of TLS handshake records as the content of -CRYPTO frames. TLS record protection is not used by QUIC. QUIC assembles -CRYPTO frames into QUIC packets, which are protected using QUIC packet -protection.¶
-QUIC is only capable of conveying TLS handshake records in CRYPTO frames. TLS -alerts are turned into QUIC CONNECTION_CLOSE error codes; see Section 4.8. -TLS application data and other message types cannot be carried by QUIC at any -encryption level; it is an error if they are received from the TLS stack.¶
-When an endpoint receives a QUIC packet containing a CRYPTO frame from the -network, it proceeds as follows:¶
-Each time that TLS is provided with new data, new handshake bytes are requested -from TLS. TLS might not provide any bytes if the handshake messages it has -received are incomplete or it has no data to send.¶
-The content of CRYPTO frames might either be processed incrementally by TLS or -buffered until complete messages or flights are available. TLS is responsible -for buffering handshake bytes that have arrived in order. QUIC is responsible -for buffering handshake bytes that arrive out of order or for encryption levels -that are not yet ready. QUIC does not provide any means of flow control for -CRYPTO frames; see Section 7.5 of [QUIC-TRANSPORT].¶
-Once the TLS handshake is complete, this is indicated to QUIC along with any -final handshake bytes that TLS needs to send. TLS also provides QUIC with the -transport parameters that the peer advertised during the handshake.¶
-Once the handshake is complete, TLS becomes passive. TLS can still receive data -from its peer and respond in kind, but it will not need to send more data unless -specifically requested - either by an application or QUIC. One reason to send -data is that the server might wish to provide additional or updated session -tickets to a client.¶
-When the handshake is complete, QUIC only needs to provide TLS with any data -that arrives in CRYPTO streams. In the same way that is done during the -handshake, new data is requested from TLS after providing received data.¶
-As keys at a given encryption level become available to TLS, TLS indicates to -QUIC that reading or writing keys at that encryption level are available.¶
-The availability of new keys is always a result of providing inputs to TLS. TLS -only provides new keys after being initialized (by a client) or when provided -with new handshake data.¶
-However, a TLS implementation could perform some of its processing -asynchronously. In particular, the process of validating a certificate can take -some time. While waiting for TLS processing to complete, an endpoint SHOULD -buffer received packets if they might be processed using keys that aren't yet -available. These packets can be processed once keys are provided by TLS. An -endpoint SHOULD continue to respond to packets that can be processed during this -time.¶
-After processing inputs, TLS might produce handshake bytes, keys for new -encryption levels, or both.¶
-TLS provides QUIC with three items as a new encryption level becomes available:¶
-These values are based on the values that TLS negotiates and are used by QUIC to -generate packet and header protection keys; see Section 5 and -Section 5.4.¶
-If 0-RTT is possible, it is ready after the client sends a TLS ClientHello -message or the server receives that message. After providing a QUIC client with -the first handshake bytes, the TLS stack might signal the change to 0-RTT -keys. On the server, after receiving handshake bytes that contain a ClientHello -message, a TLS server might signal that 0-RTT keys are available.¶
-Although TLS only uses one encryption level at a time, QUIC may use more than -one level. For instance, after sending its Finished message (using a CRYPTO -frame at the Handshake encryption level) an endpoint can send STREAM data (in -1-RTT encryption). If the Finished message is lost, the endpoint uses the -Handshake encryption level to retransmit the lost message. Reordering or loss -of packets can mean that QUIC will need to handle packets at multiple encryption -levels. During the handshake, this means potentially handling packets at higher -and lower encryption levels than the current encryption level used by TLS.¶
-In particular, server implementations need to be able to read packets at the -Handshake encryption level at the same time as the 0-RTT encryption level. A -client could interleave ACK frames that are protected with Handshake keys with -0-RTT data and the server needs to process those acknowledgments in order to -detect lost Handshake packets.¶
-QUIC also needs access to keys that might not ordinarily be available to a TLS -implementation. For instance, a client might need to acknowledge Handshake -packets before it is ready to send CRYPTO frames at that encryption level. TLS -therefore needs to provide keys to QUIC before it might produce them for its own -use.¶
-Figure 5 summarizes the exchange between QUIC and TLS for both -client and server. Solid arrows indicate packets that carry handshake data; -dashed arrows show where application data can be sent. Each arrow is tagged -with the encryption level used for that transmission.¶
-Figure 5 shows the multiple packets that form a single "flight" of -messages being processed individually, to show what incoming messages trigger -different actions. New handshake messages are requested after incoming packets -have been processed. This process varies based on the structure of endpoint -implementations and the order in which packets arrive; this is intended to -illustrate the steps involved in a single handshake exchange.¶
-This document describes how TLS 1.3 [TLS13] is used with QUIC.¶
-In practice, the TLS handshake will negotiate a version of TLS to use. This -could result in a newer version of TLS than 1.3 being negotiated if both -endpoints support that version. This is acceptable provided that the features -of TLS 1.3 that are used by QUIC are supported by the newer version.¶
-Clients MUST NOT offer TLS versions older than 1.3. A badly configured TLS -implementation could negotiate TLS 1.2 or another older version of TLS. An -endpoint MUST terminate the connection if a version of TLS older than 1.3 is -negotiated.¶
-The first Initial packet from a client contains the start or all of its first -cryptographic handshake message, which for TLS is the ClientHello. Servers -might need to parse the entire ClientHello (e.g., to access extensions such as -Server Name Identification (SNI) or Application Layer Protocol Negotiation -(ALPN)) in order to decide whether to accept the new incoming QUIC connection. -If the ClientHello spans multiple Initial packets, such servers would need to -buffer the first received fragments, which could consume excessive resources if -the client's address has not yet been validated. To avoid this, servers MAY -use the Retry feature (see Section 8.1 of [QUIC-TRANSPORT]) to only buffer -partial ClientHello messages from clients with a validated address.¶
-QUIC packet and framing add at least 36 bytes of overhead to the ClientHello -message. That overhead increases if the client chooses a source connection ID -longer than zero bytes. Overheads also do not include the token or a -destination connection ID longer than 8 bytes, both of which might be required -if a server sends a Retry packet.¶
-A typical TLS ClientHello can easily fit into a 1200-byte packet. However, in -addition to the overheads added by QUIC, there are several variables that could -cause this limit to be exceeded. Large session tickets, multiple or large key -shares, and long lists of supported ciphers, signature algorithms, versions, -QUIC transport parameters, and other negotiable parameters and extensions could -cause this message to grow.¶
-For servers, in addition to connection IDs and tokens, the size of TLS session -tickets can have an effect on a client's ability to connect efficiently. -Minimizing the size of these values increases the probability that clients can -use them and still fit their ClientHello message in their first Initial packet.¶
-The TLS implementation does not need to ensure that the ClientHello is -sufficiently large. QUIC PADDING frames are added to increase the size of the -packet as necessary.¶
-The requirements for authentication depend on the application protocol that is -in use. TLS provides server authentication and permits the server to request -client authentication.¶
-A client MUST authenticate the identity of the server. This typically involves -verification that the identity of the server is included in a certificate and -that the certificate is issued by a trusted entity (see for example -[RFC2818]).¶
-Where servers provide certificates for authentication, the size of -the certificate chain can consume a large number of bytes. Controlling the -size of certificate chains is critical to performance in QUIC as servers are -limited to sending 3 bytes for every byte received prior to validating the -client address; see Section 8.1 of [QUIC-TRANSPORT]. The size of a -certificate chain can be managed by limiting the number of names or -extensions; using keys with small public key representations, like ECDSA; or -by using certificate compression -[COMPRESS].¶
-A server MAY request that the client authenticate during the handshake. A server -MAY refuse a connection if the client is unable to authenticate when requested. -The requirements for client authentication vary based on application protocol -and deployment.¶
-A server MUST NOT use post-handshake client authentication (as defined in -Section 4.6.2 of [TLS13]), because the multiplexing offered by QUIC prevents -clients from correlating the certificate request with the application-level -event that triggered it (see [HTTP2-TLS13]). -More specifically, servers MUST NOT send post-handshake TLS CertificateRequest -messages and clients MUST treat receipt of such messages as a connection error -of type PROTOCOL_VIOLATION.¶
-QUIC can use the session resumption feature of TLS 1.3. It does this by -carrying NewSessionTicket messages in CRYPTO frames after the handshake is -complete. Session resumption is the basis of 0-RTT, but can be used without -also enabling 0-RTT.¶
-Endpoints that use session resumption might need to remember some information -about the current connection when creating a resumed connection. TLS requires -that some information be retained; see Section 4.6.1 of [TLS13]. QUIC itself -does not depend on any state being retained when resuming a connection, unless -0-RTT is also used; see Section 4.6.1 and Section 7.4.1 of -[QUIC-TRANSPORT]. Application protocols could depend on state that is -retained between resumed connections.¶
-Clients can store any state required for resumption along with the session -ticket. Servers can use the session ticket to help carry state.¶
-Session resumption allows servers to link activity on the original connection -with the resumed connection, which might be a privacy issue for clients. -Clients can choose not to enable resumption to avoid creating this correlation. -Clients SHOULD NOT reuse tickets as that allows entities other than the server -to correlate connections; see Section C.4 of [TLS13].¶
-The 0-RTT feature in QUIC allows a client to send application data before the -handshake is complete. This is made possible by reusing negotiated parameters -from a previous connection. To enable this, 0-RTT depends on the client -remembering critical parameters and providing the server with a TLS session -ticket that allows the server to recover the same information.¶
-This information includes parameters that determine TLS state, as governed by -[TLS13], QUIC transport parameters, the chosen application protocol, and any -information the application protocol might need; see Section 4.6.3. This -information determines how 0-RTT packets and their contents are formed.¶
-To ensure that the same information is available to both endpoints, all -information used to establish 0-RTT comes from the same connection. Endpoints -cannot selectively disregard information that might alter the sending or -processing of 0-RTT.¶
-[TLS13] sets a limit of 7 days on the time between the original connection -and any attempt to use 0-RTT. There are other constraints on 0-RTT usage, -notably those caused by the potential exposure to replay attack; see Section 9.2.¶
-To communicate their willingness to process 0-RTT data, servers send a -NewSessionTicket message that contains the early_data extension with a -max_early_data_size of 0xffffffff. The TLS max_early_data_size parameter is not -used in QUIC. The amount of data that the client can send in 0-RTT is -controlled by the initial_max_data transport parameter supplied by the server.¶
-Servers MUST NOT send the early_data extension with a max_early_data_size field -set to any value other than 0xffffffff. A client MUST treat receipt of a -NewSessionTicket that contains an early_data extension with any other value as -a connection error of type PROTOCOL_VIOLATION.¶
-A client that wishes to send 0-RTT packets uses the early_data extension in -the ClientHello message of a subsequent handshake; see Section 4.2.10 of -[TLS13]. It then sends application data in 0-RTT packets.¶
-A client that attempts 0-RTT might also provide an address validation token if -the server has sent a NEW_TOKEN frame; see Section 8.1 of [QUIC-TRANSPORT].¶
-A server accepts 0-RTT by sending an early_data extension in the -EncryptedExtensions (see Section 4.2.10 of [TLS13]). The server then -processes and acknowledges the 0-RTT packets that it receives.¶
-A server rejects 0-RTT by sending the EncryptedExtensions without an early_data -extension. A server will always reject 0-RTT if it sends a TLS -HelloRetryRequest. When rejecting 0-RTT, a server MUST NOT process any 0-RTT -packets, even if it could. When 0-RTT was rejected, a client SHOULD treat -receipt of an acknowledgement for a 0-RTT packet as a connection error of type -PROTOCOL_VIOLATION, if it is able to detect the condition.¶
-When 0-RTT is rejected, all connection characteristics that the client assumed -might be incorrect. This includes the choice of application protocol, transport -parameters, and any application configuration. The client therefore MUST reset -the state of all streams, including application state bound to those streams.¶
-A client MAY reattempt 0-RTT if it receives a Retry or Version Negotiation -packet. These packets do not signify rejection of 0-RTT.¶
-When a server receives a ClientHello with the early_data extension, it has to -decide whether to accept or reject early data from the client. Some of this -decision is made by the TLS stack (e.g., checking that the cipher suite being -resumed was included in the ClientHello; see Section 4.2.10 of [TLS13]). Even -when the TLS stack has no reason to reject early data, the QUIC stack or the -application protocol using QUIC might reject early data because the -configuration of the transport or application associated with the resumed -session is not compatible with the server's current configuration.¶
-QUIC requires additional transport state to be associated with a 0-RTT session -ticket. One common way to implement this is using stateless session tickets and -storing this state in the session ticket. Application protocols that use QUIC -might have similar requirements regarding associating or storing state. This -associated state is used for deciding whether early data must be rejected. For -example, HTTP/3 ([QUIC-HTTP]) settings determine how early data from the -client is interpreted. Other applications using QUIC could have different -requirements for determining whether to accept or reject early data.¶
-The HelloRetryRequest message (see Section 4.1.4 of [TLS13]) can be used to -request that a client provide new information, such as a key share, or to -validate some characteristic of the client. From the perspective of QUIC, -HelloRetryRequest is not differentiated from other cryptographic handshake -messages that are carried in Initial packets. Although it is in principle -possible to use this feature for address verification, QUIC implementations -SHOULD instead use the Retry feature; see Section 8.1 of [QUIC-TRANSPORT].¶
-If TLS experiences an error, it generates an appropriate alert as defined in -Section 6 of [TLS13].¶
-A TLS alert is converted into a QUIC connection error. The alert description is -added to 0x100 to produce a QUIC error code from the range reserved for -CRYPTO_ERROR. The resulting value is sent in a QUIC CONNECTION_CLOSE frame of -type 0x1c.¶
-The alert level of all TLS alerts is "fatal"; a TLS stack MUST NOT generate -alerts at the "warning" level.¶
-QUIC permits the use of a generic code in place of a specific error code; see -Section 11 of [QUIC-TRANSPORT]. For TLS alerts, this includes replacing any -alert with a generic alert, such as handshake_failure (0x128 in QUIC). -Endpoints MAY use a generic error code to avoid possibly exposing confidential -information.¶
-After QUIC moves to a new encryption level, packet protection keys for previous -encryption levels can be discarded. This occurs several times during the -handshake, as well as when keys are updated; see Section 6.¶
-Packet protection keys are not discarded immediately when new keys are -available. If packets from a lower encryption level contain CRYPTO frames, -frames that retransmit that data MUST be sent at the same encryption level. -Similarly, an endpoint generates acknowledgements for packets at the same -encryption level as the packet being acknowledged. Thus, it is possible that -keys for a lower encryption level are needed for a short time after keys for a -newer encryption level are available.¶
-An endpoint cannot discard keys for a given encryption level unless it has both -received and acknowledged all CRYPTO frames for that encryption level and when -all CRYPTO frames for that encryption level have been acknowledged by its peer. -However, this does not guarantee that no further packets will need to be -received or sent at that encryption level because a peer might not have received -all the acknowledgements necessary to reach the same state.¶
-Though an endpoint might retain older keys, new data MUST be sent at the highest -currently-available encryption level. Only ACK frames and retransmissions of -data in CRYPTO frames are sent at a previous encryption level. These packets -MAY also include PADDING frames.¶
-Packets protected with Initial secrets (Section 5.2) are not -authenticated, meaning that an attacker could spoof packets with the intent to -disrupt a connection. To limit these attacks, Initial packet protection keys -are discarded more aggressively than other keys.¶
-The successful use of Handshake packets indicates that no more Initial packets -need to be exchanged, as these keys can only be produced after receiving all -CRYPTO frames from Initial packets. Thus, a client MUST discard Initial keys -when it first sends a Handshake packet and a server MUST discard Initial keys -when it first successfully processes a Handshake packet. Endpoints MUST NOT -send Initial packets after this point.¶
-This results in abandoning loss recovery state for the Initial encryption level -and ignoring any outstanding Initial packets.¶
-An endpoint MUST discard its handshake keys when the TLS handshake is confirmed -(Section 4.1.2). The server MUST send a HANDSHAKE_DONE frame as soon -as it completes the handshake.¶
-0-RTT and 1-RTT packets share the same packet number space, and clients do not -send 0-RTT packets after sending a 1-RTT packet (Section 5.6).¶
-Therefore, a client SHOULD discard 0-RTT keys as soon as it installs 1-RTT -keys, since they have no use after that moment.¶
-Additionally, a server MAY discard 0-RTT keys as soon as it receives a 1-RTT -packet. However, due to packet reordering, a 0-RTT packet could arrive after -a 1-RTT packet. Servers MAY temporarily retain 0-RTT keys to allow decrypting -reordered packets without requiring their contents to be retransmitted with -1-RTT keys. After receiving a 1-RTT packet, servers MUST discard 0-RTT keys -within a short time; the RECOMMENDED time period is three times the Probe -Timeout (PTO, see [QUIC-RECOVERY]). A server MAY discard 0-RTT keys earlier -if it determines that it has received all 0-RTT packets, which can be done by -keeping track of missing packet numbers.¶
-As with TLS over TCP, QUIC protects packets with keys derived from the TLS -handshake, using the AEAD algorithm [AEAD] negotiated by TLS.¶
-QUIC packets have varying protections depending on their type:¶
-This section describes how packet protection is applied to Handshake packets, -0-RTT packets, and 1-RTT packets. The same packet protection process is applied -to Initial packets. However, as it is trivial to determine the keys used for -Initial packets, these packets are not considered to have confidentiality or -integrity protection. Retry packets use a fixed key and so similarly lack -confidentiality and integrity protection.¶
-QUIC derives packet protection keys in the same way that TLS derives record -protection keys.¶
-Each encryption level has separate secret values for protection of packets sent -in each direction. These traffic secrets are derived by TLS (see Section 7.1 of -[TLS13]) and are used by QUIC for all encryption levels except the Initial -encryption level. The secrets for the Initial encryption level are computed -based on the client's initial Destination Connection ID, as described in -Section 5.2.¶
-The keys used for packet protection are computed from the TLS secrets using the -KDF provided by TLS. In TLS 1.3, the HKDF-Expand-Label function described in -Section 7.1 of [TLS13] is used, using the hash function from the negotiated -cipher suite. Note that labels, which are described using strings, are encoded -as bytes using ASCII [ASCII] without quotes or any trailing NUL -byte. Other versions of TLS MUST provide a similar function in order to be -used with QUIC.¶
-The current encryption level secret and the label "quic key" are input to the -KDF to produce the AEAD key; the label "quic iv" is used to derive the -Initialization Vector (IV); see Section 5.3. The header protection key uses the -"quic hp" label; see Section 5.4. Using these labels provides key -separation between QUIC and TLS; see Section 9.6.¶
-The KDF used for initial secrets is always the HKDF-Expand-Label function from -TLS 1.3; see Section 5.2.¶
-Initial packets apply the packet protection process, but use a secret derived -from the Destination Connection ID field from the client's first Initial -packet.¶
-This secret is determined by using HKDF-Extract (see Section 2.2 of -[HKDF]) with a salt of 0xafbfec289993d24c9e9786f19c6111e04390a899 -and a IKM of the Destination Connection ID field. This produces an intermediate -pseudorandom key (PRK) that is used to derive two separate secrets for sending -and receiving.¶
-The secret used by clients to construct Initial packets uses the PRK and the -label "client in" as input to the HKDF-Expand-Label function from TLS -[TLS13] to produce a 32-byte secret. Packets constructed by the server use -the same process with the label "server in". The hash function for HKDF when -deriving initial secrets and keys is SHA-256 -[SHA].¶
-This process in pseudocode is:¶
--initial_salt = 0xafbfec289993d24c9e9786f19c6111e04390a899 -initial_secret = HKDF-Extract(initial_salt, - client_dst_connection_id) - -client_initial_secret = HKDF-Expand-Label(initial_secret, - "client in", "", - Hash.length) -server_initial_secret = HKDF-Expand-Label(initial_secret, - "server in", "", - Hash.length) -¶ -
The connection ID used with HKDF-Expand-Label is the Destination Connection ID -in the Initial packet sent by the client. This will be a randomly-selected -value unless the client creates the Initial packet after receiving a Retry -packet, where the Destination Connection ID is selected by the server.¶
-Future versions of QUIC SHOULD generate a new salt value, thus ensuring that -the keys are different for each version of QUIC. This prevents a middlebox that -recognizes only one version of QUIC from seeing or modifying the contents of -packets from future versions.¶
-The HKDF-Expand-Label function defined in TLS 1.3 MUST be used for Initial -packets even where the TLS versions offered do not include TLS 1.3.¶
-The secrets used for constructing Initial packets change when a server sends a -Retry packet to use the connection ID value selected by the server. The secrets -do not change when a client changes the Destination Connection ID it uses in -response to an Initial packet from the server.¶
-The Destination Connection ID field could be any length up to 20 bytes, -including zero length if the server sends a Retry packet with a zero-length -Source Connection ID field. After a Retry, the Initial keys provide the client -no assurance that the server received its packet, so the client has to rely on -the exchange that included the Retry packet to validate the server address; -see Section 8.1 of [QUIC-TRANSPORT].¶
-Appendix A contains sample Initial packets.¶
-The Authenticated Encryption with Associated Data (AEAD; see [AEAD]) function -used for QUIC packet protection is the AEAD that is negotiated for use with the -TLS connection. For example, if TLS is using the TLS_AES_128_GCM_SHA256 cipher -suite, the AEAD_AES_128_GCM function is used.¶
-QUIC can use any of the cipher suites defined in [TLS13] with the exception -of TLS_AES_128_CCM_8_SHA256. A cipher suite MUST NOT be negotiated unless a -header protection scheme is defined for the cipher suite. This document defines -a header protection scheme for all cipher suites defined in [TLS13] aside -from TLS_AES_128_CCM_8_SHA256. These cipher suites have a 16-byte -authentication tag and produce an output 16 bytes larger than their input.¶
-An endpoint MUST NOT reject a ClientHello that offers a cipher suite that it -does not support, or it would be impossible to deploy a new cipher suite. -This also applies to TLS_AES_128_CCM_8_SHA256.¶
-When constructing packets, the AEAD function is applied prior to applying -header protection; see Section 5.4. The unprotected packet header is part -of the associated data (A). When processing packets, an endpoint first -removes the header protection.¶
-The key and IV for the packet are computed as described in Section 5.1. -The nonce, N, is formed by combining the packet protection IV with the packet -number. The 62 bits of the reconstructed QUIC packet number in network byte -order are left-padded with zeros to the size of the IV. The exclusive OR of the -padded packet number and the IV forms the AEAD nonce.¶
-The associated data, A, for the AEAD is the contents of the QUIC header, -starting from the first byte of either the short or long header, up to and -including the unprotected packet number.¶
-The input plaintext, P, for the AEAD is the payload of the QUIC packet, as -described in [QUIC-TRANSPORT].¶
-The output ciphertext, C, of the AEAD is transmitted in place of P.¶
-Some AEAD functions have limits for how many packets can be encrypted under the -same key and IV; see Section 6.6. This might be lower than the packet -number limit. An endpoint MUST initiate a key update (Section 6) prior to -exceeding any limit set for the AEAD that is in use.¶
-Parts of QUIC packet headers, in particular the Packet Number field, are -protected using a key that is derived separately from the packet protection key -and IV. The key derived using the "quic hp" label is used to provide -confidentiality protection for those fields that are not exposed to on-path -elements.¶
-This protection applies to the least-significant bits of the first byte, plus -the Packet Number field. The four least-significant bits of the first byte are -protected for packets with long headers; the five least significant bits of the -first byte are protected for packets with short headers. For both header forms, -this covers the reserved bits and the Packet Number Length field; the Key Phase -bit is also protected for packets with a short header.¶
-The same header protection key is used for the duration of the connection, with -the value not changing after a key update (see Section 6). This allows -header protection to be used to protect the key phase.¶
-This process does not apply to Retry or Version Negotiation packets, which do -not contain a protected payload or any of the fields that are protected by this -process.¶
-Header protection is applied after packet protection is applied (see Section 5.3). -The ciphertext of the packet is sampled and used as input to an encryption -algorithm. The algorithm used depends on the negotiated AEAD.¶
-The output of this algorithm is a 5-byte mask that is applied to the protected -header fields using exclusive OR. The least significant bits of the first byte -of the packet are masked by the least significant bits of the first mask byte, -and the packet number is masked with the remaining bytes. Any unused bytes of -mask that might result from a shorter packet number encoding are unused.¶
-Figure 6 shows a sample algorithm for applying header protection. Removing -header protection only differs in the order in which the packet number length -(pn_length) is determined.¶
-Specific header protection functions are defined based on the selected cipher -suite; see Section 5.4.3 and Section 5.4.4.¶
-Figure 7 shows an example long header packet (Initial) and a short header -packet (1-RTT). Figure 7 shows the fields in each header that are covered -by header protection and the portion of the protected packet payload that is -sampled.¶
-Before a TLS cipher suite can be used with QUIC, a header protection algorithm -MUST be specified for the AEAD used with that cipher suite. This document -defines algorithms for AEAD_AES_128_GCM, AEAD_AES_128_CCM, AEAD_AES_256_GCM (all -these AES AEADs are defined in [AEAD]), and AEAD_CHACHA20_POLY1305 -(defined in [CHACHA]). Prior to TLS selecting a cipher suite, AES -header protection is used (Section 5.4.3), matching the AEAD_AES_128_GCM packet -protection.¶
-The header protection algorithm uses both the header protection key and a sample -of the ciphertext from the packet Payload field.¶
-The same number of bytes are always sampled, but an allowance needs to be made -for the endpoint removing protection, which will not know the length of the -Packet Number field. In sampling the packet ciphertext, the Packet Number field -is assumed to be 4 bytes long (its maximum possible encoded length).¶
-An endpoint MUST discard packets that are not long enough to contain a complete -sample.¶
-To ensure that sufficient data is available for sampling, packets are padded so -that the combined lengths of the encoded packet number and protected payload is -at least 4 bytes longer than the sample required for header protection. The -cipher suites defined in [TLS13] - other than TLS_AES_128_CCM_8_SHA256, for -which a header protection scheme is not defined in this document - have 16-byte -expansions and 16-byte header protection samples. This results in needing at -least 3 bytes of frames in the unprotected payload if the packet number is -encoded on a single byte, or 2 bytes of frames for a 2-byte packet number -encoding.¶
-The sampled ciphertext for a packet with a short header can be determined by the -following pseudocode:¶
--sample_offset = 1 + len(connection_id) + 4 - -sample = packet[sample_offset..sample_offset+sample_length] -¶ -
For example, for a packet with a short header, an 8-byte connection ID, and -protected with AEAD_AES_128_GCM, the sample takes bytes 13 to 28 inclusive -(using zero-based indexing).¶
-A packet with a long header is sampled in the same way, noting that multiple -QUIC packets might be included in the same UDP datagram and that each one is -handled separately.¶
--sample_offset = 7 + len(destination_connection_id) + - len(source_connection_id) + - len(payload_length) + 4 -if packet_type == Initial: - sample_offset += len(token_length) + - len(token) - -sample = packet[sample_offset..sample_offset+sample_length] -¶ -
This section defines the packet protection algorithm for AEAD_AES_128_GCM, -AEAD_AES_128_CCM, and AEAD_AES_256_GCM. AEAD_AES_128_GCM and AEAD_AES_128_CCM -use 128-bit AES in electronic code-book (ECB) mode. AEAD_AES_256_GCM uses -256-bit AES in ECB mode. AES is defined in [AES].¶
-This algorithm samples 16 bytes from the packet ciphertext. This value is used -as the input to AES-ECB. In pseudocode, the header protection function is -defined as:¶
--header_protection(hp_key, sample): - mask = AES-ECB(hp_key, sample) -¶ -
When AEAD_CHACHA20_POLY1305 is in use, header protection uses the raw ChaCha20 -function as defined in Section 2.4 of [CHACHA]. This uses a 256-bit key and -16 bytes sampled from the packet protection output.¶
-The first 4 bytes of the sampled ciphertext are the block counter. A ChaCha20 -implementation could take a 32-bit integer in place of a byte sequence, in -which case the byte sequence is interpreted as a little-endian value.¶
-The remaining 12 bytes are used as the nonce. A ChaCha20 implementation might -take an array of three 32-bit integers in place of a byte sequence, in which -case the nonce bytes are interpreted as a sequence of 32-bit little-endian -integers.¶
-The encryption mask is produced by invoking ChaCha20 to protect 5 zero bytes. In -pseudocode, the header protection function is defined as:¶
--header_protection(hp_key, sample): - counter = sample[0..3] - nonce = sample[4..15] - mask = ChaCha20(hp_key, counter, nonce, {0,0,0,0,0}) -¶ -
Once an endpoint successfully receives a packet with a given packet number, it -MUST discard all packets in the same packet number space with higher packet -numbers if they cannot be successfully unprotected with either the same key, or -- if there is a key update - the next packet protection key (see -Section 6). Similarly, a packet that appears to trigger a key update, but -cannot be unprotected successfully MUST be discarded.¶
-Failure to unprotect a packet does not necessarily indicate the existence of a -protocol error in a peer or an attack. The truncated packet number encoding -used in QUIC can cause packet numbers to be decoded incorrectly if they are -delayed significantly.¶
-If 0-RTT keys are available (see Section 4.6.1), the lack of replay protection -means that restrictions on their use are necessary to avoid replay attacks on -the protocol.¶
-A client MUST only use 0-RTT keys to protect data that is idempotent. A client -MAY wish to apply additional restrictions on what data it sends prior to the -completion of the TLS handshake. A client otherwise treats 0-RTT keys as -equivalent to 1-RTT keys, except that it MUST NOT send ACKs with 0-RTT keys.¶
-A client that receives an indication that its 0-RTT data has been accepted by a -server can send 0-RTT data until it receives all of the server's handshake -messages. A client SHOULD stop sending 0-RTT data if it receives an indication -that 0-RTT data has been rejected.¶
-A server MUST NOT use 0-RTT keys to protect packets; it uses 1-RTT keys to -protect acknowledgements of 0-RTT packets. A client MUST NOT attempt to -decrypt 0-RTT packets it receives and instead MUST discard them.¶
-Once a client has installed 1-RTT keys, it MUST NOT send any more 0-RTT -packets.¶
-0-RTT data can be acknowledged by the server as it receives it, but any -packets containing acknowledgments of 0-RTT data cannot have packet protection -removed by the client until the TLS handshake is complete. The 1-RTT keys -necessary to remove packet protection cannot be derived until the client -receives all server handshake messages.¶
-Due to reordering and loss, protected packets might be received by an endpoint -before the final TLS handshake messages are received. A client will be unable -to decrypt 1-RTT packets from the server, whereas a server will be able to -decrypt 1-RTT packets from the client. Endpoints in either role MUST NOT -decrypt 1-RTT packets from their peer prior to completing the handshake.¶
-Even though 1-RTT keys are available to a server after receiving the first -handshake messages from a client, it is missing assurances on the client state:¶
-Therefore, the server's use of 1-RTT keys before the handshake is complete is -limited to sending data. A server MUST NOT process incoming 1-RTT protected -packets before the TLS handshake is complete. Because sending acknowledgments -indicates that all frames in a packet have been processed, a server cannot send -acknowledgments for 1-RTT packets until the TLS handshake is complete. Received -packets protected with 1-RTT keys MAY be stored and later decrypted and used -once the handshake is complete.¶
-TLS implementations might provide all 1-RTT secrets prior to handshake -completion. Even where QUIC implementations have 1-RTT read keys, those keys -cannot be used prior to completing the handshake.¶
-The requirement for the server to wait for the client Finished message creates -a dependency on that message being delivered. A client can avoid the -potential for head-of-line blocking that this implies by sending its 1-RTT -packets coalesced with a Handshake packet containing a copy of the CRYPTO frame -that carries the Finished message, until one of the Handshake packets is -acknowledged. This enables immediate server processing for those packets.¶
-A server could receive packets protected with 0-RTT keys prior to receiving a -TLS ClientHello. The server MAY retain these packets for later decryption in -anticipation of receiving a ClientHello.¶
-A client generally receives 1-RTT keys at the same time as the handshake -completes. Even if it has 1-RTT secrets, a client MUST NOT process -incoming 1-RTT protected packets before the TLS handshake is complete.¶
-Retry packets (see the Retry Packet section of [QUIC-TRANSPORT]) carry a -Retry Integrity Tag that provides two properties: it allows discarding -packets that have accidentally been corrupted by the network, and it diminishes -off-path attackers' ability to send valid Retry packets.¶
-The Retry Integrity Tag is a 128-bit field that is computed as the output of -AEAD_AES_128_GCM ([AEAD]) used with the following inputs:¶
-The secret key and the nonce are values derived by calling HKDF-Expand-Label -using 0x8b0d37eb8535022ebc8d76a207d80df22646ec06dc809642c30a8baa2baaff4c as the -secret, with labels being "quic key" and "quic iv" (Section 5.1).¶
-The Retry Pseudo-Packet is not sent over the wire. It is computed by taking -the transmitted Retry packet, removing the Retry Integrity Tag and prepending -the two following fields:¶
-The ODCID Length field contains the length in bytes of the Original -Destination Connection ID field that follows it, encoded as an 8-bit unsigned -integer.¶
-The Original Destination Connection ID contains the value of the Destination -Connection ID from the Initial packet that this Retry is in response to. The -length of this field is given in ODCID Length. The presence of this field -mitigates an off-path attacker's ability to inject a Retry packet.¶
-Once the handshake is confirmed (see Section 4.1.2), an endpoint MAY -initiate a key update.¶
-The Key Phase bit indicates which packet protection keys are used to protect the -packet. The Key Phase bit is initially set to 0 for the first set of 1-RTT -packets and toggled to signal each subsequent key update.¶
-The Key Phase bit allows a recipient to detect a change in keying material -without needing to receive the first packet that triggered the change. An -endpoint that notices a changed Key Phase bit updates keys and decrypts the -packet that contains the changed value.¶
-This mechanism replaces the TLS KeyUpdate message. Endpoints MUST NOT send a -TLS KeyUpdate message. Endpoints MUST treat the receipt of a TLS KeyUpdate -message as a connection error of type 0x10a, equivalent to a fatal TLS alert of -unexpected_message (see Section 4.8).¶
-Figure 9 shows a key update process, where the initial set of keys used -(identified with @M) are replaced by updated keys (identified with @N). The -value of the Key Phase bit is indicated in brackets [].¶
-Endpoints maintain separate read and write secrets for packet protection. An -endpoint initiates a key update by updating its packet protection write secret -and using that to protect new packets. The endpoint creates a new write secret -from the existing write secret as performed in Section 7.2 of [TLS13]. This -uses the KDF function provided by TLS with a label of "quic ku". The -corresponding key and IV are created from that secret as defined in -Section 5.1. The header protection key is not updated.¶
-For example, to update write keys with TLS 1.3, HKDF-Expand-Label is used as:¶
--secret_<n+1> = HKDF-Expand-Label(secret_<n>, "quic ku", - "", Hash.length) -¶ -
The endpoint toggles the value of the Key Phase bit and uses the updated key and -IV to protect all subsequent packets.¶
-An endpoint MUST NOT initiate a key update prior to having confirmed the -handshake (Section 4.1.2). An endpoint MUST NOT initiate a subsequent -key update unless it has received an acknowledgment for a packet that was sent -protected with keys from the current key phase. This ensures that keys are -available to both peers before another key update can be initiated. This can be -implemented by tracking the lowest packet number sent with each key phase, and -the highest acknowledged packet number in the 1-RTT space: once the latter is -higher than or equal to the former, another key update can be initiated.¶
-Keys of packets other than the 1-RTT packets are never updated; their keys are -derived solely from the TLS handshake state.¶
-The endpoint that initiates a key update also updates the keys that it uses for -receiving packets. These keys will be needed to process packets the peer sends -after updating.¶
-An endpoint MUST retain old keys until it has successfully unprotected a packet -sent using the new keys. An endpoint SHOULD retain old keys for some time -after unprotecting a packet sent using the new keys. Discarding old keys too -early can cause delayed packets to be discarded. Discarding packets will be -interpreted as packet loss by the peer and could adversely affect performance.¶
-A peer is permitted to initiate a key update after receiving an acknowledgement -of a packet in the current key phase. An endpoint detects a key update when -processing a packet with a key phase that differs from the value used to protect -the last packet it sent. To process this packet, the endpoint uses the next -packet protection key and IV. See Section 6.3 for considerations -about generating these keys.¶
-If a packet is successfully processed using the next key and IV, then the peer -has initiated a key update. The endpoint MUST update its send keys to the -corresponding key phase in response, as described in Section 6.1. -Sending keys MUST be updated before sending an acknowledgement for the packet -that was received with updated keys. By acknowledging the packet that triggered -the key update in a packet protected with the updated keys, the endpoint signals -that the key update is complete.¶
-An endpoint can defer sending the packet or acknowledgement according to its -normal packet sending behaviour; it is not necessary to immediately generate a -packet in response to a key update. The next packet sent by the endpoint will -use the updated keys. The next packet that contains an acknowledgement will -cause the key update to be completed. If an endpoint detects a second update -before it has sent any packets with updated keys containing an -acknowledgement for the packet that initiated the key update, it indicates that -its peer has updated keys twice without awaiting confirmation. An endpoint MAY -treat consecutive key updates as a connection error of type KEY_UPDATE_ERROR.¶
-An endpoint that receives an acknowledgement that is carried in a packet -protected with old keys where any acknowledged packet was protected with newer -keys MAY treat that as a connection error of type KEY_UPDATE_ERROR. This -indicates that a peer has received and acknowledged a packet that initiates a -key update, but has not updated keys in response.¶
-Endpoints responding to an apparent key update MUST NOT generate a timing -side-channel signal that might indicate that the Key Phase bit was invalid (see -Section 9.4). Endpoints can use dummy packet protection keys in -place of discarded keys when key updates are not yet permitted. Using dummy -keys will generate no variation in the timing signal produced by attempting to -remove packet protection, and results in all packets with an invalid Key Phase -bit being rejected.¶
-The process of creating new packet protection keys for receiving packets could -reveal that a key update has occurred. An endpoint MAY perform this process as -part of packet processing, but this creates a timing signal that can be used by -an attacker to learn when key updates happen and thus the value of the Key Phase -bit in certain packets. Endpoints MAY instead defer the creation of the next -set of receive packet protection keys until some time after a key update -completes, up to three times the PTO; see Section 6.5.¶
-Once generated, the next set of packet protection keys SHOULD be retained, even -if the packet that was received was subsequently discarded. Packets containing -apparent key updates are easy to forge and - while the process of key update -does not require significant effort - triggering this process could be used by -an attacker for DoS.¶
-For this reason, endpoints MUST be able to retain two sets of packet protection -keys for receiving packets: the current and the next. Retaining the previous -keys in addition to these might improve performance, but this is not essential.¶
-An endpoint never sends packets that are protected with old keys. Only the -current keys are used. Keys used for protecting packets can be discarded -immediately after switching to newer keys.¶
-Packets with higher packet numbers MUST be protected with either the same or -newer packet protection keys than packets with lower packet numbers. An -endpoint that successfully removes protection with old keys when newer keys were -used for packets with lower packet numbers MUST treat this as a connection error -of type KEY_UPDATE_ERROR.¶
-For receiving packets during a key update, packets protected with older keys -might arrive if they were delayed by the network. Retaining old packet -protection keys allows these packets to be successfully processed.¶
-As packets protected with keys from the next key phase use the same Key Phase -value as those protected with keys from the previous key phase, it can be -necessary to distinguish between the two. This can be done using packet -numbers. A recovered packet number that is lower than any packet number from -the current key phase uses the previous packet protection keys; a recovered -packet number that is higher than any packet number from the current key phase -requires the use of the next packet protection keys.¶
-Some care is necessary to ensure that any process for selecting between -previous, current, and next packet protection keys does not expose a timing side -channel that might reveal which keys were used to remove packet protection. See -Section 9.5 for more information.¶
-Alternatively, endpoints can retain only two sets of packet protection keys, -swapping previous for next after enough time has passed to allow for reordering -in the network. In this case, the Key Phase bit alone can be used to select -keys.¶
-An endpoint MAY allow a period of approximately the Probe Timeout (PTO; see -[QUIC-RECOVERY]) after receiving a packet that uses the new key generation -before it creates the next set of packet protection keys. These updated keys -MAY replace the previous keys at that time. With the caveat that PTO is a -subjective measure - that is, a peer could have a different view of the RTT - -this time is expected to be long enough that any reordered packets would be -declared lost by a peer even if they were acknowledged and short enough to -allow for subsequent key updates.¶
-Endpoints need to allow for the possibility that a peer might not be able to -decrypt packets that initiate a key update during the period when it retains old -keys. Endpoints SHOULD wait three times the PTO before initiating a key update -after receiving an acknowledgment that confirms that the previous key update was -received. Failing to allow sufficient time could lead to packets being -discarded.¶
-An endpoint SHOULD retain old read keys for no more than three times the PTO -after having received a packet protected using the new keys. After this period, -old read keys and their corresponding secrets SHOULD be discarded.¶
-This document sets usage limits for AEAD algorithms to ensure that overuse does -not give an adversary a disproportionate advantage in attacking the -confidentiality and integrity of communications when using QUIC.¶
-The usage limits defined in TLS 1.3 exist for protection against attacks -on confidentiality and apply to successful applications of AEAD protection. The -integrity protections in authenticated encryption also depend on limiting the -number of attempts to forge packets. TLS achieves this by closing connections -after any record fails an authentication check. In comparison, QUIC ignores any -packet that cannot be authenticated, allowing multiple forgery attempts.¶
-QUIC accounts for AEAD confidentiality and integrity limits separately. The -confidentiality limit applies to the number of packets encrypted with a given -key. The integrity limit applies to the number of packets decrypted within a -given connection. Details on enforcing these limits for each AEAD algorithm -follow below.¶
-Endpoints MUST count the number of encrypted packets for each set of keys. If -the total number of encrypted packets with the same key exceeds the -confidentiality limit for the selected AEAD, the endpoint MUST stop using those -keys. Endpoints MUST initiate a key update before sending more protected packets -than the confidentiality limit for the selected AEAD permits. If a key update -is not possible or integrity limits are reached, the endpoint MUST stop using -the connection and only send stateless resets in response to receiving packets. -It is RECOMMENDED that endpoints immediately close the connection with a -connection error of type AEAD_LIMIT_REACHED before reaching a state where key -updates are not possible.¶
-For AEAD_AES_128_GCM and AEAD_AES_256_GCM, the confidentiality limit is 2^23 -encrypted packets; see Appendix B.1. For AEAD_CHACHA20_POLY1305, the -confidentiality limit is greater than the number of possible packets (2^62) and -so can be disregarded. For AEAD_AES_128_CCM, the confidentiality limit is 2^21.5 -encrypted packets; see Appendix B.2. Applying a limit reduces the probability -that an attacker can distinguish the AEAD in use from a random permutation; see -[AEBounds], [ROBUST], and [GCM-MU].¶
-In addition to counting packets sent, endpoints MUST count the number of -received packets that fail authentication during the lifetime of a connection. -If the total number of received packets that fail authentication within the -connection, across all keys, exceeds the integrity limit for the selected AEAD, -the endpoint MUST immediately close the connection with a connection error of -type AEAD_LIMIT_REACHED and not process any more packets.¶
-For AEAD_AES_128_GCM and AEAD_AES_256_GCM, the integrity limit is 2^52 invalid -packets; see Appendix B.1. For AEAD_CHACHA20_POLY1305, the integrity limit is -2^36 invalid packets; see [AEBounds]. For AEAD_AES_128_CCM, the integrity -limit is 2^21.5 invalid packets; see Appendix B.2. Applying this limit reduces -the probability that an attacker can successfully forge a packet; see -[AEBounds], [ROBUST], and [GCM-MU].¶
-Endpoints that limit the size of packets MAY use higher confidentiality and -integrity limits; see Appendix B for details.¶
-Future analyses and specifications MAY relax confidentiality or integrity limits -for an AEAD.¶
-These limits were originally calculated using assumptions about the -limits on TLS record size. The maximum size of a TLS record is 2^14 bytes. -In comparison, QUIC packets can be up to 2^16 bytes. However, it is -expected that QUIC packets will generally be smaller than TLS records. -Where packets might be larger than 2^14 bytes in length, smaller limits might -be needed.¶
-Any TLS cipher suite that is specified for use with QUIC MUST define limits on -the use of the associated AEAD function that preserves margins for -confidentiality and integrity. That is, limits MUST be specified for the number -of packets that can be authenticated and for the number of packets that can fail -authentication. Providing a reference to any analysis upon which values are -based - and any assumptions used in that analysis - allows limits to be adapted -to varying usage conditions.¶
-The KEY_UPDATE_ERROR error code (0xe) is used to signal errors related to key -updates.¶
-Initial packets are not protected with a secret key, so they are subject to -potential tampering by an attacker. QUIC provides protection against attackers -that cannot read packets, but does not attempt to provide additional protection -against attacks where the attacker can observe and inject packets. Some forms -of tampering -- such as modifying the TLS messages themselves -- are detectable, -but some -- such as modifying ACKs -- are not.¶
-For example, an attacker could inject a packet containing an ACK frame that -makes it appear that a packet had not been received or to create a false -impression of the state of the connection (e.g., by modifying the ACK Delay). -Note that such a packet could cause a legitimate packet to be dropped as a -duplicate. Implementations SHOULD use caution in relying on any data that is -contained in Initial packets that is not otherwise authenticated.¶
-It is also possible for the attacker to tamper with data that is carried in -Handshake packets, but because that tampering requires modifying TLS handshake -messages, that tampering will cause the TLS handshake to fail.¶
-Certain aspects of the TLS handshake are different when used with QUIC.¶
-QUIC also requires additional features from TLS. In addition to negotiation of -cryptographic parameters, the TLS handshake carries and authenticates values for -QUIC transport parameters.¶
-QUIC requires that the cryptographic handshake provide authenticated protocol -negotiation. TLS uses Application Layer Protocol Negotiation -([ALPN]) to select an application protocol. Unless another mechanism -is used for agreeing on an application protocol, endpoints MUST use ALPN for -this purpose.¶
-When using ALPN, endpoints MUST immediately close a connection (see Section -10.2 of [QUIC-TRANSPORT]) with a no_application_protocol TLS alert (QUIC error -code 0x178; see Section 4.8) if an application protocol is not negotiated. -While [ALPN] only specifies that servers use this alert, QUIC clients MUST -use error 0x178 to terminate a connection when ALPN negotiation fails.¶
-An application protocol MAY restrict the QUIC versions that it can operate over. -Servers MUST select an application protocol compatible with the QUIC version -that the client has selected. The server MUST treat the inability to select a -compatible application protocol as a connection error of type 0x178 -(no_application_protocol). Similarly, a client MUST treat the selection of an -incompatible application protocol by a server as a connection error of type -0x178.¶
-QUIC transport parameters are carried in a TLS extension. Different versions of -QUIC might define a different method for negotiating transport configuration.¶
-Including transport parameters in the TLS handshake provides integrity -protection for these values.¶
-- enum { - quic_transport_parameters(0xffa5), (65535) - } ExtensionType; -¶ -
The extension_data field of the quic_transport_parameters extension contains a -value that is defined by the version of QUIC that is in use.¶
-The quic_transport_parameters extension is carried in the ClientHello and the -EncryptedExtensions messages during the handshake. Endpoints MUST send the -quic_transport_parameters extension; endpoints that receive ClientHello or -EncryptedExtensions messages without the quic_transport_parameters extension -MUST close the connection with an error of type 0x16d (equivalent to a fatal TLS -missing_extension alert, see Section 4.8).¶
-While the transport parameters are technically available prior to the completion -of the handshake, they cannot be fully trusted until the handshake completes, -and reliance on them should be minimized. However, any tampering with the -parameters will cause the handshake to fail.¶
-Endpoints MUST NOT send this extension in a TLS connection that does not use -QUIC (such as the use of TLS with TCP defined in [TLS13]). A fatal -unsupported_extension alert MUST be sent by an implementation that supports this -extension if the extension is received when the transport is not QUIC.¶
-The TLS EndOfEarlyData message is not used with QUIC. QUIC does not rely on -this message to mark the end of 0-RTT data or to signal the change to Handshake -keys.¶
-Clients MUST NOT send the EndOfEarlyData message. A server MUST treat receipt -of a CRYPTO frame in a 0-RTT packet as a connection error of type -PROTOCOL_VIOLATION.¶
-As a result, EndOfEarlyData does not appear in the TLS handshake transcript.¶
-Appendix D.4 of [TLS13] describes an alteration to the TLS 1.3 handshake as -a workaround for bugs in some middleboxes. The TLS 1.3 middlebox compatibility -mode involves setting the legacy_session_id field to a 32-byte value in the -ClientHello and ServerHello, then sending a change_cipher_spec record. Both -field and record carry no semantic content and are ignored.¶
-This mode has no use in QUIC as it only applies to middleboxes that interfere -with TLS over TCP. QUIC also provides no means to carry a change_cipher_spec -record. A client MUST NOT request the use of the TLS 1.3 compatibility mode. A -server SHOULD treat the receipt of a TLS ClientHello with a non-empty -legacy_session_id field as a connection error of type PROTOCOL_VIOLATION.¶
-All of the security considerations that apply to TLS also apply to the use of -TLS in QUIC. Reading all of [TLS13] and its appendices is the best way to -gain an understanding of the security properties of QUIC.¶
-This section summarizes some of the more important security aspects specific to -the TLS integration, though there are many security-relevant details in the -remainder of the document.¶
-Use of TLS session tickets allows servers and possibly other entities to -correlate connections made by the same client; see Section 4.5 for details.¶
-As described in Section 8 of [TLS13], use of TLS early data comes with an -exposure to replay attack. The use of 0-RTT in QUIC is similarly vulnerable to -replay attack.¶
-Endpoints MUST implement and use the replay protections described in [TLS13], -however it is recognized that these protections are imperfect. Therefore, -additional consideration of the risk of replay is needed.¶
-QUIC is not vulnerable to replay attack, except via the application protocol -information it might carry. The management of QUIC protocol state based on the -frame types defined in [QUIC-TRANSPORT] is not vulnerable to replay. -Processing of QUIC frames is idempotent and cannot result in invalid connection -states if frames are replayed, reordered or lost. QUIC connections do not -produce effects that last beyond the lifetime of the connection, except for -those produced by the application protocol that QUIC serves.¶
-TLS session tickets and address validation tokens are used to carry QUIC -configuration information between connections. Specifically, to enable a -server to efficiently recover state that is used in connection establishment -and address validation. These MUST NOT be used to communicate application -semantics between endpoints; clients MUST treat them as opaque values. The -potential for reuse of these tokens means that they require stronger -protections against replay.¶
-A server that accepts 0-RTT on a connection incurs a higher cost than accepting -a connection without 0-RTT. This includes higher processing and computation -costs. Servers need to consider the probability of replay and all associated -costs when accepting 0-RTT.¶
-Ultimately, the responsibility for managing the risks of replay attacks with -0-RTT lies with an application protocol. An application protocol that uses QUIC -MUST describe how the protocol uses 0-RTT and the measures that are employed to -protect against replay attack. An analysis of replay risk needs to consider -all QUIC protocol features that carry application semantics.¶
-Disabling 0-RTT entirely is the most effective defense against replay attack.¶
-QUIC extensions MUST describe how replay attacks affect their operation, or -prohibit their use in 0-RTT. Application protocols MUST either prohibit the use -of extensions that carry application semantics in 0-RTT or provide replay -mitigation strategies.¶
-A small ClientHello that results in a large block of handshake messages from a -server can be used in packet reflection attacks to amplify the traffic generated -by an attacker.¶
-QUIC includes three defenses against this attack. First, the packet containing a -ClientHello MUST be padded to a minimum size. Second, if responding to an -unverified source address, the server is forbidden to send more than three times -as many bytes as the number of bytes it has received (see Section 8.1 of -[QUIC-TRANSPORT]). Finally, because acknowledgements of Handshake packets are -authenticated, a blind attacker cannot forge them. Put together, these defenses -limit the level of amplification.¶
-[NAN] analyzes authenticated encryption
-algorithms that provide nonce privacy, referred to as "Hide Nonce" (HN)
-transforms. The general header protection construction in this document is
-one of those algorithms (HN1). Header protection uses the output of the packet
-protection AEAD to derive sample
, and then encrypts the header field using
-a pseudorandom function (PRF) as follows:¶
-protected_field = field XOR PRF(hp_key, sample) -¶ -
The header protection variants in this document use a pseudorandom permutation -(PRP) in place of a generic PRF. However, since all PRPs are also PRFs [IMC], -these variants do not deviate from the HN1 construction.¶
-As hp_key
is distinct from the packet protection key, it follows that header
-protection achieves AE2 security as defined in [NAN] and therefore guarantees
-privacy of field
, the protected packet header. Future header protection
-variants based on this construction MUST use a PRF to ensure equivalent
-security guarantees.¶
Use of the same key and ciphertext sample more than once risks compromising -header protection. Protecting two different headers with the same key and -ciphertext sample reveals the exclusive OR of the protected fields. Assuming -that the AEAD acts as a PRF, if L bits are sampled, the odds of two ciphertext -samples being identical approach 2^(-L/2), that is, the birthday bound. For the -algorithms described in this document, that probability is one in 2^64.¶
-To prevent an attacker from modifying packet headers, the header is transitively -authenticated using packet protection; the entire packet header is part of the -authenticated additional data. Protected fields that are falsified or modified -can only be detected once the packet protection is removed.¶
-An attacker could guess values for packet numbers or Key Phase and have an -endpoint confirm guesses through timing side channels. Similarly, guesses for -the packet number length can be tried and exposed. If the recipient of a -packet discards packets with duplicate packet numbers without attempting to -remove packet protection they could reveal through timing side-channels that the -packet number matches a received packet. For authentication to be free from -side-channels, the entire process of header protection removal, packet number -recovery, and packet protection removal MUST be applied together without timing -and other side-channels.¶
-For the sending of packets, construction and protection of packet payloads and -packet numbers MUST be free from side-channels that would reveal the packet -number or its encoded size.¶
-During a key update, the time taken to generate new keys could reveal through -timing side-channels that a key update has occurred. Alternatively, where an -attacker injects packets this side-channel could reveal the value of the Key -Phase on injected packets. After receiving a key update, an endpoint SHOULD -generate and save the next set of receive packet protection keys, as described -in Section 6.3. By generating new keys before a key update is -received, receipt of packets will not create timing signals that leak the value -of the Key Phase.¶
-This depends on not doing this key generation during packet processing and it -can require that endpoints maintain three sets of packet protection keys for -receiving: for the previous key phase, for the current key phase, and for the -next key phase. Endpoints can instead choose to defer generation of the next -receive packet protection keys until they discard old keys so that only two sets -of receive keys need to be retained at any point in time.¶
-In using TLS, the central key schedule of TLS is used. As a result of the TLS -handshake messages being integrated into the calculation of secrets, the -inclusion of the QUIC transport parameters extension ensures that handshake and -1-RTT keys are not the same as those that might be produced by a server running -TLS over TCP. To avoid the possibility of cross-protocol key synchronization, -additional measures are provided to improve key separation.¶
-The QUIC packet protection keys and IVs are derived using a different label than -the equivalent keys in TLS.¶
-To preserve this separation, a new version of QUIC SHOULD define new labels for -key derivation for packet protection key and IV, plus the header protection -keys. This version of QUIC uses the string "quic". Other versions can use a -version-specific label in place of that string.¶
-The initial secrets use a key that is specific to the negotiated QUIC version. -New QUIC versions SHOULD define a new salt value used in calculating initial -secrets.¶
-QUIC depends on endpoints being able to generate secure random numbers, both -directly for protocol values such as the connection ID, and transitively via -TLS. See [RFC4086] for guidance on secure random number generation.¶
-This document registers the quic_transport_parameters extension found in -Section 8.2 in the TLS ExtensionType Values Registry -[TLS-REGISTRIES].¶
-The Recommended column is to be marked Yes. The TLS 1.3 Column is to include CH -and EE.¶
-This section shows examples of packet protection so that implementations can be -verified incrementally. Samples of Initial packets from both client and server, -plus a Retry packet are defined. These packets use an 8-byte client-chosen -Destination Connection ID of 0x8394c8f03e515708. Some intermediate values are -included. All values are shown in hexadecimal.¶
-The labels generated during the execution of the HKDF-Expand-Label function and -given to the HKDF-Expand function in order to produce its output are:¶
-00200f746c73313320636c69656e7420696e00¶
-00200f746c7331332073657276657220696e00¶
-00100e746c7331332071756963206b657900¶
-000c0d746c733133207175696320697600¶
-00100d746c733133207175696320687000¶
-The initial secret is common:¶
--initial_secret = HKDF-Extract(initial_salt, cid) - = 1e7e7764529715b1e0ddc8e9753c6157 - 6769605187793ed366f8bbf8c9e986eb -¶ -
The secrets for protecting client packets are:¶
--client_initial_secret - = HKDF-Expand-Label(initial_secret, "client in", _, 32) - = 0088119288f1d866733ceeed15ff9d50 - 902cf82952eee27e9d4d4918ea371d87 - -key = HKDF-Expand-Label(client_initial_secret, "quic key", _, 16) - = 175257a31eb09dea9366d8bb79ad80ba - -iv = HKDF-Expand-Label(client_initial_secret, "quic iv", _, 12) - = 6b26114b9cba2b63a9e8dd4f - -hp = HKDF-Expand-Label(client_initial_secret, "quic hp", _, 16) - = 9ddd12c994c0698b89374a9c077a3077 -¶ -
The secrets for protecting server packets are:¶
--server_initial_secret - = HKDF-Expand-Label(initial_secret, "server in", _, 32) - = 006f881359244dd9ad1acf85f595bad6 - 7c13f9f5586f5e64e1acae1d9ea8f616 - -key = HKDF-Expand-Label(server_initial_secret, "quic key", _, 16) - = 149d0b1662ab871fbe63c49b5e655a5d - -iv = HKDF-Expand-Label(server_initial_secret, "quic iv", _, 12) - = bab2b12a4c76016ace47856d - -hp = HKDF-Expand-Label(server_initial_secret, "quic hp", _, 16) - = c0c499a65a60024a18a250974ea01dfa -¶ -
The client sends an Initial packet. The unprotected payload of this packet -contains the following CRYPTO frame, plus enough PADDING frames to make a 1162 -byte payload:¶
--060040f1010000ed0303ebf8fa56f129 39b9584a3896472ec40bb863cfd3e868 -04fe3a47f06a2b69484c000004130113 02010000c000000010000e00000b6578 -616d706c652e636f6dff01000100000a 00080006001d00170018001000070005 -04616c706e0005000501000000000033 00260024001d00209370b2c9caa47fba -baf4559fedba753de171fa71f50f1ce1 5d43e994ec74d748002b000302030400 -0d0010000e0403050306030203080408 050806002d00020101001c00024001ff -a500320408ffffffffffffffff050480 00ffff07048000ffff08011001048000 -75300901100f088394c8f03e51570806 048000ffff -¶ -
The unprotected header includes the connection ID and a 4-byte packet number -encoding for a packet number of 2:¶
--c3ff000020088394c8f03e5157080000449e00000002 -¶ -
Protecting the payload produces output that is sampled for header protection. -Because the header uses a 4-byte packet number encoding, the first 16 bytes of -the protected payload is sampled, then applied to the header:¶
--sample = fb66bc6a93032b50dd8973972d149421 - -mask = AES-ECB(hp, sample)[0..4] - = 1e9cdb9909 - -header[0] ^= mask[0] & 0x0f - = cd -header[18..21] ^= mask[1..4] - = 9cdb990b -header = cdff000020088394c8f03e5157080000449e9cdb990b -¶ -
The resulting protected packet is:¶
--cdff000020088394c8f03e5157080000 449e9cdb990bfb66bc6a93032b50dd89 -73972d149421874d3849e3708d71354e a33bcdc356f3ea6e2a1a1bd7c3d14003 -8d3e784d04c30a2cdb40c32523aba2da fe1c1bf3d27a6be38fe38ae033fbb071 -3c1c73661bb6639795b42b97f77068ea d51f11fbf9489af2501d09481e6c64d4 -b8551cd3cea70d830ce2aeeec789ef55 1a7fbe36b3f7e1549a9f8d8e153b3fac -3fb7b7812c9ed7c20b4be190ebd89956 26e7f0fc887925ec6f0606c5d36aa81b -ebb7aacdc4a31bb5f23d55faef5c5190 5783384f375a43235b5c742c78ab1bae -0a188b75efbde6b3774ed61282f9670a 9dea19e1566103ce675ab4e21081fb58 -60340a1e88e4f10e39eae25cd685b109 29636d4f02e7fad2a5a458249f5c0298 -a6d53acbe41a7fc83fa7cc01973f7a74 d1237a51974e097636b6203997f921d0 -7bc1940a6f2d0de9f5a11432946159ed 6cc21df65c4ddd1115f86427259a196c -7148b25b6478b0dc7766e1c4d1b1f515 9f90eabc61636226244642ee148b464c -9e619ee50a5e3ddc836227cad938987c 4ea3c1fa7c75bbf88d89e9ada642b2b8 -8fe8107b7ea375b1b64889a4e9e5c38a 1c896ce275a5658d250e2d76e1ed3a34 -ce7e3a3f383d0c996d0bed106c2899ca 6fc263ef0455e74bb6ac1640ea7bfedc -59f03fee0e1725ea150ff4d69a7660c5 542119c71de270ae7c3ecfd1af2c4ce5 -51986949cc34a66b3e216bfe18b347e6 c05fd050f85912db303a8f054ec23e38 -f44d1c725ab641ae929fecc8e3cefa56 19df4231f5b4c009fa0c0bbc60bc75f7 -6d06ef154fc8577077d9d6a1d2bd9bf0 81dc783ece60111bea7da9e5a9748069 -d078b2bef48de04cabe3755b197d52b3 2046949ecaa310274b4aac0d008b1948 -c1082cdfe2083e386d4fd84c0ed0666d 3ee26c4515c4fee73433ac703b690a9f -7bf278a77486ace44c489a0c7ac8dfe4 d1a58fb3a730b993ff0f0d61b4d89557 -831eb4c752ffd39c10f6b9f46d8db278 da624fd800e4af85548a294c1518893a -8778c4f6d6d73c93df200960104e062b 388ea97dcf4016bced7f62b4f062cb6c -04c20693d9a0e3b74ba8fe74cc012378 84f40d765ae56a51688d985cf0ceaef4 -3045ed8c3f0c33bced08537f6882613a cd3b08d665fce9dd8aa73171e2d3771a -61dba2790e491d413d93d987e2745af2 9418e428be34941485c93447520ffe23 -1da2304d6a0fd5d07d08372202369661 59bef3cf904d722324dd852513df39ae -030d8173908da6364786d3c1bfcb19ea 77a63b25f1e7fc661def480c5d00d444 -56269ebd84efd8e3a8b2c257eec76060 682848cbf5194bc99e49ee75e4d0d254 -bad4bfd74970c30e44b65511d4ad0e6e c7398e08e01307eeeea14e46ccd87cf3 -6b285221254d8fc6a6765c524ded0085 dca5bd688ddf722e2c0faf9d0fb2ce7a -0c3f2cee19ca0ffba461ca8dc5d2c817 8b0762cf67135558494d2a96f1a139f0 -edb42d2af89a9c9122b07acbc29e5e72 2df8615c343702491098478a389c9872 -a10b0c9875125e257c7bfdf27eef4060 bd3d00f4c14fd3e3496c38d3c5d1a566 -8c39350effbc2d16ca17be4ce29f02ed 969504dda2a8c6b9ff919e693ee79e09 -089316e7d1d89ec099db3b2b268725d8 88536a4b8bf9aee8fb43e82a4d919d48 -b5a464ca5b62df3be35ee0d0a2ec68f3 -¶ -
The server sends the following payload in response, including an ACK frame, a -CRYPTO frame, and no PADDING frames:¶
--02000000000600405a020000560303ee fce7f7b37ba1d1632e96677825ddf739 -88cfc79825df566dc5430b9a045a1200 130100002e00330024001d00209d3c94 -0d89690b84d08a60993c144eca684d10 81287c834d5311bcf32bb9da1a002b00 -020304 -¶ -
The header from the server includes a new connection ID and a 2-byte packet -number encoding for a packet number of 1:¶
--c1ff0000200008f067a5502a4262b50040750001 -¶ -
As a result, after protection, the header protection sample is taken starting -from the third protected octet:¶
--sample = 823a5d24534d906ce4c76782a2167e34 -mask = abaaf34fdc -header = c7ff0000200008f067a5502a4262b5004075fb12 -¶ -
The final protected packet is then:¶
--c7ff0000200008f067a5502a4262b500 4075fb12ff07823a5d24534d906ce4c7 -6782a2167e3479c0f7f6395dc2c91676 302fe6d70bb7cbeb117b4ddb7d173498 -44fd61dae200b8338e1b932976b61d91 e64a02e9e0ee72e3a6f63aba4ceeeec5 -be2f24f2d86027572943533846caa13e 6f163fb257473d0eda5047360fd4a47e -fd8142fafc0f76 -¶ -
This shows a Retry packet that might be sent in response to the Initial packet -in Appendix A.2. The integrity check includes the client-chosen -connection ID value of 0x8394c8f03e515708, but that value is not -included in the final Retry packet:¶
--ffff0000200008f067a5502a4262b574 6f6b656e59756519dd6cc85bd90e33a9 -34d2ff85 -¶ -
This example shows some of the steps required to protect a packet with -a short header. This example uses AEAD_CHACHA20_POLY1305.¶
-In this example, TLS produces an application write secret from which a server -uses HKDF-Expand-Label to produce four values: a key, an IV, a header -protection key, and the secret that will be used after keys are updated (this -last value is not used further in this example).¶
--secret - = 9ac312a7f877468ebe69422748ad00a1 - 5443f18203a07d6060f688f30f21632b - -key = HKDF-Expand-Label(secret, "quic key", _, 32) - = c6d98ff3441c3fe1b2182094f69caa2e - d4b716b65488960a7a984979fb23e1c8 - -iv = HKDF-Expand-Label(secret, "quic iv", _, 12) - = e0459b3474bdd0e44a41c144 - -hp = HKDF-Expand-Label(secret, "quic hp", _, 32) - = 25a282b9e82f06f21f488917a4fc8f1b - 73573685608597d0efcb076b0ab7a7a4 - -ku = HKDF-Expand-Label(secret, "quic ku", _, 32) - = 1223504755036d556342ee9361d25342 - 1a826c9ecdf3c7148684b36b714881f9 -¶ -
The following shows the steps involved in protecting a minimal packet with an -empty Destination Connection ID. This packet contains a single PING frame (that -is, a payload of just 0x01) and has a packet number of 654360564. In this -example, using a packet number of length 3 (that is, 49140 is encoded) avoids -having to pad the payload of the packet; PADDING frames would be needed if the -packet number is encoded on fewer octets.¶
--pn = 654360564 (decimal) -nonce = e0459b3474bdd0e46d417eb0 -unprotected header = 4200bff4 -payload plaintext = 01 -payload ciphertext = 655e5cd55c41f69080575d7999c25a5bfb -¶ -
The resulting ciphertext is the minimum size possible. One byte is skipped to -produce the sample for header protection.¶
--sample = 5e5cd55c41f69080575d7999c25a5bfb -mask = aefefe7d03 -header = 4cfe4189 -¶ -
The protected packet is the smallest possible packet size of 21 bytes.¶
--packet = 4cfe4189655e5cd55c41f69080575d7999c25a5bfb -¶ -
This section documents analyses used in deriving AEAD algorithm limits for -AEAD_AES_128_GCM, AEAD_AES_128_CCM, and AEAD_AES_256_GCM. The analyses that -follow use symbols for multiplication (*), division (/), and exponentiation (^), -plus parentheses for establishing precedence. The following symbols are also -used:¶
-The size of the authentication tag in bits. For these ciphers, t is 128.¶
-The size of the block function in bits. For these ciphers, n is 128.¶
-The size of the key in bits. This is 128 for AEAD_AES_128_GCM and -AEAD_AES_128_CCM; 256 for AEAD_AES_256_GCM.¶
-The number of blocks in each packet (see below).¶
-The number of genuine packets created and protected by endpoints. This value -is the bound on the number of packets that can be protected before updating -keys.¶
-The number of forged packets that endpoints will accept. This value is the -bound on the number of forged packets that an endpoint can reject before -updating keys.¶
-The amount of offline ideal cipher queries made by an adversary.¶
-The analyses that follow rely on a count of the number of block operations -involved in producing each message. This analysis is performed for packets of -size up to 2^11 (l = 2^7) and 2^16 (l = 2^12). A size of 2^11 is expected to be -a limit that matches common deployment patterns, whereas the 2^16 is the maximum -possible size of a QUIC packet. Only endpoints that strictly limit packet size -can use the larger confidentiality and integrity limits that are derived using -the smaller packet size.¶
-For AEAD_AES_128_GCM and AEAD_AES_256_GCM, the message length (l) is the length -of the associated data in blocks plus the length of the plaintext in blocks.¶
-For AEAD_AES_128_CCM, the total number of block cipher operations is the sum
-of: the length of the associated data in blocks, the length of the ciphertext
-in blocks, the length of the plaintext in blocks, plus 1. In this analysis,
-this is simplified to a value of twice the length of the packet in blocks (that
-is, 2l = 2^8
for packets that are limited to 2^11 bytes, or 2l = 2^13
-otherwise). This simplification is based on the packet containing all of the
-associated data and ciphertext. This results in a 1 to 3 block overestimation
-of the number of operations per packet.¶
[GCM-MU] specify concrete bounds for AEAD_AES_128_GCM and AEAD_AES_256_GCM as -used in TLS 1.3 and QUIC. This section documents this analysis using several -simplifying assumptions:¶
-The bounds in [GCM-MU] are tighter and more complete than those used in -[AEBounds], which allows for larger limits than those described in -[TLS13].¶
-For confidentiality, Theorum (4.3) in [GCM-MU] establishes that - for a -single user that does not repeat nonces - the dominant term in determining the -distinguishing advantage between a real and random AEAD algorithm gained by an -attacker is:¶
--2 * (q * l)^2 / 2^n -¶ -
For a target advantage of 2^-57, this results in the relation:¶
--q <= 2^35 / l -¶ -
Thus, endpoints that do not send packets larger than 2^11 bytes cannot protect -more than 2^28 packets in a single connection without causing an attacker to -gain an larger advantage than the target of 2^-57. The limit for endpoints that -allow for the packet size to be as large as 2^16 is instead 2^23.¶
-For integrity, Theorem (4.3) in [GCM-MU] establishes that an attacker gains -an advantage in successfully forging a packet of no more than:¶
--(1 / 2^(8 * n)) + ((2 * v) / 2^(2 * n)) - + ((2 * o * v) / 2^(k + n)) + (n * (v + (v * l)) / 2^k) -¶ -
The goal is to limit this advantage to 2^-57. For AEAD_AES_128_GCM, the fourth -term in this inequality dominates the rest, so the others can be removed without -significant effect on the result. This produces the following approximation:¶
--v <= 2^64 / l -¶ -
Endpoints that do not attempt to remove protection from packets larger than 2^11 -bytes can attempt to remove protection from at most 2^57 packets. Endpoints that -do not restrict the size of processed packets can attempt to remove protection -from at most 2^52 packets.¶
-For AEAD_AES_256_GCM, the same term dominates, but the larger value of k -produces the following approximation:¶
--v <= 2^192 / l -¶ -
This is substantially larger than the limit for AEAD_AES_128_GCM. However, this -document recommends that the same limit be applied to both functions as either -limit is acceptably large.¶
-TLS [TLS13] and [AEBounds] do not specify limits on usage -for AEAD_AES_128_CCM. However, any AEAD that is used with QUIC requires limits -on use that ensure that both confidentiality and integrity are preserved. This -section documents that analysis.¶
-[CCM-ANALYSIS] is used as the basis of this -analysis. The results of that analysis are used to derive usage limits that are -based on those chosen in [TLS13].¶
-For confidentiality, Theorem 2 in [CCM-ANALYSIS] establishes that an attacker -gains a distinguishing advantage over an ideal pseudorandom permutation (PRP) of -no more than:¶
--(2l * q)^2 / 2^n -¶ -
The integrity limit in Theorem 1 in [CCM-ANALYSIS] provides an attacker a -strictly higher advantage for the same number of messages. As the targets for -the confidentiality advantage and the integrity advantage are the same, only -Theorem 1 needs to be considered.¶
-Theorem 1 establishes that an attacker gains an advantage over an -ideal PRP of no more than:¶
--v / 2^t + (2l * (v + q))^2 / 2^n -¶ -
As t
and n
are both 128, the first term is negligible relative to the
-second, so that term can be removed without a significant effect on the result.¶
This produces a relation that combines both encryption and decryption attempts -with the same limit as that produced by the theorem for confidentiality alone. -For a target advantage of 2^-57, this results in:¶
--v + q <= 2^34.5 / l -¶ -
By setting q = v
, values for both confidentiality and integrity limits can be
-produced. Endpoints that limit packets to 2^11 bytes therefore have both
-confidentiality and integrity limits of 2^26.5 packets. Endpoints that do not
-restrict packet size have a limit of 2^21.5.¶
Issue and pull request numbers are listed with a leading octothorp.¶
-Changes to integration of the TLS handshake (#829, #1018, #1094, #1165, #1190, -#1233, #1242, #1252, #1450)¶
-No significant changes.¶
-No significant changes.¶
-The IETF QUIC Working Group received an enormous amount of support from many -people. The following people provided substantive contributions to this -document:¶
-奥 一穂 (Kazuho Oku)¶
-Mikkel Fahnøe Jørgensen¶
-Internet-Draft | -QUIC Transport Protocol | -December 2020 | -
Iyengar & Thomson | -Expires 13 June 2021 | -[Page] | -
This document defines the core of the QUIC transport protocol. QUIC provides -applications with flow-controlled streams for structured communication, -low-latency connection establishment, and network path migration. QUIC includes -security measures that ensure confidentiality, integrity, and availability in a -range of deployment circumstances. Accompanying documents describe the -integration of TLS for key negotiation, loss detection, and an exemplary -congestion control algorithm.¶
-Discussion of this draft takes place on the QUIC working group mailing list -(quic@ietf.org), which is archived at -https://mailarchive.ietf.org/arch/search/?email_list=quic¶
-Working Group information can be found at https://github.com/quicwg; source -code and issues list for this draft can be found at -https://github.com/quicwg/base-drafts/labels/-transport.¶
-- This Internet-Draft is submitted in full conformance with the - provisions of BCP 78 and BCP 79.¶
-- Internet-Drafts are working documents of the Internet Engineering Task - Force (IETF). Note that other groups may also distribute working - documents as Internet-Drafts. The list of current Internet-Drafts is - at https://datatracker.ietf.org/drafts/current/.¶
-- Internet-Drafts are draft documents valid for a maximum of six months - and may be updated, replaced, or obsoleted by other documents at any - time. It is inappropriate to use Internet-Drafts as reference - material or to cite them other than as "work in progress."¶
-- This Internet-Draft will expire on 13 June 2021.¶
-- Copyright (c) 2020 IETF Trust and the persons identified as the - document authors. All rights reserved.¶
-- This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with - respect to this document. Code Components extracted from this - document must include Simplified BSD License text as described in - Section 4.e of the Trust Legal Provisions and are provided without - warranty as described in the Simplified BSD License.¶
-QUIC is a secure general-purpose transport protocol. This -document defines version 1 of QUIC, which conforms to the version-independent -properties of QUIC defined in [QUIC-INVARIANTS].¶
-QUIC is a connection-oriented protocol that creates a stateful interaction -between a client and server.¶
-The QUIC handshake combines negotiation of cryptographic and transport -parameters. QUIC integrates the TLS ([TLS13]) handshake, although using a -customized framing for protecting packets. The integration of TLS and QUIC is -described in more detail in [QUIC-TLS]. The handshake is structured to permit -the exchange of application data as soon as possible. This includes an option -for clients to send data immediately (0-RTT), which might require prior -communication to enable.¶
-Endpoints communicate in QUIC by exchanging QUIC packets. Most packets contain -frames, which carry control information and application data between -endpoints. QUIC authenticates all packets and encrypts as much as is practical. -QUIC packets are carried in UDP datagrams ([UDP]) to better -facilitate deployment in existing systems and networks.¶
-Application protocols exchange information over a QUIC connection via streams, -which are ordered sequences of bytes. Two types of stream can be created: -bidirectional streams, which allow both endpoints to send data; and -unidirectional streams, which allow a single endpoint to send data. A -credit-based scheme is used to limit stream creation and to bound the amount of -data that can be sent.¶
-QUIC provides the necessary feedback to implement reliable delivery and -congestion control. An algorithm for detecting and recovering from loss of -data is described in [QUIC-RECOVERY]. QUIC depends on congestion control -to avoid network congestion. An exemplary congestion control algorithm is -also described in [QUIC-RECOVERY].¶
-QUIC connections are not strictly bound to a single network path. Connection -migration uses connection identifiers to allow connections to transfer to a new -network path. Only clients are able to migrate in this version of QUIC. This -design also allows connections to continue after changes in network topology or -address mappings, such as might be caused by NAT rebinding.¶
-Once established, multiple options are provided for connection termination. -Applications can manage a graceful shutdown, endpoints can negotiate a timeout -period, errors can cause immediate connection teardown, and a stateless -mechanism provides for termination of connections after one endpoint has lost -state.¶
-This document describes the core QUIC protocol and is structured as follows:¶
-Streams are the basic service abstraction that QUIC provides.¶
- -Connections are the context in which QUIC endpoints communicate.¶
-Packets and frames are the basic unit used by QUIC to communicate.¶
-Finally, encoding details of QUIC protocol elements are described in:¶
-Accompanying documents describe QUIC's loss detection and congestion control -[QUIC-RECOVERY], and the use of TLS for key negotiation [QUIC-TLS].¶
-This document defines QUIC version 1, which conforms to the protocol invariants -in [QUIC-INVARIANTS].¶
-To refer to QUIC version 1, cite this document. References to the limited -set of version-independent properties of QUIC can cite [QUIC-INVARIANTS].¶
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL -NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", -"MAY", and "OPTIONAL" in this document are to be interpreted as -described in BCP 14 [RFC2119] [RFC8174] when, and only when, they -appear in all capitals, as shown here.¶
-Commonly used terms in the document are described below.¶
-The transport protocol described by this document. QUIC is a name, not an -acronym.¶
-An entity that can participate in a QUIC connection by generating, receiving, -and processing QUIC packets. There are only two types of endpoint in QUIC: -client and server.¶
-The endpoint that initiates a QUIC connection.¶
-The endpoint that accepts a QUIC connection.¶
-A complete processable unit of QUIC that can be encapsulated in a UDP -datagram. One or more QUIC packets can be encapsulated in a single UDP -datagram.¶
-A QUIC packet that contains frames other than ACK, PADDING, and -CONNECTION_CLOSE. These cause a recipient to send an acknowledgment; see -Section 13.2.1.¶
-A unit of structured protocol information. There are multiple frame types, -each of which carries different information. Frames are contained in QUIC -packets.¶
-When used without qualification, the tuple of IP version, IP address, and UDP -port number that represents one end of a network path.¶
-An identifier that is used to identify a QUIC connection at an endpoint. -Each endpoint selects one or more Connection IDs for its peer to include in -packets sent towards the endpoint. This value is opaque to the peer.¶
-A unidirectional or bidirectional channel of ordered bytes within a QUIC -connection. A QUIC connection can carry multiple simultaneous streams.¶
-An entity that uses QUIC to send and receive data.¶
-This document uses the terms "QUIC packets", "UDP datagrams", and "IP packets" -to refer to the units of the respective protocols. That is, one or more QUIC -packets can be encapsulated in a UDP datagram, which is in turn encapsulated in -an IP packet.¶
-Packet and frame diagrams in this document use a custom format. The purpose of -this format is to summarize, not define, protocol elements. Prose defines the -complete semantics and details of structures.¶
-Complex fields are named and then followed by a list of fields surrounded by a -pair of matching braces. Each field in this list is separated by commas.¶
-Individual fields include length information, plus indications about fixed -value, optionality, or repetitions. Individual fields use the following -notational conventions, with all lengths in bits:¶
-Indicates that x is A bits long¶
-Indicates that x uses the variable-length encoding in Section 16¶
-Indicates that x can be any length from A to B; A can be omitted to indicate -a minimum of zero bits and B can be omitted to indicate no set upper limit; -values in this format always end on an octet boundary¶
-Indicates that x has a fixed value of C with the length described by -?, as above¶
-Indicates that x has a value in the range from C to D, inclusive, -with the length described by ?, as above¶
-Indicates that x is optional (and has length of E)¶
-Indicates that x is repeated zero or more times (and that each instance is -length E)¶
-This document uses network byte order (that is, big endian) values. Fields -are placed starting from the high-order bits of each byte.¶
-By convention, individual fields reference a complex field by using the name of -the complex field.¶
-For example:¶
-When a single-bit field is referenced in prose, the position of that field can -be clarified by using the value of the byte that carries the field with the -field's value set. For example, the value 0x80 could be used to refer to the -single-bit field in the most significant bit of the byte, such as One-bit Field -in Figure 1.¶
-Streams in QUIC provide a lightweight, ordered byte-stream abstraction to an -application. Streams can be unidirectional or bidirectional.¶
-Streams can be created by sending data. Other processes associated with stream -management - ending, cancelling, and managing flow control - are all designed to -impose minimal overheads. For instance, a single STREAM frame (Section 19.8) -can open, carry data for, and close a stream. Streams can also be long-lived and -can last the entire duration of a connection.¶
-Streams can be created by either endpoint, can concurrently send data -interleaved with other streams, and can be cancelled. QUIC does not provide any -means of ensuring ordering between bytes on different streams.¶
-QUIC allows for an arbitrary number of streams to operate concurrently and for -an arbitrary amount of data to be sent on any stream, subject to flow control -constraints and stream limits; see Section 4.¶
-Streams can be unidirectional or bidirectional. Unidirectional streams carry -data in one direction: from the initiator of the stream to its peer. -Bidirectional streams allow for data to be sent in both directions.¶
-Streams are identified within a connection by a numeric value, referred to as -the stream ID. A stream ID is a 62-bit integer (0 to 2^62-1) that is unique for -all streams on a connection. Stream IDs are encoded as variable-length -integers; see Section 16. A QUIC endpoint MUST NOT reuse a stream ID -within a connection.¶
-The least significant bit (0x1) of the stream ID identifies the initiator of the -stream. Client-initiated streams have even-numbered stream IDs (with the bit -set to 0), and server-initiated streams have odd-numbered stream IDs (with the -bit set to 1).¶
-The second least significant bit (0x2) of the stream ID distinguishes between -bidirectional streams (with the bit set to 0) and unidirectional streams (with -the bit set to 1).¶
-The two least significant bits from a stream ID therefore identify a stream as -one of four types, as summarized in Table 1.¶
-Bits | -Stream Type | -
---|---|
0x0 | -Client-Initiated, Bidirectional | -
0x1 | -Server-Initiated, Bidirectional | -
0x2 | -Client-Initiated, Unidirectional | -
0x3 | -Server-Initiated, Unidirectional | -
The stream space for each type begins at the minimum value (0x0 through 0x3 -respectively); successive streams of each type are created with numerically -increasing stream IDs. A stream ID that is used out of order results in all -streams of that type with lower-numbered stream IDs also being opened.¶
-STREAM frames (Section 19.8) encapsulate data sent by an application. An -endpoint uses the Stream ID and Offset fields in STREAM frames to place data in -order.¶
-Endpoints MUST be able to deliver stream data to an application as an ordered -byte-stream. Delivering an ordered byte-stream requires that an endpoint buffer -any data that is received out of order, up to the advertised flow control limit.¶
-QUIC makes no specific allowances for delivery of stream data out of -order. However, implementations MAY choose to offer the ability to deliver data -out of order to a receiving application.¶
-An endpoint could receive data for a stream at the same stream offset multiple -times. Data that has already been received can be discarded. The data at a -given offset MUST NOT change if it is sent multiple times; an endpoint MAY treat -receipt of different data at the same offset within a stream as a connection -error of type PROTOCOL_VIOLATION.¶
-Streams are an ordered byte-stream abstraction with no other structure visible -to QUIC. STREAM frame boundaries are not expected to be preserved when -data is transmitted, retransmitted after packet loss, or delivered to the -application at a receiver.¶
-An endpoint MUST NOT send data on any stream without ensuring that it is within -the flow control limits set by its peer. Flow control is described in detail in -Section 4.¶
-Stream multiplexing can have a significant effect on application performance if -resources allocated to streams are correctly prioritized.¶
-QUIC does not provide a mechanism for exchanging prioritization information. -Instead, it relies on receiving priority information from the application.¶
-A QUIC implementation SHOULD provide ways in which an application can indicate -the relative priority of streams. An implementation uses information provided -by the application to determine how to allocate resources to active streams.¶
-This document does not define an API for QUIC, but instead defines a set of -functions on streams that application protocols can rely upon. An application -protocol can assume that a QUIC implementation provides an interface that -includes the operations described in this section. An implementation designed -for use with a specific application protocol might provide only those operations -that are used by that protocol.¶
-On the sending part of a stream, an application protocol can:¶
-On the receiving part of a stream, an application protocol can:¶
-An application protocol can also request to be informed of state changes on -streams, including when the peer has opened or reset a stream, when a peer -aborts reading on a stream, when new data is available, and when data can or -cannot be written to the stream due to flow control.¶
-This section describes streams in terms of their send or receive components. -Two state machines are described: one for the streams on which an endpoint -transmits data (Section 3.1), and another for streams on which an -endpoint receives data (Section 3.2).¶
-Unidirectional streams use the applicable state machine directly. Bidirectional -streams use both state machines. For the most part, the use of these state -machines is the same whether the stream is unidirectional or bidirectional. The -conditions for opening a stream are slightly more complex for a bidirectional -stream because the opening of either the send or receive side causes the stream -to open in both directions.¶
-The state machines shown in this section are largely informative. This -document uses stream states to describe rules for when and how different types -of frames can be sent and the reactions that are expected when different types -of frames are received. Though these state machines are intended to be useful -in implementing QUIC, these states are not intended to constrain -implementations. An implementation can define a different state machine as long -as its behavior is consistent with an implementation that implements these -states.¶
-In some cases, a single event or action can cause a transition through -multiple states. For instance, sending STREAM with a FIN bit set can cause -two state transitions for a sending stream: from the Ready state to the Send -state, and from the Send state to the Data Sent state.¶
-Figure 2 shows the states for the part of a stream that sends -data to a peer.¶
-The sending part of a stream that the endpoint initiates (types 0 -and 2 for clients, 1 and 3 for servers) is opened by the application. The -"Ready" state represents a newly created stream that is able to accept data from -the application. Stream data might be buffered in this state in preparation for -sending.¶
-Sending the first STREAM or STREAM_DATA_BLOCKED frame causes a sending part of a -stream to enter the "Send" state. An implementation might choose to defer -allocating a stream ID to a stream until it sends the first STREAM frame and -enters this state, which can allow for better stream prioritization.¶
-The sending part of a bidirectional stream initiated by a peer (type 0 for a -server, type 1 for a client) starts in the "Ready" state when the receiving part -is created.¶
-In the "Send" state, an endpoint transmits - and retransmits as necessary - -stream data in STREAM frames. The endpoint respects the flow control limits set -by its peer, and continues to accept and process MAX_STREAM_DATA frames. An -endpoint in the "Send" state generates STREAM_DATA_BLOCKED frames if it is -blocked from sending by stream or connection flow control limits -Section 4.1.¶
-After the application indicates that all stream data has been sent and a STREAM -frame containing the FIN bit is sent, the sending part of the stream enters the -"Data Sent" state. From this state, the endpoint only retransmits stream data -as necessary. The endpoint does not need to check flow control limits or send -STREAM_DATA_BLOCKED frames for a stream in this state. MAX_STREAM_DATA frames -might be received until the peer receives the final stream offset. The endpoint -can safely ignore any MAX_STREAM_DATA frames it receives from its peer for a -stream in this state.¶
-Once all stream data has been successfully acknowledged, the sending part of the -stream enters the "Data Recvd" state, which is a terminal state.¶
-From any of the "Ready", "Send", or "Data Sent" states, an application can -signal that it wishes to abandon transmission of stream data. Alternatively, an -endpoint might receive a STOP_SENDING frame from its peer. In either case, the -endpoint sends a RESET_STREAM frame, which causes the stream to enter the "Reset -Sent" state.¶
-An endpoint MAY send a RESET_STREAM as the first frame that mentions a stream; -this causes the sending part of that stream to open and then immediately -transition to the "Reset Sent" state.¶
-Once a packet containing a RESET_STREAM has been acknowledged, the sending part -of the stream enters the "Reset Recvd" state, which is a terminal state.¶
-Figure 3 shows the states for the part of a stream that -receives data from a peer. The states for a receiving part of a stream mirror -only some of the states of the sending part of the stream at the peer. The -receiving part of a stream does not track states on the sending part that cannot -be observed, such as the "Ready" state. Instead, the receiving part of a stream -tracks the delivery of data to the application, some of which cannot be observed -by the sender.¶
-The receiving part of a stream initiated by a peer (types 1 and 3 for a client, -or 0 and 2 for a server) is created when the first STREAM, STREAM_DATA_BLOCKED, -or RESET_STREAM frame is received for that stream. For bidirectional streams -initiated by a peer, receipt of a MAX_STREAM_DATA or STOP_SENDING frame for the -sending part of the stream also creates the receiving part. The initial state -for the receiving part of a stream is "Recv".¶
-The receiving part of a stream enters the "Recv" state when the sending part of -a bidirectional stream initiated by the endpoint (type 0 for a client, type 1 -for a server) enters the "Ready" state.¶
-An endpoint opens a bidirectional stream when a MAX_STREAM_DATA or STOP_SENDING -frame is received from the peer for that stream. Receiving a MAX_STREAM_DATA -frame for an unopened stream indicates that the remote peer has opened the -stream and is providing flow control credit. Receiving a STOP_SENDING frame for -an unopened stream indicates that the remote peer no longer wishes to receive -data on this stream. Either frame might arrive before a STREAM or -STREAM_DATA_BLOCKED frame if packets are lost or reordered.¶
-Before a stream is created, all streams of the same type with lower-numbered -stream IDs MUST be created. This ensures that the creation order for streams is -consistent on both endpoints.¶
-In the "Recv" state, the endpoint receives STREAM and STREAM_DATA_BLOCKED -frames. Incoming data is buffered and can be reassembled into the correct order -for delivery to the application. As data is consumed by the application and -buffer space becomes available, the endpoint sends MAX_STREAM_DATA frames to -allow the peer to send more data.¶
-When a STREAM frame with a FIN bit is received, the final size of the stream is -known; see Section 4.5. The receiving part of the stream then enters the -"Size Known" state. In this state, the endpoint no longer needs to send -MAX_STREAM_DATA frames, it only receives any retransmissions of stream data.¶
-Once all data for the stream has been received, the receiving part enters the -"Data Recvd" state. This might happen as a result of receiving the same STREAM -frame that causes the transition to "Size Known". After all data has been -received, any STREAM or STREAM_DATA_BLOCKED frames for the stream can be -discarded.¶
-The "Data Recvd" state persists until stream data has been delivered to the -application. Once stream data has been delivered, the stream enters the "Data -Read" state, which is a terminal state.¶
-Receiving a RESET_STREAM frame in the "Recv" or "Size Known" states causes the -stream to enter the "Reset Recvd" state. This might cause the delivery of -stream data to the application to be interrupted.¶
-It is possible that all stream data has already been received when a -RESET_STREAM is received (that is, in the "Data Recvd" state). Similarly, it is -possible for remaining stream data to arrive after receiving a RESET_STREAM -frame (the "Reset Recvd" state). An implementation is free to manage this -situation as it chooses.¶
-Sending RESET_STREAM means that an endpoint cannot guarantee delivery of stream -data; however there is no requirement that stream data not be delivered if a -RESET_STREAM is received. An implementation MAY interrupt delivery of stream -data, discard any data that was not consumed, and signal the receipt of the -RESET_STREAM. A RESET_STREAM signal might be suppressed or withheld if stream -data is completely received and is buffered to be read by the application. If -the RESET_STREAM is suppressed, the receiving part of the stream remains in -"Data Recvd".¶
-Once the application receives the signal indicating that the stream -was reset, the receiving part of the stream transitions to the "Reset Read" -state, which is a terminal state.¶
-The sender of a stream sends just three frame types that affect the state of a -stream at either sender or receiver: STREAM (Section 19.8), -STREAM_DATA_BLOCKED (Section 19.13), and RESET_STREAM -(Section 19.4).¶
-A sender MUST NOT send any of these frames from a terminal state ("Data Recvd" -or "Reset Recvd"). A sender MUST NOT send a STREAM or STREAM_DATA_BLOCKED frame -for a stream in the "Reset Sent" state or any terminal state, that is, after -sending a RESET_STREAM frame. A receiver could receive any of these three -frames in any state, due to the possibility of delayed delivery of packets -carrying them.¶
-The receiver of a stream sends MAX_STREAM_DATA (Section 19.10) and -STOP_SENDING frames (Section 19.5).¶
-The receiver only sends MAX_STREAM_DATA in the "Recv" state. A receiver MAY -send STOP_SENDING in any state where it has not received a RESET_STREAM frame; -that is states other than "Reset Recvd" or "Reset Read". However there is -little value in sending a STOP_SENDING frame in the "Data Recvd" state, since -all stream data has been received. A sender could receive either of these two -frames in any state as a result of delayed delivery of packets.¶
-A bidirectional stream is composed of sending and receiving parts. -Implementations can represent states of the bidirectional stream as composites -of sending and receiving stream states. The simplest model presents the stream -as "open" when either sending or receiving parts are in a non-terminal state and -"closed" when both sending and receiving streams are in terminal states.¶
-Table 2 shows a more complex mapping of bidirectional stream -states that loosely correspond to the stream states in HTTP/2 -[HTTP2]. This shows that multiple states on sending or receiving -parts of streams are mapped to the same composite state. Note that this is just -one possibility for such a mapping; this mapping requires that data is -acknowledged before the transition to a "closed" or "half-closed" state.¶
-Sending Part | -Receiving Part | -Composite State | -
---|---|---|
No Stream/Ready | -No Stream/Recv *1 | -idle | -
Ready/Send/Data Sent | -Recv/Size Known | -open | -
Ready/Send/Data Sent | -Data Recvd/Data Read | -half-closed (remote) | -
Ready/Send/Data Sent | -Reset Recvd/Reset Read | -half-closed (remote) | -
Data Recvd | -Recv/Size Known | -half-closed (local) | -
Reset Sent/Reset Recvd | -Recv/Size Known | -half-closed (local) | -
Reset Sent/Reset Recvd | -Data Recvd/Data Read | -closed | -
Reset Sent/Reset Recvd | -Reset Recvd/Reset Read | -closed | -
Data Recvd | -Data Recvd/Data Read | -closed | -
Data Recvd | -Reset Recvd/Reset Read | -closed | -
A stream is considered "idle" if it has not yet been created, or if the -receiving part of the stream is in the "Recv" state without yet having -received any frames.¶
-If an application is no longer interested in the data it is receiving on a -stream, it can abort reading the stream and specify an application error code.¶
-If the stream is in the "Recv" or "Size Known" states, the transport SHOULD -signal this by sending a STOP_SENDING frame to prompt closure of the stream in -the opposite direction. This typically indicates that the receiving application -is no longer reading data it receives from the stream, but it is not a guarantee -that incoming data will be ignored.¶
-STREAM frames received after sending a STOP_SENDING frame are still counted -toward connection and stream flow control, even though these frames can be -discarded upon receipt.¶
-A STOP_SENDING frame requests that the receiving endpoint send a RESET_STREAM -frame. An endpoint that receives a STOP_SENDING frame MUST send a RESET_STREAM -frame if the stream is in the Ready or Send state. If the stream is in the -"Data Sent" state, the endpoint MAY defer sending the RESET_STREAM frame until -the packets containing outstanding data are acknowledged or declared lost. If -any outstanding data is declared lost, the endpoint SHOULD send a RESET_STREAM -frame instead of retransmitting the data.¶
-An endpoint SHOULD copy the error code from the STOP_SENDING frame to the -RESET_STREAM frame it sends, but MAY use any application error code. An -endpoint that sends a STOP_SENDING frame MAY ignore the error code in -any RESET_STREAM frames subsequently received for that stream.¶
-STOP_SENDING SHOULD only be sent for a stream that has not been reset by the -peer. STOP_SENDING is most useful for streams in the "Recv" or "Size Known" -states.¶
-An endpoint is expected to send another STOP_SENDING frame if a packet -containing a previous STOP_SENDING is lost. However, once either all stream -data or a RESET_STREAM frame has been received for the stream - that is, the -stream is in any state other than "Recv" or "Size Known" - sending a -STOP_SENDING frame is unnecessary.¶
-An endpoint that wishes to terminate both directions of a bidirectional stream -can terminate one direction by sending a RESET_STREAM frame, and it can -encourage prompt termination in the opposite direction by sending a STOP_SENDING -frame.¶
-It is necessary to limit the amount of data that a receiver could buffer, to -prevent a fast sender from overwhelming a slow receiver, or to prevent a -malicious sender from consuming a large amount of memory at a receiver. To -enable a receiver to limit memory commitment to a connection and to apply back -pressure on the sender, streams are flow controlled both individually and as an -aggregate. A QUIC receiver controls the maximum amount of data the sender can -send on a stream at any time, as described in Section 4.1 and -Section 4.2.¶
-Similarly, to limit concurrency within a connection, a QUIC endpoint controls -the maximum cumulative number of streams that its peer can initiate, as -described in Section 4.6.¶
-Data sent in CRYPTO frames is not flow controlled in the same way as stream -data. QUIC relies on the cryptographic protocol implementation to avoid -excessive buffering of data; see [QUIC-TLS]. To avoid excessive buffering at -multiple layers, QUIC implementations SHOULD provide an interface for the -cryptographic protocol implementation to communicate its buffering limits.¶
-QUIC employs a limit-based flow-control scheme where a receiver advertises the -limit of total bytes it is prepared to receive on a given stream or for the -entire connection. This leads to two levels of data flow control in QUIC:¶
-Senders MUST NOT send data in excess of either limit.¶
-A receiver sets initial limits for all streams through transport parameters -during the handshake (Section 7.4). Subsequently, a receiver sends -MAX_STREAM_DATA (Section 19.10) or MAX_DATA (Section 19.9) -frames to the sender to advertise larger limits.¶
-A receiver can advertise a larger limit for a stream by sending a -MAX_STREAM_DATA frame with the corresponding stream ID. A MAX_STREAM_DATA frame -indicates the maximum absolute byte offset of a stream. A receiver could -determine the flow control offset to be advertised based on the current offset -of data consumed on that stream.¶
-A receiver can advertise a larger limit for a connection by sending a MAX_DATA -frame, which indicates the maximum of the sum of the absolute byte offsets of -all streams. A receiver maintains a cumulative sum of bytes received on all -streams, which is used to check for violations of the advertised connection or -stream data limits. A receiver could determine the maximum data limit to be -advertised based on the sum of bytes consumed on all streams.¶
-Once a receiver advertises a limit for the connection or a stream, it MAY -advertise a smaller limit, but this has no effect.¶
-A receiver MUST close the connection with a FLOW_CONTROL_ERROR error -(Section 11) if the sender violates the advertised connection or stream -data limits.¶
-A sender MUST ignore any MAX_STREAM_DATA or MAX_DATA frames that do not increase -flow control limits.¶
-If a sender has sent data up to the limit, it will be unable to send new data -and is considered blocked. A sender SHOULD send a STREAM_DATA_BLOCKED or -DATA_BLOCKED frame to indicate to the receiver that it has data to write but is -blocked by flow control limits. If a sender is blocked for a period longer than -the idle timeout (Section 10.1), the receiver might close the connection -even when the sender has data that is available for transmission. To keep the -connection from closing, a sender that is flow control limited SHOULD -periodically send a STREAM_DATA_BLOCKED or DATA_BLOCKED frame when it has no -ack-eliciting packets in flight.¶
-Implementations decide when and how much credit to advertise in MAX_STREAM_DATA -and MAX_DATA frames, but this section offers a few considerations.¶
-To avoid blocking a sender, a receiver MAY send a MAX_STREAM_DATA or MAX_DATA -frame multiple times within a round trip or send it early enough to allow time -for loss of the frame and subsequent recovery.¶
-Control frames contribute to connection overhead. Therefore, frequently sending -MAX_STREAM_DATA and MAX_DATA frames with small changes is undesirable. On the -other hand, if updates are less frequent, larger increments to limits are -necessary to avoid blocking a sender, requiring larger resource commitments at -the receiver. There is a trade-off between resource commitment and overhead -when determining how large a limit is advertised.¶
-A receiver can use an autotuning mechanism to tune the frequency and amount of -advertised additional credit based on a round-trip time estimate and the rate at -which the receiving application consumes data, similar to common TCP -implementations. As an optimization, an endpoint could send frames related to -flow control only when there are other frames to send, ensuring that flow -control does not cause extra packets to be sent.¶
-A blocked sender is not required to send STREAM_DATA_BLOCKED or DATA_BLOCKED -frames. Therefore, a receiver MUST NOT wait for a STREAM_DATA_BLOCKED or -DATA_BLOCKED frame before sending a MAX_STREAM_DATA or MAX_DATA frame; doing so -could result in the sender being blocked for the rest of the connection. Even if -the sender sends these frames, waiting for them will result in the sender being -blocked for at least an entire round trip.¶
-When a sender receives credit after being blocked, it might be able to send a -large amount of data in response, resulting in short-term congestion; see -Section 6.9 in [QUIC-RECOVERY] for a discussion of how a sender can avoid this -congestion.¶
-If an endpoint cannot ensure that its peer always has available flow control -credit that is greater than the peer's bandwidth-delay product on this -connection, its receive throughput will be limited by flow control.¶
-Packet loss can cause gaps in the receive buffer, preventing the application -from consuming data and freeing up receive buffer space.¶
-Sending timely updates of flow control limits can improve performance. -Sending packets only to provide flow control updates can increase network -load and adversely affect performance. Sending flow control updates along with -other frames, such as ACK frames, reduces the cost of those updates.¶
-Endpoints need to eventually agree on the amount of flow control credit that has -been consumed on every stream, to be able to account for all bytes for -connection-level flow control.¶
-On receipt of a RESET_STREAM frame, an endpoint will tear down state for the -matching stream and ignore further data arriving on that stream.¶
-RESET_STREAM terminates one direction of a stream abruptly. For a bidirectional -stream, RESET_STREAM has no effect on data flow in the opposite direction. Both -endpoints MUST maintain flow control state for the stream in the unterminated -direction until that direction enters a terminal state.¶
-The final size is the amount of flow control credit that is consumed by a -stream. Assuming that every contiguous byte on the stream was sent once, the -final size is the number of bytes sent. More generally, this is one higher -than the offset of the byte with the largest offset sent on the stream, or zero -if no bytes were sent.¶
-A sender always communicates the final size of a stream to the receiver -reliably, no matter how the stream is terminated. The final size is the sum of -the Offset and Length fields of a STREAM frame with a FIN flag, noting that -these fields might be implicit. Alternatively, the Final Size field of a -RESET_STREAM frame carries this value. This guarantees that both endpoints agree -on how much flow control credit was consumed by the sender on that stream.¶
-An endpoint will know the final size for a stream when the receiving part of the -stream enters the "Size Known" or "Reset Recvd" state (Section 3). The -receiver MUST use the final size of the stream to account for all bytes sent on -the stream in its connection level flow controller.¶
-An endpoint MUST NOT send data on a stream at or beyond the final size.¶
-Once a final size for a stream is known, it cannot change. If a RESET_STREAM or -STREAM frame is received indicating a change in the final size for the stream, -an endpoint SHOULD respond with a FINAL_SIZE_ERROR error; see -Section 11. A receiver SHOULD treat receipt of data at or beyond the -final size as a FINAL_SIZE_ERROR error, even after a stream is closed. -Generating these errors is not mandatory, because requiring that an -endpoint generate these errors also means that the endpoint needs to maintain -the final size state for closed streams, which could mean a significant state -commitment.¶
-An endpoint limits the cumulative number of incoming streams a peer can open. -Only streams with a stream ID less than (max_stream * 4 + -initial_stream_id_for_type) can be opened; see Table 1. Initial -limits are set in the transport parameters; see -Section 18.2. Subsequent limits are advertised using -MAX_STREAMS frames; see Section 19.11. Separate limits apply to -unidirectional and bidirectional streams.¶
-If a max_streams transport parameter or a MAX_STREAMS frame is received with a -value greater than 2^60, this would allow a maximum stream ID that cannot be -expressed as a variable-length integer; see Section 16. If either is -received, the connection MUST be closed immediately with a connection error of -type TRANSPORT_PARAMETER_ERROR if the offending value was received in a -transport parameter or of type FRAME_ENCODING_ERROR if it was received in a -frame; see Section 10.2.¶
-Endpoints MUST NOT exceed the limit set by their peer. An endpoint that -receives a frame with a stream ID exceeding the limit it has sent MUST treat -this as a connection error of type STREAM_LIMIT_ERROR (Section 11).¶
-Once a receiver advertises a stream limit using the MAX_STREAMS frame, -advertising a smaller limit has no effect. A receiver MUST ignore any -MAX_STREAMS frame that does not increase the stream limit.¶
-As with stream and connection flow control, this document leaves implementations -to decide when and how many streams should be advertised -to a peer via MAX_STREAMS. Implementations might choose to increase limits as -streams are closed, to keep the number of streams available to peers roughly -consistent.¶
-An endpoint that is unable to open a new stream due to the peer's limits SHOULD -send a STREAMS_BLOCKED frame (Section 19.14). This signal is -considered useful for debugging. An endpoint MUST NOT wait to receive this -signal before advertising additional credit, since doing so will mean that the -peer will be blocked for at least an entire round trip, and potentially -indefinitely if the peer chooses not to send STREAMS_BLOCKED frames.¶
-A QUIC connection is shared state between a client and a server.¶
-Each connection starts with a handshake phase, during which the two endpoints -establish a shared secret using the cryptographic handshake protocol -[QUIC-TLS] and negotiate the application protocol. The handshake -(Section 7) confirms that both endpoints are willing to communicate -(Section 8.1) and establishes parameters for the connection -(Section 7.4).¶
-An application protocol can use the connection during the handshake phase with -some limitations. 0-RTT allows application data to be sent by a client before -receiving a response from the server. However, 0-RTT provides no protection -against replay attacks; see Section 9.2 of [QUIC-TLS]. A server can also send -application data to a client before it receives the final cryptographic -handshake messages that allow it to confirm the identity and liveness of the -client. These capabilities allow an application protocol to offer the option of -trading some security guarantees for reduced latency.¶
-The use of connection IDs (Section 5.1) allows connections to migrate to a -new network path, both as a direct choice of an endpoint and when forced by a -change in a middlebox. Section 9 describes mitigations for the security and -privacy issues associated with migration.¶
-For connections that are no longer needed or desired, there are several ways for -a client and server to terminate a connection, as described in Section 10.¶
-Each connection possesses a set of connection identifiers, or connection IDs, -each of which can identify the connection. Connection IDs are independently -selected by endpoints; each endpoint selects the connection IDs that its peer -uses.¶
-The primary function of a connection ID is to ensure that changes in addressing -at lower protocol layers (UDP, IP) do not cause packets for a QUIC -connection to be delivered to the wrong endpoint. Each endpoint selects -connection IDs using an implementation-specific (and perhaps -deployment-specific) method that will allow packets with that connection ID to -be routed back to the endpoint and to be identified by the endpoint upon -receipt.¶
-Connection IDs MUST NOT contain any information that can be used by an external -observer (that is, one that does not cooperate with the issuer) to correlate -them with other connection IDs for the same connection. As a trivial example, -this means the same connection ID MUST NOT be issued more than once on the same -connection.¶
-Packets with long headers include Source Connection ID and Destination -Connection ID fields. These fields are used to set the connection IDs for new -connections; see Section 7.2 for details.¶
-Packets with short headers (Section 17.3) only include the Destination -Connection ID and omit the explicit length. The length of the Destination -Connection ID field is expected to be known to endpoints. Endpoints using a -load balancer that routes based on connection ID could agree with the load -balancer on a fixed length for connection IDs, or agree on an encoding scheme. -A fixed portion could encode an explicit length, which allows the entire -connection ID to vary in length and still be used by the load balancer.¶
-A Version Negotiation (Section 17.2.1) packet echoes the connection IDs -selected by the client, both to ensure correct routing toward the client and to -demonstrate that the packet is in response to a packet sent by the client.¶
-A zero-length connection ID can be used when a connection ID is not needed to -route to the correct endpoint. However, multiplexing connections on the same -local IP address and port while using zero-length connection IDs will cause -failures in the presence of peer connection migration, NAT rebinding, and client -port reuse. An endpoint MUST NOT use the same IP address and port for multiple -connections with zero-length connection IDs, unless it is certain that those -protocol features are not in use.¶
-When an endpoint uses a non-zero-length connection ID, it needs to ensure that -the peer has a supply of connection IDs from which to choose for packets sent to -the endpoint. These connection IDs are supplied by the endpoint using the -NEW_CONNECTION_ID frame (Section 19.15).¶
-Each Connection ID has an associated sequence number to assist in detecting when -NEW_CONNECTION_ID or RETIRE_CONNECTION_ID frames refer to the same value. The -initial connection ID issued by an endpoint is sent in the Source Connection ID -field of the long packet header (Section 17.2) during the handshake. The -sequence number of the initial connection ID is 0. If the preferred_address -transport parameter is sent, the sequence number of the supplied connection ID -is 1.¶
-Additional connection IDs are communicated to the peer using NEW_CONNECTION_ID -frames (Section 19.15). The sequence number on each newly issued -connection ID MUST increase by 1. The connection ID randomly selected by the -client in the Initial packet and any connection ID provided by a Retry packet -are not assigned sequence numbers unless a server opts to retain them as its -initial connection ID.¶
-When an endpoint issues a connection ID, it MUST accept packets that carry this -connection ID for the duration of the connection or until its peer invalidates -the connection ID via a RETIRE_CONNECTION_ID frame -(Section 19.16). Connection IDs that are issued and not -retired are considered active; any active connection ID is valid for use with -the current connection at any time, in any packet type. This includes the -connection ID issued by the server via the preferred_address transport -parameter.¶
-An endpoint SHOULD ensure that its peer has a sufficient number of available and -unused connection IDs. Endpoints advertise the number of active connection IDs -they are willing to maintain using the active_connection_id_limit transport -parameter. An endpoint MUST NOT provide more connection IDs than the peer's -limit. An endpoint MAY send connection IDs that temporarily exceed a peer's -limit if the NEW_CONNECTION_ID frame also requires the retirement of any excess, -by including a sufficiently large value in the Retire Prior To field.¶
-A NEW_CONNECTION_ID frame might cause an endpoint to add some active connection -IDs and retire others based on the value of the Retire Prior To field. After -processing a NEW_CONNECTION_ID frame and adding and retiring active connection -IDs, if the number of active connection IDs exceeds the value advertised in its -active_connection_id_limit transport parameter, an endpoint MUST close the -connection with an error of type CONNECTION_ID_LIMIT_ERROR.¶
-An endpoint SHOULD supply a new connection ID when the peer retires a connection -ID. If an endpoint provided fewer connection IDs than the peer's -active_connection_id_limit, it MAY supply a new connection ID when it receives a -packet with a previously unused connection ID. An endpoint MAY limit the -total number of connection IDs issued for each connection to -avoid the risk of running out of connection IDs; see Section 10.3.2. An -endpoint MAY also limit the issuance of connection IDs to reduce the amount of -per-path state it maintains, such as path validation status, as its peer -might interact with it over as many paths as there are issued connection -IDs.¶
-An endpoint that initiates migration and requires non-zero-length connection IDs -SHOULD ensure that the pool of connection IDs available to its peer allows the -peer to use a new connection ID on migration, as the peer will be unable to -respond if the pool is exhausted.¶
-An endpoint that selects a zero-length connection ID during the handshake -cannot issue a new connection ID. A zero-length Destination Connection ID -field is used in all packets sent toward such an endpoint over any network -path.¶
-An endpoint can change the connection ID it uses for a peer to another available -one at any time during the connection. An endpoint consumes connection IDs in -response to a migrating peer; see Section 9.5 for more.¶
-An endpoint maintains a set of connection IDs received from its peer, any of -which it can use when sending packets. When the endpoint wishes to remove a -connection ID from use, it sends a RETIRE_CONNECTION_ID frame to its peer. -Sending a RETIRE_CONNECTION_ID frame indicates that the connection ID will not -be used again and requests that the peer replace it with a new connection ID -using a NEW_CONNECTION_ID frame.¶
-As discussed in Section 9.5, endpoints limit the use of a -connection ID to packets sent from a single local address to a single -destination address. Endpoints SHOULD retire connection IDs when they are no -longer actively using either the local or destination address for which the -connection ID was used.¶
-An endpoint might need to stop accepting previously issued connection IDs in -certain circumstances. Such an endpoint can cause its peer to retire connection -IDs by sending a NEW_CONNECTION_ID frame with an increased Retire Prior To -field. The endpoint SHOULD continue to accept the previously issued connection -IDs until they are retired by the peer. If the endpoint can no longer process -the indicated connection IDs, it MAY close the connection.¶
-Upon receipt of an increased Retire Prior To field, the peer MUST stop using -the corresponding connection IDs and retire them with RETIRE_CONNECTION_ID -frames before adding the newly provided connection ID to the set of active -connection IDs. This ordering allows an endpoint to replace all active -connection IDs without the possibility of a peer having no available connection -IDs and without exceeding the limit the peer sets in the -active_connection_id_limit transport parameter; see -Section 18.2. Failure to cease using the connection IDs -when requested can result in connection failures, as the issuing endpoint might -be unable to continue using the connection IDs with the active connection.¶
-An endpoint SHOULD limit the number of connection IDs it has retired locally and -have not yet been acknowledged. An endpoint SHOULD allow for sending and -tracking a number of RETIRE_CONNECTION_ID frames of at least twice the -active_connection_id limit. An endpoint MUST NOT forget a connection ID without -retiring it, though it MAY choose to treat having connection IDs in need of -retirement that exceed this limit as a connection error of type -CONNECTION_ID_LIMIT_ERROR.¶
-Endpoints SHOULD NOT issue updates of the Retire Prior To field before receiving -RETIRE_CONNECTION_ID frames that retire all connection IDs indicated by the -previous Retire Prior To value.¶
-Incoming packets are classified on receipt. Packets can either be associated -with an existing connection, or - for servers - potentially create a new -connection.¶
-Endpoints try to associate a packet with an existing connection. If the packet -has a non-zero-length Destination Connection ID corresponding to an existing -connection, QUIC processes that packet accordingly. Note that more than one -connection ID can be associated with a connection; see Section 5.1.¶
-If the Destination Connection ID is zero length and the addressing information -in the packet matches the addressing information the endpoint uses to identify a -connection with a zero-length connection ID, QUIC processes the packet as part -of that connection. An endpoint can use just destination IP and port or both -source and destination addresses for identification, though this makes -connections fragile as described in Section 5.1.¶
-Endpoints can send a Stateless Reset (Section 10.3) for any packets that -cannot be attributed to an existing connection. A stateless reset allows a peer -to more quickly identify when a connection becomes unusable.¶
-Packets that are matched to an existing connection are discarded if the packets -are inconsistent with the state of that connection. For example, packets are -discarded if they indicate a different protocol version than that of the -connection, or if the removal of packet protection is unsuccessful once the -expected keys are available.¶
-Invalid packets that lack strong integrity protection, such as Initial, Retry, -or Version Negotiation, MAY be discarded. An endpoint MUST generate a -connection error if processing the contents of these packets prior to -discovering an error, unless it fully reverts these changes.¶
-Valid packets sent to clients always include a Destination Connection ID that -matches a value the client selects. Clients that choose to receive zero-length -connection IDs can use the local address and port to identify a connection. -Packets that do not match an existing connection, based on Destination -Connection ID or, if this value is zero-length, local IP address and port, are -discarded.¶
-Due to packet reordering or loss, a client might receive packets for a -connection that are encrypted with a key it has not yet computed. The client MAY -drop these packets, or MAY buffer them in anticipation of later packets that -allow it to compute the key.¶
-If a client receives a packet that uses a different version than it initially -selected, it MUST discard that packet.¶
-If a server receives a packet that indicates an unsupported version but is large -enough to initiate a new connection for any supported version, the server -SHOULD send a Version Negotiation packet as described in Section 6.1. A server -MAY limit the number of packets to which it responds with a Version Negotiation -packet. Servers MUST drop smaller packets that specify unsupported versions.¶
-The first packet for an unsupported version can use different semantics and -encodings for any version-specific field. In particular, different packet -protection keys might be used for different versions. Servers that do not -support a particular version are unlikely to be able to decrypt the payload of -the packet or properly interpret the result. Servers SHOULD respond with a -Version Negotiation packet, provided that the datagram is sufficiently long.¶
-Packets with a supported version, or no version field, are matched to a -connection using the connection ID or - for packets with zero-length connection -IDs - the local address and port. These packets are processed using the -selected connection; otherwise, the server continues below.¶
-If the packet is an Initial packet fully conforming with the specification, the -server proceeds with the handshake (Section 7). This commits the server to -the version that the client selected.¶
-If a server refuses to accept a new connection, it SHOULD send an Initial packet -containing a CONNECTION_CLOSE frame with error code CONNECTION_REFUSED.¶
-If the packet is a 0-RTT packet, the server MAY buffer a limited number of these -packets in anticipation of a late-arriving Initial packet. Clients are not able -to send Handshake packets prior to receiving a server response, so servers -SHOULD ignore any such packets.¶
-Servers MUST drop incoming packets under all other circumstances.¶
-A server deployment could load balance among servers using only source and -destination IP addresses and ports. Changes to the client's IP address or port -could result in packets being forwarded to the wrong server. Such a server -deployment could use one of the following methods for connection continuity -when a client's address changes.¶
-A server in a deployment that does not implement a solution to maintain -connection continuity when the client address changes SHOULD indicate migration -is not supported using the disable_active_migration transport parameter. The -disable_active_migration transport parameter does not prohibit connection -migration after a client has acted on a preferred_address transport parameter.¶
-Server deployments that use this simple form of load balancing MUST avoid the -creation of a stateless reset oracle; see Section 21.11.¶
-This document does not define an API for QUIC, but instead defines a set of -functions for QUIC connections that application protocols can rely upon. An -application protocol can assume that an implementation of QUIC provides an -interface that includes the operations described in this section. An -implementation designed for use with a specific application protocol might -provide only those operations that are used by that protocol.¶
-When implementing the client role, an application protocol can:¶
-When implementing the server role, an application protocol can:¶
-In either role, an application protocol can:¶
-Version negotiation allows a server to indicate that it does not support -the version the client used. A server sends a Version Negotiation packet in -response to each packet that might initiate a new connection; see -Section 5.2 for details.¶
-The size of the first packet sent by a client will determine whether a server -sends a Version Negotiation packet. Clients that support multiple QUIC versions -SHOULD ensure that the first UDP datagram they send is sized to the largest of -the minimum datagram sizes from all versions they support, using PADDING frames -(Section 19.1) as necessary. This ensures that the server responds if there -is a mutually supported version. A server might not send a Version Negotiation -packet if the datagram it receives is smaller than the minimum size specified in -a different version; see Section 14.1.¶
-If the version selected by the client is not acceptable to the server, the -server responds with a Version Negotiation packet; see Section 17.2.1. This -includes a list of versions that the server will accept. An endpoint MUST NOT -send a Version Negotiation packet in response to receiving a Version Negotiation -packet.¶
-This system allows a server to process packets with unsupported versions without -retaining state. Though either the Initial packet or the Version Negotiation -packet that is sent in response could be lost, the client will send new packets -until it successfully receives a response or it abandons the connection attempt. -As a result, the client discards all state for the connection and does not send -any more packets on the connection.¶
-A server MAY limit the number of Version Negotiation packets it sends. For -instance, a server that is able to recognize packets as 0-RTT might choose not -to send Version Negotiation packets in response to 0-RTT packets with the -expectation that it will eventually receive an Initial packet.¶
-Version Negotiation packets are designed to allow future versions of QUIC to -negotiate the version in use between endpoints. Future versions of QUIC might -change how implementations that support multiple versions of QUIC react to -Version Negotiation packets when attempting to establish a connection using this -version.¶
-A client that supports only this version of QUIC MUST abandon the current -connection attempt if it receives a Version Negotiation packet, with the -following two exceptions. A client MUST discard any Version Negotiation packet -if it has received and successfully processed any other packet, including an -earlier Version Negotiation packet. A client MUST discard a Version Negotiation -packet that lists the QUIC version selected by the client.¶
-How to perform version negotiation is left as future work defined by future -versions of QUIC. In particular, that future work will ensure robustness -against version downgrade attacks; see Section 21.12.¶
-[[RFC editor: please remove this section before publication.]]¶
-When a draft implementation receives a Version Negotiation packet, it MAY use -it to attempt a new connection with one of the versions listed in the packet, -instead of abandoning the current connection attempt; see Section 6.2.¶
-The client MUST check that the Destination and Source Connection ID fields -match the Source and Destination Connection ID fields in a packet that the -client sent. If this check fails, the packet MUST be discarded.¶
-Once the Version Negotiation packet is determined to be valid, the client then -selects an acceptable protocol version from the list provided by the server. -The client then attempts to create a new connection using that version. The new -connection MUST use a new random Destination Connection ID different from the -one it had previously sent.¶
-Note that this mechanism does not protect against downgrade attacks and -MUST NOT be used outside of draft implementations.¶
-For a server to use a new version in the future, clients need to correctly -handle unsupported versions. Some version numbers (0x?a?a?a?a as defined in -Section 15) are reserved for inclusion in fields that contain version -numbers.¶
-Endpoints MAY add reserved versions to any field where unknown or unsupported -versions are ignored to test that a peer correctly ignores the value. For -instance, an endpoint could include a reserved version in a Version Negotiation -packet; see Section 17.2.1. Endpoints MAY send packets with a reserved -version to test that a peer correctly discards the packet.¶
-QUIC relies on a combined cryptographic and transport handshake to minimize -connection establishment latency. QUIC uses the CRYPTO frame (Section 19.6) -to transmit the cryptographic handshake. Version 0x00000001 of QUIC uses TLS as -described in [QUIC-TLS]; a different QUIC version number could indicate that a -different cryptographic handshake protocol is in use.¶
-QUIC provides reliable, ordered delivery of the cryptographic handshake -data. QUIC packet protection is used to encrypt as much of the handshake -protocol as possible. The cryptographic handshake MUST provide the following -properties:¶
-authenticated key exchange, where¶
- -Endpoints can use packets sent during the handshake to test for Explicit -Congestion Notification (ECN) support; see Section 13.4. An endpoint verifies support -for ECN by observing whether the ACK frames acknowledging the first packets it -sends carry ECN counts, as described in Section 13.4.2.¶
-The CRYPTO frame can be sent in different packet number spaces -(Section 12.3). The offsets used by CRYPTO frames to ensure ordered -delivery of cryptographic handshake data start from zero in each packet number -space.¶
-Figure 4 shows a simplified handshake and the exchange of packets and frames -that are used to advance the handshake. Exchange of application data during the -handshake is enabled where possible, shown with a '*'. Once completed, -endpoints are able to exchange application data.¶
-An endpoint validates support for Explicit Congestion Notification (ECN) by -observing whether the ACK frames acknowledging the first packets it sends carry -ECN counts, as described in Section 13.4.2.¶
-Endpoints MUST explicitly negotiate an application protocol. This avoids -situations where there is a disagreement about the protocol that is in use.¶
-Details of how TLS is integrated with QUIC are provided in [QUIC-TLS], but -some examples are provided here. An extension of this exchange to support -client address validation is shown in Section 8.1.2.¶
-Once any address validation exchanges are complete, the -cryptographic handshake is used to agree on cryptographic keys. The -cryptographic handshake is carried in Initial (Section 17.2.2) and Handshake -(Section 17.2.4) packets.¶
-Figure 5 provides an overview of the 1-RTT handshake. Each line -shows a QUIC packet with the packet type and packet number shown first, followed -by the frames that are typically contained in those packets. So, for instance -the first packet is of type Initial, with packet number 0, and contains a CRYPTO -frame carrying the ClientHello.¶
-Multiple QUIC packets -- even of different packet types -- can be coalesced into -a single UDP datagram; see Section 12.2. As a result, this handshake -could consist of as few as 4 UDP datagrams, or any number more (subject to -limits inherent to the protocol, such as congestion control and -anti-amplification). For instance, the server's first flight contains Initial -packets, Handshake packets, and "0.5-RTT data" in 1-RTT packets.¶
-Figure 6 shows an example of a connection with a 0-RTT handshake -and a single packet of 0-RTT data. Note that as described in -Section 12.3, the server acknowledges 0-RTT data in 1-RTT packets, and -the client sends 1-RTT packets in the same packet number space.¶
-A connection ID is used to ensure consistent routing of packets, as described in -Section 5.1. The long header contains two connection IDs: the Destination -Connection ID is chosen by the recipient of the packet and is used to provide -consistent routing; the Source Connection ID is used to set the Destination -Connection ID used by the peer.¶
-During the handshake, packets with the long header (Section 17.2) are used -to establish the connection IDs used by both endpoints. Each endpoint uses the -Source Connection ID field to specify the connection ID that is used in the -Destination Connection ID field of packets being sent to them. After processing -the first Initial packet, each endpoint sets the Destination Connection ID -field in subsequent packets it sends to the value of the Source Connection ID -field that it received.¶
-When an Initial packet is sent by a client that has not previously received an -Initial or Retry packet from the server, the client populates the Destination -Connection ID field with an unpredictable value. This Destination Connection ID -MUST be at least 8 bytes in length. Until a packet is received from the server, -the client MUST use the same Destination Connection ID value on all packets in -this connection.¶
-The Destination Connection ID field from the first Initial packet sent by a -client is used to determine packet protection keys for Initial packets. These -keys change after receiving a Retry packet; see Section 5.2 of [QUIC-TLS].¶
-The client populates the Source Connection ID field with a value of its choosing -and sets the Source Connection ID Length field to indicate the length.¶
-The first flight of 0-RTT packets use the same Destination Connection ID and -Source Connection ID values as the client's first Initial packet.¶
-Upon first receiving an Initial or Retry packet from the server, the client uses -the Source Connection ID supplied by the server as the Destination Connection ID -for subsequent packets, including any 0-RTT packets. This means that a client -might have to change the connection ID it sets in the Destination Connection ID -field twice during connection establishment: once in response to a Retry, and -once in response to an Initial packet from the server. Once a client has -received a valid Initial packet from the server, it MUST discard any subsequent -packet it receives with a different Source Connection ID.¶
-A client MUST change the Destination Connection ID it uses for sending packets -in response to only the first received Initial or Retry packet. A server MUST -set the Destination Connection ID it uses for sending packets based on the first -received Initial packet. Any further changes to the Destination Connection ID -are only permitted if the values are taken from NEW_CONNECTION_ID frames; if -subsequent Initial packets include a different Source Connection ID, they MUST -be discarded. This avoids unpredictable outcomes that might otherwise result -from stateless processing of multiple Initial packets with different Source -Connection IDs.¶
-The Destination Connection ID that an endpoint sends can change over the -lifetime of a connection, especially in response to connection migration -(Section 9); see Section 5.1.1 for details.¶
-The choice each endpoint makes about connection IDs during the handshake is -authenticated by including all values in transport parameters; see -Section 7.4. This ensures that all connection IDs used for the -handshake are also authenticated by the cryptographic handshake.¶
-Each endpoint includes the value of the Source Connection ID field from the -first Initial packet it sent in the initial_source_connection_id transport -parameter; see Section 18.2. A server includes the -Destination Connection ID field from the first Initial packet it received from -the client in the original_destination_connection_id transport parameter; if the -server sent a Retry packet, this refers to the first Initial packet received -before sending the Retry packet. If it sends a Retry packet, a server also -includes the Source Connection ID field from the Retry packet in the -retry_source_connection_id transport parameter.¶
-The values provided by a peer for these transport parameters MUST match the -values that an endpoint used in the Destination and Source Connection ID fields -of Initial packets that it sent. Including connection ID values in transport -parameters and verifying them ensures that that an attacker cannot influence -the choice of connection ID for a successful connection by injecting packets -carrying attacker-chosen connection IDs during the handshake.¶
-An endpoint MUST treat absence of the initial_source_connection_id transport -parameter from either endpoint or absence of the -original_destination_connection_id transport parameter from the server as a -connection error of type TRANSPORT_PARAMETER_ERROR.¶
-An endpoint MUST treat the following as a connection error of type -TRANSPORT_PARAMETER_ERROR or PROTOCOL_VIOLATION:¶
-If a zero-length connection ID is selected, the corresponding transport -parameter is included with a zero-length value.¶
-Figure 7 shows the connection IDs (with DCID=Destination Connection ID, -SCID=Source Connection ID) that are used in a complete handshake. The exchange -of Initial packets is shown, plus the later exchange of 1-RTT packets that -includes the connection ID established during the handshake.¶
-Figure 8 shows a similar handshake that includes a Retry packet.¶
-In both cases (Figure 7 and Figure 8), the client sets the
-value of the initial_source_connection_id transport parameter to C1
.¶
When the handshake does not include a Retry (Figure 7), the server sets
-original_destination_connection_id to S1
and initial_source_connection_id to
-S3
. In this case, the server does not include a retry_source_connection_id
-transport parameter.¶
When the handshake includes a Retry (Figure 8), the server sets
-original_destination_connection_id to S1
, retry_source_connection_id to S2
,
-and initial_source_connection_id to S3
.¶
During connection establishment, both endpoints make authenticated declarations -of their transport parameters. Endpoints are required to comply with the -restrictions that each parameter defines; the description of each parameter -includes rules for its handling.¶
-Transport parameters are declarations that are made unilaterally by each -endpoint. Each endpoint can choose values for transport parameters independent -of the values chosen by its peer.¶
-The encoding of the transport parameters is detailed in -Section 18.¶
-QUIC includes the encoded transport parameters in the cryptographic handshake. -Once the handshake completes, the transport parameters declared by the peer are -available. Each endpoint validates the values provided by its peer.¶
-Definitions for each of the defined transport parameters are included in -Section 18.2.¶
-An endpoint MUST treat receipt of a transport parameter with an invalid value as -a connection error of type TRANSPORT_PARAMETER_ERROR.¶
-An endpoint MUST NOT send a parameter more than once in a given transport -parameters extension. An endpoint SHOULD treat receipt of duplicate transport -parameters as a connection error of type TRANSPORT_PARAMETER_ERROR.¶
-Endpoints use transport parameters to authenticate the negotiation of -connection IDs during the handshake; see Section 7.3.¶
-Application Layer Protocol Negotiation (ALPN; see [ALPN]) allows -clients to offer multiple application protocols during connection -establishment. The transport parameters that a client includes during the -handshake apply to all application protocols that the client offers. Application -protocols can recommend values for transport parameters, such as the initial -flow control limits. However, application protocols that set constraints on -values for transport parameters could make it impossible for a client to offer -multiple application protocols if these constraints conflict.¶
-Using 0-RTT depends on both client and server using protocol parameters that -were negotiated from a previous connection. To enable 0-RTT, endpoints store -the value of the server transport parameters from a connection and apply them -to any 0-RTT packets that are sent in subsequent connections to that peer. This -information is stored with any information required by the application -protocol or cryptographic handshake; see Section 4.6 of [QUIC-TLS].¶
-Remembered transport parameters apply to the new connection until the handshake -completes and the client starts sending 1-RTT packets. Once the handshake -completes, the client uses the transport parameters established in the -handshake. Not all transport parameters are remembered, as some do not apply to -future connections or they have no effect on use of 0-RTT.¶
-The definition of a new transport parameter (Section 7.4.2) MUST -specify whether storing the transport parameter for 0-RTT is mandatory, -optional, or prohibited. A client need not store a transport parameter it cannot -process.¶
-A client MUST NOT use remembered values for the following parameters: -ack_delay_exponent, max_ack_delay, initial_source_connection_id, -original_destination_connection_id, preferred_address, -retry_source_connection_id, and stateless_reset_token. The client MUST use the -server's new values in the handshake instead; if the server does not provide new -values, the default value is used.¶
-A client that attempts to send 0-RTT data MUST remember all other transport -parameters used by the server that it is able to process. The server can -remember these transport parameters, or store an integrity-protected copy of -the values in the ticket and recover the information when accepting 0-RTT data. -A server uses the transport parameters in determining whether to accept 0-RTT -data.¶
-If 0-RTT data is accepted by the server, the server MUST NOT reduce any -limits or alter any values that might be violated by the client with its -0-RTT data. In particular, a server that accepts 0-RTT data MUST NOT set -values for the following parameters (Section 18.2) -that are smaller than the remembered value of the parameters.¶
-Omitting or setting a zero value for certain transport parameters can result in -0-RTT data being enabled, but not usable. The applicable subset of transport -parameters that permit sending of application data SHOULD be set to non-zero -values for 0-RTT. This includes initial_max_data and either -initial_max_streams_bidi and initial_max_stream_data_bidi_remote, or -initial_max_streams_uni and initial_max_stream_data_uni.¶
-A server MAY store and recover the previously sent values of the -max_idle_timeout, max_udp_payload_size, and disable_active_migration parameters -and reject 0-RTT if it selects smaller values. Lowering the values of these -parameters while also accepting 0-RTT data could degrade the performance of the -connection. Specifically, lowering the max_udp_payload_size could result in -dropped packets leading to worse performance compared to rejecting 0-RTT data -outright.¶
-A server MUST reject 0-RTT data if the restored values for transport -parameters cannot be supported.¶
-When sending frames in 0-RTT packets, a client MUST only use remembered -transport parameters; importantly, it MUST NOT use updated values that it learns -from the server's updated transport parameters or from frames received in 1-RTT -packets. Updated values of transport parameters from the handshake apply only -to 1-RTT packets. For instance, flow control limits from remembered transport -parameters apply to all 0-RTT packets even if those values are increased by the -handshake or by frames sent in 1-RTT packets. A server MAY treat use of updated -transport parameters in 0-RTT as a connection error of type PROTOCOL_VIOLATION.¶
-New transport parameters can be used to negotiate new protocol behavior. An -endpoint MUST ignore transport parameters that it does not support. Absence of -a transport parameter therefore disables any optional protocol feature that is -negotiated using the parameter. As described in Section 18.1, -some identifiers are reserved in order to exercise this requirement.¶
-A client that does not understand a transport parameter can discard it and -attempt 0-RTT on subsequent connections. However, if the client adds support -for a discarded transport parameter, it risks violating the constraints that -the transport parameter establishes if it attempts 0-RTT. New transport -parameters can avoid this problem by setting a default of the most conservative -value.¶
-New transport parameters can be registered according to the rules in -Section 22.2.¶
-Implementations need to maintain a buffer of CRYPTO data received out of order. -Because there is no flow control of CRYPTO frames, an endpoint could -potentially force its peer to buffer an unbounded amount of data.¶
-Implementations MUST support buffering at least 4096 bytes of data received in -out-of-order CRYPTO frames. Endpoints MAY choose to allow more data to be -buffered during the handshake. A larger limit during the handshake could allow -for larger keys or credentials to be exchanged. An endpoint's buffer size does -not need to remain constant during the life of the connection.¶
-Being unable to buffer CRYPTO frames during the handshake can lead to a -connection failure. If an endpoint's buffer is exceeded during the handshake, it -can expand its buffer temporarily to complete the handshake. If an endpoint -does not expand its buffer, it MUST close the connection with a -CRYPTO_BUFFER_EXCEEDED error code.¶
-Once the handshake completes, if an endpoint is unable to buffer all data in a -CRYPTO frame, it MAY discard that CRYPTO frame and all CRYPTO frames received in -the future, or it MAY close the connection with a CRYPTO_BUFFER_EXCEEDED error -code. Packets containing discarded CRYPTO frames MUST be acknowledged because -the packet has been received and processed by the transport even though the -CRYPTO frame was discarded.¶
-Address validation ensures that an endpoint cannot be used for a traffic -amplification attack. In such an attack, a packet is sent to a server with -spoofed source address information that identifies a victim. If a server -generates more or larger packets in response to that packet, the attacker can -use the server to send more data toward the victim than it would be able to send -on its own.¶
-The primary defense against amplification attack is verifying that an endpoint -is able to receive packets at the transport address that it claims. Address -validation is performed both during connection establishment (see -Section 8.1) and during connection migration (see -Section 8.2).¶
-Connection establishment implicitly provides address validation for both -endpoints. In particular, receipt of a packet protected with Handshake keys -confirms that the peer successfully processed an Initial packet. Once an -endpoint has successfully processed a Handshake packet from the peer, it can -consider the peer address to have been validated.¶
-Additionally, an endpoint MAY consider the peer address validated if the peer -uses a connection ID chosen by the endpoint and the connection ID contains at -least 64 bits of entropy.¶
-For the client, the value of the Destination Connection ID field in its first -Initial packet allows it to validate the server address as a part of -successfully processing any packet. Initial packets from the server are -protected with keys that are derived from this value (see Section 5.2 of -[QUIC-TLS]). Alternatively, the value is echoed by the server in Version -Negotiation packets (Section 6) or included in the Integrity Tag -in Retry packets (Section 5.8 of [QUIC-TLS]).¶
-Prior to validating the client address, servers MUST NOT send more than three -times as many bytes as the number of bytes they have received. This limits the -magnitude of any amplification attack that can be mounted using spoofed source -addresses. For the purposes of avoiding amplification prior to address -validation, servers MUST count all of the payload bytes received in datagrams -that are uniquely attributed to a single connection. This includes datagrams -that contain packets that are successfully processed and datagrams that contain -packets that are all discarded.¶
-Clients MUST ensure that UDP datagrams containing Initial packets have UDP -payloads of at least 1200 bytes, adding PADDING frames as necessary. -A client that sends padded datagrams allows the server to -send more data prior to completing address validation.¶
-Loss of an Initial or Handshake packet from the server can cause a deadlock if -the client does not send additional Initial or Handshake packets. A deadlock -could occur when the server reaches its anti-amplification limit and the client -has received acknowledgements for all the data it has sent. In this case, when -the client has no reason to send additional packets, the server will be unable -to send more data because it has not validated the client's address. To prevent -this deadlock, clients MUST send a packet on a probe timeout (PTO, see Section -6.2 of [QUIC-RECOVERY]). Specifically, the client MUST send an Initial packet -in a UDP datagram that contains at least 1200 bytes if it does not have -Handshake keys, and otherwise send a Handshake packet.¶
-A server might wish to validate the client address before starting the -cryptographic handshake. QUIC uses a token in the Initial packet to provide -address validation prior to completing the handshake. This token is delivered to -the client during connection establishment with a Retry packet (see -Section 8.1.2) or in a previous connection using the NEW_TOKEN frame (see -Section 8.1.3).¶
-In addition to sending limits imposed prior to address validation, servers are -also constrained in what they can send by the limits set by the congestion -controller. Clients are only constrained by the congestion controller.¶
-A token sent in a NEW_TOKEN frames or a Retry packet MUST be constructed in a -way that allows the server to identify how it was provided to a client. These -tokens are carried in the same field, but require different handling from -servers.¶
-Upon receiving the client's Initial packet, the server can request address -validation by sending a Retry packet (Section 17.2.5) containing a token. This -token MUST be repeated by the client in all Initial packets it sends for that -connection after it receives the Retry packet.¶
-In response to processing an Initial containing a token that was provided in a -Retry packet, a server cannot send another Retry packet; it can only refuse the -connection or permit it to proceed.¶
-As long as it is not possible for an attacker to generate a valid token for -its own address (see Section 8.1.4) and the client is able to return -that token, it proves to the server that it received the token.¶
-A server can also use a Retry packet to defer the state and processing costs of -connection establishment. Requiring the server to provide a different -connection ID, along with the original_destination_connection_id transport -parameter defined in Section 18.2, forces the server to -demonstrate that it, or an entity it cooperates with, received the original -Initial packet from the client. Providing a different connection ID also grants -a server some control over how subsequent packets are routed. This can be used -to direct connections to a different server instance.¶
-If a server receives a client Initial that can be unprotected but contains an -invalid Retry token, it knows the client will not accept another Retry token. -The server can discard such a packet and allow the client to time out to -detect handshake failure, but that could impose a significant latency penalty on -the client. Instead, the server SHOULD immediately close (Section 10.2) -the connection with an INVALID_TOKEN error. Note that a server has not -established any state for the connection at this point and so does not enter the -closing period.¶
-A flow showing the use of a Retry packet is shown in Figure 9.¶
-A server MAY provide clients with an address validation token during one -connection that can be used on a subsequent connection. Address validation is -especially important with 0-RTT because a server potentially sends a significant -amount of data to a client in response to 0-RTT data.¶
-The server uses the NEW_TOKEN frame (Section 19.7) to provide the client -with an address validation token that can be used to validate future -connections. In a future connection, the client includes this token in Initial -packets to provide address validation. The client MUST include the token in all -Initial packets it sends, unless a Retry replaces the token with a newer one. -The client MUST NOT use the token provided in a Retry for future connections. -Servers MAY discard any Initial packet that does not carry the expected token.¶
-Unlike the token that is created for a Retry packet, which is used immediately, -the token sent in the NEW_TOKEN frame can be used after some period of -time has passed. Thus, a token SHOULD have an expiration time, which could -be either an explicit expiration time or an issued timestamp that can be -used to dynamically calculate the expiration time. A server can store the -expiration time or include it in an encrypted form in the token.¶
-A token issued with NEW_TOKEN MUST NOT include information that would allow -values to be linked by an observer to the connection on which it was -issued. For example, it cannot include the previous connection ID or addressing -information, unless the values are encrypted. A server MUST ensure that -every NEW_TOKEN frame it sends is unique across all clients, with the exception -of those sent to repair losses of previously sent NEW_TOKEN frames. Information -that allows the server to distinguish between tokens from Retry and NEW_TOKEN -MAY be accessible to entities other than the server.¶
-It is unlikely that the client port number is the same on two different -connections; validating the port is therefore unlikely to be successful.¶
-A token received in a NEW_TOKEN frame is applicable to any server that the -connection is considered authoritative for (e.g., server names included in the -certificate). When connecting to a server for which the client retains an -applicable and unused token, it SHOULD include that token in the Token field of -its Initial packet. Including a token might allow the server to validate the -client address without an additional round trip. A client MUST NOT include a -token that is not applicable to the server that it is connecting to, unless the -client has the knowledge that the server that issued the token and the server -the client is connecting to are jointly managing the tokens. A client MAY use a -token from any previous connection to that server.¶
-A token allows a server to correlate activity between the connection where the -token was issued and any connection where it is used. Clients that want to -break continuity of identity with a server MAY discard tokens provided using the -NEW_TOKEN frame. In comparison, a token obtained in a Retry packet MUST be used -immediately during the connection attempt and cannot be used in subsequent -connection attempts.¶
-A client SHOULD NOT reuse a NEW_TOKEN token for different connection attempts. -Reusing a token allows connections to be linked by entities on the network path; -see Section 9.5.¶
-Clients might receive multiple tokens on a single connection. Aside from -preventing linkability, any token can be used in any connection attempt. -Servers can send additional tokens to either enable address validation for -multiple connection attempts or to replace older tokens that might become -invalid. For a client, this ambiguity means that sending the most recent unused -token is most likely to be effective. Though saving and using older tokens has -no negative consequences, clients can regard older tokens as being less likely -be useful to the server for address validation.¶
-When a server receives an Initial packet with an address validation token, it -MUST attempt to validate the token, unless it has already completed address -validation. If the token is invalid then the server SHOULD proceed as if -the client did not have a validated address, including potentially sending -a Retry. A server SHOULD encode tokens provided with NEW_TOKEN frames and Retry -packets differently, and validate the latter more strictly. If the validation -succeeds, the server SHOULD then allow the handshake to proceed.¶
-The rationale for treating the client as unvalidated rather than discarding -the packet is that the client might have received the token in a previous -connection using the NEW_TOKEN frame, and if the server has lost state, it -might be unable to validate the token at all, leading to connection failure if -the packet is discarded.¶
-In a stateless design, a server can use encrypted and authenticated tokens to -pass information to clients that the server can later recover and use to -validate a client address. Tokens are not integrated into the cryptographic -handshake and so they are not authenticated. For instance, a client might be -able to reuse a token. To avoid attacks that exploit this property, a server -can limit its use of tokens to only the information needed to validate client -addresses.¶
-Clients MAY use tokens obtained on one connection for any connection attempt -using the same version. When selecting a token to use, clients do not need to -consider other properties of the connection that is being attempted, including -the choice of possible application protocols, session tickets, or other -connection properties.¶
-An address validation token MUST be difficult to guess. Including a large -enough random value in the token would be sufficient, but this depends on the -server remembering the value it sends to clients.¶
-A token-based scheme allows the server to offload any state associated with -validation to the client. For this design to work, the token MUST be covered by -integrity protection against modification or falsification by clients. Without -integrity protection, malicious clients could generate or guess values for -tokens that would be accepted by the server. Only the server requires access to -the integrity protection key for tokens.¶
-There is no need for a single well-defined format for the token because the -server that generates the token also consumes it. Tokens sent in Retry packets -SHOULD include information that allows the server to verify that the source IP -address and port in client packets remain constant.¶
-Tokens sent in NEW_TOKEN frames MUST include information that allows the server -to verify that the client IP address has not changed from when the token was -issued. Servers can use tokens from NEW_TOKEN in deciding not to send a Retry -packet, even if the client address has changed. If the client IP address has -changed, the server MUST adhere to the anti-amplification limits found in -Section 8.1. Note that in the presence of NAT, this requirement -might be insufficient to protect other hosts that share the NAT from -amplification attack.¶
-Attackers could replay tokens to use servers as amplifiers in DDoS attacks. To -protect against such attacks, servers MUST ensure that replay of tokens is -prevented or limited. Servers SHOULD ensure that tokens sent in Retry packets -are only accepted for a short time. Tokens that are provided in NEW_TOKEN frames -(Section 19.7) need to be valid for longer, but SHOULD NOT be accepted -multiple times in a short period. Servers are encouraged to allow tokens to be -used only once, if possible; tokens MAY include additional information about -clients to further narrow applicability or reuse.¶
-Path validation is used by both peers during connection migration -(see Section 9) to verify reachability after a change of address. -In path validation, endpoints test reachability between a specific local -address and a specific peer address, where an address is the two-tuple of -IP address and port.¶
-Path validation tests that packets sent on a path to a peer are -received by that peer. Path validation is used to ensure that packets received -from a migrating peer do not carry a spoofed source address.¶
-Path validation does not validate that a peer can send in the return direction. -Acknowledgments cannot be used for return path validation because they contain -insufficient entropy and might be spoofed. Endpoints independently determine -reachability on each direction of a path, and therefore return reachability can -only be established by the peer.¶
-Path validation can be used at any time by either endpoint. For instance, an -endpoint might check that a peer is still in possession of its address after a -period of quiescence.¶
-Path validation is not designed as a NAT traversal mechanism. Though the -mechanism described here might be effective for the creation of NAT bindings -that support NAT traversal, the expectation is that one or other peer is able to -receive packets without first having sent a packet on that path. Effective NAT -traversal needs additional synchronization mechanisms that are not provided -here.¶
-An endpoint MAY include other frames with the PATH_CHALLENGE and PATH_RESPONSE -frames used for path validation. In particular, an endpoint can include PADDING -frames with a PATH_CHALLENGE frame for Path Maximum Transmission Unit Discovery -(PMTUD; see Section 14.2.1); it can also include its own PATH_CHALLENGE frame with -a PATH_RESPONSE frame.¶
-An endpoint uses a new connection ID for probes sent from a new local address; -see Section 9.5. When probing a new path, an endpoint can -ensure that its peer has an unused connection ID available for -responses. Sending NEW_CONNECTION_ID and PATH_CHALLENGE frames in the same -packet, if the peer's active_connection_id_limit permits, ensures that an unused -connection ID will be available to the peer when sending a response.¶
-An endpoint can choose to simultaneously probe multiple paths. The number of -simultaneous paths used for probes is limited by the number of extra connection -IDs its peer has previously supplied, since each new local address used for a -probe requires a previously unused connection ID.¶
-To initiate path validation, an endpoint sends a PATH_CHALLENGE frame containing -an unpredictable payload on the path to be validated.¶
-An endpoint MAY send multiple PATH_CHALLENGE frames to guard against packet -loss. However, an endpoint SHOULD NOT send multiple PATH_CHALLENGE frames in a -single packet.¶
-An endpoint SHOULD NOT probe a new path with packets containing a PATH_CHALLENGE -frame more frequently than it would send an Initial packet. This ensures that -connection migration is no more load on a new path than establishing a new -connection.¶
-The endpoint MUST use unpredictable data in every PATH_CHALLENGE frame so that -it can associate the peer's response with the corresponding PATH_CHALLENGE.¶
-An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame to at -least the smallest allowed maximum datagram size of 1200 bytes. Sending UDP -datagrams of this size ensures that the network path from the endpoint to the -peer can be used for QUIC; see Section 14.¶
-On receiving a PATH_CHALLENGE frame, an endpoint MUST respond by echoing the -data contained in the PATH_CHALLENGE frame in a PATH_RESPONSE frame. An -endpoint MUST NOT delay transmission of a packet containing a PATH_RESPONSE -frame unless constrained by congestion control.¶
-A PATH_RESPONSE frame MUST be sent on the network path where the -PATH_CHALLENGE was received. This ensures that path validation by a peer only -succeeds if the path is functional in both directions. This requirement MUST -NOT be enforced by the endpoint that initiates path validation as that would -enable an attack on migration; see Section 9.3.3.¶
-An endpoint MUST expand datagrams that contain a PATH_RESPONSE frame to at -least the smallest allowed maximum datagram size of 1200 bytes. This verifies -that the path is able to carry datagrams of this size in both directions.¶
-An endpoint MUST NOT send more than one PATH_RESPONSE frame in response to one -PATH_CHALLENGE frame; see Section 13.3. The peer is -expected to send more PATH_CHALLENGE frames as necessary to evoke additional -PATH_RESPONSE frames.¶
-Path validation succeeds when a PATH_RESPONSE frame is received that contains -the data that was sent in a previous PATH_CHALLENGE frame. A PATH_RESPONSE -frame received on any network path validates the path on which the -PATH_CHALLENGE was sent.¶
-Receipt of an acknowledgment for a packet containing a PATH_CHALLENGE frame is -not adequate validation, since the acknowledgment can be spoofed by a malicious -peer.¶
-Path validation only fails when the endpoint attempting to validate the path -abandons its attempt to validate the path.¶
-Endpoints SHOULD abandon path validation based on a timer. When setting this -timer, implementations are cautioned that the new path could have a longer -round-trip time than the original. A value of three times the larger of the -current Probe Timeout (PTO) or the PTO for the new path (that is, using -kInitialRtt as defined in [QUIC-RECOVERY]) is RECOMMENDED.¶
-This timeout allows for multiple PTOs to expire prior to failing path -validation, so that loss of a single PATH_CHALLENGE or PATH_RESPONSE frame -does not cause path validation failure.¶
-Note that the endpoint might receive packets containing other frames on the new -path, but a PATH_RESPONSE frame with appropriate data is required for path -validation to succeed.¶
-When an endpoint abandons path validation, it determines that the path is -unusable. This does not necessarily imply a failure of the connection - -endpoints can continue sending packets over other paths as appropriate. If no -paths are available, an endpoint can wait for a new path to become available or -close the connection.¶
-A path validation might be abandoned for other reasons besides -failure. Primarily, this happens if a connection migration to a new path is -initiated while a path validation on the old path is in progress.¶
-The use of a connection ID allows connections to survive changes to endpoint -addresses (IP address and port), such as those caused by an -endpoint migrating to a new network. This section describes the process by -which an endpoint migrates to a new address.¶
-The design of QUIC relies on endpoints retaining a stable address for the -duration of the handshake. An endpoint MUST NOT initiate connection migration -before the handshake is confirmed, as defined in section 4.1.2 of [QUIC-TLS].¶
-If the peer sent the disable_active_migration transport parameter, an endpoint -also MUST NOT send packets (including probing packets; see Section 9.1) from a -different local address to the address the peer used during the handshake, -unless the endpoint has acted on a preferred_address transport parameter from -the peer. If the peer violates this requirement, the endpoint MUST either drop -the incoming packets on that path without generating a stateless reset or -proceed with path validation and allow the peer to migrate. Generating a -stateless reset or closing the connection would allow third parties in the -network to cause connections to close by spoofing or otherwise manipulating -observed traffic.¶
-Not all changes of peer address are intentional, or active, migrations. The peer -could experience NAT rebinding: a change of address due to a middlebox, usually -a NAT, allocating a new outgoing port or even a new outgoing IP address for a -flow. An endpoint MUST perform path validation (Section 8.2) if it -detects any change to a peer's address, unless it has previously validated that -address.¶
-When an endpoint has no validated path on which to send packets, it MAY discard -connection state. An endpoint capable of connection migration MAY wait for a -new path to become available before discarding connection state.¶
-This document limits migration of connections to new client addresses, except as -described in Section 9.6. Clients are responsible for initiating all -migrations. Servers do not send non-probing packets (see Section 9.1) toward a -client address until they see a non-probing packet from that address. If a -client receives packets from an unknown server address, the client MUST discard -these packets.¶
-An endpoint MAY probe for peer reachability from a new local address using path -validation (Section 8.2) prior to migrating the connection to the new -local address. Failure of path validation simply means that the new path is not -usable for this connection. Failure to validate a path does not cause the -connection to end unless there are no valid alternative paths available.¶
-PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames are -"probing frames", and all other frames are "non-probing frames". A packet -containing only probing frames is a "probing packet", and a packet containing -any other frame is a "non-probing packet".¶
-An endpoint can migrate a connection to a new local address by sending packets -containing non-probing frames from that address.¶
-Each endpoint validates its peer's address during connection establishment. -Therefore, a migrating endpoint can send to its peer knowing that the peer is -willing to receive at the peer's current address. Thus an endpoint can migrate -to a new local address without first validating the peer's address.¶
-To establish reachability on the new path, an endpoint initiates path -validation (Section 8.2) on the new path. An endpoint MAY defer path -validation until after a peer sends the next non-probing frame to its new -address.¶
-When migrating, the new path might not support the endpoint's current sending -rate. Therefore, the endpoint resets its congestion controller and RTT estimate, -as described in Section 9.4.¶
-The new path might not have the same ECN capability. Therefore, the endpoint -validates ECN capability as described in Section 13.4.¶
-Receiving a packet from a new peer address containing a non-probing frame -indicates that the peer has migrated to that address.¶
-If the recipient permits the migration, it MUST send subsequent packets -to the new peer address and MUST initiate path validation (Section 8.2) -to verify the peer's ownership of the address if validation is not already -underway.¶
-An endpoint only changes the address to which it sends packets in response to -the highest-numbered non-probing packet. This ensures that an endpoint does not -send packets to an old peer address in the case that it receives reordered -packets.¶
-An endpoint MAY send data to an unvalidated peer address, but it MUST protect -against potential attacks as described in Section 9.3.1 and -Section 9.3.2. An endpoint MAY skip validation of a peer address if that -address has been seen recently. In particular, if an endpoint returns to a -previously-validated path after detecting some form of spurious migration, -skipping address validation and restoring loss detection and congestion state -can reduce the performance impact of the attack.¶
-After changing the address to which it sends non-probing packets, an endpoint -can abandon any path validation for other addresses.¶
-Receiving a packet from a new peer address could be the result of a NAT -rebinding at the peer.¶
-After verifying a new client address, the server SHOULD send new address -validation tokens (Section 8) to the client.¶
-It is possible that a peer is spoofing its source address to cause an endpoint -to send excessive amounts of data to an unwilling host. If the endpoint sends -significantly more data than the spoofing peer, connection migration might be -used to amplify the volume of data that an attacker can generate toward a -victim.¶
-As described in Section 9.3, an endpoint is required to validate a -peer's new address to confirm the peer's possession of the new address. Until a -peer's address is deemed valid, an endpoint MUST limit the rate at which it -sends data to this address. The endpoint MUST NOT send more than a minimum -congestion window's worth of data per estimated round-trip time (kMinimumWindow, -as defined in [QUIC-RECOVERY]). In the absence of this limit, an endpoint -risks being used for a denial of service attack against an unsuspecting victim. -Note that since the endpoint will not have any round-trip time measurements to -this address, the estimate SHOULD be the default initial value; see -[QUIC-RECOVERY].¶
-If an endpoint skips validation of a peer address as described above, it does -not need to limit its sending rate.¶
-An on-path attacker could cause a spurious connection migration by copying and -forwarding a packet with a spoofed address such that it arrives before the -original packet. The packet with the spoofed address will be seen to come from -a migrating connection, and the original packet will be seen as a duplicate and -dropped. After a spurious migration, validation of the source address will fail -because the entity at the source address does not have the necessary -cryptographic keys to read or respond to the PATH_CHALLENGE frame that is sent -to it even if it wanted to.¶
-To protect the connection from failing due to such a spurious migration, an -endpoint MUST revert to using the last validated peer address when validation -of a new peer address fails. Additionally, receipt of packets with higher -packet numbers from the legitimate peer address will trigger another connection -migration. This will cause the validation of the address of the spurious -migration to be abandoned, thus containing migrations initiated by the attacker -injecting a single packet.¶
-If an endpoint has no state about the last validated peer address, it MUST close -the connection silently by discarding all connection state. This results in new -packets on the connection being handled generically. For instance, an endpoint -MAY send a stateless reset in response to any further incoming packets.¶
-An off-path attacker that can observe packets might forward copies of genuine -packets to endpoints. If the copied packet arrives before the genuine packet, -this will appear as a NAT rebinding. Any genuine packet will be discarded as a -duplicate. If the attacker is able to continue forwarding packets, it might be -able to cause migration to a path via the attacker. This places the attacker on -path, giving it the ability to observe or drop all subsequent packets.¶
-This style of attack relies on the attacker using a path that has approximately -the same characteristics as the direct path between endpoints. The attack is -more reliable if relatively few packets are sent or if packet loss coincides -with the attempted attack.¶
-A non-probing packet received on the original path that increases the maximum -received packet number will cause the endpoint to move back to that path. -Eliciting packets on this path increases the likelihood that the attack is -unsuccessful. Therefore, mitigation of this attack relies on triggering the -exchange of packets.¶
-In response to an apparent migration, endpoints MUST validate the previously -active path using a PATH_CHALLENGE frame. This induces the sending of new -packets on that path. If the path is no longer viable, the validation attempt -will time out and fail; if the path is viable, but no longer desired, the -validation will succeed, but only results in probing packets being sent on the -path.¶
-An endpoint that receives a PATH_CHALLENGE on an active path SHOULD send a -non-probing packet in response. If the non-probing packet arrives before any -copy made by an attacker, this results in the connection being migrated back to -the original path. Any subsequent migration to another path restarts this -entire process.¶
-This defense is imperfect, but this is not considered a serious problem. If the -path via the attack is reliably faster than the original path despite multiple -attempts to use that original path, it is not possible to distinguish between -attack and an improvement in routing.¶
-An endpoint could also use heuristics to improve detection of this style of -attack. For instance, NAT rebinding is improbable if packets were recently -received on the old path, similarly rebinding is rare on IPv6 paths. Endpoints -can also look for duplicated packets. Conversely, a change in connection ID is -more likely to indicate an intentional migration rather than an attack.¶
-The capacity available on the new path might not be the same as the old path. -Packets sent on the old path MUST NOT contribute to congestion control or RTT -estimation for the new path.¶
-On confirming a peer's ownership of its new address, an endpoint MUST -immediately reset the congestion controller and round-trip time estimator for -the new path to initial values (see Appendices A.3 and B.3 in [QUIC-RECOVERY]) -unless the only change in the peer's address is its port number. Because -port-only changes are commonly the result of NAT rebinding or other middlebox -activity, the endpoint MAY instead retain its congestion control state and -round-trip estimate in those cases instead of reverting to initial values. -In cases where congestion control state -retained from an old path is used on a new path with substantially different -characteristics, a sender could transmit too aggressively until the congestion -controller and the RTT estimator have adapted. Generally, implementations are -advised to be cautious when using previous values on a new path.¶
-There could be apparent reordering at the receiver when an endpoint sends data -and probes from/to multiple addresses during the migration period, since the two -resulting paths could have different round-trip times. A receiver of packets on -multiple paths will still send ACK frames covering all received packets.¶
-While multiple paths might be used during connection migration, a single -congestion control context and a single loss recovery context (as described in -[QUIC-RECOVERY]) could be adequate. For instance, an endpoint might delay -switching to a new congestion control context until it is confirmed that an old -path is no longer needed (such as the case in Section 9.3.3).¶
-A sender can make exceptions for probe packets so that their loss detection is -independent and does not unduly cause the congestion controller to reduce its -sending rate. An endpoint might set a separate timer when a PATH_CHALLENGE is -sent, which is cancelled if the corresponding PATH_RESPONSE is received. If the -timer fires before the PATH_RESPONSE is received, the endpoint might send a new -PATH_CHALLENGE, and restart the timer for a longer period of time. This timer -SHOULD be set as described in Section 6.2.1 of [QUIC-RECOVERY] and MUST NOT be -more aggressive.¶
-Using a stable connection ID on multiple network paths would allow a passive -observer to correlate activity between those paths. An endpoint that moves -between networks might not wish to have their activity correlated by any entity -other than their peer, so different connection IDs are used when sending from -different local addresses, as discussed in Section 5.1. For this to be -effective, endpoints need to ensure that connection IDs they provide cannot be -linked by any other entity.¶
-At any time, endpoints MAY change the Destination Connection ID they transmit -with to a value that has not been used on another path.¶
-An endpoint MUST NOT reuse a connection ID when sending from more than one local -address, for example when initiating connection migration as described in -Section 9.2 or when probing a new network path as described in -Section 9.1.¶
-Similarly, an endpoint MUST NOT reuse a connection ID when sending to more than -one destination address. Due to network changes outside the control of its -peer, an endpoint might receive packets from a new source address with the same -destination connection ID, in which case it MAY continue to use the current -connection ID with the new remote address while still sending from the same -local address.¶
-These requirements regarding connection ID reuse apply only to the sending of -packets, as unintentional changes in path without a change in connection ID are -possible. For example, after a period of network inactivity, NAT rebinding -might cause packets to be sent on a new path when the client resumes sending. -An endpoint responds to such an event as described in Section 9.3.¶
-Using different connection IDs for packets sent in both directions on each new -network path eliminates the use of the connection ID for linking packets from -the same connection across different network paths. Header protection ensures -that packet numbers cannot be used to correlate activity. This does not prevent -other properties of packets, such as timing and size, from being used to -correlate activity.¶
-An endpoint SHOULD NOT initiate migration with a peer that has requested a -zero-length connection ID, because traffic over the new path might be trivially -linkable to traffic over the old one. If the server is able to associate -packets with a zero-length connection ID to the right connection, it means that -the server is using other information to demultiplex packets. For example, a -server might provide a unique address to every client, for instance using HTTP -alternative services [ALTSVC]. Information that might allow correct -routing of packets across multiple network paths will also allow activity on -those paths to be linked by entities other than the peer.¶
-A client might wish to reduce linkability by employing a new connection ID and -source UDP port when sending traffic after a period of inactivity. Changing the -UDP port from which it sends packets at the same time might cause the packet to -appear as a connection migration. This ensures that the mechanisms that support -migration are exercised even for clients that do not experience NAT rebindings -or genuine migrations. Changing port number can cause a peer to reset its -congestion state (see Section 9.4), so the port SHOULD only be changed -infrequently.¶
-An endpoint that exhausts available connection IDs cannot probe new paths or -initiate migration, nor can it respond to probes or attempts by its peer to -migrate. To ensure that migration is possible and packets sent on different -paths cannot be correlated, endpoints SHOULD provide new connection IDs before -peers migrate; see Section 5.1.1. If a peer might have exhausted available -connection IDs, a migrating endpoint could include a NEW_CONNECTION_ID frame in -all packets sent on a new network path.¶
-QUIC allows servers to accept connections on one IP address and attempt to -transfer these connections to a more preferred address shortly after the -handshake. This is particularly useful when clients initially connect to an -address shared by multiple servers but would prefer to use a unicast address to -ensure connection stability. This section describes the protocol for migrating a -connection to a preferred server address.¶
-Migrating a connection to a new server address mid-connection is not supported -by the version of QUIC specified in this document. If a client receives packets -from a new server address when the client has not initiated a migration to that -address, the client SHOULD discard these packets.¶
-A server conveys a preferred address by including the preferred_address -transport parameter in the TLS handshake.¶
-Servers MAY communicate a preferred address of each address family (IPv4 and -IPv6) to allow clients to pick the one most suited to their network attachment.¶
-Once the handshake is confirmed, the client SHOULD select one of the two -addresses provided by the server and initiate path validation (see -Section 8.2). A client constructs packets using any previously unused -active connection ID, taken from either the preferred_address transport -parameter or a NEW_CONNECTION_ID frame.¶
-As soon as path validation succeeds, the client SHOULD begin sending all -future packets to the new server address using the new connection ID and -discontinue use of the old server address. If path validation fails, the client -MUST continue sending all future packets to the server's original IP address.¶
-A client that migrates to a preferred address MUST validate the address it -chooses before migrating; see Section 21.5.3.¶
-A server might receive a packet addressed to its preferred IP address at any -time after it accepts a connection. If this packet contains a PATH_CHALLENGE -frame, the server sends a packet containing a PATH_RESPONSE frame as per -Section 8.2. The server MUST send non-probing packets from its -original address until it receives a non-probing packet from the client at its -preferred address and until the server has validated the new path.¶
-The server MUST probe on the path toward the client from its preferred address. -This helps to guard against spurious migration initiated by an attacker.¶
-Once the server has completed its path validation and has received a non-probing -packet with a new largest packet number on its preferred address, the server -begins sending non-probing packets to the client exclusively from its preferred -IP address. It SHOULD drop packets for this connection received on the old IP -address, but MAY continue to process delayed packets.¶
-The addresses that a server provides in the preferred_address transport -parameter are only valid for the connection in which they are provided. A -client MUST NOT use these for other connections, including connections that are -resumed from the current connection.¶
-A client might need to perform a connection migration before it has migrated to -the server's preferred address. In this case, the client SHOULD perform path -validation to both the original and preferred server address from the client's -new address concurrently.¶
-If path validation of the server's preferred address succeeds, the client MUST -abandon validation of the original address and migrate to using the server's -preferred address. If path validation of the server's preferred address fails -but validation of the server's original address succeeds, the client MAY migrate -to its new address and continue sending to the server's original address.¶
-If packets received at the server's preferred address have a different source -address than observed from the client during the handshake, the server MUST -protect against potential attacks as described in Section 9.3.1 and -Section 9.3.2. In addition to intentional simultaneous migration, this -might also occur because the client's access network used a different NAT -binding for the server's preferred address.¶
-Servers SHOULD initiate path validation to the client's new address upon -receiving a probe packet from a different address. Servers MUST NOT send more -than a minimum congestion window's worth of non-probing packets to the new -address before path validation is complete.¶
-A client that migrates to a new address SHOULD use a preferred address from the -same address family for the server.¶
-The connection ID provided in the preferred_address transport parameter is not -specific to the addresses that are provided. This connection ID is provided to -ensure that the client has a connection ID available for migration, but the -client MAY use this connection ID on any path.¶
-Endpoints that send data using IPv6 SHOULD apply an IPv6 flow label -in compliance with [RFC6437], unless the local API does not allow -setting IPv6 flow labels.¶
-The IPv6 flow label SHOULD be a pseudo-random function of the source and -destination addresses, source and destination UDP ports, and the Destination -Connection ID field. The flow label generation MUST be designed to minimize the -chances of linkability with a previously used flow label, as this would enable -correlating activity on multiple paths; see Section 9.5.¶
-A possible implementation is to compute the flow label as a cryptographic hash -function of the source and destination addresses, source and destination -UDP ports, Destination Connection ID field, and a local secret.¶
-An established QUIC connection can be terminated in one of three ways:¶
-An endpoint MAY discard connection state if it does not have a validated path on -which it can send packets; see Section 8.2.¶
-If a max_idle_timeout is specified by either peer in its transport parameters -(Section 18.2), the connection is silently closed -and its state is discarded when it remains idle for longer than the minimum of -both peers max_idle_timeout values.¶
-Each endpoint advertises a max_idle_timeout, but the effective value -at an endpoint is computed as the minimum of the two advertised values. By -announcing a max_idle_timeout, an endpoint commits to initiating an immediate -close (Section 10.2) if it abandons the connection prior to the effective -value.¶
-An endpoint restarts its idle timer when a packet from its peer is received and -processed successfully. An endpoint also restarts its idle timer when sending an -ack-eliciting packet if no other ack-eliciting packets have been sent since last -receiving and processing a packet. Restarting this timer when sending a packet -ensures that connections are not closed after new activity is initiated.¶
-To avoid excessively small idle timeout periods, endpoints MUST increase the -idle timeout period to be at least three times the current Probe Timeout (PTO). -This allows for multiple PTOs to expire, and therefore multiple probes to be -sent and lost, prior to idle timeout.¶
-An endpoint that sends packets close to the effective timeout risks having -them be discarded at the peer, since the idle timeout period might have expired -at the peer before these packets arrive.¶
-An endpoint can send a PING or another ack-eliciting frame to test the -connection for liveness if the peer could time out soon, such as within a PTO; -see Section 6.2 of [QUIC-RECOVERY]. This is especially useful if any -available application data cannot be safely retried. Note that the application -determines what data is safe to retry.¶
-An endpoint might need to send ack-eliciting packets to avoid an idle timeout -if it is expecting response data, but does not have or is unable to send -application data.¶
-An implementation of QUIC might provide applications with an option to defer an -idle timeout. This facility could be used when the application wishes to avoid -losing state that has been associated with an open connection, but does not -expect to exchange application data for some time. With this option, an -endpoint could send a PING frame (Section 19.2) periodically, which will cause -the peer to restart its idle timeout period. Sending a packet containing a PING -frame restarts the idle timeout for this endpoint also if this is the first -ack-eliciting packet sent since receiving a packet. Sending a PING frame causes -the peer to respond with an acknowledgment, which also restarts the idle -timeout for the endpoint.¶
-Application protocols that use QUIC SHOULD provide guidance on when deferring an -idle timeout is appropriate. Unnecessary sending of PING frames could have a -detrimental effect on performance.¶
-A connection will time out if no packets are sent or received for a period -longer than the time negotiated using the max_idle_timeout transport parameter; -see Section 10. However, state in middleboxes might time out earlier than -that. Though REQ-5 in [RFC4787] recommends a 2 minute timeout interval, -experience shows that sending packets every 30 seconds is necessary to prevent -the majority of middleboxes from losing state for UDP flows -[GATEWAY].¶
-An endpoint sends a CONNECTION_CLOSE frame (Section 19.19) to -terminate the connection immediately. A CONNECTION_CLOSE frame causes all -streams to immediately become closed; open streams can be assumed to be -implicitly reset.¶
-After sending a CONNECTION_CLOSE frame, an endpoint immediately enters the -closing state; see Section 10.2.1. After receiving a CONNECTION_CLOSE frame, -endpoints enter the draining state; see Section 10.2.2.¶
-Violations of the protocol lead to an immediate close.¶
-An immediate close can be used after an application protocol has arranged to -close a connection. This might be after the application protocol negotiates a -graceful shutdown. The application protocol can exchange messages that are -needed for both application endpoints to agree that the connection can be -closed, after which the application requests that QUIC close the connection. -When QUIC consequently closes the connection, a CONNECTION_CLOSE frame with an -application-supplied error code will be used to signal closure to the peer.¶
-The closing and draining connection states exist to ensure that connections -close cleanly and that delayed or reordered packets are properly discarded. -These states SHOULD persist for at least three times the current Probe Timeout -(PTO) interval as defined in [QUIC-RECOVERY].¶
-Disposing of connection state prior to exiting the closing or draining state -could result in an endpoint generating a stateless reset unnecessarily when it -receives a late-arriving packet. Endpoints that have some alternative means -to ensure that late-arriving packets do not induce a response, such as those -that are able to close the UDP socket, MAY end these states earlier to allow -for faster resource recovery. Servers that retain an open socket for accepting -new connections SHOULD NOT end the closing or draining states early.¶
-Once its closing or draining state ends, an endpoint SHOULD discard all -connection state. The endpoint MAY send a stateless reset in response to any -further incoming packets belonging to this connection.¶
-An endpoint enters the closing state after initiating an immediate close.¶
-In the closing state, an endpoint retains only enough information to generate a -packet containing a CONNECTION_CLOSE frame and to identify packets as belonging -to the connection. An endpoint in the closing state sends a packet containing a -CONNECTION_CLOSE frame in response to any incoming packet that it attributes to -the connection.¶
-An endpoint SHOULD limit the rate at which it generates packets in the closing -state. For instance, an endpoint could wait for a progressively increasing -number of received packets or amount of time before responding to received -packets.¶
-An endpoint's selected connection ID and the QUIC version are sufficient -information to identify packets for a closing connection; the endpoint MAY -discard all other connection state. An endpoint that is closing is not required -to process any received frame. An endpoint MAY retain packet protection keys for -incoming packets to allow it to read and process a CONNECTION_CLOSE frame.¶
-An endpoint MAY drop packet protection keys when entering the closing state and -send a packet containing a CONNECTION_CLOSE frame in response to any UDP -datagram that is received. However, an endpoint that discards packet protection -keys cannot identify and discard invalid packets. To avoid being used for an -amplication attack, such endpoints MUST limit the cumulative size of packets it -sends to three times the cumulative size of the packets that are received and -attributed to the connection. To minimize the state that an endpoint maintains -for a closing connection, endpoints MAY send the exact same packet in response -to any received packet.¶
-Allowing retransmission of a closing packet is an exception to the requirement -that a new packet number be used for each packet in Section 12.3. -Sending new packet numbers is primarily of advantage to loss recovery and -congestion control, which are not expected to be relevant for a closed -connection. Retransmitting the final packet requires less state.¶
-While in the closing state, an endpoint could receive packets from a new source -address, possibly indicating a connection migration; see Section 9. An -endpoint in the closing state MUST either discard packets received from an -unvalidated address or limit the cumulative size of packets it sends to an -unvalidated address to three times the size of packets it receives from that -address.¶
-An endpoint is not expected to handle key updates when it is closing (Section 6 -of [QUIC-TLS]). A key update might prevent the endpoint from moving from the -closing state to the draining state, as the endpoint will not be able to process -subsequently received packets, but it otherwise has no impact.¶
-The draining state is entered once an endpoint receives a CONNECTION_CLOSE -frame, which indicates that its peer is closing or draining. While otherwise -identical to the closing state, an endpoint in the draining state MUST NOT send -any packets. Retaining packet protection keys is unnecessary once a connection -is in the draining state.¶
-An endpoint that receives a CONNECTION_CLOSE frame MAY send a single packet -containing a CONNECTION_CLOSE frame before entering the draining state, using a -NO_ERROR code if appropriate. An endpoint MUST NOT send further packets. Doing -so could result in a constant exchange of CONNECTION_CLOSE frames until one of -the endpoints exits the closing state.¶
-An endpoint MAY enter the draining state from the closing state if it receives a -CONNECTION_CLOSE frame, which indicates that the peer is also closing or -draining. In this case, the draining state SHOULD end when the closing state -would have ended. In other words, the endpoint uses the same end time, but -ceases transmission of any packets on this connection.¶
-When sending CONNECTION_CLOSE, the goal is to ensure that the peer will process -the frame. Generally, this means sending the frame in a packet with the highest -level of packet protection to avoid the packet being discarded. After the -handshake is confirmed (see Section 4.1.2 of [QUIC-TLS]), an endpoint MUST -send any CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to -confirming the handshake, it is possible that more advanced packet protection -keys are not available to the peer, so another CONNECTION_CLOSE frame MAY be -sent in a packet that uses a lower packet protection level. More specifically:¶
-Sending a CONNECTION_CLOSE of type 0x1d in an Initial or Handshake packet could -expose application state or be used to alter application state. A -CONNECTION_CLOSE of type 0x1d MUST be replaced by a CONNECTION_CLOSE of type -0x1c when sending the frame in Initial or Handshake packets. Otherwise, -information about the application state might be revealed. Endpoints MUST clear -the value of the Reason Phrase field and SHOULD use the APPLICATION_ERROR code -when converting to a CONNECTION_CLOSE of type 0x1c.¶
-CONNECTION_CLOSE frames sent in multiple packet types can be coalesced into a -single UDP datagram; see Section 12.2.¶
-An endpoint can send a CONNECTION_CLOSE frame in an Initial packet. This might -be in response to unauthenticated information received in Initial or Handshake -packets. Such an immediate close might expose legitimate connections to a -denial of service. QUIC does not include defensive measures for on-path attacks -during the handshake; see Section 21.2. However, at the cost of reducing -feedback about errors for legitimate peers, some forms of denial of service can -be made more difficult for an attacker if endpoints discard illegal packets -rather than terminating a connection with CONNECTION_CLOSE. For this reason, -endpoints MAY discard packets rather than immediately close if errors are -detected in packets that lack authentication.¶
-An endpoint that has not established state, such as a server that detects an -error in an Initial packet, does not enter the closing state. An endpoint that -has no state for the connection does not enter a closing or draining period on -sending a CONNECTION_CLOSE frame.¶
-A stateless reset is provided as an option of last resort for an endpoint that -does not have access to the state of a connection. A crash or outage might -result in peers continuing to send data to an endpoint that is unable to -properly continue the connection. An endpoint MAY send a stateless reset in -response to receiving a packet that it cannot associate with an active -connection.¶
-A stateless reset is not appropriate for indicating errors in active -connections. An endpoint that wishes to communicate a fatal connection error -MUST use a CONNECTION_CLOSE frame if it is able.¶
-To support this process, an endpoint issues a stateless reset token, which is a -16-byte value that is hard to guess. If the peer subsequently receives a -stateless reset, which is a UDP datagram that ends in that stateless reset -token, the peer will immediately end the connection.¶
-A stateless reset token is specific to a connection ID. An endpoint issues a -stateless reset token by including the value in the Stateless Reset Token field -of a NEW_CONNECTION_ID frame. Servers can also issue a stateless_reset_token -transport parameter during the handshake that applies to the connection ID that -it selected during the handshake. These exchanges are protected by encryption, -so only client and server know their value. Note that clients cannot use the -stateless_reset_token transport parameter because their transport parameters do -not have confidentiality protection.¶
-Tokens are invalidated when their associated connection ID is retired via a -RETIRE_CONNECTION_ID frame (Section 19.16).¶
-An endpoint that receives packets that it cannot process sends a packet in the -following layout (see Section 1.3):¶
-This design ensures that a stateless reset packet is - to the extent possible - -indistinguishable from a regular packet with a short header.¶
-A stateless reset uses an entire UDP datagram, starting with the first two bits -of the packet header. The remainder of the first byte and an arbitrary number -of bytes following it are set to values that SHOULD be indistinguishable -from random. The last 16 bytes of the datagram contain a Stateless Reset Token.¶
-To entities other than its intended recipient, a stateless reset will appear to -be a packet with a short header. For the stateless reset to appear as a valid -QUIC packet, the Unpredictable Bits field needs to include at least 38 bits of -data (or 5 bytes, less the two fixed bits).¶
-The resulting minimum size of 21 bytes does not guarantee that a stateless reset -is difficult to distinguish from other packets if the recipient requires the use -of a connection ID. To achieve that end, the endpoint SHOULD ensure that all -packets it sends are at least 22 bytes longer than the minimum connection ID -length that it requests the peer to include in its packets, adding PADDING -frames as necessary. This ensures that any stateless reset sent by the peer -is indistinguishable from a valid packet sent to the endpoint. An endpoint that -sends a stateless reset in response to a packet that is 43 bytes or shorter -SHOULD send a stateless reset that is one byte shorter than the packet it -responds to.¶
-These values assume that the Stateless Reset Token is the same length as the -minimum expansion of the packet protection AEAD. Additional unpredictable bytes -are necessary if the endpoint could have negotiated a packet protection scheme -with a larger minimum expansion.¶
-An endpoint MUST NOT send a stateless reset that is three times or more larger -than the packet it receives to avoid being used for amplification. -Section 10.3.3 describes additional limits on stateless reset size.¶
-Endpoints MUST discard packets that are too small to be valid QUIC packets. To -give an example, with the set of AEAD functions defined in [QUIC-TLS], short -header packets that are smaller than 21 bytes are never valid.¶
-Endpoints MUST send stateless reset packets formatted as a packet with a short -header. However, endpoints MUST treat any packet ending in a valid stateless -reset token as a stateless reset, as other QUIC versions might allow the use of -a long header.¶
-An endpoint MAY send a stateless reset in response to a packet with a long -header. Sending a stateless reset is not effective prior to the stateless reset -token being available to a peer. In this QUIC version, packets with a long -header are only used during connection establishment. Because the stateless -reset token is not available until connection establishment is complete or near -completion, ignoring an unknown packet with a long header might be as effective -as sending a stateless reset.¶
-An endpoint cannot determine the Source Connection ID from a packet with a short -header, therefore it cannot set the Destination Connection ID in the stateless -reset packet. The Destination Connection ID will therefore differ from the -value used in previous packets. A random Destination Connection ID makes the -connection ID appear to be the result of moving to a new connection ID that was -provided using a NEW_CONNECTION_ID frame (Section 19.15).¶
-Using a randomized connection ID results in two problems:¶
-This stateless reset design is specific to QUIC version 1. An endpoint that -supports multiple versions of QUIC needs to generate a stateless reset that will -be accepted by peers that support any version that the endpoint might support -(or might have supported prior to losing state). Designers of new versions of -QUIC need to be aware of this and either reuse this design, or use a portion of -the packet other than the last 16 bytes for carrying data.¶
-An endpoint detects a potential stateless reset using the trailing 16 bytes of -the UDP datagram. An endpoint remembers all Stateless Reset Tokens associated -with the connection IDs and remote addresses for datagrams it has recently sent. -This includes Stateless Reset Tokens from NEW_CONNECTION_ID frames and the -server's transport parameters but excludes Stateless Reset Tokens associated -with connection IDs that are either unused or retired. The endpoint identifies -a received datagram as a stateless reset by comparing the last 16 bytes of the -datagram with all Stateless Reset Tokens associated with the remote address on -which the datagram was received.¶
-This comparison can be performed for every inbound datagram. Endpoints MAY skip -this check if any packet from a datagram is successfully processed. However, -the comparison MUST be performed when the first packet in an incoming datagram -either cannot be associated with a connection, or cannot be decrypted.¶
-An endpoint MUST NOT check for any Stateless Reset Tokens associated with -connection IDs it has not used or for connection IDs that have been retired.¶
-When comparing a datagram to Stateless Reset Token values, endpoints MUST -perform the comparison without leaking information about the value of the token. -For example, performing this comparison in constant time protects the value of -individual Stateless Reset Tokens from information leakage through timing side -channels. Another approach would be to store and compare the transformed values -of Stateless Reset Tokens instead of the raw token values, where the -transformation is defined as a cryptographically-secure pseudo-random function -using a secret key (e.g., block cipher, HMAC [RFC2104]). An endpoint is not -expected to protect information about whether a packet was successfully -decrypted, or the number of valid Stateless Reset Tokens.¶
-If the last 16 bytes of the datagram are identical in value to a Stateless Reset -Token, the endpoint MUST enter the draining period and not send any further -packets on this connection.¶
-The stateless reset token MUST be difficult to guess. In order to create a -Stateless Reset Token, an endpoint could randomly generate ([RFC4086]) a -secret for every connection that it creates. However, this presents a -coordination problem when there are multiple instances in a cluster or a storage -problem for an endpoint that might lose state. Stateless reset specifically -exists to handle the case where state is lost, so this approach is suboptimal.¶
-A single static key can be used across all connections to the same endpoint by -generating the proof using a second iteration of a preimage-resistant function -that takes a static key and the connection ID chosen by the endpoint (see -Section 5.1) as input. An endpoint could use HMAC [RFC2104] (for -example, HMAC(static_key, connection_id)) or HKDF [RFC5869] (for example, -using the static key as input keying material, with the connection ID as salt). -The output of this function is truncated to 16 bytes to produce the Stateless -Reset Token for that connection.¶
-An endpoint that loses state can use the same method to generate a valid -Stateless Reset Token. The connection ID comes from the packet that the -endpoint receives.¶
-This design relies on the peer always sending a connection ID in its packets so -that the endpoint can use the connection ID from a packet to reset the -connection. An endpoint that uses this design MUST either use the same -connection ID length for all connections or encode the length of the connection -ID such that it can be recovered without state. In addition, it cannot provide -a zero-length connection ID.¶
-Revealing the Stateless Reset Token allows any entity to terminate the -connection, so a value can only be used once. This method for choosing the -Stateless Reset Token means that the combination of connection ID and static key -MUST NOT be used for another connection. A denial of service attack is possible -if the same connection ID is used by instances that share a static key, or if an -attacker can cause a packet to be routed to an instance that has no state but -the same static key; see Section 21.11. A connection ID from a connection -that is reset by revealing the Stateless Reset Token MUST NOT be reused for new -connections at nodes that share a static key.¶
-The same Stateless Reset Token MUST NOT be used for multiple connection IDs. -Endpoints are not required to compare new values against all previous values, -but a duplicate value MAY be treated as a connection error of type -PROTOCOL_VIOLATION.¶
-Note that Stateless Reset packets do not have any cryptographic protection.¶
-The design of a Stateless Reset is such that without knowing the stateless reset -token it is indistinguishable from a valid packet. For instance, if a server -sends a Stateless Reset to another server it might receive another Stateless -Reset in response, which could lead to an infinite exchange.¶
-An endpoint MUST ensure that every Stateless Reset that it sends is smaller than -the packet that triggered it, unless it maintains state sufficient to prevent -looping. In the event of a loop, this results in packets eventually being too -small to trigger a response.¶
-An endpoint can remember the number of Stateless Reset packets that it has sent -and stop generating new Stateless Reset packets once a limit is reached. Using -separate limits for different remote addresses will ensure that Stateless Reset -packets can be used to close connections when other peers or connections have -exhausted limits.¶
-Reducing the size of a Stateless Reset below 41 bytes means that the packet -could reveal to an observer that it is a Stateless Reset, depending upon the -length of the peer's connection IDs. Conversely, refusing to send a Stateless -Reset in response to a small packet might result in Stateless Reset not being -useful in detecting cases of broken connections where only very small packets -are sent; such failures might only be detected by other means, such as timers.¶
-An endpoint that detects an error SHOULD signal the existence of that error to -its peer. Both transport-level and application-level errors can affect an -entire connection; see Section 11.1. Only application-level -errors can be isolated to a single stream; see Section 11.2.¶
-The most appropriate error code (Section 20) SHOULD be included in the -frame that signals the error. Where this specification identifies error -conditions, it also identifies the error code that is used; though these are -worded as requirements, different implementation strategies might lead to -different errors being reported. In particular, an endpoint MAY use any -applicable error code when it detects an error condition; a generic error code -(such as PROTOCOL_VIOLATION or INTERNAL_ERROR) can always be used in place of -specific error codes.¶
-A stateless reset (Section 10.3) is not suitable for any error that can -be signaled with a CONNECTION_CLOSE or RESET_STREAM frame. A stateless reset -MUST NOT be used by an endpoint that has the state necessary to send a frame on -the connection.¶
-Errors that result in the connection being unusable, such as an obvious -violation of protocol semantics or corruption of state that affects an entire -connection, MUST be signaled using a CONNECTION_CLOSE frame -(Section 19.19).¶
-Application-specific protocol errors are signaled using the CONNECTION_CLOSE -frame with a frame type of 0x1d. Errors that are specific to the transport, -including all those described in this document, are carried in the -CONNECTION_CLOSE frame with a frame type of 0x1c.¶
-A CONNECTION_CLOSE frame could be sent in a packet that is lost. An endpoint -SHOULD be prepared to retransmit a packet containing a CONNECTION_CLOSE frame if -it receives more packets on a terminated connection. Limiting the number of -retransmissions and the time over which this final packet is sent limits the -effort expended on terminated connections.¶
-An endpoint that chooses not to retransmit packets containing a CONNECTION_CLOSE -frame risks a peer missing the first such packet. The only mechanism available -to an endpoint that continues to receive data for a terminated connection is to -use the stateless reset process (Section 10.3).¶
-As the AEAD on Initial packets does not provide strong authentication, an -endpoint MAY discard an invalid Initial packet. Discarding an Initial packet is -permitted even where this specification otherwise mandates a connection error. -An endpoint can only discard a packet if it does not process the frames in the -packet or reverts the effects of any processing. Discarding invalid Initial -packets might be used to reduce exposure to denial of service; see -Section 21.2.¶
-If an application-level error affects a single stream, but otherwise leaves the -connection in a recoverable state, the endpoint can send a RESET_STREAM frame -(Section 19.4) with an appropriate error code to terminate just the -affected stream.¶
-Resetting a stream without the involvement of the application protocol could -cause the application protocol to enter an unrecoverable state. RESET_STREAM -MUST only be instigated by the application protocol that uses QUIC.¶
-The semantics of the application error code carried in RESET_STREAM are -defined by the application protocol. Only the application protocol is able to -cause a stream to be terminated. A local instance of the application protocol -uses a direct API call and a remote instance uses the STOP_SENDING frame, which -triggers an automatic RESET_STREAM.¶
-Application protocols SHOULD define rules for handling streams that are -prematurely cancelled by either endpoint.¶
-QUIC endpoints communicate by exchanging packets. Packets have confidentiality -and integrity protection; see Section 12.1. Packets are carried in UDP -datagrams; see Section 12.2.¶
-This version of QUIC uses the long packet header during connection -establishment; see Section 17.2. Packets with the long header are Initial -(Section 17.2.2), 0-RTT (Section 17.2.3), Handshake (Section 17.2.4), -and Retry (Section 17.2.5). Version negotiation uses a version-independent -packet with a long header; see Section 17.2.1.¶
-Packets with the short header are designed for minimal overhead and are used -after a connection is established and 1-RTT keys are available; see -Section 17.3.¶
-QUIC packets have different levels of cryptographic protection based on the -type of packet. Details of packet protection are found in [QUIC-TLS]; this -section includes an overview of the protections that are provided.¶
-Version Negotiation packets have no cryptographic protection; see -[QUIC-INVARIANTS].¶
-Retry packets use an authenticated encryption with associated data function -(AEAD; [AEAD]) to protect against accidental modification.¶
-Initial packets use an AEAD, the keys for which are derived using a value that -is visible on the wire. Initial packets therefore do not have effective -confidentiality protection. Initial protection exists to ensure that the sender -of the packet is on the network path. Any entity that receives an Initial packet -from a client can recover the keys that will allow them to both read the -contents of the packet and generate Initial packets that will be successfully -authenticated at either endpoint. The AEAD also protects Initial packets -against accidental modification.¶
-All other packets are protected with keys derived from the cryptographic -handshake. The cryptographic handshake ensures that only the communicating -endpoints receive the corresponding keys for Handshake, 0-RTT, and 1-RTT -packets. Packets protected with 0-RTT and 1-RTT keys have strong -confidentiality and integrity protection.¶
-The Packet Number field that appears in some packet types has alternative -confidentiality protection that is applied as part of header protection; see -Section 5.4 of [QUIC-TLS] for details. The underlying packet number increases -with each packet sent in a given packet number space; see Section 12.3 for -details.¶
-Initial (Section 17.2.2), 0-RTT (Section 17.2.3), and Handshake -(Section 17.2.4) packets contain a Length field that determines the end -of the packet. The length includes both the Packet Number and Payload -fields, both of which are confidentiality protected and initially of unknown -length. The length of the Payload field is learned once header protection is -removed.¶
-Using the Length field, a sender can coalesce multiple QUIC packets into one UDP -datagram. This can reduce the number of UDP datagrams needed to complete the -cryptographic handshake and start sending data. This can also be used to -construct PMTU probes; see Section 14.4.1. Receivers MUST be able to -process coalesced packets.¶
-Coalescing packets in order of increasing encryption levels (Initial, 0-RTT, -Handshake, 1-RTT; see Section 4.1.4 of [QUIC-TLS]) makes it more likely the -receiver will be able to process all the packets in a single pass. A packet -with a short header does not include a length, so it can only be the last -packet included in a UDP datagram. An endpoint SHOULD include multiple frames -in a single packet if they are to be sent at the same encryption level, instead -of coalescing multiple packets at the same encryption level.¶
-Receivers MAY route based on the information in the first packet contained in a -UDP datagram. Senders MUST NOT coalesce QUIC packets with different connection -IDs into a single UDP datagram. Receivers SHOULD ignore any subsequent packets -with a different Destination Connection ID than the first packet in the -datagram.¶
-Every QUIC packet that is coalesced into a single UDP datagram is separate and -complete. The receiver of coalesced QUIC packets MUST individually process each -QUIC packet and separately acknowledge them, as if they were received as the -payload of different UDP datagrams. For example, if decryption fails (because -the keys are not available or any other reason), the receiver MAY either discard -or buffer the packet for later processing and MUST attempt to process the -remaining packets.¶
-Retry packets (Section 17.2.5), Version Negotiation packets -(Section 17.2.1), and packets with a short header (Section 17.3) do not -contain a Length field and so cannot be followed by other packets in the same -UDP datagram. Note also that there is no situation where a Retry or Version -Negotiation packet is coalesced with another packet.¶
-The packet number is an integer in the range 0 to 2^62-1. This number is used -in determining the cryptographic nonce for packet protection. Each endpoint -maintains a separate packet number for sending and receiving.¶
-Packet numbers are limited to this range because they need to be representable -in whole in the Largest Acknowledged field of an ACK frame (Section 19.3). -When present in a long or short header however, packet numbers are reduced and -encoded in 1 to 4 bytes; see Section 17.1.¶
-Version Negotiation (Section 17.2.1) and Retry (Section 17.2.5) packets -do not include a packet number.¶
-Packet numbers are divided into 3 spaces in QUIC:¶
-As described in [QUIC-TLS], each packet type uses different protection keys.¶
-Conceptually, a packet number space is the context in which a packet can be -processed and acknowledged. Initial packets can only be sent with Initial -packet protection keys and acknowledged in packets that are also Initial -packets. Similarly, Handshake packets are sent at the Handshake encryption -level and can only be acknowledged in Handshake packets.¶
-This enforces cryptographic separation between the data sent in the different -packet number spaces. Packet numbers in each space start at packet number 0. -Subsequent packets sent in the same packet number space MUST increase the packet -number by at least one.¶
-0-RTT and 1-RTT data exist in the same packet number space to make loss recovery -algorithms easier to implement between the two packet types.¶
-A QUIC endpoint MUST NOT reuse a packet number within the same packet number -space in one connection. If the packet number for sending reaches 2^62 - 1, the -sender MUST close the connection without sending a CONNECTION_CLOSE frame or any -further packets; an endpoint MAY send a Stateless Reset (Section 10.3) in -response to further packets that it receives.¶
-A receiver MUST discard a newly unprotected packet unless it is certain that it -has not processed another packet with the same packet number from the same -packet number space. Duplicate suppression MUST happen after removing packet -protection for the reasons described in Section 9.5 of [QUIC-TLS].¶
-Endpoints that track all individual packets for the purposes of detecting -duplicates are at risk of accumulating excessive state. The data required for -detecting duplicates can be limited by maintaining a minimum packet number below -which all packets are immediately dropped. Any minimum needs to account for -large variations in round trip time, which includes the possibility that a peer -might probe network paths with much larger round trip times; see Section 9.¶
-Packet number encoding at a sender and decoding at a receiver are described in -Section 17.1.¶
-The payload of QUIC packets, after removing packet protection, consists of a -sequence of complete frames, as shown in Figure 11. Version -Negotiation, Stateless Reset, and Retry packets do not contain frames.¶
-The payload of a packet that contains frames MUST contain at least one frame, -and MAY contain multiple frames and multiple frame types. An endpoint MUST -treat receipt of a packet containing no frames as a connection error of type -PROTOCOL_VIOLATION. Frames always fit within a single QUIC packet and cannot -span multiple packets.¶
-Each frame begins with a Frame Type, indicating its type, followed by -additional type-dependent fields:¶
-Table 3 lists and summarizes information about each frame type that is -defined in this specification. A description of this summary is included after -the table.¶
-Type Value | -Frame Type Name | -Definition | -Pkts | -Spec | -
---|---|---|---|---|
0x00 | -PADDING | -- Section 19.1 - | -IH01 | -NP | -
0x01 | -PING | -- Section 19.2 - | -IH01 | -- |
0x02 - 0x03 | -ACK | -- Section 19.3 - | -IH_1 | -NC | -
0x04 | -RESET_STREAM | -- Section 19.4 - | -__01 | -- |
0x05 | -STOP_SENDING | -- Section 19.5 - | -__01 | -- |
0x06 | -CRYPTO | -- Section 19.6 - | -IH_1 | -- |
0x07 | -NEW_TOKEN | -- Section 19.7 - | -___1 | -- |
0x08 - 0x0f | -STREAM | -- Section 19.8 - | -__01 | -F | -
0x10 | -MAX_DATA | -- Section 19.9 - | -__01 | -- |
0x11 | -MAX_STREAM_DATA | -- Section 19.10 - | -__01 | -- |
0x12 - 0x13 | -MAX_STREAMS | -- Section 19.11 - | -__01 | -- |
0x14 | -DATA_BLOCKED | -- Section 19.12 - | -__01 | -- |
0x15 | -STREAM_DATA_BLOCKED | -- Section 19.13 - | -__01 | -- |
0x16 - 0x17 | -STREAMS_BLOCKED | -- Section 19.14 - | -__01 | -- |
0x18 | -NEW_CONNECTION_ID | -- Section 19.15 - | -__01 | -P | -
0x19 | -RETIRE_CONNECTION_ID | -- Section 19.16 - | -__01 | -- |
0x1a | -PATH_CHALLENGE | -- Section 19.17 - | -__01 | -P | -
0x1b | -PATH_RESPONSE | -- Section 19.18 - | -__01 | -P | -
0x1c - 0x1d | -CONNECTION_CLOSE | -- Section 19.19 - | -ih01 | -N | -
0x1e | -HANDSHAKE_DONE | -- Section 19.20 - | -___1 | -- |
The format and semantics of each frame type are explained in more detail in -Section 19. The remainder of this section provides a summary of -important and general information.¶
-The Frame Type in ACK, STREAM, MAX_STREAMS, STREAMS_BLOCKED, and -CONNECTION_CLOSE frames is used to carry other frame-specific flags. For all -other frames, the Frame Type field simply identifies the frame.¶
-The "Pkts" column in Table 3 lists the types of packets that each frame -type could appear in, indicated by the following characters:¶
-Initial (Section 17.2.2)¶
-Handshake (Section 17.2.4)¶
-0-RTT (Section 17.2.3)¶
-1-RTT (Section 17.3.1)¶
-Only a CONNECTION_CLOSE frame of type 0x1c can appear in Initial or Handshake -packets.¶
-For more detail about these restrictions, see Section 12.5. Note -that all frames can appear in 1-RTT packets. An endpoint MUST treat receipt of -a frame in a packet type that is not permitted as a connection error of type -PROTOCOL_VIOLATION.¶
-The "Spec" column in Table 3 summarizes any special rules governing the -processing or generation of the frame type, as indicated by the following -characters:¶
-Packets containing only frames with this marking are not ack-eliciting; see -Section 13.2.¶
-Packets containing only frames with this marking do not count toward bytes -in flight for congestion control purposes; see [QUIC-RECOVERY].¶
-Packets containing only frames with this marking can be used to probe new -network paths during connection migration; see Section 9.1.¶
-The content of frames with this marking are flow controlled; see -Section 4.¶
-The "Pkts" and "Spec" columns in Table 3 do not form part of the IANA -registry; see Section 22.3.¶
-An endpoint MUST treat the receipt of a frame of unknown type as a connection -error of type FRAME_ENCODING_ERROR.¶
-All frames are idempotent in this version of QUIC. That is, a valid frame does -not cause undesirable side effects or errors when received more than once.¶
-The Frame Type field uses a variable-length integer encoding (see -Section 16) with one exception. To ensure simple and efficient -implementations of frame parsing, a frame type MUST use the shortest possible -encoding. For frame types defined in this document, this means a single-byte -encoding, even though it is possible to encode these values as a two-, four- -or eight-byte variable-length integer. For instance, though 0x4001 is -a legitimate two-byte encoding for a variable-length integer with a value -of 1, PING frames are always encoded as a single byte with the value 0x01. -This rule applies to all current and future QUIC frame types. An endpoint -MAY treat the receipt of a frame type that uses a longer encoding than -necessary as a connection error of type PROTOCOL_VIOLATION.¶
-Some frames are prohibited in different packet number spaces. The rules here -generalize those of TLS, in that frames associated with establishing the -connection can usually appear in packets in any packet number space, whereas -those associated with transferring data can only appear in the application -data packet number space:¶
-Note that it is not possible to send the following frames in 0-RTT packets for -various reasons: ACK, CRYPTO, HANDSHAKE_DONE, NEW_TOKEN, PATH_RESPONSE, and -RETIRE_CONNECTION_ID. A server MAY treat receipt of these frames in 0-RTT -packets as a connection error of type PROTOCOL_VIOLATION.¶
-A sender sends one or more frames in a QUIC packet; see Section 12.4.¶
-A sender can minimize per-packet bandwidth and computational costs by including -as many frames as possible in each QUIC packet. A sender MAY wait for a short -period of time to collect multiple frames before sending a packet that is not -maximally packed, to avoid sending out large numbers of small packets. An -implementation MAY use knowledge about application sending behavior or -heuristics to determine whether and for how long to wait. This waiting period -is an implementation decision, and an implementation should be careful to delay -conservatively, since any delay is likely to increase application-visible -latency.¶
-Stream multiplexing is achieved by interleaving STREAM frames from multiple -streams into one or more QUIC packets. A single QUIC packet can include -multiple STREAM frames from one or more streams.¶
-One of the benefits of QUIC is avoidance of head-of-line blocking across -multiple streams. When a packet loss occurs, only streams with data in that -packet are blocked waiting for a retransmission to be received, while other -streams can continue making progress. Note that when data from multiple streams -is included in a single QUIC packet, loss of that packet blocks all those -streams from making progress. Implementations are advised to include as few -streams as necessary in outgoing packets without losing transmission efficiency -to underfilled packets.¶
-A packet MUST NOT be acknowledged until packet protection has been successfully -removed and all frames contained in the packet have been processed. For STREAM -frames, this means the data has been enqueued in preparation to be received by -the application protocol, but it does not require that data is delivered and -consumed.¶
-Once the packet has been fully processed, a receiver acknowledges receipt by -sending one or more ACK frames containing the packet number of the received -packet.¶
-An endpoint SHOULD treat receipt of an acknowledgment for a packet it did not -send as a connection error of type PROTOCOL_VIOLATION, if it is able to detect -the condition.¶
-Endpoints acknowledge all packets they receive and process. However, only -ack-eliciting packets cause an ACK frame to be sent within the maximum ack -delay. Packets that are not ack-eliciting are only acknowledged when an ACK -frame is sent for other reasons.¶
-When sending a packet for any reason, an endpoint SHOULD attempt to include an -ACK frame if one has not been sent recently. Doing so helps with timely loss -detection at the peer.¶
-In general, frequent feedback from a receiver improves loss and congestion -response, but this has to be balanced against excessive load generated by a -receiver that sends an ACK frame in response to every ack-eliciting packet. The -guidance offered below seeks to strike this balance.¶
-Every packet SHOULD be acknowledged at least once, and ack-eliciting packets -MUST be acknowledged at least once within the maximum delay an endpoint -communicated using the max_ack_delay transport parameter; see -Section 18.2. max_ack_delay declares an explicit -contract: an endpoint promises to never intentionally delay acknowledgments of -an ack-eliciting packet by more than the indicated value. If it does, any excess -accrues to the RTT estimate and could result in spurious or delayed -retransmissions from the peer. A sender uses the receiver's max_ack_delay value -in determining timeouts for timer-based retransmission, as detailed in Section -6.2 of [QUIC-RECOVERY].¶
-An endpoint MUST acknowledge all ack-eliciting Initial and Handshake packets -immediately and all ack-eliciting 0-RTT and 1-RTT packets within its advertised -max_ack_delay, with the following exception. Prior to handshake confirmation, an -endpoint might not have packet protection keys for decrypting Handshake, 0-RTT, -or 1-RTT packets when they are received. It might therefore buffer them and -acknowledge them when the requisite keys become available.¶
-Since packets containing only ACK frames are not congestion controlled, an -endpoint MUST NOT send more than one such packet in response to receiving an -ack-eliciting packet.¶
-An endpoint MUST NOT send a non-ack-eliciting packet in response to a -non-ack-eliciting packet, even if there are packet gaps that precede the -received packet. This avoids an infinite feedback loop of acknowledgements, -which could prevent the connection from ever becoming idle. Non-ack-eliciting -packets are eventually acknowledged when the endpoint sends an ACK frame in -response to other events.¶
-In order to assist loss detection at the sender, an endpoint SHOULD generate -and send an ACK frame without delay when it receives an ack-eliciting packet -either:¶
-Similarly, packets marked with the ECN Congestion Experienced (CE) codepoint in -the IP header SHOULD be acknowledged immediately, to reduce the peer's response -time to congestion events.¶
-The algorithms in [QUIC-RECOVERY] are expected to be resilient to receivers -that do not follow the guidance offered above. However, an implementation -should only deviate from these requirements after careful consideration of the -performance implications of a change, for connections made by the endpoint and -for other users of the network.¶
-An endpoint that is only sending ACK frames will not receive acknowledgments -from its peer unless those acknowledgements are included in packets with -ack-eliciting frames. An endpoint SHOULD send an ACK frame with other frames -when there are new ack-eliciting packets to acknowledge. When only -non-ack-eliciting packets need to be acknowledged, an endpoint MAY wait until an -ack-eliciting packet has been received to include an ACK frame with outgoing -frames.¶
-A receiver MUST NOT send an ack-eliciting frame in all packets that would -otherwise be non-ack-eliciting, to avoid an infinite feedback loop of -acknowledgements.¶
-A receiver determines how frequently to send acknowledgements in response to -ack-eliciting packets. This determination involves a trade-off.¶
-Endpoints rely on timely acknowledgment to detect loss; see Section 6 of -[QUIC-RECOVERY]. Window-based congestion controllers, such as the one in -Section 7 of [QUIC-RECOVERY], rely on acknowledgments to manage their -congestion window. In both cases, delaying acknowledgments can adversely affect -performance.¶
-On the other hand, reducing the frequency of packets that carry only -acknowledgements reduces packet transmission and processing cost at both -endpoints. It can improve connection throughput on severely asymmetric links -and reduce the volume of acknowledgment traffic using return path capacity; -see Section 3 of [RFC3449].¶
-A receiver SHOULD send an ACK frame after receiving at least two ack-eliciting -packets. This recommendation is general in nature and consistent with -recommendations for TCP endpoint behavior [RFC5681]. Knowledge of network -conditions, knowledge of the peer's congestion controller, or further research -and experimentation might suggest alternative acknowledgment strategies with -better performance characteristics.¶
-A receiver MAY process multiple available packets before determining whether to -send an ACK frame in response.¶
-When an ACK frame is sent, one or more ranges of acknowledged packets are -included. Including acknowledgements for older packets reduces the chance of -spurious retransmissions caused by losing previously sent ACK frames, at the -cost of larger ACK frames.¶
-ACK frames SHOULD always acknowledge the most recently received packets, and the -more out-of-order the packets are, the more important it is to send an updated -ACK frame quickly, to prevent the peer from declaring a packet as lost and -spuriously retransmitting the frames it contains. An ACK frame is expected -to fit within a single QUIC packet. If it does not, then older ranges -(those with the smallest packet numbers) are omitted.¶
-A receiver limits the number of ACK Ranges (Section 19.3.1) it remembers and -sends in ACK frames, both to limit the size of ACK frames and to avoid resource -exhaustion. After receiving acknowledgments for an ACK frame, the receiver -SHOULD stop tracking those acknowledged ACK Ranges. Senders can expect -acknowledgements for most packets, but QUIC does not guarantee receipt of an -acknowledgment for every packet that the receiver processes.¶
-It is possible that retaining many ACK Ranges could cause an ACK frame to become -too large. A receiver can discard unacknowledged ACK Ranges to limit ACK frame -size, at the cost of increased retransmissions from the sender. This is -necessary if an ACK frame would be too large to fit in a packet. -Receivers MAY also limit ACK frame size further to preserve space for other -frames or to limit the capacity that acknowledgments consume.¶
-A receiver MUST retain an ACK Range unless it can ensure that it will not -subsequently accept packets with numbers in that range. Maintaining a minimum -packet number that increases as ranges are discarded is one way to achieve this -with minimal state.¶
-Receivers can discard all ACK Ranges, but they MUST retain the largest packet -number that has been successfully processed as that is used to recover packet -numbers from subsequent packets; see Section 17.1.¶
-A receiver SHOULD include an ACK Range containing the largest received packet -number in every ACK frame. The Largest Acknowledged field is used in ECN -validation at a sender and including a lower value than what was included in a -previous ACK frame could cause ECN to be unnecessarily disabled; see -Section 13.4.2.¶
-Section 13.2.4 describes an exemplary approach for determining what packets -to acknowledge in each ACK frame. Though the goal of this algorithm is to -generate an acknowledgment for every packet that is processed, it is still -possible for acknowledgments to be lost.¶
-When a packet containing an ACK frame is sent, the largest acknowledged in that -frame can be saved. When a packet containing an ACK frame is acknowledged, the -receiver can stop acknowledging packets less than or equal to the largest -acknowledged in the sent ACK frame.¶
-A receiver that sends only non-ack-eliciting packets, such as ACK frames, might -not receive an acknowledgement for a long period of time. This could cause the -receiver to maintain state for a large number of ACK frames for a long period of -time, and ACK frames it sends could be unnecessarily large. In such a case, a -receiver could send a PING or other small ack-eliciting frame occasionally, -such as once per round trip, to elicit an ACK from the peer.¶
-In cases without ACK frame loss, this algorithm allows for a minimum of 1 RTT of -reordering. In cases with ACK frame loss and reordering, this approach does not -guarantee that every acknowledgement is seen by the sender before it is no -longer included in the ACK frame. Packets could be received out of order and all -subsequent ACK frames containing them could be lost. In this case, the loss -recovery algorithm could cause spurious retransmissions, but the sender will -continue making forward progress.¶
-An endpoint measures the delays intentionally introduced between the time the -packet with the largest packet number is received and the time an acknowledgment -is sent. The endpoint encodes this acknowledgement delay in the ACK Delay field -of an ACK frame; see Section 19.3. This allows the receiver of the ACK frame -to adjust for any intentional delays, which is important for getting a better -estimate of the path RTT when acknowledgments are delayed.¶
-A packet might be held in the OS kernel or elsewhere on the host before being -processed. An endpoint MUST NOT include delays that it does not control when -populating the ACK Delay field in an ACK frame. However, endpoints SHOULD -include buffering delays caused by unavailability of decryption keys, since -these delays can be large and are likely to be non-repeating.¶
-When the measured acknowledgement delay is larger than its max_ack_delay, an -endpoint SHOULD report the measured delay. This information is especially useful -during the handshake when delays might be large; see -Section 13.2.1.¶
-ACK frames MUST only be carried in a packet that has the same packet number -space as the packet being acknowledged; see Section 12.1. For instance, -packets that are protected with 1-RTT keys MUST be acknowledged in packets that -are also protected with 1-RTT keys.¶
-Packets that a client sends with 0-RTT packet protection MUST be acknowledged by -the server in packets protected by 1-RTT keys. This can mean that the client is -unable to use these acknowledgments if the server cryptographic handshake -messages are delayed or lost. Note that the same limitation applies to other -data sent by the server protected by the 1-RTT keys.¶
-Packets containing PADDING frames are considered to be in flight for congestion -control purposes [QUIC-RECOVERY]. Packets containing only PADDING frames -therefore consume congestion window but do not generate acknowledgments that -will open the congestion window. To avoid a deadlock, a sender SHOULD ensure -that other frames are sent periodically in addition to PADDING frames to elicit -acknowledgments from the receiver.¶
-QUIC packets that are determined to be lost are not retransmitted whole. The -same applies to the frames that are contained within lost packets. Instead, the -information that might be carried in frames is sent again in new frames as -needed.¶
-New frames and packets are used to carry information that is determined to have -been lost. In general, information is sent again when a packet containing that -information is determined to be lost and sending ceases when a packet -containing that information is acknowledged.¶
-Endpoints SHOULD prioritize retransmission of data over sending new data, unless -priorities specified by the application indicate otherwise; see -Section 2.3.¶
-Even though a sender is encouraged to assemble frames containing up-to-date -information every time it sends a packet, it is not forbidden to retransmit -copies of frames from lost packets. A sender that retransmits copies of frames -needs to handle decreases in available payload size due to change in packet -number length, connection ID length, and path MTU. A receiver MUST accept -packets containing an outdated frame, such as a MAX_DATA frame carrying a -smaller maximum data than one found in an older packet.¶
-A sender SHOULD avoid retransmitting information from packets once they are -acknowledged. This includes packets that are acknowledged after being declared -lost, which can happen in the presence of network reordering. Doing so requires -senders to retain information about packets after they are declared lost. A -sender can discard this information after a period of time elapses that -adequately allows for reordering, such as a PTO (Section 6.2 of -[QUIC-RECOVERY]), or on other events, such as reaching a memory limit.¶
-Upon detecting losses, a sender MUST take appropriate congestion control action. -The details of loss detection and congestion control are described in -[QUIC-RECOVERY].¶
-QUIC endpoints can use Explicit Congestion Notification (ECN) [RFC3168] to -detect and respond to network congestion. ECN allows an endpoint to set an ECT -codepoint in the ECN field of an IP packet. A network node can then indicate -congestion by setting the CE codepoint in the ECN field instead of dropping the -packet [RFC8087]. Endpoints react to reported congestion by reducing their -sending rate in response, as described in [QUIC-RECOVERY].¶
-To enable ECN, a sending QUIC endpoint first determines whether a path supports -ECN marking and whether the peer reports the ECN values in received IP headers; -see Section 13.4.2.¶
-Use of ECN requires the receiving endpoint to read the ECN field from an IP -packet, which is not possible on all platforms. If an endpoint does not -implement ECN support or does not have access to received ECN fields, it -does not report ECN counts for packets it receives.¶
-Even if an endpoint does not set an ECT field on packets it sends, the endpoint -MUST provide feedback about ECN markings it receives, if these are accessible. -Failing to report the ECN counts will cause the sender to disable use of ECN -for this connection.¶
-On receiving an IP packet with an ECT(0), ECT(1) or CE codepoint, an -ECN-enabled endpoint accesses the ECN field and increases the corresponding -ECT(0), ECT(1), or CE count. These ECN counts are included in subsequent ACK -frames; see Section 13.2 and Section 19.3.¶
-Each packet number space maintains separate acknowledgement state and separate -ECN counts. Coalesced QUIC packets (see Section 12.2) share the same IP -header so the ECN counts are incremented once for each coalesced QUIC packet.¶
-For example, if one each of an Initial, Handshake, and 1-RTT QUIC packet are -coalesced into a single UDP datagram, the ECN counts for all three packet number -spaces will be incremented by one each, based on the ECN field of the single IP -header.¶
-ECN counts are only incremented when QUIC packets from the received IP -packet are processed. As such, duplicate QUIC packets are not processed and -do not increase ECN counts; see Section 21.10 for relevant security -concerns.¶
-It is possible for faulty network devices to corrupt or erroneously drop -packets that carry a non-zero ECN codepoint. To ensure connectivity in the -presence of such devices, an endpoint validates the ECN counts for each network -path and disables use of ECN on that path if errors are detected.¶
-To perform ECN validation for a new path:¶
-If an endpoint has cause to expect that IP packets with an ECT codepoint might -be dropped by a faulty network element, the endpoint could set an ECT codepoint -for only the first ten outgoing packets on a path, or for a period of three -PTOs (see Section 6.2 of [QUIC-RECOVERY]). If all packets marked with non-zero -ECN codepoints are subsequently lost, it can disable marking on the assumption -that the marking caused the loss.¶
-An endpoint thus attempts to use ECN and validates this for each new connection, -when switching to a server's preferred address, and on active connection -migration to a new path. Appendix A.4 describes one possible algorithm.¶
-Other methods of probing paths for ECN support are possible, as are different -marking strategies. Implementations MAY use other methods defined in RFCs; see -[RFC8311]. Implementations that use the ECT(1) codepoint need to -perform ECN validation using the reported ECT(1) counts.¶
-Erroneous application of CE markings by the network can result in degraded -connection performance. An endpoint that receives an ACK frame with ECN counts -therefore validates the counts before using them. It performs this validation by -comparing newly received counts against those from the last successfully -processed ACK frame. Any increase in the ECN counts is validated based on the -ECN markings that were applied to packets that are newly acknowledged in the ACK -frame.¶
-If an ACK frame newly acknowledges a packet that the endpoint sent with either -the ECT(0) or ECT(1) codepoint set, ECN validation fails if the corresponding -ECN counts are not present in the ACK frame. This check detects a network -element that zeroes the ECN field or a peer that does not report ECN markings.¶
-ECN validation also fails if the sum of the increase in ECT(0) and ECN-CE counts -is less than the number of newly acknowledged packets that were originally sent -with an ECT(0) marking. Similarly, ECN validation fails if the sum of the -increases to ECT(1) and ECN-CE counts is less than the number of newly -acknowledged packets sent with an ECT(1) marking. These checks can detect -remarking of ECN-CE markings by the network.¶
-An endpoint could miss acknowledgements for a packet when ACK frames are lost. -It is therefore possible for the total increase in ECT(0), ECT(1), and ECN-CE -counts to be greater than the number of packets that are newly acknowledged by -an ACK frame. This is why ECN counts are permitted to be larger than the total -number of packets that are acknowledged.¶
-Validating ECN counts from reordered ACK frames can result in failure. An -endpoint MUST NOT fail ECN validation as a result of processing an ACK frame -that does not increase the largest acknowledged packet number.¶
-ECN validation can fail if the received total count for either ECT(0) or ECT(1) -exceeds the total number of packets sent with each corresponding ECT codepoint. -In particular, validation will fail when an endpoint receives a non-zero ECN -count corresponding to an ECT codepoint that it never applied. This check -detects when packets are remarked to ECT(0) or ECT(1) in the network.¶
-If validation fails, then the endpoint MUST disable ECN. It stops setting the -ECT codepoint in IP packets that it sends, assuming that either the network path -or the peer does not support ECN.¶
-Even if validation fails, an endpoint MAY revalidate ECN for the same path at -any later time in the connection. An endpoint could continue to periodically -attempt validation.¶
-Upon successful validation, an endpoint MAY continue to set an ECT codepoint in -subsequent packets it sends, with the expectation that the path is ECN-capable. -Network routing and path elements can however change mid-connection; an endpoint -MUST disable ECN if validation later fails.¶
-A UDP datagram can include one or more QUIC packets. The datagram size refers to -the total UDP payload size of a single UDP datagram carrying QUIC packets. The -datagram size includes one or more QUIC packet headers and protected payloads, -but not the UDP or IP headers.¶
-The maximum datagram size is defined as the largest size of UDP payload that can -be sent across a network path using a single UDP datagram. QUIC MUST NOT be -used if the network path cannot support a maximum datagram size of at least 1200 -bytes.¶
-QUIC assumes a minimum IP packet size of at least 1280 bytes. This is the IPv6 -minimum size ([IPv6]) and is also supported by most modern IPv4 -networks. Assuming the minimum IP header size of 40 bytes for IPv6 and 20 bytes -for IPv4 and a UDP header size of 8 bytes, this results in a maximum datagram -size of 1232 bytes for IPv6 and 1252 bytes for IPv4. Thus, modern IPv4 -and all IPv6 network paths will be able to support QUIC.¶
-Any maximum datagram size larger than 1200 bytes can be discovered using Path -Maximum Transmission Unit Discovery (PMTUD; see Section 14.2.1) or Datagram -Packetization Layer PMTU Discovery (DPLPMTUD; see Section 14.3).¶
-Enforcement of the max_udp_payload_size transport parameter -(Section 18.2) might act as an additional limit on the -maximum datagram size. A sender can avoid exceeding this limit, once the value -is known. However, prior to learning the value of the transport parameter, -endpoints risk datagrams being lost if they send datagrams larger than the -smallest allowed maximum datagram size of 1200 bytes.¶
-UDP datagrams MUST NOT be fragmented at the IP layer. In IPv4 -([IPv4]), the DF bit MUST be set if possible, to prevent -fragmentation on the path.¶
-QUIC sometimes requires datagrams to be no smaller than a certain size; see -Section 8.1 as an example. However, the size of a datagram is not -authenticated. That is, if an endpoint receives a datagram of a certain size, it -cannot know that the sender sent the datagram at the same size. Therefore, an -endpoint MUST NOT close a connection when it receives a datagram that does not -meet size constraints; the endpoint MAY however discard such datagrams.¶
-A client MUST expand the payload of all UDP datagrams carrying Initial packets -to at least the smallest allowed maximum datagram size of 1200 bytes by adding -PADDING frames to the Initial packet or by coalescing the Initial packet; see -Section 12.2. Similarly, a server MUST expand the payload of all UDP -datagrams carrying ack-eliciting Initial packets to at least the smallest -allowed maximum datagram size of 1200 bytes. Sending UDP datagrams of this size -ensures that the network path supports a reasonable Path Maximum Transmission -Unit (PMTU), in both directions. Additionally, a client that expands Initial -packets helps reduce the amplitude of amplification attacks caused by server -responses toward an unverified client address; see Section 8.¶
-Datagrams containing Initial packets MAY exceed 1200 bytes if the sender -believes that the network path and peer both support the size that it chooses.¶
-A server MUST discard an Initial packet that is carried in a UDP datagram with a -payload that is smaller than the smallest allowed maximum datagram size of 1200 -bytes. A server MAY also immediately close the connection by sending a -CONNECTION_CLOSE frame with an error code of PROTOCOL_VIOLATION; see -Section 10.2.3.¶
-The server MUST also limit the number of bytes it sends before validating the -address of the client; see Section 8.¶
-The Path Maximum Transmission Unit (PMTU) is the maximum size of the entire IP -packet including the IP header, UDP header, and UDP payload. The UDP payload -includes one or more QUIC packet headers and protected payloads. The PMTU can -depend on path characteristics, and can therefore change over time. The largest -UDP payload an endpoint sends at any given time is referred to as the endpoint's -maximum datagram size.¶
-An endpoint SHOULD use DPLPMTUD (Section 14.3) or PMTUD (Section 14.2.1) to determine -whether the path to a destination will support a desired maximum datagram size -without fragmentation. In the absence of these mechanisms, QUIC endpoints -SHOULD NOT send datagrams larger than the smallest allowed maximum datagram -size.¶
-Both DPLPMTUD and PMTUD send datagrams that are larger than the current maximum -datagram size, referred to as PMTU probes. All QUIC packets that are not sent -in a PMTU probe SHOULD be sized to fit within the maximum datagram size to avoid -the datagram being fragmented or dropped ([RFC8085]).¶
-If a QUIC endpoint determines that the PMTU between any pair of local and remote -IP addresses has fallen below the smallest allowed maximum datagram size of 1200 -bytes, it MUST immediately cease sending QUIC packets, except for those in PMTU -probes or those containing CONNECTION_CLOSE frames, on the affected path. An -endpoint MAY terminate the connection if an alternative path cannot be found.¶
-Each pair of local and remote addresses could have a different PMTU. QUIC -implementations that implement any kind of PMTU discovery therefore SHOULD -maintain a maximum datagram size for each combination of local and remote IP -addresses.¶
-A QUIC implementation MAY be more conservative in computing the maximum datagram -size to allow for unknown tunnel overheads or IP header options/extensions.¶
-Path Maximum Transmission Unit Discovery (PMTUD; [RFC1191], [RFC8201]) -relies on reception of ICMP messages (e.g., IPv6 Packet Too Big messages) that -indicate when an IP packet is dropped because it is larger than the local router -MTU. DPLPMTUD can also optionally use these messages. This use of ICMP messages -is potentially vulnerable to off-path attacks that successfully guess the -addresses used on the path and reduce the PMTU to a bandwidth-inefficient value.¶
-An endpoint MUST ignore an ICMP message that claims the PMTU has decreased below -QUIC's smallest allowed maximum datagram size.¶
-The requirements for generating ICMP ([RFC1812], [RFC4443]) state that the -quoted packet should contain as much of the original packet as possible without -exceeding the minimum MTU for the IP version. The size of the quoted packet can -actually be smaller, or the information unintelligible, as described in Section -1.1 of [DPLPMTUD].¶
-QUIC endpoints using PMTUD SHOULD validate ICMP messages to protect from -off-path injection as specified in [RFC8201] and Section 5.2 of [RFC8085]. -This validation SHOULD use the quoted packet supplied in the payload of an ICMP -message to associate the message with a corresponding transport connection (see -Section 4.6.1 of [DPLPMTUD]). ICMP message validation MUST include matching -IP addresses and UDP ports ([RFC8085]) and, when possible, connection IDs to -an active QUIC session. The endpoint SHOULD ignore all ICMP messages that fail -validation.¶
-An endpoint MUST NOT increase PMTU based on ICMP messages; see Section 3, clause -6 of [DPLPMTUD]. Any reduction in QUIC's maximum datagram size in response -to ICMP messages MAY be provisional until QUIC's loss detection algorithm -determines that the quoted packet has actually been lost.¶
-Datagram Packetization Layer PMTU Discovery (DPLPMTUD; [DPLPMTUD]) -relies on tracking loss or acknowledgment of QUIC packets that are carried in -PMTU probes. PMTU probes for DPLPMTUD that use the PADDING frame implement -"Probing using padding data", as defined in Section 4.1 of [DPLPMTUD].¶
-Endpoints SHOULD set the initial value of BASE_PLPMTU (Section 5.1 of -[DPLPMTUD]) to be consistent with QUIC's smallest allowed maximum datagram -size. The MIN_PLPMTU is the same as the BASE_PLPMTU.¶
-QUIC endpoints implementing DPLPMTUD maintain a DPLPMTUD Maximum Packet Size -(MPS, Section 4.4 of [DPLPMTUD]) for each combination of local and remote IP -addresses. This corresponds to the maximum datagram size.¶
-From the perspective of DPLPMTUD, QUIC is an acknowledged Packetization Layer -(PL). A QUIC sender can therefore enter the DPLPMTUD BASE state (Section 5.2 of -[DPLPMTUD]) when the QUIC connection handshake has been completed.¶
-QUIC is an acknowledged PL, therefore a QUIC sender does not implement a -DPLPMTUD CONFIRMATION_TIMER while in the SEARCH_COMPLETE state; see Section 5.2 -of [DPLPMTUD].¶
-An endpoint using DPLPMTUD requires the validation of any received ICMP Packet -Too Big (PTB) message before using the PTB information, as defined in Section -4.6 of [DPLPMTUD]. In addition to UDP port validation, QUIC validates an -ICMP message by using other PL information (e.g., validation of connection IDs -in the quoted packet of any received ICMP message).¶
-The considerations for processing ICMP messages described in Section 14.2.1 also -apply if these messages are used by DPLPMTUD.¶
-PMTU probes are ack-eliciting packets.¶
-Endpoints could limit the content of PMTU probes to PING and PADDING frames, -since packets that are larger than the current maximum datagram size are more -likely to be dropped by the network. Loss of a QUIC packet that is carried in a -PMTU probe is therefore not a reliable indication of congestion and SHOULD NOT -trigger a congestion control reaction; see Section 3, Bullet 7 of [DPLPMTUD]. -However, PMTU probes consume congestion window, which could delay subsequent -transmission by an application.¶
-Endpoints that rely on the destination connection ID for routing incoming QUIC -packets are likely to require that the connection ID be included in -PMTU probes to route any resulting ICMP messages (Section 14.2.1) back to the correct -endpoint. However, only long header packets (Section 17.2) contain the -Source Connection ID field, and long header packets are not decrypted or -acknowledged by the peer once the handshake is complete.¶
-One way to construct a PMTU probe is to coalesce (see Section 12.2) a -packet with a long header, such as a Handshake or 0-RTT packet -(Section 17.2), with a short header packet in a single UDP datagram. If the -resulting PMTU probe reaches the endpoint, the packet with the long header will -be ignored, but the short header packet will be acknowledged. If the PMTU probe -causes an ICMP message to be sent, the first part of the probe will be quoted in -that message. If the Source Connection ID field is within the quoted portion of -the probe, that could be used for routing or validation of the ICMP message.¶
-The purpose of using a packet with a long header is only to ensure that the -quoted packet contained in the ICMP message contains a Source Connection ID -field. This packet does not need to be a valid packet and it can be sent even -if there is no current use for packets of that type.¶
-QUIC versions are identified using a 32-bit unsigned number.¶
-The version 0x00000000 is reserved to represent version negotiation. This -version of the specification is identified by the number 0x00000001.¶
-Other versions of QUIC might have different properties from this version. The -properties of QUIC that are guaranteed to be consistent across all versions of -the protocol are described in [QUIC-INVARIANTS].¶
-Version 0x00000001 of QUIC uses TLS as a cryptographic handshake protocol, as -described in [QUIC-TLS].¶
-Versions with the most significant 16 bits of the version number cleared are -reserved for use in future IETF consensus documents.¶
-Versions that follow the pattern 0x?a?a?a?a are reserved for use in forcing -version negotiation to be exercised. That is, any version number where the low -four bits of all bytes is 1010 (in binary). A client or server MAY advertise -support for any of these reserved versions.¶
-Reserved version numbers will never represent a real protocol; a client MAY use -one of these version numbers with the expectation that the server will initiate -version negotiation; a server MAY advertise support for one of these versions -and can expect that clients ignore the value.¶
-[[RFC editor: please remove the remainder of this section before -publication.]]¶
-The version number for the final version of this specification (0x00000001), is -reserved for the version of the protocol that is published as an RFC.¶
-Version numbers used to identify IETF drafts are created by adding the draft -number to 0xff000000. For example, draft-ietf-quic-transport-13 would be -identified as 0xff00000d.¶
-Implementors are encouraged to register version numbers of QUIC that they are -using for private experimentation on the GitHub wiki at -https://github.com/quicwg/base-drafts/wiki/QUIC-Versions.¶
-QUIC packets and frames commonly use a variable-length encoding for non-negative -integer values. This encoding ensures that smaller integer values need fewer -bytes to encode.¶
-The QUIC variable-length integer encoding reserves the two most significant bits -of the first byte to encode the base 2 logarithm of the integer encoding length -in bytes. The integer value is encoded on the remaining bits, in network byte -order.¶
-This means that integers are encoded on 1, 2, 4, or 8 bytes and can encode 6, -14, 30, or 62 bit values respectively. Table 4 summarizes the -encoding properties.¶
-2Bit | -Length | -Usable Bits | -Range | -
---|---|---|---|
00 | -1 | -6 | -0-63 | -
01 | -2 | -14 | -0-16383 | -
10 | -4 | -30 | -0-1073741823 | -
11 | -8 | -62 | -0-4611686018427387903 | -
Examples and a sample decoding algorithm are shown in Appendix A.1.¶
-Versions (Section 15) and packet numbers sent in the header -(Section 17.1) are described using integers, but do not use this -encoding.¶
-All numeric values are encoded in network byte order (that is, big-endian) and -all field sizes are in bits. Hexadecimal notation is used for describing the -value of fields.¶
-Packet numbers are integers in the range 0 to 2^62-1 (Section 12.3). When -present in long or short packet headers, they are encoded in 1 to 4 bytes. The -number of bits required to represent the packet number is reduced by including -only the least significant bits of the packet number.¶
-The encoded packet number is protected as described in Section 5.4 of -[QUIC-TLS].¶
-Prior to receiving an acknowledgement for a packet number space, the full packet -number MUST be included; it is not to be truncated as described below.¶
-After an acknowledgement is received for a packet number space, the sender MUST -use a packet number size able to represent more than twice as large a range than -the difference between the largest acknowledged packet and packet number being -sent. A peer receiving the packet will then correctly decode the packet number, -unless the packet is delayed in transit such that it arrives after many -higher-numbered packets have been received. An endpoint SHOULD use a large -enough packet number encoding to allow the packet number to be recovered even if -the packet arrives after packets that are sent afterwards.¶
-As a result, the size of the packet number encoding is at least one bit more -than the base-2 logarithm of the number of contiguous unacknowledged packet -numbers, including the new packet. Pseudocode and examples for packet number -encoding can be found in Appendix A.2.¶
-At a receiver, protection of the packet number is removed prior to recovering -the full packet number. The full packet number is then reconstructed based on -the number of significant bits present, the value of those bits, and the largest -packet number received on a successfully authenticated packet. Recovering the -full packet number is necessary to successfully remove packet protection.¶
-Once header protection is removed, the packet number is decoded by finding the -packet number value that is closest to the next expected packet. The next -expected packet is the highest received packet number plus one. Pseudocode and -an example for packet number decoding can be found in -Appendix A.3.¶
-Long headers are used for packets that are sent prior to the establishment -of 1-RTT keys. Once 1-RTT keys are available, -a sender switches to sending packets using the short header -(Section 17.3). The long form allows for special packets - such as the -Version Negotiation packet - to be represented in this uniform fixed-length -packet format. Packets that use the long header contain the following fields:¶
-The most significant bit (0x80) of byte 0 (the first byte) is set to 1 for -long headers.¶
-The next bit (0x40) of byte 0 is set to 1. Packets containing a zero value -for this bit are not valid packets in this version and MUST be discarded.¶
-The next two bits (those with a mask of 0x30) of byte 0 contain a packet type. -Packet types are listed in Table 5.¶
-The lower four bits (those with a mask of 0x0f) of byte 0 are type-specific.¶
-The QUIC Version is a 32-bit field that follows the first byte. This field -indicates the version of QUIC that is in use and determines how the rest of -the protocol fields are interpreted.¶
-The byte following the version contains the length in bytes of the Destination -Connection ID field that follows it. This length is encoded as an 8-bit -unsigned integer. In QUIC version 1, this value MUST NOT exceed 20. -Endpoints that receive a version 1 long header with a value larger than 20 -MUST drop the packet. In order to properly form a Version Negotiation packet, -servers SHOULD be able to read longer connection IDs from other QUIC versions.¶
-The Destination Connection ID field follows the Destination Connection ID -Length field, which indicates the length of this field. -Section 7.2 describes the use of this field in more detail.¶
-The byte following the Destination Connection ID contains the length in bytes -of the Source Connection ID field that follows it. This length is encoded as -a 8-bit unsigned integer. In QUIC version 1, this value MUST NOT exceed 20 -bytes. Endpoints that receive a version 1 long header with a value larger -than 20 MUST drop the packet. In order to properly form a Version Negotiation -packet, servers SHOULD be able to read longer connection IDs from other QUIC -versions.¶
-The Source Connection ID field follows the Source Connection ID Length field, -which indicates the length of this field. Section 7.2 -describes the use of this field in more detail.¶
-The remainder of the packet, if any, is type-specific.¶
-In this version of QUIC, the following packet types with the long header are -defined:¶
-Type | -Name | -Section | -
---|---|---|
0x0 | -Initial | -- Section 17.2.2 - | -
0x1 | -0-RTT | -- Section 17.2.3 - | -
0x2 | -Handshake | -- Section 17.2.4 - | -
0x3 | -Retry | -- Section 17.2.5 - | -
The header form bit, Destination and Source Connection ID lengths, Destination -and Source Connection ID fields, and Version fields of a long header packet are -version-independent. The other fields in the first byte are version-specific. -See [QUIC-INVARIANTS] for details on how packets from different versions of -QUIC are interpreted.¶
-The interpretation of the fields and the payload are specific to a version and -packet type. While type-specific semantics for this version are described in -the following sections, several long-header packets in this version of QUIC -contain these additional fields:¶
-Two bits (those with a mask of 0x0c) of byte 0 are reserved across multiple -packet types. These bits are protected using header protection; see Section -5.4 of [QUIC-TLS]. The value included prior to protection MUST be set to 0. -An endpoint MUST treat receipt of a packet that has a non-zero value for these -bits after removing both packet and header protection as a connection error -of type PROTOCOL_VIOLATION. Discarding such a packet after only removing -header protection can expose the endpoint to attacks; see Section 9.5 of -[QUIC-TLS].¶
-In packet types that contain a Packet Number field, the least significant two -bits (those with a mask of 0x03) of byte 0 contain the length of the packet -number, encoded as an unsigned, two-bit integer that is one less than the -length of the packet number field in bytes. That is, the length of the packet -number field is the value of this field, plus one. These bits are protected -using header protection; see Section 5.4 of [QUIC-TLS].¶
-The length of the remainder of the packet (that is, the Packet Number and -Payload fields) in bytes, encoded as a variable-length integer -(Section 16).¶
-The packet number field is 1 to 4 bytes long. The packet number is protected -using header protection; see Section 5.4 of [QUIC-TLS]. The length of the -packet number field is encoded in the Packet Number Length bits of byte 0; see -above.¶
-A Version Negotiation packet is inherently not version-specific. Upon receipt by -a client, it will be identified as a Version Negotiation packet based on the -Version field having a value of 0.¶
-The Version Negotiation packet is a response to a client packet that contains a -version that is not supported by the server, and is only sent by servers.¶
-The layout of a Version Negotiation packet is:¶
-The value in the Unused field is selected randomly by the server. Clients MUST -ignore the value of this field. Servers SHOULD set the most significant bit of -this field (0x40) to 1 so that Version Negotiation packets appear to have the -Fixed Bit field.¶
-The Version field of a Version Negotiation packet MUST be set to 0x00000000.¶
-The server MUST include the value from the Source Connection ID field of the -packet it receives in the Destination Connection ID field. The value for Source -Connection ID MUST be copied from the Destination Connection ID of the received -packet, which is initially randomly selected by a client. Echoing both -connection IDs gives clients some assurance that the server received the packet -and that the Version Negotiation packet was not generated by an off-path -attacker.¶
-Future versions of QUIC could have different requirements for the lengths of -connection IDs. In particular, connection IDs might have a smaller minimum -length or a greater maximum length. Version-specific rules for the connection -ID therefore MUST NOT influence a server decision about whether to send a -Version Negotiation packet.¶
-The remainder of the Version Negotiation packet is a list of 32-bit versions -that the server supports.¶
-A Version Negotiation packet is not acknowledged. It is only sent in response -to a packet that indicates an unsupported version; see Section 5.2.2.¶
-The Version Negotiation packet does not include the Packet Number and Length -fields present in other packets that use the long header form. Consequently, -a Version Negotiation packet consumes an entire UDP datagram.¶
-A server MUST NOT send more than one Version Negotiation packet in response to a -single UDP datagram.¶
-See Section 6 for a description of the version negotiation -process.¶
-An Initial packet uses long headers with a type value of 0x0. It carries the -first CRYPTO frames sent by the client and server to perform key exchange, and -carries ACKs in either direction.¶
-The Initial packet contains a long header as well as the Length and Packet -Number fields; see Section 17.2. The first byte contains the Reserved and -Packet Number Length bits; see also Section 17.2. Between the Source -Connection ID and Length fields, there are two additional fields specific to -the Initial packet.¶
-A variable-length integer specifying the length of the Token field, in bytes. -This value is zero if no token is present. Initial packets sent by the server -MUST set the Token Length field to zero; clients that receive an Initial -packet with a non-zero Token Length field MUST either discard the packet or -generate a connection error of type PROTOCOL_VIOLATION.¶
-The value of the token that was previously provided in a Retry packet or -NEW_TOKEN frame; see Section 8.1.¶
-The payload of the packet.¶
-In order to prevent tampering by version-unaware middleboxes, Initial packets -are protected with connection- and version-specific keys (Initial keys) as -described in [QUIC-TLS]. This protection does not provide confidentiality or -integrity against on-path attackers, but provides some level of protection -against off-path attackers.¶
-The client and server use the Initial packet type for any packet that contains -an initial cryptographic handshake message. This includes all cases where a new -packet containing the initial cryptographic message needs to be created, such as -the packets sent after receiving a Retry packet (Section 17.2.5).¶
-A server sends its first Initial packet in response to a client Initial. A -server MAY send multiple Initial packets. The cryptographic key exchange could -require multiple round trips or retransmissions of this data.¶
-The payload of an Initial packet includes a CRYPTO frame (or frames) containing -a cryptographic handshake message, ACK frames, or both. PING, PADDING, and -CONNECTION_CLOSE frames of type 0x1c are also permitted. An endpoint that -receives an Initial packet containing other frames can either discard the -packet as spurious or treat it as a connection error.¶
-The first packet sent by a client always includes a CRYPTO frame that contains -the start or all of the first cryptographic handshake message. The first -CRYPTO frame sent always begins at an offset of 0; see Section 7.¶
-Note that if the server sends a HelloRetryRequest, the client will send another -series of Initial packets. These Initial packets will continue the -cryptographic handshake and will contain CRYPTO frames starting at an offset -matching the size of the CRYPTO frames sent in the first flight of Initial -packets.¶
-A client stops both sending and processing Initial packets when it sends its -first Handshake packet. A server stops sending and processing Initial packets -when it receives its first Handshake packet. Though packets might still be in -flight or awaiting acknowledgment, no further Initial packets need to be -exchanged beyond this point. Initial packet protection keys are discarded (see -Section 4.9.1 of [QUIC-TLS]) along with any loss recovery and congestion -control state; see Section 6.4 of [QUIC-RECOVERY].¶
-Any data in CRYPTO frames is discarded - and no longer retransmitted - when -Initial keys are discarded.¶
-A 0-RTT packet uses long headers with a type value of 0x1, followed by the -Length and Packet Number fields; see Section 17.2. The first byte contains -the Reserved and Packet Number Length bits; see Section 17.2. A 0-RTT packet -is used to carry "early" data from the client to the server as part of the -first flight, prior to handshake completion. As part of the TLS handshake, the -server can accept or reject this early data.¶
-See Section 2.3 of [TLS13] for a discussion of 0-RTT data and its -limitations.¶
- -Packet numbers for 0-RTT protected packets use the same space as 1-RTT protected -packets.¶
-After a client receives a Retry packet, 0-RTT packets are likely to have been -lost or discarded by the server. A client SHOULD attempt to resend data in -0-RTT packets after it sends a new Initial packet. New packet numbers MUST be -used for any new packets that are sent; as described in Section 17.2.5.3, -reusing packet numbers could compromise packet protection.¶
-A client only receives acknowledgments for its 0-RTT packets once the handshake -is complete, as defined Section 4.1.1 of [QUIC-TLS].¶
-A client MUST NOT send 0-RTT packets once it starts processing 1-RTT packets -from the server. This means that 0-RTT packets cannot contain any response to -frames from 1-RTT packets. For instance, a client cannot send an ACK frame in a -0-RTT packet, because that can only acknowledge a 1-RTT packet. An -acknowledgment for a 1-RTT packet MUST be carried in a 1-RTT packet.¶
-A server SHOULD treat a violation of remembered limits (Section 7.4.1) -as a connection error of an appropriate type (for instance, a FLOW_CONTROL_ERROR -for exceeding stream data limits).¶
-A Handshake packet uses long headers with a type value of 0x2, followed by the -Length and Packet Number fields; see Section 17.2. The first byte contains -the Reserved and Packet Number Length bits; see Section 17.2. It is used -to carry cryptographic handshake messages and acknowledgments from the server -and client.¶
-Once a client has received a Handshake packet from a server, it uses Handshake -packets to send subsequent cryptographic handshake messages and acknowledgments -to the server.¶
-The Destination Connection ID field in a Handshake packet contains a connection -ID that is chosen by the recipient of the packet; the Source Connection ID -includes the connection ID that the sender of the packet wishes to use; see -Section 7.2.¶
-Handshake packets have their own packet number space, and thus the first -Handshake packet sent by a server contains a packet number of 0.¶
-The payload of this packet contains CRYPTO frames and could contain PING, -PADDING, or ACK frames. Handshake packets MAY contain CONNECTION_CLOSE frames -of type 0x1c. Endpoints MUST treat receipt of Handshake packets with other -frames as a connection error of type PROTOCOL_VIOLATION.¶
-Like Initial packets (see Section 17.2.2.1), data in CRYPTO frames for -Handshake packets is discarded - and no longer retransmitted - when Handshake -protection keys are discarded.¶
-A Retry packet uses a long packet header with a type value of 0x3. It carries -an address validation token created by the server. It is used by a server that -wishes to perform a retry; see Section 8.1.¶
-A Retry packet (shown in Figure 18) does not contain any protected -fields. The value in the Unused field is set to an arbitrary value by the -server; a client MUST ignore these bits. In addition to the fields from the -long header, it contains these additional fields:¶
-An opaque token that the server can use to validate the client's address.¶
-The server populates the Destination Connection ID with the connection ID that -the client included in the Source Connection ID of the Initial packet.¶
-The server includes a connection ID of its choice in the Source Connection ID -field. This value MUST NOT be equal to the Destination Connection ID field of -the packet sent by the client. A client MUST discard a Retry packet that -contains a Source Connection ID field that is identical to the Destination -Connection ID field of its Initial packet. The client MUST use the value from -the Source Connection ID field of the Retry packet in the Destination Connection -ID field of subsequent packets that it sends.¶
-A server MAY send Retry packets in response to Initial and 0-RTT packets. A -server can either discard or buffer 0-RTT packets that it receives. A server -can send multiple Retry packets as it receives Initial or 0-RTT packets. A -server MUST NOT send more than one Retry packet in response to a single UDP -datagram.¶
-A client MUST accept and process at most one Retry packet for each connection -attempt. After the client has received and processed an Initial or Retry packet -from the server, it MUST discard any subsequent Retry packets that it receives.¶
-Clients MUST discard Retry packets that have a Retry Integrity Tag that cannot -be validated; see the Retry Packet Integrity section of [QUIC-TLS]. This -diminishes an off-path attacker's ability to inject a Retry packet and protects -against accidental corruption of Retry packets. A client MUST discard a Retry -packet with a zero-length Retry Token field.¶
-The client responds to a Retry packet with an Initial packet that includes the -provided Retry Token to continue connection establishment.¶
-A client sets the Destination Connection ID field of this Initial packet to the -value from the Source Connection ID in the Retry packet. Changing Destination -Connection ID also results in a change to the keys used to protect the Initial -packet. It also sets the Token field to the token provided in the Retry. The -client MUST NOT change the Source Connection ID because the server could include -the connection ID as part of its token validation logic; see -Section 8.1.4.¶
-A Retry packet does not include a packet number and cannot be explicitly -acknowledged by a client.¶
-Subsequent Initial packets from the client include the connection ID and token -values from the Retry packet. The client copies the Source Connection ID field -from the Retry packet to the Destination Connection ID field and uses this -value until an Initial packet with an updated value is received; see -Section 7.2. The value of the Token field is copied to all -subsequent Initial packets; see Section 8.1.2.¶
-Other than updating the Destination Connection ID and Token fields, the Initial -packet sent by the client is subject to the same restrictions as the first -Initial packet. A client MUST use the same cryptographic handshake message it -included in this packet. A server MAY treat a packet that contains a different -cryptographic handshake message as a connection error or discard it.¶
-A client MAY attempt 0-RTT after receiving a Retry packet by sending 0-RTT -packets to the connection ID provided by the server. A client MUST NOT change -the cryptographic handshake message it sends in response to receiving a Retry.¶
-A client MUST NOT reset the packet number for any packet number space after -processing a Retry packet. In particular, 0-RTT packets contain confidential -information that will most likely be retransmitted on receiving a Retry packet. -The keys used to protect these new 0-RTT packets will not change as a result of -responding to a Retry packet. However, the data sent in these packets could be -different than what was sent earlier. Sending these new packets with the same -packet number is likely to compromise the packet protection for those packets -because the same key and nonce could be used to protect different content. -A server MAY abort the connection if it detects that the client reset the -packet number.¶
-The connection IDs used on Initial and Retry packets exchanged between client -and server are copied to the transport parameters and validated as described -in Section 7.3.¶
-This version of QUIC defines a single packet type that uses the short packet -header.¶
-A 1-RTT packet uses a short packet header. It is used after the version and -1-RTT keys are negotiated.¶
- -1-RTT packets contain the following fields:¶
-The most significant bit (0x80) of byte 0 is set to 0 for the short header.¶
-The next bit (0x40) of byte 0 is set to 1. Packets containing a zero value -for this bit are not valid packets in this version and MUST be discarded.¶
-The third most significant bit (0x20) of byte 0 is the latency spin bit, set -as described in Section 17.4.¶
-The next two bits (those with a mask of 0x18) of byte 0 are reserved. These -bits are protected using header protection; see Section 5.4 of -[QUIC-TLS]. The value included prior to protection MUST be set to 0. An -endpoint MUST treat receipt of a packet that has a non-zero value for these -bits, after removing both packet and header protection, as a connection error -of type PROTOCOL_VIOLATION. Discarding such a packet after only removing -header protection can expose the endpoint to attacks; see Section 9.5 of -[QUIC-TLS].¶
-The next bit (0x04) of byte 0 indicates the key phase, which allows a -recipient of a packet to identify the packet protection keys that are used to -protect the packet. See [QUIC-TLS] for details. This bit is protected -using header protection; see Section 5.4 of [QUIC-TLS].¶
-The least significant two bits (those with a mask of 0x03) of byte 0 contain -the length of the packet number, encoded as an unsigned, two-bit integer that -is one less than the length of the packet number field in bytes. That is, the -length of the packet number field is the value of this field, plus one. These -bits are protected using header protection; see Section 5.4 of [QUIC-TLS].¶
-The Destination Connection ID is a connection ID that is chosen by the -intended recipient of the packet. See Section 5.1 for more details.¶
-The packet number field is 1 to 4 bytes long. The packet number has -confidentiality protection separate from packet protection, as described in -Section 5.4 of [QUIC-TLS]. The length of the packet number field is encoded -in Packet Number Length field. See Section 17.1 for details.¶
-1-RTT packets always include a 1-RTT protected payload.¶
-The header form bit and the connection ID field of a short header packet are -version-independent. The remaining fields are specific to the selected QUIC -version. See [QUIC-INVARIANTS] for details on how packets from different -versions of QUIC are interpreted.¶
-The latency spin bit, which is defined for 1-RTT packets (Section 17.3.1), -enables passive latency monitoring from observation points on the network path -throughout the duration of a connection. The server reflects the spin value -received, while the client 'spins' it after one RTT. On-path observers can -measure the time between two spin bit toggle events to estimate the end-to-end -RTT of a connection.¶
-The spin bit is only present in 1-RTT packets, since it is possible to measure -the initial RTT of a connection by observing the handshake. Therefore, the spin -bit is available after version negotiation and connection establishment are -completed. On-path measurement and use of the latency spin bit is further -discussed in [QUIC-MANAGEABILITY].¶
-The spin bit is an OPTIONAL feature of this version of QUIC. A QUIC stack that -chooses to support the spin bit MUST implement it as specified in this section.¶
-Each endpoint unilaterally decides if the spin bit is enabled or disabled for a -connection. Implementations MUST allow administrators of clients and servers -to disable the spin bit either globally or on a per-connection basis. Even when -the spin bit is not disabled by the administrator, endpoints MUST disable their -use of the spin bit for a random selection of at least one in every 16 network -paths, or for one in every 16 connection IDs. As each endpoint disables the -spin bit independently, this ensures that the spin bit signal is disabled on -approximately one in eight network paths.¶
-When the spin bit is disabled, endpoints MAY set the spin bit to any value, and -MUST ignore any incoming value. It is RECOMMENDED that endpoints set the spin -bit to a random value either chosen independently for each packet or chosen -independently for each connection ID.¶
-If the spin bit is enabled for the connection, the endpoint maintains a spin -value for each network path and sets the spin bit in the packet header to the -currently stored value when a 1-RTT packet is sent on that path. The spin value -is initialized to 0 in the endpoint for each network path. Each endpoint also -remembers the highest packet number seen from its peer on each path.¶
-When a server receives a 1-RTT packet that increases the highest packet number -seen by the server from the client on a given network path, it sets the spin -value for that path to be equal to the spin bit in the received packet.¶
-When a client receives a 1-RTT packet that increases the highest packet number -seen by the client from the server on a given network path, it sets the spin -value for that path to the inverse of the spin bit in the received packet.¶
-An endpoint resets the spin value for a network path to zero when changing the -connection ID being used on that network path.¶
-The extension_data field of the quic_transport_parameters extension defined in -[QUIC-TLS] contains the QUIC transport parameters. They are encoded as a -sequence of transport parameters, as shown in Figure 20:¶
-Each transport parameter is encoded as an (identifier, length, value) tuple, -as shown in Figure 21:¶
-The Transport Parameter Length field contains the length of the Transport -Parameter Value field in bytes.¶
-QUIC encodes transport parameters into a sequence of bytes, which is then -included in the cryptographic handshake.¶
-Transport parameters with an identifier of the form 31 * N + 27
for integer
-values of N are reserved to exercise the requirement that unknown transport
-parameters be ignored. These transport parameters have no semantics, and can
-carry arbitrary values.¶
This section details the transport parameters defined in this document.¶
-Many transport parameters listed here have integer values. Those transport -parameters that are identified as integers use a variable-length integer -encoding; see Section 16. Transport parameters have a default value -of 0 if the transport parameter is absent unless otherwise stated.¶
-The following transport parameters are defined:¶
-The value of the Destination Connection ID field from the first Initial packet -sent by the client; see Section 7.3. This transport parameter is only sent -by a server.¶
-The max idle timeout is a value in milliseconds that is encoded as an integer; -see (Section 10.1). Idle timeout is disabled when both endpoints omit -this transport parameter or specify a value of 0.¶
-A stateless reset token is used in verifying a stateless reset; see -Section 10.3. This parameter is a sequence of 16 bytes. This -transport parameter MUST NOT be sent by a client, but MAY be sent by a server. -A server that does not send this transport parameter cannot use stateless -reset (Section 10.3) for the connection ID negotiated during the -handshake.¶
-The maximum UDP payload size parameter is an integer value that limits the -size of UDP payloads that the endpoint is willing to receive. UDP datagrams -with payloads larger than this limit are not likely to be processed by the -receiver.¶
-The default for this parameter is the maximum permitted UDP payload of 65527. -Values below 1200 are invalid.¶
-This limit does act as an additional constraint on datagram size in the same -way as the path MTU, but it is a property of the endpoint and not the path; -see Section 14. It is expected that this is the space an endpoint -dedicates to holding incoming packets.¶
-The initial maximum data parameter is an integer value that contains the -initial value for the maximum amount of data that can be sent on the -connection. This is equivalent to sending a MAX_DATA (Section 19.9) for -the connection immediately after completing the handshake.¶
-This parameter is an integer value specifying the initial flow control limit -for locally-initiated bidirectional streams. This limit applies to newly -created bidirectional streams opened by the endpoint that sends the transport -parameter. In client transport parameters, this applies to streams with an -identifier with the least significant two bits set to 0x0; in server transport -parameters, this applies to streams with the least significant two bits set to -0x1.¶
-This parameter is an integer value specifying the initial flow control limit -for peer-initiated bidirectional streams. This limit applies to newly created -bidirectional streams opened by the endpoint that receives the transport -parameter. In client transport parameters, this applies to streams with an -identifier with the least significant two bits set to 0x1; in server transport -parameters, this applies to streams with the least significant two bits set to -0x0.¶
-This parameter is an integer value specifying the initial flow control limit -for unidirectional streams. This limit applies to newly created -unidirectional streams opened by the endpoint that receives the transport -parameter. In client transport parameters, this applies to streams with an -identifier with the least significant two bits set to 0x3; in server transport -parameters, this applies to streams with the least significant two bits set to -0x2.¶
-The initial maximum bidirectional streams parameter is an integer value that -contains the initial maximum number of bidirectional streams the peer is -permitted to initiate. If this parameter is absent or zero, the peer cannot -open bidirectional streams until a MAX_STREAMS frame is sent. Setting this -parameter is equivalent to sending a MAX_STREAMS (Section 19.11) of -the corresponding type with the same value.¶
-The initial maximum unidirectional streams parameter is an integer value that -contains the initial maximum number of unidirectional streams the peer is -permitted to initiate. If this parameter is absent or zero, the peer cannot -open unidirectional streams until a MAX_STREAMS frame is sent. Setting this -parameter is equivalent to sending a MAX_STREAMS (Section 19.11) of -the corresponding type with the same value.¶
-The acknowledgement delay exponent is an integer value indicating an exponent -used to decode the ACK Delay field in the ACK frame (Section 19.3). If this -value is absent, a default value of 3 is assumed (indicating a multiplier of -8). Values above 20 are invalid.¶
-The maximum acknowledgement delay is an integer value indicating the maximum -amount of time in milliseconds by which the endpoint will delay sending -acknowledgments. This value SHOULD include the receiver's expected delays in -alarms firing. For example, if a receiver sets a timer for 5ms and alarms -commonly fire up to 1ms late, then it should send a max_ack_delay of 6ms. If -this value is absent, a default of 25 milliseconds is assumed. Values of 2^14 -or greater are invalid.¶
-The disable active migration transport parameter is included if the endpoint -does not support active connection migration (Section 9) on the address -being used during the handshake. When a peer sets this transport parameter, -an endpoint MUST NOT use a new local address when sending to the address that -the peer used during the handshake. This transport parameter does not -prohibit connection migration after a client has acted on a preferred_address -transport parameter. This parameter is a zero-length value.¶
-The server's preferred address is used to effect a change in server address at -the end of the handshake, as described in Section 9.6. This -transport parameter is only sent by a server. Servers MAY choose to only send -a preferred address of one address family by sending an all-zero address and -port (0.0.0.0:0 or ::.0) for the other family. IP addresses are encoded in -network byte order.¶
-The preferred_address transport parameter contains an address and port for -both IP version 4 and 6. The four-byte IPv4 Address field is followed by the -associated two-byte IPv4 Port field. This is followed by a 16-byte IPv6 -Address field and two-byte IPv6 Port field. After address and port pairs, -a Connection ID Length field describes the length of the following Connection -ID field. Finally, a 16-byte Stateless Reset Token field includes the -stateless reset token associated with the connection ID. The format of this -transport parameter is shown in Figure 22.¶
-The Connection ID field and the Stateless Reset Token field contain an -alternative connection ID that has a sequence number of 1; see Section 5.1.1. -Having these values sent alongside the preferred address ensures that there -will be at least one unused active connection ID when the client initiates -migration to the preferred address.¶
-The Connection ID and Stateless Reset Token fields of a preferred address are -identical in syntax and semantics to the corresponding fields of a -NEW_CONNECTION_ID frame (Section 19.15). A server that chooses -a zero-length connection ID MUST NOT provide a preferred address. Similarly, -a server MUST NOT include a zero-length connection ID in this transport -parameter. A client MUST treat violation of these requirements as a -connection error of type TRANSPORT_PARAMETER_ERROR.¶
-The active connection ID limit is an integer value specifying the -maximum number of connection IDs from the peer that an endpoint is willing -to store. This value includes the connection ID received during the handshake, -that received in the preferred_address transport parameter, and those received -in NEW_CONNECTION_ID frames. -The value of the active_connection_id_limit parameter MUST be at least 2. -An endpoint that receives a value less than 2 MUST close the connection -with an error of type TRANSPORT_PARAMETER_ERROR. -If this transport parameter is absent, a default of 2 is assumed. If an -endpoint issues a zero-length connection ID, it will never send a -NEW_CONNECTION_ID frame and therefore ignores the active_connection_id_limit -value received from its peer.¶
-The value that the endpoint included in the Source Connection ID field of the -first Initial packet it sends for the connection; see Section 7.3.¶
-The value that the server included in the Source Connection ID field of a -Retry packet; see Section 7.3. This transport parameter is only sent by a -server.¶
-If present, transport parameters that set initial flow control limits -(initial_max_stream_data_bidi_local, initial_max_stream_data_bidi_remote, and -initial_max_stream_data_uni) are equivalent to sending a MAX_STREAM_DATA frame -(Section 19.10) on every stream of the corresponding type -immediately after opening. If the transport parameter is absent, streams of -that type start with a flow control limit of 0.¶
-A client MUST NOT include any server-only transport parameter: -original_destination_connection_id, preferred_address, -retry_source_connection_id, or stateless_reset_token. A server MUST treat -receipt of any of these transport parameters as a connection error of type -TRANSPORT_PARAMETER_ERROR.¶
-As described in Section 12.4, packets contain one or more frames. This section -describes the format and semantics of the core QUIC frame types.¶
-A PADDING frame (type=0x00) has no semantic value. PADDING frames can be used -to increase the size of a packet. Padding can be used to increase an initial -client packet to the minimum required size, or to provide protection against -traffic analysis for protected packets.¶
-PADDING frames are formatted as shown in Figure 23, which shows that -PADDING frames have no content. That is, a PADDING frame consists of the single -byte that identifies the frame as a PADDING frame.¶
-Endpoints can use PING frames (type=0x01) to verify that their peers are still -alive or to check reachability to the peer.¶
-PING frames are formatted as shown in Figure 24, which shows that PING -frames have no content.¶
-The receiver of a PING frame simply needs to acknowledge the packet containing -this frame.¶
-The PING frame can be used to keep a connection alive when an application or -application protocol wishes to prevent the connection from timing out; see -Section 10.1.2.¶
-Receivers send ACK frames (types 0x02 and 0x03) to inform senders of packets -they have received and processed. The ACK frame contains one or more ACK Ranges. -ACK Ranges identify acknowledged packets. If the frame type is 0x03, ACK frames -also contain the sum of QUIC packets with associated ECN marks received on the -connection up until this point. QUIC implementations MUST properly handle both -types and, if they have enabled ECN for packets they send, they SHOULD use the -information in the ECN section to manage their congestion state.¶
-QUIC acknowledgements are irrevocable. Once acknowledged, a packet remains -acknowledged, even if it does not appear in a future ACK frame. This is unlike -reneging for TCP SACKs ([RFC2018]).¶
-Packets from different packet number spaces can be identified using the same -numeric value. An acknowledgment for a packet needs to indicate both a packet -number and a packet number space. This is accomplished by having each ACK frame -only acknowledge packet numbers in the same space as the packet in which the -ACK frame is contained.¶
-Version Negotiation and Retry packets cannot be acknowledged because they do not -contain a packet number. Rather than relying on ACK frames, these packets are -implicitly acknowledged by the next Initial packet sent by the client.¶
-ACK frames are formatted as shown in Figure 25.¶
-ACK frames contain the following fields:¶
-A variable-length integer representing the largest packet number the peer is -acknowledging; this is usually the largest packet number that the peer has -received prior to generating the ACK frame. Unlike the packet number in the -QUIC long or short header, the value in an ACK frame is not truncated.¶
-A variable-length integer encoding the acknowledgement delay in -microseconds; see Section 13.2.5. It is decoded by multiplying the -value in the field by 2 to the power of the ack_delay_exponent transport -parameter sent by the sender of the ACK frame; see -Section 18.2. Compared to simply expressing -the delay as an integer, this encoding allows for a larger range of -values within the same number of bytes, at the cost of lower resolution.¶
-A variable-length integer specifying the number of Gap and ACK Range fields in -the frame.¶
-A variable-length integer indicating the number of contiguous packets -preceding the Largest Acknowledged that are being acknowledged. The First ACK -Range is encoded as an ACK Range; see Section 19.3.1 starting from the -Largest Acknowledged. That is, the smallest packet acknowledged in the -range is determined by subtracting the First ACK Range value from the Largest -Acknowledged.¶
-Contains additional ranges of packets that are alternately not -acknowledged (Gap) and acknowledged (ACK Range); see Section 19.3.1.¶
-The three ECN Counts; see Section 19.3.2.¶
-Each ACK Range consists of alternating Gap and ACK Range values in descending -packet number order. ACK Ranges can be repeated. The number of Gap and ACK -Range values is determined by the ACK Range Count field; one of each value is -present for each value in the ACK Range Count field.¶
-ACK Ranges are structured as shown in Figure 26.¶
-The fields that form each ACK Range are:¶
-A variable-length integer indicating the number of contiguous unacknowledged -packets preceding the packet number one lower than the smallest in the -preceding ACK Range.¶
-A variable-length integer indicating the number of contiguous acknowledged -packets preceding the largest packet number, as determined by the -preceding Gap.¶
-Gap and ACK Range value use a relative integer encoding for efficiency. Though -each encoded value is positive, the values are subtracted, so that each ACK -Range describes progressively lower-numbered packets.¶
-Each ACK Range acknowledges a contiguous range of packets by indicating the -number of acknowledged packets that precede the largest packet number in that -range. A value of zero indicates that only the largest packet number is -acknowledged. Larger ACK Range values indicate a larger range, with -corresponding lower values for the smallest packet number in the range. Thus, -given a largest packet number for the range, the smallest value is determined by -the formula:¶
-- smallest = largest - ack_range -¶ -
An ACK Range acknowledges all packets between the smallest packet number and the -largest, inclusive.¶
-The largest value for an ACK Range is determined by cumulatively subtracting the -size of all preceding ACK Ranges and Gaps.¶
-Each Gap indicates a range of packets that are not being acknowledged. The -number of packets in the gap is one higher than the encoded value of the Gap -field.¶
-The value of the Gap field establishes the largest packet number value for the -subsequent ACK Range using the following formula:¶
-- largest = previous_smallest - gap - 2 -¶ -
If any computed packet number is negative, an endpoint MUST generate a -connection error of type FRAME_ENCODING_ERROR.¶
-The ACK frame uses the least significant bit (that is, type 0x03) to indicate -ECN feedback and report receipt of QUIC packets with associated ECN codepoints -of ECT(0), ECT(1), or CE in the packet's IP header. ECN Counts are only present -when the ACK frame type is 0x03.¶
-When present, there are 3 ECN counts, as shown in Figure 27.¶
-The three ECN Counts are:¶
-A variable-length integer representing the total number of packets received -with the ECT(0) codepoint in the packet number space of the ACK frame.¶
-A variable-length integer representing the total number of packets received -with the ECT(1) codepoint in the packet number space of the ACK frame.¶
-A variable-length integer representing the total number of packets received -with the CE codepoint in the packet number space of the ACK frame.¶
-ECN counts are maintained separately for each packet number space.¶
-An endpoint uses a RESET_STREAM frame (type=0x04) to abruptly terminate the -sending part of a stream.¶
-After sending a RESET_STREAM, an endpoint ceases transmission and retransmission -of STREAM frames on the identified stream. A receiver of RESET_STREAM can -discard any data that it already received on that stream.¶
-An endpoint that receives a RESET_STREAM frame for a send-only stream MUST -terminate the connection with error STREAM_STATE_ERROR.¶
-RESET_STREAM frames are formatted as shown in Figure 28.¶
-RESET_STREAM frames contain the following fields:¶
-A variable-length integer encoding of the Stream ID of the stream being -terminated.¶
-A variable-length integer containing the application protocol error -code (see Section 20.2) that indicates why the stream is being -closed.¶
-A variable-length integer indicating the final size of the stream by the -RESET_STREAM sender, in unit of bytes; see Section 4.5.¶
-An endpoint uses a STOP_SENDING frame (type=0x05) to communicate that incoming -data is being discarded on receipt at application request. STOP_SENDING -requests that a peer cease transmission on a stream.¶
-A STOP_SENDING frame can be sent for streams in the Recv or Size Known states; -see Section 3.1. Receiving a STOP_SENDING frame for a -locally-initiated stream that has not yet been created MUST be treated as a -connection error of type STREAM_STATE_ERROR. An endpoint that receives a -STOP_SENDING frame for a receive-only stream MUST terminate the connection with -error STREAM_STATE_ERROR.¶
-STOP_SENDING frames are formatted as shown in Figure 29.¶
-STOP_SENDING frames contain the following fields:¶
-A variable-length integer carrying the Stream ID of the stream being ignored.¶
-A variable-length integer containing the application-specified reason the -sender is ignoring the stream; see Section 20.2.¶
-A CRYPTO frame (type=0x06) is used to transmit cryptographic handshake messages. -It can be sent in all packet types except 0-RTT. The CRYPTO frame offers the -cryptographic protocol an in-order stream of bytes. CRYPTO frames are -functionally identical to STREAM frames, except that they do not bear a stream -identifier; they are not flow controlled; and they do not carry markers for -optional offset, optional length, and the end of the stream.¶
-CRYPTO frames are formatted as shown in Figure 30.¶
-CRYPTO frames contain the following fields:¶
-A variable-length integer specifying the byte offset in the stream for the -data in this CRYPTO frame.¶
-A variable-length integer specifying the length of the Crypto Data field in -this CRYPTO frame.¶
-The cryptographic message data.¶
-There is a separate flow of cryptographic handshake data in each encryption -level, each of which starts at an offset of 0. This implies that each encryption -level is treated as a separate CRYPTO stream of data.¶
-The largest offset delivered on a stream - the sum of the offset and data -length - cannot exceed 2^62-1. Receipt of a frame that exceeds this limit MUST -be treated as a connection error of type FRAME_ENCODING_ERROR or -CRYPTO_BUFFER_EXCEEDED.¶
-Unlike STREAM frames, which include a Stream ID indicating to which stream the -data belongs, the CRYPTO frame carries data for a single stream per encryption -level. The stream does not have an explicit end, so CRYPTO frames do not have a -FIN bit.¶
-A server sends a NEW_TOKEN frame (type=0x07) to provide the client with a token -to send in the header of an Initial packet for a future connection.¶
-NEW_TOKEN frames are formatted as shown in Figure 31.¶
-NEW_TOKEN frames contain the following fields:¶
-A variable-length integer specifying the length of the token in bytes.¶
-An opaque blob that the client can use with a future Initial packet. The token -MUST NOT be empty. A client MUST treat receipt of a NEW_TOKEN frame with -an empty Token field as a connection error of type FRAME_ENCODING_ERROR.¶
-A client might receive multiple NEW_TOKEN frames that contain the same token -value if packets containing the frame are incorrectly determined to be lost. -Clients are responsible for discarding duplicate values, which might be used -to link connection attempts; see Section 8.1.3.¶
-Clients MUST NOT send NEW_TOKEN frames. A server MUST treat receipt of a -NEW_TOKEN frame as a connection error of type PROTOCOL_VIOLATION.¶
-STREAM frames implicitly create a stream and carry stream data. The STREAM -frame Type field takes the form 0b00001XXX (or the set of values from 0x08 to -0x0f). The three low-order bits of the frame type determine the fields that -are present in the frame:¶
-An endpoint MUST terminate the connection with error STREAM_STATE_ERROR if it -receives a STREAM frame for a locally-initiated stream that has not yet been -created, or for a send-only stream.¶
-STREAM frames are formatted as shown in Figure 32.¶
-STREAM frames contain the following fields:¶
-A variable-length integer indicating the stream ID of the stream; see -Section 2.1.¶
-A variable-length integer specifying the byte offset in the stream for the -data in this STREAM frame. This field is present when the OFF bit is set to -1. When the Offset field is absent, the offset is 0.¶
-A variable-length integer specifying the length of the Stream Data field in -this STREAM frame. This field is present when the LEN bit is set to 1. When -the LEN bit is set to 0, the Stream Data field consumes all the remaining -bytes in the packet.¶
-The bytes from the designated stream to be delivered.¶
-When a Stream Data field has a length of 0, the offset in the STREAM frame is -the offset of the next byte that would be sent.¶
-The first byte in the stream has an offset of 0. The largest offset delivered -on a stream - the sum of the offset and data length - cannot exceed 2^62-1, as -it is not possible to provide flow control credit for that data. Receipt of a -frame that exceeds this limit MUST be treated as a connection error of type -FRAME_ENCODING_ERROR or FLOW_CONTROL_ERROR.¶
-A MAX_DATA frame (type=0x10) is used in flow control to inform the peer of the -maximum amount of data that can be sent on the connection as a whole.¶
-MAX_DATA frames are formatted as shown in Figure 33.¶
-MAX_DATA frames contain the following field:¶
-A variable-length integer indicating the maximum amount of data that can be -sent on the entire connection, in units of bytes.¶
-All data sent in STREAM frames counts toward this limit. The sum of the final -sizes on all streams - including streams in terminal states - MUST NOT exceed -the value advertised by a receiver. An endpoint MUST terminate a connection -with a FLOW_CONTROL_ERROR error if it receives more data than the maximum data -value that it has sent. This includes violations of remembered limits in Early -Data; see Section 7.4.1.¶
-A MAX_STREAM_DATA frame (type=0x11) is used in flow control to inform a peer -of the maximum amount of data that can be sent on a stream.¶
-A MAX_STREAM_DATA frame can be sent for streams in the Recv state; see -Section 3.1. Receiving a MAX_STREAM_DATA frame for a -locally-initiated stream that has not yet been created MUST be treated as a -connection error of type STREAM_STATE_ERROR. An endpoint that receives a -MAX_STREAM_DATA frame for a receive-only stream MUST terminate the connection -with error STREAM_STATE_ERROR.¶
-MAX_STREAM_DATA frames are formatted as shown in Figure 34.¶
-MAX_STREAM_DATA frames contain the following fields:¶
-The stream ID of the stream that is affected encoded as a variable-length -integer.¶
-A variable-length integer indicating the maximum amount of data that can be -sent on the identified stream, in units of bytes.¶
-When counting data toward this limit, an endpoint accounts for the largest -received offset of data that is sent or received on the stream. Loss or -reordering can mean that the largest received offset on a stream can be greater -than the total size of data received on that stream. Receiving STREAM frames -might not increase the largest received offset.¶
-The data sent on a stream MUST NOT exceed the largest maximum stream data value -advertised by the receiver. An endpoint MUST terminate a connection with a -FLOW_CONTROL_ERROR error if it receives more data than the largest maximum -stream data that it has sent for the affected stream. This includes violations -of remembered limits in Early Data; see Section 7.4.1.¶
-A MAX_STREAMS frame (type=0x12 or 0x13) inform the peer of the cumulative -number of streams of a given type it is permitted to open. A MAX_STREAMS frame -with a type of 0x12 applies to bidirectional streams, and a MAX_STREAMS frame -with a type of 0x13 applies to unidirectional streams.¶
-MAX_STREAMS frames are formatted as shown in Figure 35;¶
-MAX_STREAMS frames contain the following field:¶
-A count of the cumulative number of streams of the corresponding type that -can be opened over the lifetime of the connection. This value cannot exceed -2^60, as it is not possible to encode stream IDs larger than 2^62-1. -Receipt of a frame that permits opening of a stream larger than this limit -MUST be treated as a FRAME_ENCODING_ERROR.¶
-Loss or reordering can cause a MAX_STREAMS frame to be received that state a -lower stream limit than an endpoint has previously received. MAX_STREAMS frames -that do not increase the stream limit MUST be ignored.¶
-An endpoint MUST NOT open more streams than permitted by the current stream -limit set by its peer. For instance, a server that receives a unidirectional -stream limit of 3 is permitted to open stream 3, 7, and 11, but not stream 15. -An endpoint MUST terminate a connection with a STREAM_LIMIT_ERROR error if a -peer opens more streams than was permitted. This includes violations of -remembered limits in Early Data; see Section 7.4.1.¶
-Note that these frames (and the corresponding transport parameters) do not -describe the number of streams that can be opened concurrently. The limit -includes streams that have been closed as well as those that are open.¶
-A sender SHOULD send a DATA_BLOCKED frame (type=0x14) when it wishes to send -data, but is unable to do so due to connection-level flow control; see -Section 4. DATA_BLOCKED frames can be used as input to tuning of flow -control algorithms; see Section 4.2.¶
-DATA_BLOCKED frames are formatted as shown in Figure 36.¶
-DATA_BLOCKED frames contain the following field:¶
-A variable-length integer indicating the connection-level limit at which -blocking occurred.¶
-A sender SHOULD send a STREAM_DATA_BLOCKED frame (type=0x15) when it wishes to -send data, but is unable to do so due to stream-level flow control. This frame -is analogous to DATA_BLOCKED (Section 19.12).¶
-An endpoint that receives a STREAM_DATA_BLOCKED frame for a send-only stream -MUST terminate the connection with error STREAM_STATE_ERROR.¶
-STREAM_DATA_BLOCKED frames are formatted as shown in -Figure 37.¶
-STREAM_DATA_BLOCKED frames contain the following fields:¶
- -A sender SHOULD send a STREAMS_BLOCKED frame (type=0x16 or 0x17) when it wishes -to open a stream, but is unable to due to the maximum stream limit set by its -peer; see Section 19.11. A STREAMS_BLOCKED frame of type 0x16 is used -to indicate reaching the bidirectional stream limit, and a STREAMS_BLOCKED frame -of type 0x17 is used to indicate reaching the unidirectional stream limit.¶
-A STREAMS_BLOCKED frame does not open the stream, but informs the peer that a -new stream was needed and the stream limit prevented the creation of the stream.¶
-STREAMS_BLOCKED frames are formatted as shown in Figure 38.¶
-STREAMS_BLOCKED frames contain the following field:¶
-A variable-length integer indicating the maximum number of streams allowed -at the time the frame was sent. This value cannot exceed 2^60, as it is -not possible to encode stream IDs larger than 2^62-1. Receipt of a frame -that encodes a larger stream ID MUST be treated as a STREAM_LIMIT_ERROR or a -FRAME_ENCODING_ERROR.¶
-An endpoint sends a NEW_CONNECTION_ID frame (type=0x18) to provide its peer with -alternative connection IDs that can be used to break linkability when migrating -connections; see Section 9.5.¶
-NEW_CONNECTION_ID frames are formatted as shown in Figure 39.¶
-NEW_CONNECTION_ID frames contain the following fields:¶
-The sequence number assigned to the connection ID by the sender, encoded as a -variable-length integer; see Section 5.1.1.¶
-A variable-length integer indicating which connection IDs should be retired; -see Section 5.1.2.¶
-An 8-bit unsigned integer containing the length of the connection ID. Values -less than 1 and greater than 20 are invalid and MUST be treated as a -connection error of type FRAME_ENCODING_ERROR.¶
-A connection ID of the specified length.¶
-A 128-bit value that will be used for a stateless reset when the associated -connection ID is used; see Section 10.3.¶
-An endpoint MUST NOT send this frame if it currently requires that its peer send -packets with a zero-length Destination Connection ID. Changing the length of a -connection ID to or from zero-length makes it difficult to identify when the -value of the connection ID changed. An endpoint that is sending packets with a -zero-length Destination Connection ID MUST treat receipt of a NEW_CONNECTION_ID -frame as a connection error of type PROTOCOL_VIOLATION.¶
-Transmission errors, timeouts and retransmissions might cause the same -NEW_CONNECTION_ID frame to be received multiple times. Receipt of the same -frame multiple times MUST NOT be treated as a connection error. A receiver can -use the sequence number supplied in the NEW_CONNECTION_ID frame to handle -receiving the same NEW_CONNECTION_ID frame multiple times.¶
-If an endpoint receives a NEW_CONNECTION_ID frame that repeats a previously -issued connection ID with a different Stateless Reset Token or a different -sequence number, or if a sequence number is used for different connection -IDs, the endpoint MAY treat that receipt as a connection error of type -PROTOCOL_VIOLATION.¶
-The Retire Prior To field applies to connection IDs established during -connection setup and the preferred_address transport parameter; see -Section 5.1.2. The Retire Prior To field MUST be less than or equal to the -Sequence Number field. Receiving a value greater than the Sequence Number MUST -be treated as a connection error of type FRAME_ENCODING_ERROR.¶
-Once a sender indicates a Retire Prior To value, smaller values sent in -subsequent NEW_CONNECTION_ID frames have no effect. A receiver MUST ignore any -Retire Prior To fields that do not increase the largest received Retire Prior To -value.¶
-An endpoint that receives a NEW_CONNECTION_ID frame with a sequence number -smaller than the Retire Prior To field of a previously received -NEW_CONNECTION_ID frame MUST send a corresponding RETIRE_CONNECTION_ID frame -that retires the newly received connection ID, unless it has already done so -for that sequence number.¶
-An endpoint sends a RETIRE_CONNECTION_ID frame (type=0x19) to indicate that it -will no longer use a connection ID that was issued by its peer. This includes -the connection ID provided during the handshake. Sending a RETIRE_CONNECTION_ID -frame also serves as a request to the peer to send additional connection IDs for -future use; see Section 5.1. New connection IDs can be delivered to a -peer using the NEW_CONNECTION_ID frame (Section 19.15).¶
-Retiring a connection ID invalidates the stateless reset token associated with -that connection ID.¶
-RETIRE_CONNECTION_ID frames are formatted as shown in -Figure 40.¶
-RETIRE_CONNECTION_ID frames contain the following field:¶
-The sequence number of the connection ID being retired; see Section 5.1.2.¶
-Receipt of a RETIRE_CONNECTION_ID frame containing a sequence number greater -than any previously sent to the peer MUST be treated as a connection error of -type PROTOCOL_VIOLATION.¶
-The sequence number specified in a RETIRE_CONNECTION_ID frame MUST NOT refer -to the Destination Connection ID field of the packet in which the frame is -contained. The peer MAY treat this as a connection error of type -PROTOCOL_VIOLATION.¶
-An endpoint cannot send this frame if it was provided with a zero-length -connection ID by its peer. An endpoint that provides a zero-length connection -ID MUST treat receipt of a RETIRE_CONNECTION_ID frame as a connection error of -type PROTOCOL_VIOLATION.¶
-Endpoints can use PATH_CHALLENGE frames (type=0x1a) to check reachability to the -peer and for path validation during connection migration.¶
-PATH_CHALLENGE frames are formatted as shown in Figure 41.¶
-PATH_CHALLENGE frames contain the following field:¶
-This 8-byte field contains arbitrary data.¶
-Including 64 bits of entropy in a PATH_CHALLENGE frame ensures that it is easier -to receive the packet than it is to guess the value correctly.¶
-The recipient of this frame MUST generate a PATH_RESPONSE frame -(Section 19.18) containing the same Data.¶
-A PATH_RESPONSE frame (type=0x1b) is sent in response to a PATH_CHALLENGE frame.¶
-PATH_RESPONSE frames are formatted as shown in Figure 42, which is -identical to the PATH_CHALLENGE frame (Section 19.17).¶
-If the content of a PATH_RESPONSE frame does not match the content of a -PATH_CHALLENGE frame previously sent by the endpoint, the endpoint MAY generate -a connection error of type PROTOCOL_VIOLATION.¶
-An endpoint sends a CONNECTION_CLOSE frame (type=0x1c or 0x1d) to notify its -peer that the connection is being closed. The CONNECTION_CLOSE with a frame -type of 0x1c is used to signal errors at only the QUIC layer, or the absence of -errors (with the NO_ERROR code). The CONNECTION_CLOSE frame with a type of 0x1d -is used to signal an error with the application that uses QUIC.¶
-If there are open streams that have not been explicitly closed, they are -implicitly closed when the connection is closed.¶
-CONNECTION_CLOSE frames are formatted as shown in Figure 43.¶
-CONNECTION_CLOSE frames contain the following fields:¶
-A variable-length integer error code that indicates the reason for -closing this connection. A CONNECTION_CLOSE frame of type 0x1c uses codes -from the space defined in Section 20.1. A CONNECTION_CLOSE frame -of type 0x1d uses codes from the application protocol error code space; see -Section 20.2.¶
-A variable-length integer encoding the type of frame that triggered the error. -A value of 0 (equivalent to the mention of the PADDING frame) is used when the -frame type is unknown. The application-specific variant of CONNECTION_CLOSE -(type 0x1d) does not include this field.¶
-A variable-length integer specifying the length of the reason phrase in bytes. -Because a CONNECTION_CLOSE frame cannot be split between packets, any limits -on packet size will also limit the space available for a reason phrase.¶
-A human-readable explanation for why the connection was closed. This can be -zero length if the sender chooses not to give details beyond the Error Code. -This SHOULD be a UTF-8 encoded string [RFC3629].¶
-The application-specific variant of CONNECTION_CLOSE (type 0x1d) can only be -sent using 0-RTT or 1-RTT packets; see Section 12.5. When an -application wishes to abandon a connection during the handshake, an endpoint -can send a CONNECTION_CLOSE frame (type 0x1c) with an error code of -APPLICATION_ERROR in an Initial or a Handshake packet.¶
-The server uses a HANDSHAKE_DONE frame (type=0x1e) to signal confirmation of -the handshake to the client.¶
-HANDSHAKE_DONE frames are formatted as shown in Figure 44, which -shows that HANDSHAKE_DONE frames have no content.¶
-A HANDSHAKE_DONE frame can only be sent by the server. Servers MUST NOT send a -HANDSHAKE_DONE frame before completing the handshake. A server MUST treat -receipt of a HANDSHAKE_DONE frame as a connection error of type -PROTOCOL_VIOLATION.¶
-QUIC frames do not use a self-describing encoding. An endpoint therefore needs -to understand the syntax of all frames before it can successfully process a -packet. This allows for efficient encoding of frames, but it means that an -endpoint cannot send a frame of a type that is unknown to its peer.¶
-An extension to QUIC that wishes to use a new type of frame MUST first ensure -that a peer is able to understand the frame. An endpoint can use a transport -parameter to signal its willingness to receive extension frame types. One -transport parameter can indicate support for one or more extension frame types.¶
-Extensions that modify or replace core protocol functionality (including frame -types) will be difficult to combine with other extensions that modify or -replace the same functionality unless the behavior of the combination is -explicitly defined. Such extensions SHOULD define their interaction with -previously-defined extensions modifying the same protocol components.¶
-Extension frames MUST be congestion controlled and MUST cause an ACK frame to -be sent. The exception is extension frames that replace or supplement the ACK -frame. Extension frames are not included in flow control unless specified -in the extension.¶
-An IANA registry is used to manage the assignment of frame types; see -Section 22.3.¶
-QUIC transport error codes and application error codes are 62-bit unsigned -integers.¶
-This section lists the defined QUIC transport error codes that can be used in a -CONNECTION_CLOSE frame with a type of 0x1c. These errors apply to the entire -connection.¶
-An endpoint uses this with CONNECTION_CLOSE to signal that the connection is -being closed abruptly in the absence of any error.¶
-The endpoint encountered an internal error and cannot continue with the -connection.¶
-The server refused to accept a new connection.¶
-An endpoint received more data than it permitted in its advertised data -limits; see Section 4.¶
-An endpoint received a frame for a stream identifier that exceeded its -advertised stream limit for the corresponding stream type.¶
-An endpoint received a frame for a stream that was not in a state that -permitted that frame; see Section 3.¶
-An endpoint received a STREAM frame containing data that exceeded the -previously established final size. Or an endpoint received a STREAM frame or -a RESET_STREAM frame containing a final size that was lower than the size of -stream data that was already received. Or an endpoint received a STREAM frame -or a RESET_STREAM frame containing a different final size to the one already -established.¶
-An endpoint received a frame that was badly formatted. For instance, a frame -of an unknown type, or an ACK frame that has more acknowledgment ranges than -the remainder of the packet could carry.¶
-An endpoint received transport parameters that were badly formatted, included -an invalid value, was absent even though it is mandatory, was present though -it is forbidden, or is otherwise in error.¶
-The number of connection IDs provided by the peer exceeds the advertised -active_connection_id_limit.¶
-An endpoint detected an error with protocol compliance that was not covered by -more specific error codes.¶
-A server received a client Initial that contained an invalid Token field.¶
-The application or application protocol caused the connection to be closed.¶
-An endpoint has received more data in CRYPTO frames than it can buffer.¶
-An endpoint detected errors in performing key updates; see Section 6 of -[QUIC-TLS].¶
-An endpoint has reached the confidentiality or integrity limit for the AEAD -algorithm used by the given connection.¶
-The cryptographic handshake failed. A range of 256 values is reserved for -carrying error codes specific to the cryptographic handshake that is used. -Codes for errors occurring when TLS is used for the crypto handshake are -described in Section 4.8 of [QUIC-TLS].¶
-See Section 22.4 for details of registering new error codes.¶
-In defining these error codes, several principles are applied. Error conditions -that might require specific action on the part of a recipient are given unique -codes. Errors that represent common conditions are given specific codes. -Absent either of these conditions, error codes are used to identify a general -function of the stack, like flow control or transport parameter handling. -Finally, generic errors are provided for conditions where implementations are -unable or unwilling to use more specific codes.¶
-The management of application error codes is left to application protocols. -Application protocol error codes are used for the RESET_STREAM frame -(Section 19.4), the STOP_SENDING frame (Section 19.5), and -the CONNECTION_CLOSE frame with a type of 0x1d (Section 19.19).¶
-The goal of QUIC is to provide a secure transport connection. -Section 21.1 provides an overview of those properties; subsequent -sections discuss constraints and caveats regarding these properties, including -descriptions of known attacks and countermeasures.¶
-A complete security analysis of QUIC is outside the scope of this document. -This section provides an informal description of the desired security properties -as an aid to implementors and to help guide protocol analysis.¶
-QUIC assumes the threat model described in [SEC-CONS] and provides -protections against many of the attacks that arise from that model.¶
-For this purpose, attacks are divided into passive and active attacks. Passive -attackers have the capability to read packets from the network, while active -attackers also have the capability to write packets into the network. However, -a passive attack could involve an attacker with the ability to cause a routing -change or other modification in the path taken by packets that comprise a -connection.¶
-Attackers are additionally categorized as either on-path attackers or off-path -attackers; see Section 3.5 of [SEC-CONS]. An on-path attacker can read, -modify, or remove any packet it observes such that it no longer reaches its -destination, while an off-path attacker observes the packets, but cannot prevent -the original packet from reaching its intended destination. Both types of -attackers can also transmit arbitrary packets.¶
-Properties of the handshake, protected packets, and connection migration are -considered separately.¶
-The QUIC handshake incorporates the TLS 1.3 handshake and inherits the -cryptographic properties described in Appendix E.1 of [TLS13]. Many -of the security properties of QUIC depend on the TLS handshake providing these -properties. Any attack on the TLS handshake could affect QUIC.¶
-Any attack on the TLS handshake that compromises the secrecy or uniqueness -of session keys affects other security guarantees provided by QUIC that depends -on these keys. For instance, migration (Section 9) depends on the efficacy -of confidentiality protections, both for the negotiation of keys using the TLS -handshake and for QUIC packet protection, to avoid linkability across network -paths.¶
-An attack on the integrity of the TLS handshake might allow an attacker to -affect the selection of application protocol or QUIC version.¶
-In addition to the properties provided by TLS, the QUIC handshake provides some -defense against DoS attacks on the handshake.¶
-Address validation (Section 8) is used to verify that an entity -that claims a given address is able to receive packets at that address. Address -validation limits amplification attack targets to addresses for which an -attacker can observe packets.¶
-Prior to validation, endpoints are limited in what they are able to send. -During the handshake, a server cannot send more than three times the data it -receives; clients that initiate new connections or migrate to a new network -path are limited.¶
-Computing the server's first flight for a full handshake is potentially -expensive, requiring both a signature and a key exchange computation. In order -to prevent computational DoS attacks, the Retry packet provides a cheap token -exchange mechanism that allows servers to validate a client's IP address prior -to doing any expensive computations at the cost of a single round trip. After a -successful handshake, servers can issue new tokens to a client, which will allow -new connection establishment without incurring this cost.¶
-An on-path or off-path attacker can force a handshake to fail by replacing or -racing Initial packets. Once valid Initial packets have been exchanged, -subsequent Handshake packets are protected with the handshake keys and an -on-path attacker cannot force handshake failure other than by dropping packets -to cause endpoints to abandon the attempt.¶
-An on-path attacker can also replace the addresses of packets on either side and -therefore cause the client or server to have an incorrect view of the remote -addresses. Such an attack is indistinguishable from the functions performed by a -NAT.¶
-The entire handshake is cryptographically protected, with the Initial packets -being encrypted with per-version keys and the Handshake and later packets being -encrypted with keys derived from the TLS key exchange. Further, parameter -negotiation is folded into the TLS transcript and thus provides the same -integrity guarantees as ordinary TLS negotiation. An attacker can observe -the client's transport parameters (as long as it knows the version-specific -salt) but cannot observe the server's transport parameters and cannot influence -parameter negotiation.¶
-Connection IDs are unencrypted but integrity protected in all packets.¶
-This version of QUIC does not incorporate a version negotiation mechanism; -implementations of incompatible versions will simply fail to establish a -connection.¶
-Packet protection (Section 12.1) provides authentication and encryption -of all packets except Version Negotiation packets, though Initial and Retry -packets have limited encryption and authentication based on version-specific -inputs; see [QUIC-TLS] for more details. This section considers passive and -active attacks against protected packets.¶
-Both on-path and off-path attackers can mount a passive attack in which they -save observed packets for an offline attack against packet protection at a -future time; this is true for any observer of any packet on any network.¶
-A blind attacker, one who injects packets without being able to observe valid -packets for a connection, is unlikely to be successful, since packet protection -ensures that valid packets are only generated by endpoints that possess the -key material established during the handshake; see Section 7 and -Section 21.1.1. Similarly, any active attacker that observes packets -and attempts to insert new data or modify existing data in those packets should -not be able to generate packets deemed valid by the receiving endpoint.¶
-A spoofing attack, in which an active attacker rewrites unprotected parts of a -packet that it forwards or injects, such as the source or destination -address, is only effective if the attacker can forward packets to the original -endpoint. Packet protection ensures that the packet payloads can only be -processed by the endpoints that completed the handshake, and invalid -packets are ignored by those endpoints.¶
-An attacker can also modify the boundaries between packets and UDP datagrams, -causing multiple packets to be coalesced into a single datagram, or splitting -coalesced packets into multiple datagrams. Aside from datagrams containing -Initial packets, which require padding, modification of how packets are -arranged in datagrams has no functional effect on a connection, although it -might change some performance characteristics.¶
-Connection Migration (Section 9) provides endpoints with the ability to -transition between IP addresses and ports on multiple paths, using one path at a -time for transmission and receipt of non-probing frames. Path validation -(Section 8.2) establishes that a peer is both willing and able -to receive packets sent on a particular path. This helps reduce the effects of -address spoofing by limiting the number of packets sent to a spoofed address.¶
-This section describes the intended security properties of connection migration -when under various types of DoS attacks.¶
-An attacker that can cause a packet it observes to no longer reach its intended -destination is considered an on-path attacker. When an attacker is present -between a client and server, endpoints are required to send packets through the -attacker to establish connectivity on a given path.¶
-An on-path attacker can:¶
-An on-path attacker cannot:¶
-An on-path attacker has the opportunity to modify the packets that it observes, -however any modifications to an authenticated portion of a packet will cause it -to be dropped by the receiving endpoint as invalid, as packet payloads are both -authenticated and encrypted.¶
-In the presence of an on-path attacker, QUIC aims to provide the following -properties:¶
-An off-path attacker is not directly on the path between a client and server, -but could be able to obtain copies of some or all packets sent between the -client and the server. It is also able to send copies of those packets to -either endpoint.¶
-An off-path attacker can:¶
- -An off-path attacker cannot:¶
-An off-path attacker can modify packets that it has observed and inject them -back into the network, potentially with spoofed source and destination -addresses.¶
-For the purposes of this discussion, it is assumed that an off-path attacker -has the ability to observe, modify, and re-inject a packet into the network -that will reach the destination endpoint prior to the arrival of the original -packet observed by the attacker. In other words, an attacker has the ability to -consistently "win" a race with the legitimate packets between the endpoints, -potentially causing the original packet to be ignored by the recipient.¶
-It is also assumed that an attacker has the resources necessary to affect NAT -state, potentially both causing an endpoint to lose its NAT binding, and an -attacker to obtain the same port for use with its traffic.¶
-In the presence of an off-path attacker, QUIC aims to provide the following -properties:¶
-A limited on-path attacker is an off-path attacker that has offered improved -routing of packets by duplicating and forwarding original packets between the -server and the client, causing those packets to arrive before the original -copies such that the original packets are dropped by the destination endpoint.¶
-A limited on-path attacker differs from an on-path attacker in that it is not on -the original path between endpoints, and therefore the original packets sent by -an endpoint are still reaching their destination. This means that a future -failure to route copied packets to the destination faster than their original -path will not prevent the original packets from reaching the destination.¶
-A limited on-path attacker can:¶
-A limited on-path attacker cannot:¶
-A limited on-path attacker can only delay packets up to the point that the -original packets arrive before the duplicate packets, meaning that it cannot -offer routing with worse latency than the original path. If a limited on-path -attacker drops packets, the original copy will still arrive at the destination -endpoint.¶
-In the presence of a limited on-path attacker, QUIC aims to provide the -following properties:¶
-Note that these guarantees are the same guarantees provided for any NAT, for the -same reasons.¶
-As an encrypted and authenticated transport QUIC provides a range of protections -against denial of service. Once the cryptographic handshake is complete, QUIC -endpoints discard most packets that are not authenticated, greatly limiting the -ability of an attacker to interfere with existing connections.¶
-Once a connection is established QUIC endpoints might accept some -unauthenticated ICMP packets (see Section 14.2.1), but the use of these packets -is extremely limited. The only other type of packet that an endpoint might -accept is a stateless reset (Section 10.3), which relies on the token -being kept secret until it is used.¶
-During the creation of a connection, QUIC only provides protection against -attack from off the network path. All QUIC packets contain proof that the -recipient saw a preceding packet from its peer.¶
-Addresses cannot change during the handshake, so endpoints can discard packets -that are received on a different network path.¶
-The Source and Destination Connection ID fields are the primary means of -protection against off-path attack during the handshake. These are required to -match those set by a peer. Except for an Initial and stateless reset packets, -an endpoint only accepts packets that include a Destination Connection ID field -that matches a value the endpoint previously chose. This is the only protection -offered for Version Negotiation packets.¶
-The Destination Connection ID field in an Initial packet is selected by a client -to be unpredictable, which serves an additional purpose. The packets that carry -the cryptographic handshake are protected with a key that is derived from this -connection ID and salt specific to the QUIC version. This allows endpoints to -use the same process for authenticating packets that they receive as they use -after the cryptographic handshake completes. Packets that cannot be -authenticated are discarded. Protecting packets in this fashion provides a -strong assurance that the sender of the packet saw the Initial packet and -understood it.¶
-These protections are not intended to be effective against an attacker that is -able to receive QUIC packets prior to the connection being established. Such an -attacker can potentially send packets that will be accepted by QUIC endpoints. -This version of QUIC attempts to detect this sort of attack, but it expects that -endpoints will fail to establish a connection rather than recovering. For the -most part, the cryptographic handshake protocol [QUIC-TLS] is responsible for -detecting tampering during the handshake.¶
-Endpoints are permitted to use other methods to detect and attempt to recover -from interference with the handshake. Invalid packets can be identified and -discarded using other methods, but no specific method is mandated in this -document.¶
-An attacker might be able to receive an address validation token -(Section 8) from a server and then release the IP address it used -to acquire that token. At a later time, the attacker can initiate a 0-RTT -connection with a server by spoofing this same address, which might now address -a different (victim) endpoint. The attacker can thus potentially cause the -server to send an initial congestion window's worth of data towards the victim.¶
-Servers SHOULD provide mitigations for this attack by limiting the usage and -lifetime of address validation tokens; see Section 8.1.3.¶
-An endpoint that acknowledges packets it has not received might cause a -congestion controller to permit sending at rates beyond what the network -supports. An endpoint MAY skip packet numbers when sending packets to detect -this behavior. An endpoint can then immediately close the connection with a -connection error of type PROTOCOL_VIOLATION; see Section 10.2.¶
-A request forgery attack occurs where an endpoint causes its peer to issue a -request towards a victim, with the request controlled by the endpoint. Request -forgery attacks aim to provide an attacker with access to capabilities of its -peer that might otherwise be unavailable to the attacker. For a networking -protocol, a request forgery attack is often used to exploit any implicit -authorization conferred on the peer by the victim due to the peer's location in -the network.¶
-For request forgery to be effective, an attacker needs to be able to influence -what packets the peer sends and where these packets are sent. If an attacker -can target a vulnerable service with a controlled payload, that service might -perform actions that are attributed to the attacker's peer, but decided by the -attacker.¶
-For example, cross-site request forgery [CSRF] -exploits on the Web cause a client to issue requests that include authorization -cookies [COOKIE], allowing one site access to information and -actions that are intended to be restricted to a different site.¶
-As QUIC runs over UDP, the primary attack modality of concern is one where an -attacker can select the address to which its peer sends UDP datagrams and can -control some of the unprotected content of those packets. As much of the data -sent by QUIC endpoints is protected, this includes control over ciphertext. An -attack is successful if an attacker can cause a peer to send a UDP datagram to -a host that will perform some action based on content in the datagram.¶
-This section discusses ways in which QUIC might be used for request forgery -attacks.¶
-This section also describes limited countermeasures that can be implemented by -QUIC endpoints. These mitigations can be employed unilaterally by a QUIC -implementation or deployment, without potential targets for request forgery -attacks taking action. However these countermeasures could be insufficient if -UDP-based services do not properly authorize requests.¶
-Because the migration attack described in -Section 21.5.4 is quite powerful and does not have -adequate countermeasures, QUIC server implementations should assume that -attackers can cause them to generate arbitrary UDP payloads to arbitrary -destinations. QUIC servers SHOULD NOT be deployed in networks that also have -inadequately secured UDP endpoints.¶
-Although it is not generally possible to ensure that clients are not co-located -with vulnerable endpoints, this version of QUIC does not allow servers to -migrate, thus preventing spoofed migration attacks on clients. Any future -extension which allows server migration MUST also define countermeasures for -forgery attacks.¶
-QUIC offers some opportunities for an attacker to influence or control where -its peer sends UDP datagrams:¶
-In all three cases, the attacker can cause its peer to send datagrams to a -victim that might not understand QUIC. That is, these packets are sent by -the peer prior to address validation; see Section 8.¶
-Outside of the encrypted portion of packets, QUIC offers an endpoint several -options for controlling the content of UDP datagrams that its peer sends. The -Destination Connection ID field offers direct control over bytes that appear -early in packets sent by the peer; see Section 5.1. The Token field in -Initial packets offers a server control over other bytes of Initial packets; -see Section 17.2.2.¶
-There are no measures in this version of QUIC to prevent indirect control over -the encrypted portions of packets. It is necessary to assume that endpoints are -able to control the contents of frames that a peer sends, especially those -frames that convey application data, such as STREAM frames. Though this depends -to some degree on details of the application protocol, some control is possible -in many protocol usage contexts. As the attacker has access to packet -protection keys, they are likely to be capable of predicting how a peer will -encrypt future packets. Successful control over datagram content then only -requires that the attacker be able to predict the packet number and placement -of frames in packets with some amount of reliability.¶
-This section assumes that limiting control over datagram content is not -feasible. The focus of the mitigations in subsequent sections is on limiting -the ways in which datagrams that are sent prior to address validation can be -used for request forgery.¶
-An attacker acting as a server can choose the IP address and port on which it -advertises its availability, so Initial packets from clients are assumed to be -available for use in this sort of attack. The address validation implicit in -the handshake ensures that - for a new connection - a client will not send -other types of packet to a destination that does not understand QUIC or is not -willing to accept a QUIC connection.¶
-Initial packet protection (Section 5.2 of [QUIC-TLS]) makes it difficult for -servers to control the content of Initial packets sent by clients. A client -choosing an unpredictable Destination Connection ID ensures that servers are -unable to control any of the encrypted portion of Initial packets from clients.¶
-However, the Token field is open to server control and does allow a server to -use clients to mount request forgery attacks. Use of tokens provided with the -NEW_TOKEN frame (Section 8.1.3) offers the only option for request -forgery during connection establishment.¶
-Clients however are not obligated to use the NEW_TOKEN frame. Request forgery -attacks that rely on the Token field can be avoided if clients send an empty -Token field when the server address has changed from when the NEW_TOKEN frame -was received.¶
-Clients could avoid using NEW_TOKEN if the server address changes. However, not -including a Token field could adversely affect performance. Servers could rely -on NEW_TOKEN to enable sending of data in excess of the three times limit on -sending data; see Section 8.1. In particular, this affects cases -where clients use 0-RTT to request data from servers.¶
-Sending a Retry packet (Section 17.2.5) offers a server the option to change -the Token field. After sending a Retry, the server can also control the -Destination Connection ID field of subsequent Initial packets from the client. -This also might allow indirect control over the encrypted content of Initial -packets. However, the exchange of a Retry packet validates the server's -address, thereby preventing the use of subsequent Initial packets for request -forgery.¶
-Servers can specify a preferred address, which clients then migrate to after -confirming the handshake; see Section 9.6. The Destination Connection -ID field of packets that the client sends to a preferred address can be used -for request forgery.¶
-A client MUST NOT send non-probing frames to a preferred address prior to -validating that address; see Section 8. This greatly reduces the -options that a server has to control the encrypted portion of datagrams.¶
-This document does not offer any additional countermeasures that are specific -to use of preferred addresses and can be implemented by endpoints. The generic -measures described in Section 21.5.5 could be used as further mitigation.¶
-Clients are able to present a spoofed source address as part of an apparent -connection migration to cause a server to send datagrams to that address.¶
-The Destination Connection ID field in any packets that a server subsequently -sends to this spoofed address can be used for request forgery. A client might -also be able to influence the ciphertext.¶
-A server that only sends probing packets (Section 9.1) to an address prior to -address validation provides an attacker with only limited control over the -encrypted portion of datagrams. However, particularly for NAT rebinding, this -can adversely affect performance. If the server sends frames carrying -application data, an attacker might be able to control most of the content of -datagrams.¶
-This document does not offer specific countermeasures that can be implemented -by endpoints aside from the generic measures described in Section 21.5.5. -However, countermeasures for address spoofing at the network level, in -particular ingress filtering [BCP38], are especially effective -against attacks that use spoofing and originate from an external network.¶
-The most effective defense against request forgery attacks is to modify -vulnerable services to use strong authentication. However, this is not always -something that is within the control of a QUIC deployment. This section -outlines some others steps that QUIC endpoints could take unilaterally. These -additional steps are all discretionary as, depending on circumstances, they -could interfere with or prevent legitimate uses.¶
-Services offered over loopback interfaces often lack proper authentication. -Endpoints MAY prevent connection attempts or migration to a loopback address. -Endpoints SHOULD NOT allow connections or migration to a loopback address if the -same service was previously available at a different interface or if the address -was provided by a service at a non-loopback address. Endpoints that depend on -these capabilities could offer an option to disable these protections.¶
-Similarly, endpoints could regard a change in address to link-local address -[RFC4291] or an address in a private use range [RFC1918] from a global, -unique-local [RFC4193], or non-private address as a potential attempt at -request forgery. Endpoints could refuse to use these addresses entirely, but -that carries a significant risk of interfering with legitimate uses. Endpoints -SHOULD NOT refuse to use an address unless they have specific knowledge about -the network indicating that sending datagrams to unvalidated addresses in a -given range is not safe.¶
-Endpoints MAY choose to reduce the risk of request forgery by not including -values from NEW_TOKEN frames in Initial packets or by only sending probing -frames in packets prior to completing address validation. Note that this does -not prevent an attacker from using the Destination Connection ID field for an -attack.¶
-Endpoints are not expected to have specific information about the location of -servers that could be vulnerable targets of a request forgery attack. However, -it might be possible over time to identify specific UDP ports that are common -targets of attacks or particular patterns in datagrams that are used for -attacks. Endpoints MAY choose to avoid sending datagrams to these ports or not -send datagrams that match these patterns prior to validating the destination -address. Endpoints MAY retire connection IDs containing patterns known to be -problematic without using them.¶
-Modifying endpoints to apply these protections is more efficient than -deploying network-based protections, as endpoints do not need to perform -any additional processing when sending to an address that has been validated.¶
-The attacks commonly known as Slowloris ([SLOWLORIS]) try to keep many -connections to the target endpoint open and hold them open as long as possible. -These attacks can be executed against a QUIC endpoint by generating the minimum -amount of activity necessary to avoid being closed for inactivity. This might -involve sending small amounts of data, gradually opening flow control windows in -order to control the sender rate, or manufacturing ACK frames that simulate a -high loss rate.¶
-QUIC deployments SHOULD provide mitigations for the Slowloris attacks, such as -increasing the maximum number of clients the server will allow, limiting the -number of connections a single IP address is allowed to make, imposing -restrictions on the minimum transfer speed a connection is allowed to have, and -restricting the length of time an endpoint is allowed to stay connected.¶
-An adversarial sender might intentionally not send portions of the stream data, -causing the receiver to commit resources for the unsent data. This could -cause a disproportionate receive buffer memory commitment and/or the creation of -a large and inefficient data structure at the receiver.¶
-An adversarial receiver might intentionally not acknowledge packets containing -stream data in an attempt to force the sender to store the unacknowledged stream -data for retransmission.¶
-The attack on receivers is mitigated if flow control windows correspond to -available memory. However, some receivers will over-commit memory and -advertise flow control offsets in the aggregate that exceed actual available -memory. The over-commitment strategy can lead to better performance when -endpoints are well behaved, but renders endpoints vulnerable to the stream -fragmentation attack.¶
-QUIC deployments SHOULD provide mitigations against stream fragmentation -attacks. Mitigations could consist of avoiding over-committing memory, -limiting the size of tracking data structures, delaying reassembly -of STREAM frames, implementing heuristics based on the age and -duration of reassembly holes, or some combination.¶
-An adversarial endpoint can open a large number of streams, exhausting state on -an endpoint. The adversarial endpoint could repeat the process on a large -number of connections, in a manner similar to SYN flooding attacks in TCP.¶
-Normally, clients will open streams sequentially, as explained in Section 2.1. -However, when several streams are initiated at short intervals, loss or -reordering can cause STREAM frames that open streams to be received out of -sequence. On receiving a higher-numbered stream ID, a receiver is required to -open all intervening streams of the same type; see Section 3.2. -Thus, on a new connection, opening stream 4000000 opens 1 million and 1 -client-initiated bidirectional streams.¶
-The number of active streams is limited by the initial_max_streams_bidi and -initial_max_streams_uni transport parameters, as explained in -Section 4.6. If chosen judiciously, these limits mitigate the -effect of the stream commitment attack. However, setting the limit too low -could affect performance when applications expect to open large number of -streams.¶
-QUIC and TLS both contain frames or messages that have legitimate uses in some -contexts, but that can be abused to cause a peer to expend processing resources -without having any observable impact on the state of the connection.¶
-Messages can also be used to change and revert state in small or inconsequential -ways, such as by sending small increments to flow control limits.¶
-If processing costs are disproportionately large in comparison to bandwidth -consumption or effect on state, then this could allow a malicious peer to -exhaust processing capacity.¶
-While there are legitimate uses for all messages, implementations SHOULD track -cost of processing relative to progress and treat excessive quantities of any -non-productive packets as indicative of an attack. Endpoints MAY respond to -this condition with a connection error, or by dropping packets.¶
-An on-path attacker could manipulate the value of ECN fields in the IP header -to influence the sender's rate. [RFC3168] discusses manipulations and their -effects in more detail.¶
-An on-the-side attacker can duplicate and send packets with modified ECN fields -to affect the sender's rate. If duplicate packets are discarded by a receiver, -an off-path attacker will need to race the duplicate packet against the -original to be successful in this attack. Therefore, QUIC endpoints ignore the -ECN field on an IP packet unless at least one QUIC packet in that IP packet is -successfully processed; see Section 13.4.¶
-Stateless resets create a possible denial of service attack analogous to a TCP -reset injection. This attack is possible if an attacker is able to cause a -stateless reset token to be generated for a connection with a selected -connection ID. An attacker that can cause this token to be generated can reset -an active connection with the same connection ID.¶
-If a packet can be routed to different instances that share a static key, for -example by changing an IP address or port, then an attacker can cause the server -to send a stateless reset. To defend against this style of denial of service, -endpoints that share a static key for stateless reset (see Section 10.3.2) MUST -be arranged so that packets with a given connection ID always arrive at an -instance that has connection state, unless that connection is no longer active.¶
-More generally, servers MUST NOT generate a stateless reset if a connection with -the corresponding connection ID could be active on any endpoint using the same -static key.¶
-In the case of a cluster that uses dynamic load balancing, it is possible that a -change in load balancer configuration could occur while an active instance -retains connection state. Even if an instance retains connection state, the -change in routing and resulting stateless reset will result in the connection -being terminated. If there is no chance of the packet being routed to the -correct instance, it is better to send a stateless reset than wait for the -connection to time out. However, this is acceptable only if the routing cannot -be influenced by an attacker.¶
-This document defines QUIC Version Negotiation packets in -Section 6 that can be used to negotiate the QUIC version used -between two endpoints. However, this document does not specify how this -negotiation will be performed between this version and subsequent future -versions. In particular, Version Negotiation packets do not contain any -mechanism to prevent version downgrade attacks. Future versions of QUIC that -use Version Negotiation packets MUST define a mechanism that is robust against -version downgrade attacks.¶
-Deployments should limit the ability of an attacker to target a new connection -to a particular server instance. This means that client-controlled fields, such -as the initial Destination Connection ID used on Initial and 0-RTT packets -SHOULD NOT be used by themselves to make routing decisions. Ideally, routing -decisions are made independently of client-selected values; a Source Connection -ID can be selected to route later packets to the same server.¶
-The length of QUIC packets can reveal information about the length of the -content of those packets. The PADDING frame is provided so that endpoints have -some ability to obscure the length of packet content; see Section 19.1.¶
-Note however that defeating traffic analysis is challenging and the subject of -active research. Length is not the only way that information might leak. -Endpoints might also reveal sensitive information through other side channels, -such as the timing of packets.¶
-This document establishes several registries for the management of codepoints in -QUIC. These registries operate on a common set of policies as defined in -Section 22.1.¶
-All QUIC registries allow for both provisional and permanent registration of -codepoints. This section documents policies that are common to these -registries.¶
-Provisional registration of codepoints are intended to allow for private use and -experimentation with extensions to QUIC. Provisional registrations only require -the inclusion of the codepoint value and contact information. However, -provisional registrations could be reclaimed and reassigned for another purpose.¶
-Provisional registrations require Expert Review, as defined in Section 4.5 of -[RFC8126]. Designated expert(s) are advised that only registrations for an -excessive proportion of remaining codepoint space or the very first unassigned -value (see Section 22.1.2) can be rejected.¶
-Provisional registrations will include a date field that indicates when the -registration was last updated. A request to update the date on any provisional -registration can be made without review from the designated expert(s).¶
-All QUIC registries include the following fields to support provisional -registration:¶
-The assigned codepoint.¶
-"Permanent" or "Provisional".¶
-A reference to a publicly available specification for the value.¶
-The date of last update to the registration.¶
-The entity that is responsible for the definition of the registration.¶
-Contact details for the registrant.¶
-Supplementary notes about the registration.¶
-Provisional registrations MAY omit the Specification and Notes fields, plus any -additional fields that might be required for a permanent registration. The Date -field is not required as part of requesting a registration as it is set to the -date the registration is created or updated.¶
-New uses of codepoints from QUIC registries SHOULD use a randomly selected -codepoint that excludes both existing allocations and the first unallocated -codepoint in the selected space. Requests for multiple codepoints MAY use a -contiguous range. This minimizes the risk that differing semantics are -attributed to the same codepoint by different implementations.¶
-Use of the first available codepoint in a range is reserved for allocation using -the Standards Action policy; see Section 4.9 of [RFC8126]. The early -codepoint assignment process [EARLY-ASSIGN] can be used for these -values.¶
-For codepoints that are encoded in variable-length integers -(Section 16), such as frame types, codepoints that encode to four or -eight bytes (that is, values 2^14 and above) SHOULD be used unless the usage is -especially sensitive to having a longer encoding.¶
-Applications to register codepoints in QUIC registries MAY include a codepoint -as part of the registration. IANA MUST allocate the selected codepoint if the -codepoint is unassigned and the requirements of the registration policy are met.¶
-A request might be made to remove an unused provisional registration from the -registry to reclaim space in a registry, or portion of the registry (such as the -64-16383 range for codepoints that use variable-length encodings). This SHOULD -be done only for the codepoints with the earliest recorded date and entries that -have been updated less than a year prior SHOULD NOT be reclaimed.¶
-A request to remove a codepoint MUST be reviewed by the designated expert(s). -The expert(s) MUST attempt to determine whether the codepoint is still in use. -Experts are advised to contact the listed contacts for the registration, plus as -wide a set of protocol implementers as possible in order to determine whether -any use of the codepoint is known. The expert(s) are advised to allow at least -four weeks for responses.¶
-If any use of the codepoints is identified by this search or a request to update -the registration is made, the codepoint MUST NOT be reclaimed. Instead, the -date on the registration is updated. A note might be added for the registration -recording relevant information that was learned.¶
-If no use of the codepoint was identified and no request was made to update the -registration, the codepoint MAY be removed from the registry.¶
-This process also applies to requests to change a provisional registration into -a permanent registration, except that the goal is not to determine whether there -is no use of the codepoint, but to determine that the registration is an -accurate representation of any deployed usage.¶
-Permanent registrations in QUIC registries use the Specification Required policy -([RFC8126]), unless otherwise specified. The designated expert(s) verify -that a specification exists and is readily accessible. Expert(s) are encouraged -to be biased towards approving registrations unless they are abusive, frivolous, -or actively harmful (not merely aesthetically displeasing, or architecturally -dubious). The creation of a registry MAY specify additional constraints on -permanent registrations.¶
-The creation of a registry MAY identify a range of codepoints where -registrations are governed by a different registration policy. For instance, -the registries for 62-bit codepoints in this document have stricter policies for -codepoints in the range from 0 to 63.¶
-Any stricter requirements for permanent registrations do not prevent provisional -registrations for affected codepoints. For instance, a provisional registration -for a frame type (Section 22.3) of 61 could be requested.¶
-All registrations made by Standards Track publications MUST be permanent.¶
-All registrations in this document are assigned a permanent status and list a -change controller of the IETF and a contact of the QUIC working group -(quic@ietf.org).¶
-IANA [SHALL add/has added] a registry for "QUIC Transport Parameters" under a -"QUIC" heading.¶
-The "QUIC Transport Parameters" registry governs a 62-bit space. This registry -follows the registration policy from Section 22.1. Permanent registrations -in this registry are assigned using the Specification Required policy -([RFC8126]).¶
-In addition to the fields in Section 22.1.1, permanent registrations in -this registry MUST include the following field:¶
-A short mnemonic for the parameter.¶
-The initial contents of this registry are shown in Table 6.¶
-Value | -Parameter Name | -Specification | -
---|---|---|
0x00 | -original_destination_connection_id | -- Section 18.2 - | -
0x01 | -max_idle_timeout | -- Section 18.2 - | -
0x02 | -stateless_reset_token | -- Section 18.2 - | -
0x03 | -max_udp_payload_size | -- Section 18.2 - | -
0x04 | -initial_max_data | -- Section 18.2 - | -
0x05 | -initial_max_stream_data_bidi_local | -- Section 18.2 - | -
0x06 | -initial_max_stream_data_bidi_remote | -- Section 18.2 - | -
0x07 | -initial_max_stream_data_uni | -- Section 18.2 - | -
0x08 | -initial_max_streams_bidi | -- Section 18.2 - | -
0x09 | -initial_max_streams_uni | -- Section 18.2 - | -
0x0a | -ack_delay_exponent | -- Section 18.2 - | -
0x0b | -max_ack_delay | -- Section 18.2 - | -
0x0c | -disable_active_migration | -- Section 18.2 - | -
0x0d | -preferred_address | -- Section 18.2 - | -
0x0e | -active_connection_id_limit | -- Section 18.2 - | -
0x0f | -initial_source_connection_id | -- Section 18.2 - | -
0x10 | -retry_source_connection_id | -- Section 18.2 - | -
Additionally, each value of the format 31 * N + 27
for integer values of N
-(that is, 27, 58, 89, ...) are reserved and MUST NOT be assigned by IANA.¶
IANA [SHALL add/has added] a registry for "QUIC Frame Types" under a -"QUIC" heading.¶
-The "QUIC Frame Types" registry governs a 62-bit space. This registry follows -the registration policy from Section 22.1. Permanent registrations in this -registry are assigned using the Specification Required policy ([RFC8126]), -except for values between 0x00 and 0x3f (in hexadecimal; inclusive), which are -assigned using Standards Action or IESG Approval as defined in Section 4.9 and -4.10 of [RFC8126].¶
-In addition to the fields in Section 22.1.1, permanent registrations in -this registry MUST include the following field:¶
-A short mnemonic for the frame type.¶
-In addition to the advice in Section 22.1, specifications for new permanent -registrations SHOULD describe the means by which an endpoint might determine -that it can send the identified type of frame. An accompanying transport -parameter registration is expected for most registrations; see -Section 22.2. Specifications for permanent registrations also -need to describe the format and assigned semantics of any fields in the frame.¶
-The initial contents of this registry are tabulated in Table 3. Note -that the registry does not include the "Pkts" and "Spec" columns from -Table 3.¶
-IANA [SHALL add/has added] a registry for "QUIC Transport Error Codes" under a -"QUIC" heading.¶
-The "QUIC Transport Error Codes" registry governs a 62-bit space. This space is -split into three regions that are governed by different policies. Permanent -registrations in this registry are assigned using the Specification Required -policy ([RFC8126]), except for values between 0x00 and 0x3f (in hexadecimal; -inclusive), which are assigned using Standards Action or IESG Approval as -defined in Section 4.9 and 4.10 of [RFC8126].¶
-In addition to the fields in Section 22.1.1, permanent registrations in -this registry MUST include the following fields:¶
-A short mnemonic for the parameter.¶
-A brief description of the error code semantics, which MAY be a summary if a -specification reference is provided.¶
-The initial contents of this registry are shown in Table 7.¶
-Value | -Error | -Description | -Specification | -
---|---|---|---|
0x0 | -NO_ERROR | -No error | -- Section 20 - | -
0x1 | -INTERNAL_ERROR | -Implementation error | -- Section 20 - | -
0x2 | -CONNECTION_REFUSED | -Server refuses a connection | -- Section 20 - | -
0x3 | -FLOW_CONTROL_ERROR | -Flow control error | -- Section 20 - | -
0x4 | -STREAM_LIMIT_ERROR | -Too many streams opened | -- Section 20 - | -
0x5 | -STREAM_STATE_ERROR | -Frame received in invalid stream state | -- Section 20 - | -
0x6 | -FINAL_SIZE_ERROR | -Change to final size | -- Section 20 - | -
0x7 | -FRAME_ENCODING_ERROR | -Frame encoding error | -- Section 20 - | -
0x8 | -TRANSPORT_PARAMETER_ERROR | -Error in transport parameters | -- Section 20 - | -
0x9 | -CONNECTION_ID_LIMIT_ERROR | -Too many connection IDs received | -- Section 20 - | -
0xa | -PROTOCOL_VIOLATION | -Generic protocol violation | -- Section 20 - | -
0xb | -INVALID_TOKEN | -Invalid Token Received | -- Section 20 - | -
0xc | -APPLICATION_ERROR | -Application error | -- Section 20 - | -
0xd | -CRYPTO_BUFFER_EXCEEDED | -CRYPTO data buffer overflowed | -- Section 20 - | -
0xe | -KEY_UPDATE_ERROR | -Invalid packet protection update | -- Section 20 - | -
0xf | -AEAD_LIMIT_REACHED | -Excessive use of packet protection keys | -- Section 20 - | -
The pseudocode in this section describes sample algorithms. These algorithms -are intended to be correct and clear, rather than being optimally performant.¶
-The pseudocode segments in this section are licensed as Code Components; see the -copyright notice.¶
-The pseudocode in Figure 45 shows how a variable-length integer can be -read from a stream of bytes. The function ReadVarint takes a single argument, a -sequence of bytes which can be read in network byte order.¶
-For example, the eight-byte sequence 0xc2197c5eff14e88c decodes to the decimal -value 151,288,809,941,952,652; the four-byte sequence 0x9d7f3e7d decodes to -494,878,333; the two-byte sequence 0x7bbd decodes to 15,293; and the single byte -0x25 decodes to 37 (as does the two-byte sequence 0x4025).¶
-The pseudocode in Figure 46 shows how an implementation can select -an appropriate size for packet number encodings.¶
-The EncodePacketNumber function takes two arguments:¶
-For example, if an endpoint has received an acknowledgment for packet 0xabe8bc -and is sending a packet with a number of 0xac5c02, there are 29,519 (0x734f) -outstanding packets. In order to represent at least twice this range (59,038 -packets, or 0xe69e), 16 bits are required.¶
-In the same state, sending a packet with a number of 0xace8fe uses the 24-bit -encoding, because at least 18 bits are required to represent twice the range -(131,182 packets, or 0x2006e).¶
-The pseudocode in Figure 47 includes an example algorithm for decoding -packet numbers after header protection has been removed.¶
-The DecodePacketNumber function takes three arguments:¶
-For example, if the highest successfully authenticated packet had a packet -number of 0xa82f30ea, then a packet containing a 16-bit value of 0x9b32 will be -decoded as 0xa82f9b32.¶
-Each time an endpoint commences sending on a new network path, it determines -whether the path supports ECN; see Section 13.4. If the path supports ECN, the goal -is to use ECN. Endpoints might also periodically reassess a path that was -determined to not support ECN.¶
-This section describes one method for testing new paths. This algorithm is -intended to show how a path might be tested for ECN support. Endpoints can -implement different methods.¶
-The path is assigned an ECN state that is one of "testing", "unknown", "failed", -or "capable". On paths with a "testing" or "capable" state the endpoint sends -packets with an ECT marking, by default ECT(0); otherwise, the endpoint sends -unmarked packets.¶
-To start testing a path, the ECN state is set to "testing" and existing ECN -counts are remembered as a baseline.¶
-The testing period runs for a number of packets or a limited time, as -determined by the endpoint. The goal is not to limit the duration of the -testing period, but to ensure that enough marked packets are sent for received -ECN counts to provide a clear indication of how the path treats marked packets. -Section 13.4.2 suggests limiting this to 10 packets or 3 times the probe -timeout.¶
-After the testing period ends, the ECN state for the path becomes "unknown". -From the "unknown" state, successful validation of the ECN counts an ACK frame -(see Section 13.4.2.1) causes the ECN state for the path to become "capable", unless -no marked packet has been acknowledged.¶
-If validation of ECN counts fails at any time, the ECN state for the affected -path becomes "failed". An endpoint can also mark the ECN state for a path as -"failed" if marked packets are all declared lost or if they are all CE marked.¶
-Following this algorithm ensures that ECN is rarely disabled for paths that -properly support ECN. Any path that incorrectly modifies markings will cause -ECN to be disabled. For those rare cases where marked packets are discarded by -the path, the short duration of the testing period limits the number of losses -incurred.¶
-Issue and pull request numbers are listed with a leading octothorp.¶
-Require expansion of datagrams to ensure that a path supports at least 1200 -bytes in both directions:¶
- -Stateless reset changes (#2152, #2993)¶
- -Rework the first byte (#2006)¶
-Substantial editorial reorganization; no technical changes.¶
-Changes to integration of the TLS handshake (#829, #1018, #1094, #1165, #1190, -#1233, #1242, #1252, #1450, #1458)¶
-Streams are split into unidirectional and bidirectional (#643, #656, #720, -#872, #175, #885)¶
- -Improvements to connection close¶
- -Split some frames into separate connection- and stream- level frames -(#443)¶
- -Transport parameters for 0-RTT are retained from a previous connection (#405, -#513, #512)¶
-The original design and rationale behind this protocol draw significantly from -work by Jim Roskind [EARLY-DESIGN].¶
-The IETF QUIC Working Group received an enormous amount of support from many -people. The following people provided substantive contributions to this -document:¶
-奥 一穂 (Kazuho Oku)¶
-Mikkel Fahnøe Jørgensen¶
-Mirja Kühlewind¶
-View saved issues, or the latest GitHub issues and pull requests.
-draft-ietf-quic-http | -html | -plain text | -diff with master | -diff with last submission | -
---|---|---|---|---|
draft-ietf-quic-invariants | -html | -plain text | -diff with master | -diff with last submission | -
draft-ietf-quic-qpack | -html | -plain text | -diff with master | -diff with last submission | -
draft-ietf-quic-tls | -html | -plain text | -diff with master | -diff with last submission | -
draft-ietf-quic-recovery | -html | -plain text | -diff with master | -diff with last submission | -
draft-ietf-quic-transport | -html | -plain text | -diff with master | -diff with last submission | -