Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: support QUIC as a transport #44886

Open
neild opened this issue Mar 9, 2021 · 76 comments
Open

crypto/tls: support QUIC as a transport #44886

neild opened this issue Mar 9, 2021 · 76 comments
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@neild
Copy link
Contributor

neild commented Mar 9, 2021

QUIC is the transport protocol underlying HTTP/3, as detailed in https://tools.ietf.org/html/draft-ietf-quic-tls-34. A QUIC implementation requires an unusually tight coupling with TLS; quoting the draft:

Rather than a strict layering, these two protocols cooperate: QUIC uses the TLS handshake; TLS uses the reliability, ordered delivery, and record layer provided by QUIC.

The crypto/tls package does not currently provide an API suitable for use by a QUIC implementation. I propose that we add one.

Related background

BoringSSL provides a set of functions specifically for use by QUIC implementations:
https://commondatastorage.googleapis.com/chromium-boringssl-docs/ssl.h.html#QUIC-integration

The quic-go QUIC implementation uses a fork of crypto/tls. The quic-go fork adds a RecordLayer interface, which is quite similar to the BoringSSL API.

A QUIC implementation needs to:

  • accept TLS handshake bytes to send to the peer
  • provide TLS handshake bytes received from the peer
  • learn the read and write secrets and cipher suites for the connection
  • receive TLS alerts
  • communicate transport parameters in the quic_transport_parameters TLS extension

BoringSSL and the quic-go fork of crypto/tls both provide these capabilities.

The quic-go fork provide additional extensions to crypto/tls around the handling of early data and session tickets. Those changes are out of scope for this proposal, which addresses only the QUIC-specific need to replace the record layer.

Proposed API changes

package tls

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
  EncryptionLevelInitial = iota
  EncryptionLevelEarlyData
  EncryptionLevelHandshake
  EncryptionLevelApplication
)

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
//
// It is an error to call Read, Write, or CloseWrite on a connection with
// a non-nil QUICTransport.
type QUICTransport struct {
  // SetReadSecret configures the read secret and cipher suite for the given
  // encryption level. It will be called at most once per encryption level.
  //
  // QUIC ACKs packets at the same level they were received at, except that
  // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
  // keys will always be installed with SetWriteSecret before the
  // packet-reading keys with SetReadSecret, ensuring that QUIC can always
  // ACK any packet that it decrypts.
  SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

  // SetWriteSecret configures the write secret and cipher suite for the
  // given encryption level. It will be called at most once per encryption
  // level.
  //
  // See SetReadSecret for additional invariants between packets and their
  // ACKs.
  SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

  // WriteHandshakeData adds handshake data to the current flight at the
  // given encryption level.
  //
  // A single handshake flight may include data from multiple encryption
  // levels. QUIC implementations should defer writing data to the network
  // until FlushHandshakeData to better pack QUIC packets into transport
  // datagrams.
  WriteHandshakeData func(level EncryptionLevel, data []byte) error

  // FlushHandshakeData is called when the current flight is complete and
  // should be written to the transport. Note that a flight may contain
  // data at several encryption levels.
  FlushHandshakeData func() error

  // ReadHandshakeData is called to request handshake data. It follows the
  // same contract as io.Reader's Read method, but returns the encryption
  // level of the data as well as the number of bytes read and error.
  //
  // ReadHandshakeData must not combine data from multiple encryption levels.
  //
  // ReadHandshakeData must block until at least one byte of data is
  // available, and must return as soon as least one byte of data is
  // available.
  ReadHandshakeData func(p []byte) (level EncryptionLevel, n int, err error)

  // Alert sends a fatal alert at the specified encryption level.
  //
  // If the level is not EncryptionLevelInitial, this function will not
  // be called before the write secret for the level is initialized.
  Alert func(EncryptionLevel, uint8)

  // SetTransportParameters provides the extension_data field of the
  // quic_transport_parameters extension sent by the peer. It will
  // always be called before the successful completion of a handshake.
  SetTransportParameters func([]byte)

  // GetTransportParameters returns the extension_data field of the
  // quic_transport_parameters extension to send to the peer.
  GetTransportParameters func() []byte
}

type Config struct {
  // If QUICTransport is non-nil, it replaces the TLS transport layer.
  // In this case, MinVersion and MaxVersion must be VersionTLS13.
  QUICTransport *QUICTransport
}

// ProcessQUICPostHandshake processes data that has become available
// after the handshake has completed. It must not be called until
// Handshake has returned successfully. It causes a call to the
// QUICTransport ReadHandshakeData function requesting the new data.
func (c *Conn) ProcessQUICPostHandshake() error { … }
@gopherbot gopherbot added this to the Proposal milestone Mar 9, 2021
@neild
Copy link
Contributor Author

neild commented Mar 9, 2021

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Mar 9, 2021
@ianlancetaylor ianlancetaylor added the Proposal-Crypto Proposal related to crypto packages or other security issues label Mar 9, 2021
@marten-seemann
Copy link
Contributor

Thank you @neild! I really like the API you're proposing.

A few thoughts:

  1. ALPN: QUIC mandates the use of ALPN (see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-8.1). In particular, it requires that the handhake fails if client and server can't agree on an application protocol. crypto/tls currently doesn't enforce this. We would need to enforce this check if Config.QUICTransport is set (no additional API required).
  2. Transport Parameters: We should guarantee that GetTransportParameters is called before SetTransportParameters on the server side (this is trivially possible, as the client's transport parameters are sent in the ClientHello. We just need to document it). Transport parameters are used to negotiate QUIC extensions, so a server might modify its transport parameters based on what it received from the client.
  3. WriteHandshakeData: not sure if we need the encryption level here. TLS writes messages in ascending order, i.e. it first writes all data at the Initial level, then at Handshake level, then at Application level. Once it's done with one level, it never goes back.
    In my crypto/tls fork the (admittedly insufficiently documented) contract is that after SetWriteSecret was called for encryption level X, all data written belongs to that encryption level, until SetWriteSecret is called for encryption level X+1.
    I really like the idea of explicitly setting the boundaries between flights with FlushHandshakeData, that would simplify packet generation quite a bit.
  4. You defined a EncryptionLevelEarlyData, but I'm not sure if QUIC 0-RTT is in scope here (note that crypto/tls doesn't currently support 0-RTT for TLS1.3/TCP). In order to support 0-RTT, the QUIC server needs to be able to check if it wants to allow 0-RTT handshake (see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-4.6.2 for details). That means that a QUIC implementation needs to be able to:
    1. server: save the transport parameters used on the old connection in the session ticket
    2. server: restore it from the session ticket when the client tries to resume a connection
    3. server: decide if to accept / reject 0-RTT
    4. client: learn if 0-RTT was accepted or rejected.

@neild
Copy link
Contributor Author

neild commented Mar 10, 2021

  1. Is it necessary for crypto/tls to enforce application protocol negotiation? The QUIC layer could check the ConnectionState after the handshake completes to verify a protocol was negotiated.
  2. Guaranteeing that the client quic_transport_parameters extension is provided to the server before requesting transport parameters to send to the client sounds reasonable. (You said GetTransportParameters is called before SetTransportParameters, but I think you mean the other way around.)
  3. You're right that we technically don't need to provide the encryption level in WriteHandshakeData, but I think doing so makes it less likely the QUIC implementation passes data at the wrong encryption level. I'll defer to @FiloSottile's expertise here.
  4. I defined EncryptionLevelEarlyData, because that enum entry is the only component of the transport-layer API required to support early data. But I think you're right that there are other crypto/tls changes required to support 0-RTT. I think those changes are less QUIC-specific, and should probably be a separate proposal. I don't have a strong opinion on whether it's worth including the EncryptionLevelEarlyData enum entry before that proposal or not.

@marten-seemann
Copy link
Contributor

  1. The QUIC spec says that you "MUST immediately" close the connection if no application protocol is negotiated, see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-8.1. Not sure if right after handshake completion still qualifies as "immediately". I decided to throw the error as early as possible.
  2. Yes, that's what I meant.

@OneOfOne
Copy link
Contributor

Any progress?

@neild
Copy link
Contributor Author

neild commented Jan 17, 2023

Going to try to move this along. Updated proposal after some discussions with @rolandshoemaker and @FiloSottile:

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
        EncryptionLevelInitial = iota
        EncryptionLevelHandshake
        EncryptionLevelApplication
)

// An AlertError is a TLS alert.
type AlertError uint8

// A QUICConn represents a connection which uses a QUIC implementation as the underlying
// transport as described in RFC 9001.
type QUICConn

// QUICClient returns a new TLS client side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICClient(t *QUICTransport, config *Config) *QUICConn

// QUICClient returns a new TLS server side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICServer(t *QUICTransport, config *Config) *QUICConn

// Handshake runs the client or server handshake protocol.
//
// When the handshake ends with a TLS alert, Handshake returns an error wrapping AlertError.
func (q *QUICConn) Handshake() error

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
type QUICTransport struct {
        // SetReadSecret configures the read secret and cipher suite for the given
        // encryption level. It will be called at most once per encryption level.
        //
        // QUIC ACKs packets at the same level they were received at, except that
        // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
        // keys will always be installed with SetWriteSecret before the
        // packet-reading keys with SetReadSecret, ensuring that QUIC can always
        // ACK any packet that it decrypts.
        //
        // After SetReadSecret is called for an encryption level, no more data is
        // expected at previous levels.
        SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

        // SetWriteSecret configures the write secret and cipher suite for the
        // given encryption level. It will be called at most once per encryption
        // level.
        //
        // See SetReadSecret for additional invariants between packets and their
        // ACKs.
        //
        // After SetWriteSecret is called for an encryption level, all future writes
        // will be at that level until the next call to SetWriteSecret.
        SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

        // WriteHandshakeData adds handshake data to the current flight at the
        // given encryption level.
        //
        // A single handshake flight may include data from multiple encryption
        // levels. QUIC implementations should defer writing data to the network
        // until FlushHandshakeData to better pack QUIC packets into transport
        // datagrams.
        WriteHandshakeData func(level EncryptionLevel, data []byte) error

        // FlushHandshakeData is called when the current flight is complete and
        // should be written to the transport. Note that a flight may contain
        // data at several encryption levels.
        FlushHandshakeData func() error

        // ReadHandshakeData is called to request handshake data. It follows the
        // same contract as io.Reader's Read method, but returns the encryption
        // level of the data as well as the number of bytes read and error.
        //
        // ReadHandshakeData must not combine data from multiple encryption levels.
        //
        // ReadHandshakeData must block until at least one byte of data is
        // available, and must return as soon as least one byte of data is
        // available.
        ReadHandshakeData func(p []byte) (level EncryptionLevel, n int, err error)

        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer. It will
        // always be called before the successful completion of a handshake.
        SetTransportParameters func([]byte) error

        // GetTransportParameters returns the extension_data field of the
        // quic_transport_parameters extension to send to the peer.
        //
        // Server connections always call SetTransportParameters before
        // GetTransportParameters.
        GetTransportParameters func() ([]byte, error)
}

This moves the QUIC support into a new QUICConn type. (Internally, QUICConn probably just wraps a tls.Conn, but this keeps the API surface clearer.)

In QUIC, TLS alerts are always fatal and terminate the connection. The Alert callback is dropped in favor of just returning a distinguishable error from Handshake. We'd probably do this for all handshakes, not just QUIC ones, which would satisfy #35234.

The progression of encryption levels is more explicitly spelled out: Once a read or write secret is provided for a level, no more data is read/written at previous levels. This means the WriteHandshakeData and ReadHandshakeData functions don't really need to pass around the encryption level, but doing so makes it harder to accidentally do the wrong thing.

I've added the guarantee that servers call SetTransportParameters before GetTransportParameters as suggested by @marten-seemann (thanks!). (Perhaps these functions would be clearer as SendTransportParameters and ReceiveTransportParameters?)

@marten-seemann points out (#44886 (comment)) that RFC 9001 requires connections to be terminated with a TLS alert when ALPN negotiation fails. The specific language in the RFC is "unless another mechanism is used for agreeing on an application protocol, endpoints MUST use ALPN for this purpose." Since this allows for an unspecified "another mechanism", I think it's best for the crypto/tls package to stay out of the way here and leave this up to the QUIC implementation. The negative impact is that a QUIC server using ALPN won't have an opportunity to reject a connection until after the handshake completes, rather than when processing the Initial message. This seems like a minor issue.

An open question is how to extend this to support early data / 0-RTT. Since crypto/tls has no early data support at the moment, it's difficult to say exactly what the QUIC-specific interface to this should be. In general terms, we would need:

  • Servers need a way to request session tickets be sent to the client in 1-RTT CRYPTO frames.
  • Servers need to be able to store data in the session ticket: A server should save its transport parameters (or a hash thereof) in the ticket so it can identify when a client is using a ticket with an obsolete configuration.
  • Clients need a way to receive session tickets (probably a new func in QUICTransport).
  • Clients need to be able to request 0-RTT by providing a prior session ticket when creating a connection.
  • Servers need to be informed when a client is attempting 0-RTT, and given a chance to accept or reject it.
  • Servers and clients that accept 0-RTT need to be informed of 0-RTT keys (a new EncryptionLevel value).

I don't see any obstacle to adding this functionality to the proposed API above, once crypto/tls has 0-RTT support. The only QUIC-specific portions of the API are, I think, a mechanism to receive and send NewSessionTicket messages. (Perhaps we call QUICTransport.WriteHandshakeData when prompted to send a ticket, and provide a QUICConn.ProcessPostHandshakeData function to handle post-handshake CRYPTO frames.)

Edit: Added an error return to SetTransportParameters and GetTransportParameters. SetTransportParameters can fail if the parameters cannot be parsed or are invalid. I don't see how GetTransportParameters can fail in any reasonable scenario, but may as well let it return an error for consistency.

@awnumar
Copy link
Contributor

awnumar commented Jan 17, 2023

I don't see any obstacle to adding this functionality to the proposed API above, once crypto/tls has 0-RTT support.

I thought crypto/tls deliberately does not support 0-RTT to avoid replay attacks?

@DeedleFake
Copy link

I thought crypto/tls deliberately does not support 0-RTT to avoid replay attacks?

I'm not sure that it's so clear. I found #9671 (comment) and

// We don't check the obfuscated ticket age because it's affected by
// clock skew and it's only a freshness signal useful for shrinking the
// window for replay attacks, which don't affect us as we don't do 0-RTT.

It looks to me like support just hasn't been implemented yet, not that there's any particular opposition to it.

@neild
Copy link
Contributor Author

neild commented Jan 17, 2023

Perhaps I should say if crypto/tls gains 0-RTT support.

My goal is to ensure we don't paint ourselves into a corner with the QUIC APIs such that it would be difficult to support 0-RTT with them in the future, in the event crypto/tls does add that feature.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@tmthrgd
Copy link
Contributor

tmthrgd commented Feb 9, 2023

@neild Two questions about the proposed API, why is QUICTransport a struct containing functions rather than an interface? I can't think of anything similar in the standard library.

And secondly who provides the QUICTransport? How do you go from 'I want to connect to google.com:443 over QUIC' or 'I want to host a server on port 443 using QUIC' to a QUICConn?

@DeedleFake
Copy link

DeedleFake commented Feb 9, 2023

I think it's a struct because it's a field in tls.Config, which also has a number of fields that are funcs for similar purposes. The idea here is to modify tls.Config to add support for the necessary pieces for QUIC.

Note that this proposal is specifically for crypto/tls, not for net. It doesn't cover things like QUICConn. It's just for adding the pieces of TLS-related functionality necessary to make something like QUICConn possible, and then that can hopefully be added later.

@FiloSottile
Copy link
Contributor

Indeed, this is about the crypto/tls API needed by HTTP/3 implementations in other packages.

It's a struct of callbacks because that can be extended in the future with other optional callbacks, while interfaces can't ever add method backwards-compatibly. I agree it's a little weird, but I don't have a better suggestion.

@neild
Copy link
Contributor Author

neild commented Feb 9, 2023

Yes, QUICTransport is a struct so we can add additional functions in the future.

The QUICTransport is provided by a QUIC implementation. This proposal is about making it possible to write a QUIC implementation that uses crypto/tls, not about providing a QUIC implementation. For example, github.com/lucas-clemente/quic-go needs to use a forked version of crypto/tls, since the crypto/tls API doesn't currently allow for building a QUIC implementation atop it.

QUICConn in this proposal is a QUIC-specific tls.Conn, not a QUIC connection. We'll want to clearly explain that in the documentation.

@marten-seemann
Copy link
Contributor

A few thoughts about the API suggested in #44886 (comment):

  1. Each endpoint can send post-handshake TLS messages at any point, commonly used for sending session tickets. It would be sad if crypto/tls was spawning a Go routine that continuously blocks on ReadHandshakeData. Instead, I'd prefer to re-add the ProcessQUICPostHandshake suggested in crypto/tls: support QUIC as a transport #44886 (comment) to the QUICConn.
  2. Section 4.1.3 specifies how to treat CRYPTO data sent at a previous encryption level (2nd bullet point). The responsibility for detecting this error condition would now be shared between crypto/tls and the QUIC implementation.
  3. Is there a plan to expose cipher suite constructors for the three cipher suites defined for TLS 1.3?
  4. Purely for testing purposes, it would be nice if it was possible to somehow make crypto/tls pick a specific cipher suite. tls.Config.CipherSuites doesn't have any effect in TLS 1.3, and I'm not suggesting to change that, the removal of that config option makes sense. However, a QUIC implementation will need to implement different code paths (e.g. for header encryption using AES and using ChaCha), and there needs to be some way to test those.
        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer. It will
        // always be called before the successful completion of a handshake.
        SetTransportParameters func([]byte) error

Is there any reason to not make the contract stricter here, e.g. "It will be called before the keys for the next encryption level are installed.". I imagine this is how it would be implemented anyway. This would be a requirement if one wanted to implemented the QUIC Version Negotiation extension. This would also be a requirement to send missing_extension alert if the quic_transport_parameter extension is missing (see Section 8.2. On the server side, it would also be nice if SetTransportParameters was called before GetTransportParameters, as the server would then be able to generate transport parameters (incl. extensions) in response to the values that the client sent.

@marten-seemann points out (#44886 (comment)) that RFC 9001 requires connections to be terminated with a TLS alert when ALPN negotiation fails. The specific language in the RFC is "unless another mechanism is used for agreeing on an application protocol, endpoints MUST use ALPN for this purpose." Since this allows for an unspecified "another mechanism", I think it's best for the crypto/tls package to stay out of the way here and leave this up to the QUIC implementation. The negative impact is that a QUIC server using ALPN won't have an opportunity to reject a connection until after the handshake completes, rather than when processing the Initial message. This seems like a minor issue.

Sending the no_application_protocol TLS alert after completion of the handshake can hardly be interpreted as "immediately". That means that using this API, it won't be possible to correctly implement the RFC.
What about adding a new field to the tls.Config.EnforceALPN bool (or alternatively a HandleALPN([]string) (string, error)). This would certainly be useful for TLS / TCP as well, and make it easier to prevent protocol confusion attacks / bugs.


@neild Please let me know once you have a branch to play around with. I'd be more than happy to integrate it into a quic-go branch (even at an early stage), so we can how the new API works in practice.

@james-lawrence
Copy link
Contributor

james-lawrence commented Feb 13, 2023

@neild think functional constructors will work better than the struct based approach here. will let us hide the internals of the structure layout.

@gopherbot
Copy link

Change https://go.dev/cl/461608 mentions this issue: crypto/tls: support QUIC as a transport

@neild
Copy link
Contributor Author

neild commented Feb 13, 2023

@marten-seemann Thanks for the comments!

Each endpoint can send post-handshake TLS messages at any point, commonly used for sending session tickets.

I was thinking that there isn't any use for post-handshake TLS messages until crypto/tls supports early data, but I'd forgotten about session tickets. I agree, we need a way to handle post-handshake data.

// HandlePostHandshakeData processes data that has become available
// after the handshake has completed.
// It must not be called before Handshake has returned successfully.
// It returns the number of bytes of data successfully processed,
// which may be zero if data does not contain a complete message.
// When n <= len(data), the caller should retain the unprocessed bytes
// and provide them with additional data when available.
func (q *QUICConn) HandlePostHandshakeData(data []byte) (n int, err error) {}

Alternatively, HandlePostHandshakeData could buffer partial messages. This would be simpler for the caller, but given that a QUIC implementation needs to have some form of input buffering to allow for packet reordering, perhaps it's best not to introduce an additional buffering layer here. What do you think?

Section 4.1.3 specifies how to treat CRYPTO data sent at a previous encryption level (2nd bullet point). The responsibility for detecting this error condition would now be shared between crypto/tls and the QUIC implementation.

My view is that the QUIC implementation should detect CRYPTO data sent at a previous encryption level, but crypto/tls will verify that the QUIC implementation has done so correctly.

Is there a plan to expose cipher suite constructors for the three cipher suites defined for TLS 1.3?

Isn't this aes.NewCipher and chacha20poly1305.New? Or is there something else I'm not thinking of?

Purely for testing purposes, it would be nice if it was possible to somehow make crypto/tls pick a specific cipher suite.

This seems like it'd be useful to have, but I think it's out of scope for this proposal.

Is there any reason to not make the contract stricter here, e.g. "It will be called before the keys for the next encryption level are installed.".

Seems reasonable. How about:

        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer.
        //
        // For client connections, SetTransportParameters will be called before
        // EncryptionLevelApplication keys are installed with SetWriteSecret.
        // For server connections, SetTransportParameters will be called before
        // EncryptionLevelHandshake keys are installed with SetWriteSecret.
        SetTransportParameters func([]byte) error

Sending the no_application_protocol TLS alert after completion of the handshake can hardly be interpreted as "immediately".

I'm convinced.

How about: If tls.Config.NextProtos is not set, then the handshake will complete successfully even if the peer doesn't provide an ALPN protocol. However, if tls.Config.NextProtos is set, then we generate a no_application_protocol alert if the peer doesn't support ALPN. This is in contrast to the non-QUIC behavior of succeeding with no negotiated protocol when the peer doesn't support ALPN.

This requires no API changes, and the only behavioral change is to return an error when QUIC is enabled, NextProtos is set, and the peer doesn't support ALPN.

Please let me know once you have a branch to play around with.

https://go.dev/cl/461608 is a very-draft implementation of the proposal. (No tests, doesn't enforce all the constraints defined in RFC 9001.)

@neild
Copy link
Contributor Author

neild commented Feb 13, 2023

@neild think functional constructors will work better than the struct based approach here. will let use hide the internals of the structure layout.

What's the motivation for hiding the internals of QUICTransport? It's a configuration struct.

@james-lawrence
Copy link
Contributor

@neild because it'll lock in the API for all time. functional constructors wont.

@neild
Copy link
Contributor Author

neild commented Feb 14, 2023

Implementing HandlePostHandshakeData got me thinking a bit more about the API for providing handshake data to the QUICConn.

The design of crypto/tls is such that we pretty much have to have a goroutine running the handshake, since the handshake is structured as blocking functions which wait for additional data at various points. It would be possible to restructure the handshake to be non-blocking, of course, but not trivial.

The QUICConn design above places the responsibility for running the handshake goroutine on the QUIC implementation.

An alternative would be to have crypto/tls run the handshake goroutine, replacing QUICTransport.ReadHandshakeData with a synchronous call on the QUICConn:

// HandleData processes handshake data received in CRYPTO frames.
//
// HandleData may call hooks on the QUICTransport.
func (q *QUICConn) HandleData(level EncryptionLevel, data []byte) error {
}

// QUICTransport hooks are only called from QUICClient, QUICServer, or QUICConn.HandleData.
// Hooks are never called asynchronously.
type QUICTransport struct {
  // HandshakeComplete is called when the TLS handshake completes successfully.
  HandshakeComplete func()

  // ...
}

(HandleData would also supersede HandlePostHandshakeData, of course.)

Internally, crypto/tls would start a goroutine to run the handshake, but the QUIC implementation could engage with it using synchronous function calls.

This has the advantage of being (I think) a simpler interface for a QUIC implementation to use, and leaving the door open for us adding a non-blocking handshake implementation some day. A disadvantage is that HandleData will block until the provided data has been processed; it would be unfortunate if QUIC implementations felt the need to run HandleData in a background goroutine to avoid blocking packet processing.

@neild
Copy link
Contributor Author

neild commented May 2, 2023

Transport parameters seem very fiddly. The server needs to access them from a callback, while the client needs to know what event precedes them becoming valid. QUICConfig.TLSConfig.GetConfigForClient will need to be a connection-scoped closure, since the server needs to associate the transport parameters with the connection. For client connections, we'll need to store the transport parameters indefinitely on the QUICConn, or define some point at which they are discarded and TransportParameters is no longer usable.

I understand the desire to avoid connection-scoped information on a Config, but I think this is all much simpler if the QUICConn can call methods on the QUIC layer's per-connection state object. Avoiding it feels like trying to design an io.Copy that doesn't take an io.Writer parameter. And for me, at least, managing locking on the application side hasn't been an issue at all; none of these changes feel like they're making life easier for me as a either a user or an implementer of this API.

That said, perhaps we could do something like this for transport parameters:

// On the client side, SetTransportParameters must be called before QUICConn.Start.
// On the server side, SetTransportParameters may be called after transport parameters
// have been received from the client.
func (q *QUICConn) SetTransportParameters(params []byte)

// QUICTransportParametersReceived indicates that transport parameters have been received from the peer.
// The parameters are provided in QUICOp.Data.
const QUICTransportParametersReceived QUICOpKind

// ErrQUICTransportParametersRequired is returned by QUICConn.HandleData when a server-side
// connection must provide transport parameters to progress.
// The user should call QUICConn.SetTransportParameters and call QUICConn.HandleData again
// (possibly with a zero-length data parameter).
var ErrQUICTransportParametersRequired error

A server-side connection can receive the client's transport parameters as a QUICOp, provide its own transport parameters, and continue. Servers that don't vary transport parameters based on the client's configuration can just provide them up-front.

@FiloSottile
Copy link
Contributor

FiloSottile commented May 2, 2023

That said, perhaps we could do something like this for transport parameters:

I do like that more than my design. I had tried to toy with interrupting the HandleData call to feed back transport parameters, but had not reached the sentinel error solution and gave up.

I understand the desire to avoid connection-scoped information on a Config, but I think this is all much simpler if the QUICConn can call methods on the QUIC layer's per-connection state object. [...] none of these changes feel like they're making life easier for me as a either a user or an implementer of this API.

With the ErrQUICTransportParametersRequired, not even GetConfigForClient needs to be a connection-scoped closure, which might finally produce the simplicity outcome of avoiding callbacks? @marten-seemann, I remember you mentioned having to deal with locking, does this API look like it would be simpler to use than the callback one?

Full API
type QUICConfig struct {
	TLSConfig *Config
}

func QUICClient(*QUICConfig) *QUICConn
func QUICServer(*QUICConfig) *QUICConn

// On the client side, SetTransportParameters must be called before QUICConn.Start.
// On the server side, SetTransportParameters may be called after transport parameters
// have been received from the client.
func (q *QUICConn) SetTransportParameters(params []byte)

func (q *QUICConn) ConnectionState() ConnectionState

func (q *QUICConn) Start(context.Context) ([]QUICOp, error)

func (q *QUICConn) HandleData(level QUICEncryptionLevel, data []byte) ([]QUICOp, error)

// ErrQUICTransportParametersRequired is returned by QUICConn.HandleData when a
// server-side connection must provide transport parameters to progress.
// The user should call QUICConn.SetTransportParameters and call
// QUICConn.HandleData again (possibly with a zero-length data parameter).
var ErrQUICTransportParametersRequired error

type QUICOpKind uint16

const (
	QUICSetReadSecret QUICOpKind = iota
	QUICSetWriteSecret
	QUICWriteData
	QUICHandshakeDone
	QUICTransportParameters
)

type QUICOp struct {
	Kind QUICOpKind

	// Set for QUICSetReadSecret, QUICSetWriteSecret, and QUICWriteData.
	Level QUICEncryptionLevel

	// Set for QUICTransportParameters, QUICSetReadSecret, QUICSetWriteSecret, and QUICWriteData.
	Data []byte // owned by crypto/tls, valid until next HandleData

	// Set for QUICSetReadSecret and QUICSetWriteSecret.
	Suite uint16
}

type QUICEncryptionLevel int

const (
	QUICEncryptionLevelInitial QUICEncryptionLevel = iota
	QUICEncryptionLevelHandshake
	QUICEncryptionLevelApplication
)

EDIT May 3: added QUICConn.ConnectionState, Start Context.

@marten-seemann
Copy link
Contributor

Still thinking about callbacks vs. non-connection-scoped config, but here's a thought I had about the QUICOp return values. To me, the QUICOp feels a bit messy and not very idiomatic.

What about defining concrete types:

func (q *QUICConn) HandleData(level QUICEncryptionLevel, data []byte) ([]any, error)

type QUICSetReadSecret struct { // equivalently for QUICSetWriteSecret
     Level QUICEncryptionLevel
     Secret []byte
}

type QUICTransportParameters struct {
     Data []byte
}

type QUICWriteData struct {
     Level QUICEncryptionLevel
     Data []byte
}

type QUICHandshakeDone struct{}

In both proposals, the receiver will have to run a select in a loop to process the return values from HandleData. With my proposal it would be a type switch instead of switching on QUICOp.Kind.

@FiloSottile
Copy link
Contributor

About QUICOp vs separate types, I went back and forth. Putting any in a public API is annoying, as there is nothing in the types that tells you what values could be returned. At least with QUICOpKind there's a list of consts.

I went with QUICOpKind because it polluted the crypto/tls godoc index less. Ultimately, I'd be happy with both, I was unsure which one to propose after all. I'd call them QUICOpSetReadSecret, QUICOpHandshakeDone, etc. though, so they always sort together.

@marten-seemann
Copy link
Contributor

After thinking this through once more, I believe that both proposal will work.

I think I'd prefer @FiloSottile's proposal though. Having a []QUICOp return value (or, as I suggested earlier, concrete types) makes it easier to understand how these events are sequenced. I find it a bit hard to wrap my head around when which callback will be called by crypto/tls.

@gopherbot
Copy link

Change https://go.dev/cl/493655 mentions this issue: crypto/tls: support QUIC as a transport (QUICEvent)

@neild
Copy link
Contributor Author

neild commented May 8, 2023

I worked up the QUICOp variant in a separate CL: https://go.dev/cl/493655 (Tests not updated yet.)

I made a couple tweaks:

  • Renamed QUICOp to QUICEvent. "Event" seemed a bit more natural when writing the documentation. I can change this back to "Op" easily enough if preferred.
  • There is a QUICTransportParametersRequired event kind rather than than an ErrQUICTransportParametersRequired error.

I've tested this CL in practice and found it workable. I don't think it's any easier to use than the callback-based approach, but it's not especially harder either.

I still prefer the callback-based API. The event-based API makes data ownership and lifetimes more complicated: Since we're returning []byte data to the caller, we need to copy the handshake bytes before returning them. We could define the handshake []byte data as being valid only up until the next HandshakeData call, but that seems very error prone. In contrast, it's well-understood that an io.Writer's Write call should not retain the []byte slice being written after returning.

Handling transport parameters is also simpler in the callback-based API: We just call a function to get the transport parameters to send at the time that we need to send them. The event-based API needs to put the handshake on hold while waiting for transport parameters. It's not particularly difficult, but it's still more complexity than the alternative.

(While "callback-based" is accurate, I think it also sounds more unusual than it actually is: While io.Copy is technically callback-based, in that it calls methods on the source and destination, we don't usually think of it as such.)

If @FiloSottile and @marten-seemann both prefer the event-based form, however, I'm happy to proceed with it.

@marten-seemann
Copy link
Contributor

I updated quic-go to use the new CL: https://github.com/quic-go/quic-go/blob/cryptotls-events/internal/handshake/crypto_setup.go#L235-L277. I kind of like the way we just process the events one after the other in a loop. Although, as expected, the QUICTransportParametersRequired is a little bit ugly.

For comparison, this is the change I made for the other CL: https://github.com/quic-go/quic-go/blob/cryptotls/internal/handshake/crypto_setup.go#L178-L185.

Most important observation: both of them work.
And regarding the code that's needed to drive any of them, I don't think any of them is obviously easier or harder than the other.

Also, I expect that I can significantly simplify the cryptoSetup struct in my code, once I drop support for Go 1.20 (I have a similar policy as the Go project in that I aim to always support the 2 most recent Go versions), and am able to fully embrace the new API. In fact, I might just back-port whatever we decide to do into my Go 1.20 crypto/tls fork :)

We could define the handshake []byte data as being valid only up until the next HandshakeData call, but that seems very error prone. In contrast, it's well-understood that an io.Writer's Write call should not retain the []byte slice being written after returning.

I don't think that would be a particularly terrible contract, but I agree that it's not the nicest one either. In general, I'm more concerned about allocs during the handshake (I've seen production servers running quic-go where the majority of all allocs comes from the crypto package in the standard library) than about memory copies though.

@neild
Copy link
Contributor Author

neild commented May 9, 2023

In general, I'm more concerned about allocs during the handshake (I've seen production servers running quic-go where the majority of all allocs comes from the crypto package in the standard library) than about memory copies though.

Returning a []byte containing the handshake data to send in a CRYPTO frame means we either allocate a new []byte for every chunk of data returned, or define the lifetime of the []byte as bounded by the next call to a QUICConn method.

The same goes for the []QUICEvent--either we allocate a new slice of events on each call, or we invalidate the previous slice at some point.

I generally don't like easy-to-misuse lifetime semantics, but perhaps it's acceptable here on the grounds that very, very few people will ever use this API directly. (Which is also my argument for preferring the callback-based API: It makes the lifetime of data clear, avoids allocations, and any lack of clarity in callback ordering is mitigated by the fact that the only people who will use this API are a small number of QUIC implementers.)

Anyway, I leave it up to @FiloSottile whether he prefers https://go.dev/cl/472621 or https://go.dev/cl/493655.

@rsc
Copy link
Contributor

rsc commented May 10, 2023

@FiloSottile it sounds like you are the decider for which variant. Do you still prefer the []QUICEvent version?

@rsc
Copy link
Contributor

rsc commented May 16, 2023

Talked to @FiloSottile and we decided on the QUICEvent version. The allocation is not a show-stopper, and it will be easier to understand and extend in the future, which should lead to fewer bugs.

@neild
Copy link
Contributor Author

neild commented May 16, 2023

Updating https://go.dev/cl/493655 with the tests now.

I'm wondering if it's simpler to provide events via a separate method:

// NextEvent returns the next event on the connection.
// NextEvent returns an event with a Kind of QUICNoEvent when there are no events available.
// New events may be generated by Start, HandleData, and SetTransportParameters.
func (q *QUICConn) NextEvent() QUICEvent

This avoids questions of ownership of the []QUICEvent slice, and the somewhat clumsy way you currently need to call HandleData(level, nil) to make progress after providing transport parameters mid-handshake.

@marten-seemann
Copy link
Contributor

I like @neild's proposal, as it provides an easy way to avoid allocations. The fact that we don't need HandleData(level, nil) is also a welcome simplification.

Any opinions / decision on QUICOp.Kind vs. concrete types, as I suggested in #44886 (comment)?

@neild
Copy link
Contributor Author

neild commented May 16, 2023

Any opinions / decision on QUICOp.Kind vs. concrete types, as I suggested in #44886 (comment)?

I'd rather stick with a single concrete QUICEvent type, with a Kind enum. It's less API surface and avoids an allocation per event.

I've updated https://go.dev/cl/493655 with tests, and changed it to report events with QUICConn.NextEvent.

@marten-seemann
Copy link
Contributor

Any opinions / decision on QUICOp.Kind vs. concrete types, as I suggested in #44886 (comment)?

I'd rather stick with a single concrete QUICEvent type, with a Kind enum. It's less API surface and avoids an allocation per event.

Sounds good to me.

I've updated https://go.dev/cl/493655 with tests, and changed it to report events with QUICConn.NextEvent.

I've made the (minimal) change to quic-go to use NextEvent in quic-go/quic-go@f507612. I like this new API. Let's ship it!

@FiloSottile
Copy link
Contributor

Oh I like NextEvent as a solution to having to interrupt HandleData for SetTransportParameters!

Agreed on a single QUICEvent type with an enum (and on calling it Event not Op, much better).

@rsc
Copy link
Contributor

rsc commented May 17, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 24, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: crypto/tls: support QUIC as a transport crypto/tls: support QUIC as a transport May 24, 2023
@rsc rsc modified the milestones: Proposal, Backlog May 24, 2023
@mvdan
Copy link
Member

mvdan commented May 24, 2023

Out of curiosity, is this still aiming for Go 1.21, or might it be pushed back to 1.22? I ask because I believe the 1.21 tree is meant to be frozen today.

@neild
Copy link
Contributor Author

neild commented May 24, 2023

We are indeed attempting to slide this in for 1.21 before the window closes.

gopherbot pushed a commit that referenced this issue May 24, 2023
Add a QUICConn type for use by QUIC implementations.

A QUICConn provides unencrypted handshake bytes and connection
secrets to the QUIC layer, and receives handshake bytes.

For #44886

Change-Id: I859dda4cc6d466a1df2fb863a69d3a2a069110d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/493655
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Marten Seemann <martenseemann@gmail.com>
@mknyszek mknyszek mentioned this issue Jun 1, 2023
420 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Accepted
Development

No branches or pull requests