New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: Transport should handle 421 #18341

Open
bradfitz opened this Issue Dec 16, 2016 · 7 comments

Comments

Projects
None yet
4 participants
@bradfitz
Copy link
Member

bradfitz commented Dec 16, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

What did you expect to see?

What did you see instead?

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Jan 4, 2017

Triage, @bradfitz?

@bradfitz bradfitz added this to the Unreleased milestone Jan 4, 2017

@bradfitz bradfitz added the help wanted label Jan 4, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 4, 2017

Thanks. Yeah, we could still do this sometime. It's just lowish priority, since I've never seen one in the wild and the spec only says we "MAY" retry the request.

http://httpwg.org/specs/rfc7540.html#rfc.section.9.1.2

/cc @tombergan

@meirf

This comment has been minimized.

Copy link
Contributor

meirf commented Feb 5, 2017

@bradfitz

The title of this issue puts the retry responsibility in the hands of Transport.

I don't see how the transport code could handle a retry. According to the RoundTripper, "RoundTrip should not attempt to handle higher-level protocol details such as redirects, authentication, or cookies." Though a redirect is not the same as a retry, they are at a similar abstraction level. I.e., callers of transport's round tripper would probably not expect that there was an intermediate 421 response before an additional request. I know there is a kind of precedence here with acting on 100 responses. Maybe you're saying that despite 4xx being client errors, 421 should be on the level of 1xx in terms of client's expectation of involvement.

(There's also the language of "should" vs "must". The quote above uses should, whereas a prior sentence in that godoc RoundTripper paragraph, uses "must" regarding error responses - "...RoundTrip must return err == nil..." Therefore, the restriction is not an absolute requirement.)

I'm new here so please let me know if I'm missing something.

@tombergan

This comment has been minimized.

Copy link
Contributor

tombergan commented Feb 6, 2017

There is more detail in RFC 7838. I believe what can happen is:

  1. Client is informed that foo.cdn.com is an Alt-Svc for foo.com
  2. Client sends http://foo.com/request to foo.cdn.com
  3. foo.cdn.com replies with 421 (I don't fully understand why the server would do this in practice, but I guess it could happen if foo.com is removed from the CDN and the client's Alt-Svc cache has not been updated)

RFC 7838 Section 6 says that the Alt-Svc cache entry MUST be cleared, then the request MAY be retried:

Clients receiving 421 (Misdirected Request) from an alternative service MUST remove the corresponding entry from its alternative service cache (see Section 2.2) for that origin. Regardless of the idempotency of the request method, they MAY retry the request, either at another alternative server, or at the origin.

Since this is a MAY, I tend to agree with @meirf that RoundTrip should not automatically retry. Instead, the implementation can remove foo.cdn.com from the Alt-Svc cache to guarantee that a subsequent retry will use foo.com.

However, RFC 7540 Section 9.1.2 says:

[Receiving 421] is possible if a connection is reused (Section 9.1.1)

I really don't understand why this would happen -- isn't that what GOAWAY is for? In any case, I think the http2 client library could handle this by treating 421 as GOAWAY(NOERROR).

@meirf

This comment has been minimized.

Copy link
Contributor

meirf commented Feb 13, 2017

I really don't understand why this would happen

Here's an example of 421 because of connection reuse.

isn't that what GOAWAY is for?

I think GOAWAY is too extreme for reuse-caused 421s. Specifically, 6.8 says:

The GOAWAY frame (type=0x7) is used to initiate shutdown of a connection or to signal serious error conditions...Receivers of a GOAWAY frame MUST NOT open additional streams on the connection, although a new connection can be established for new streams...The GOAWAY frame applies to the connection, not a specific stream.

GOAWAY prevents any additional steams, but 421 (at least for connection reuse) indicates that the connection can be used again - just not for requests destined for the problematic host.

I think the http2 client library could handle this by treating 421 as GOAWAY(NOERROR)

Indeed, we want the request to be retried on a different connection just as we do for GOAWAY(NOERROR). We agree that http.Transport should not be retrying as discussed above. So http.Client might have a boolean field saying retry on 421, which defaults to false. The client has a Transport field which is of type RoundTripper. A 421 is received and the client knows it should tell the transport to try the request on a new/different connection. But the RoundTripper interface in Client only takes a Request - no other options. Changing the API is probably not what we want. There is a RoundTripOpt in x/net/http2 which takes options and is bundled into h2_bundle.go. But RoundTrip calls into RoundTripOpt not vice versa.

I'd love to submit some code for this, but I don't see how it would work. I'm open to any ideas/corrections on what I've written here or above.

@tombergan

This comment has been minimized.

Copy link
Contributor

tombergan commented Feb 13, 2017

@meirf, thanks for the virtual hosts example. I knew about 421 with Alt-Svc, but not virtual hosts. Before diving into a solution, I want to make sure we completely understand the problem. Specifically, I want to understand all the ways a client may decide to send a request to a server that will return 421. I believe there are (at least) three ways this can happen:

  1. Via Alt-Svc (RFC 7838), the client previously discovered that foo.com is also available at cdn.com. The client happens to have an open connection to cdn.com, so it sends requests to foo.com on that connection rather than opening a new connection.

  2. The client has a connection open to www.foo.com. The TLS cert for that connection names multiple hosts (perhaps using a wildcard), including www1.foo.com. The client recognizes this, so it sends requests for www1.foo.com over the same connection.

  3. foo.com is served by three distinct IPs, each of which are virtual hosts. One of those IPs is misconfigured and will return 421 for all foo.com requests. The client sees all three IPs in the DNS record for foo.com. If the client unluckily selects the wrong IP, it will get a 421.

I believe 1 and 2 cannot happen in the current Go library because Go does not implement Alt-Svc or connection sharing when advertised by TLS certs (@bradfitz, correct me if I'm wrong). The third case can happen, but unfortunately, this is trickier to fix because we'd need to blacklist specific IPs in the dialer so we don't end up with the same IP when redialing a new connection.

Does that sound right?

@meirf

This comment has been minimized.

Copy link
Contributor

meirf commented Mar 1, 2017

Thanks for bringing up the status quo: "I believe... connection."

I think it's worthwhile to go back to 9.1.2. The (unfortunately vague) gist is

the request was directed at a server that is not able to produce a response

It then lists 3 categories of cases when this might happen.

Case A

server that is not configured to produce responses for the combination of scheme and authority that are included in the request URI

Case B

a connection is reused (Section 9.1.1)

Case C

an alternative service is selected [ALT-SVC].

Cases B & C are the only cases referred to in the context of retrying. (For A, we would never want to retry the request because it would certainly fail.) For examples of B, the spec describes one with a middlebox. As another example, I referenced one above with virtual hosts that had different ssl configurations. For C, 7838 says that this happens when "an alternative service is not authoritative". One way to be authoritative is to present a TLS certificate that’s valid for the origin . 7838 also says "they MAY retry the request, either at another alternative server, or at the origin."

Going back to the 3 cases you listed.

  1. I believe this conflates Case B and Case C together. Either the alternate service aspect or the connection reuse aspect is enough to get a 421 in some scenarios.
  2. This is a vanilla case of when an http2 + ssl connection can be reused: "host having resolved to the same IP address" and "a certificate that is valid for the host in the URI". (7540)
  3. This is just an example I gave of Case B. I wouldn't put it into a separate case/category.

Go does not implement Alt-Svc.

If it does not implement Alt-Svc/there is no plan to, then we can rule out Case C.

Go does not implement ... connection sharing when advertised by TLS certs.

I think this is correct - connection sharing is done only on the basis of authority (host:port). Though there is a currently a "TODO: add support for sharing conns based on cert names". The spec's only example of 421 in case B and the only cases I've seen involve different authorities that are covered by a single cert. Because of that, I assume there are currently no cases of 421 occurring in Case B with the current implementation.

The third case can happen

I don't see how the third case could happen. The vhosts have different authorities. Additionally, it requires connection reuse based on cert because it's https.

If Cases A, B and C are exhaustive and my assumption ("...there are currently no cases...") is correct, then it looks like retrying would accomplish nothing right now. It would only be valuable once alt-svc or sharing conns based on cert names was implemented. Even at that point, we would still have to address the issue I brought up way earlier - "retry responsibility".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment