Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation request formation and Vary #832

Closed
mnot opened this issue Apr 6, 2021 · 13 comments
Closed

Validation request formation and Vary #832

mnot opened this issue Apr 6, 2021 · 13 comments

Comments

@mnot
Copy link
Member

mnot commented Apr 6, 2021

#110 introduced this text regarding creation of a validation request with 7c3ecea:

It then updates that request with one or more precondition header fields. These contain validator metadata sourced from stored response(s) that have the same cache key.

(emphasis added)

We've since clarified cache key to be the whole thing - URL and any Varying request headers.

My tests show that in the following situation:

  1. A request is made and the response is stored with an ETag validator and a Vary header that indicates a value that was present in the request
  2. A subsequent request is made with a different Varying request header value

... most implementations are compliant with the intent expressed above -- they won't include the ETag of the first request in If-None-Match.

The exceptions are Chrome and Firefox, who both will.

In other words, these two implementations are updating a request with validators from responses that have different cache keys -- responses that by definition cannot be used to satisfy the current request.

It might make sense to use other stored responses for validators if they're selectable -- e.g., when Accept-Language is what's being Vary'd on, and there are several acceptable languages in cache. However, it makes no sense when the selecting header doesn't allow for selection of multiple representations (such as when it's a straight string comparison, as is in the case for my test).

Arguably, sending extra validators doesn't cause any harm, as long as the cache properly performs selection before it processes preconditions (as specified somewhat opaquely here). However, UAs always have better information about their capabilities and user preferences, so I strongly suspect there are going to be some nasty edge cases here where a cache is going to need to trust the UA about selection (especially if Client Hints start getting traction).

At a minimum, I think we need to clarify the language above; e.g., to:

It then updates that request with one or more precondition header fields. These contain validator metadata sourced from stored response(s) that could be selected for the request [ref].

I'd also like to consider giving stronger guidance, e.g., by appending:

A validating request SHOULD NOT contain other validator metadata.

(that's very rough; it needs adjustment, but you get the idea)

We might also need to clarify the language about selection in Handling a Received Validation Request.

// cc @martinthomson and @davidben for implementer perspectives.

@mnot mnot added the caching label Apr 6, 2021
@reschke
Copy link
Contributor

reschke commented Apr 6, 2021

Sounds right to me.

@davidben
Copy link

davidben commented Apr 6, 2021

@morlovich can probably speak to our HTTP caching implementation.

However, UAs always have better information about their capabilities and user preferences, so I strongly suspect there are going to be some nasty edge cases here where a cache is going to need to trust the UA about selection (especially if Client Hints start getting traction).

I'm not sure I follow this concern. It sounds like this is ultimately a question of whether an ETag is scoped to the primary key or (primary, secondary) tuple. If we've thus far said it was scoped to the whole primary key, it seems that should be fine. It's true that Client Hints, like fetch() and XMLHttpRequest, are another place where UA request headers vary, but all these mechanisms ultimately turn into request headers, which Vary and friends handle.

Could you elaborate on the edge cases here?

@reschke
Copy link
Contributor

reschke commented Apr 7, 2021

Is this something that needs to be resolved before IETF LC?

@mnot
Copy link
Member Author

mnot commented Apr 7, 2021

I think we can consider it as an LC issue.

@royfielding
Copy link
Member

I think the spec is correct as is, or at least it was before cache key was redefined, and is only slightly less efficient now.

In other words, these two implementations are updating a request with validators from responses that have different cache keys -- responses that by definition cannot be used to satisfy the current request.

They are literally saying "here is a request, but I have these variants stored already, so just let me know if I should deliver one of them instead of the new one you generate specific to this request".

It doesn't matter that the etags reflect variants that don't match the cache key -- the origin server decides what matches the cache key and that decision is multilevel (might differ based on how far the request proceeds in the resource mapping process) and varies over time. Likewise, the origin can decide that it is too busy to generate a new representation and direct the cache to serve one of those instead.

It's important to keep in mind that Vary is an instruction to caches on what they are allowed to do with this representation. It is not a restriction on the resource, nor on the origin server.

@davidben
Copy link

davidben commented Apr 7, 2021

That's a good point. Especially without draft-ietf-httpbis-variants, the representation in the cache has may well apply to multiple sets of request headers.

@mnot
Copy link
Member Author

mnot commented Apr 26, 2021

Especially without draft-ietf-httpbis-variants, the representation in the cache has may well apply to multiple sets of request headers.

The proposal above doesn't affect situations where information about the selecting headers is available to all relevant caches, like with variants; the text allows sending selectable entity-tags, and variants gives the client enough information to know that.

It doesn't matter that the etags reflect variants that don't match the cache key -- the origin server decides what matches the cache key and that decision is multilevel (might differ based on how far the request proceeds in the resource mapping process) and varies over time. Likewise, the origin can decide that it is too busy to generate a new representation and direct the cache to serve one of those instead.

Right, but If-None-Match is also processed by intermediary caches, and they don't necessarily have enough information to select a single response for validation. That means that when more than one listed entity-tag is selectable, they need to either forward it to origin, or guess. Most intermediaries will be biased towards guessing for performance reasons.

It's important to keep in mind that Vary is an instruction to caches on what they are allowed to do with this representation. It is not a restriction on the resource, nor on the origin server.

Yes. Allowing non-selectable entity-tags into validating requests allows an origin to effectively 'change its mind' about what it said previously, and extend the selectability of a stored response to a new request. At least in the near future, I think that capability is of extremely limited value; it's theoretically interesting, but not particularly useful on a day-to-day basis.

OTOH in situations where browsers have more information -- for example, if they were to implement Variants first, or if they knew more about the client's relative preferences than they could emit in request headers -- constraining the entity-tags sent could help an intermediary make a selection decision when it has more than one stored response.

A counter-argument here could be that in some situations, intermediaries might have more information about selection than clients. That might be, but I suspect that information would be used to further pare down the list of candidates, rather than select from something that was previously not applicable to this request.

So, I think it's overall better protocol design for the entity-tags listed to be constrained. However, it's pretty clear this is a change from 7230 that will make some implementations non-conformant. So, perhaps we could make progress by:

  1. Changing Sending a Validation Request to be more open regarding the candidate stored responses used for validator metadata (i.e., back out this perhaps unintentional change)
  2. Add non-normative advice that only selectable stored responses should be used when creating a validating request unless the implications are understood (pointing to (4) below)
  3. More strongly emphasise the need to perform selection in Handling a Received Validation Request
  4. Give some advice to caches about selecting from multiple stored responses when handling a received validation request -- or at least illustrate the tradeoffs.
  5. Look at Semantics 13.1.2's use of selected representation and the implication that there's always only one.

@davidben
Copy link

Right, but If-None-Match is also processed by intermediary caches, and they don't necessarily have enough information to select a single response for validation. That means that when more than one listed entity-tag is selectable, they need to either forward it to origin, or guess. Most intermediaries will be biased towards guessing for performance reasons.

I may be misunderstanding this, but why does the intermediary cache need to guess? It sounds like this is, as with other cases, a question of what is the scope of an etag. If we believe an etag is scoped to the primary key, then:

  • Clients may offer If-None-Match across secondary keys.
  • Intermediaries may match them across secondary keys.
  • Origins may not reuse etags for different representations across secondary keys

If we believe an etag is scoped to the (primary, secondary) tuple, then:

  • Clients may not offer If-None-Match across secondary keys.
  • Intermediaries may not match them across secondary keys.
  • Origins may not reuse etags for different representations across secondary keys

The first interpretation results in slightly better caching and matches existing behavior. Hopefully origins aren't misbehaving here, but given how long it's worked this way, I think we've pretty clearly decided on a behavior by now.

OTOH in situations where browsers have more information -- for example, if they were to implement Variants first, or if they knew more about the client's relative preferences than they could emit in request headers -- constraining the entity-tags sent could help an intermediary make a selection decision when it has more than one stored response.

I don't follow this. Variants doesn't introduce request header fields, only response header fields. The only place a browser could implement variants is in the HTTP cache. Variants is useful if you see many secondary keys for the same primary key. That's much more likely in something like a CDN, than something like a browser.

What exactly are you suggesting browsers do with what information? (I also didn't understand an earlier remark about better information over in #832 (comment) so this might be the same request for clarification.)

@mnot
Copy link
Member Author

mnot commented Apr 28, 2021

I think this is all an aside, but to answer your questions:

An intermediary cache needs to guess when it receives a request like this (for illustration only):

Accept-Language: en, fr
Accept: text/html, application/pdf

... and it has some combination of English, French, HTML and PDF responses in cache, but none that were cached based upon a request like this one (inserting q-values might help make a decision in one plane or the other, but it doesn't resolve preferences across both planes of negotiation).

For a request like this, the cache key is ambiguous. It has to select exactly one if it wants to generate a 304, so it needs to choose carefully.

If the client also includes:

If-None-Match: "catalonian-css" <-- again, just for illustration

... the first question you'd ask is why the hell would you do that? and then you'd start wondering if the client knows something you don't.

WRT Variants -- each response header field defines its own algorithm for selection. If a cache doesn't implement that field, it falls back to Vary for it; caches will implement new fields at different rates.

P.S. entity-tags are very clearly scoped to the resource (primary key).

@mnot
Copy link
Member Author

mnot commented Apr 28, 2021

See PR. I haven't tried to address this issue of multiple selected responses because it would be pretty invasive, and we'd need another pretty big cycle to get it right.

@reschke
Copy link
Contributor

reschke commented Apr 29, 2021

can this one be closed?

@mnot
Copy link
Member Author

mnot commented Apr 30, 2021

I'm inclined to leave it open so we can remember to bring it up during IETF LC; it needs wider review since it slipped between WGLC and IETF LC.

@mnot
Copy link
Member Author

mnot commented Jul 15, 2021

Hmm, we've made other changes in response to LC feedback and closed the issues, so closing this too (am assuming we'll send a summary out).

@mnot mnot closed this as completed Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants