# [HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy #26532

Merged
merged 1 commit into from Feb 25, 2019

## Conversation

Projects
None yet
10 participants
Contributor

### aschempp commented Mar 14, 2018 • edited

Q A
Branch? 3.4
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #26245, #26352, #28872
Doc PR -

This PR is a first draft to fix the incorrect merging of private and other cache-related headers that are not meant for the shared cache but the browser (see mentioned issues).

The existing implementation of HttpFoundation\Response is very much tailored to the HttpCache, for example isCacheable returns false if the response is private, which is not true for a browser cache. That is why my implementation does not longer use much of the response methods. They are however still used by the HttpCache and we should keep them as-is. FYI, the ResponseCacheStrategy does not affect the stored data of HttpCache but is only applied to the result of multiple merged subrequests/ESI responses.

I did read up a lot on RFC2616 as a reference. Section 13.4 gives an overall view of when a response MAY be cached. Section 14.9.1 has more insight into the Cache-Control directives.

Here's a summary of the relevant information I applied to the implementation:

• Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation.

A response without cache control headers is totally fine, and it's up to the cache (shared or private) to decide what to do with it. That is why the implementation does not longer set no-cache if no Cache-Control headers are present.

• A response received with a status code of 200, 203, 206, 300, 301 or 410 MAY be stored […] unless a cache-control directive prohibits caching.

A response received with any other status code (e.g. status codes 302 and 307) MUST NOT be returned […] unless there are cache-control directives or another header(s) that explicitly allow it.

This is what ResponseCacheStrategy::isUncacheable implements to decide whether a response is not cacheable at all. It differs from Response::isCacheable which only returns true if there are actual Cache-Control headers.

• Section 13.2.3: When a response is generated from a cache entry, the cache MUST include a single Age header field in the response with a value equal to the cache entry's current_age.

That's why the implementation always adds the Age header. It takes the oldest age of any of the responses as common denominator for the content.

• Section 14.9.3: If a response includes an s-maxage directive, then for a shared cache (but not for a private cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header.

This effectively means that max-age, s-maxage and Expires must all be kept on the response. My implementation assumes that we can only do that if they exist in all of the responses, and then takes the lowest value of any of them. Be aware the implementation might look confusing at first. Due to the fact that the Age header might come from another subresponse than the lowest expiration value, the values are stored relative to the current response date and then re-calculated based on the age header.

The Symfony implementation did not and still does not implement the full RFC. As an example, some of the Cache-Control headers (like private and no-cache) MAY actually have a string value, but the implementation only supports boolean. Also, Custom Cache-Control headers are currently not merged into the final response.

ToDo/Questions:

1. Section 13.5.2 specifies that we must add a Warning 214 Transformation applied if we modify the response headers.

2. Should we add an Expires headers based on max-age if none is explicitly set in the responses? This would essentially provide the same information as max-age but with support for HTTP/1.0 proxies/clients.

3. I'm not sure about the implemented handling of the private directive. The directive is currently only added to the final response if it is present in all of the subresponses. This can effectively result in no cache-control directive, which does not tell a shared cache that the response must not be cached. However, adding a private might also tell a browser to actually cache it, even though non of the other responses asked for that.

4. Section 14.9.2: The purpose of the no-store directive is to prevent the inadvertent release or retention of sensitive information […]. The no-store directive applies to the entire message, and MAY be sent either in a response or in a request. If sent in a request, a cache MUST NOT store any part of either this request or any response to it. If sent in a response, a cache MUST NOT store any part of either this response or the request that elicited it.

I have not (yet) validated whether the HttpCache implementation respects any of this.

5. As far as I understand, the current implementation of ResponseHeaderBag::computeCacheControlValue is incorrect. no-cache means a response must not be cached by a shared or private cache, which overrides private automatically.

6. The unit tests are still very limited and I want to add plenty more to test and sort-of describe the implementation or assumptions on the RFC.

#SymfonyConHackday2018

Member

### nicolas-grekas left a comment • edited

 Some minor comments to start the review. My most important comment is that this should be merged as a new feature to me, thus rebased against master.

Member

### fabpot commented Mar 21, 2018

 If you have time @mpdude, I would love to get your review on this one.

Closed

Contributor

### mpdude commented Mar 23, 2018 • edited

 I'd really like to help here! It's a bit difficult, though, as the description does not directly make clear what we're after. After reading the linked issues my understanding is that we want to enable the ResponseCacheStrategy to deal with private responses and come up with a result better than no-cache, must-revalidate – that is, provide an expiration time. Before discussing the implementation, can we agree on what the merge policy needs to be? I have reviewed the existing tests and think that the merge rules could be as follows. If we have no embedded response, leave the master response alone. If one of the responses is private, the result must be private. Otherwise, all responses are public. Validation-type headers (Last-Modified, ETag) need to be removed. Validation cannot reasonably be used on merged responses. If the result is public, compute the final s-maxage by looking for the smallest value among all responses, considering s-maxage, max-age and Expires in this order. If only a single response does not provide any of these, the result must use no-cache, must-revalidate. Try to compute a final max-age by looking for the smallest value among all responses, considering max-age and Expires in this order. If only a single response does not provide any of these two, we cannot compute max-age. If the result is private and we don't have the max-age, we need to resort to no-cache, must-revalidate. Calculate Age in a way that the resulting response becomes stale as soon as the lowest TTL of any response expires. [Do we have to consider s-maxage or max-age here?] Does that make sense?
Contributor Author

### aschempp commented Mar 25, 2018

 agreed If not all of the responses are marked as public, the final should not have public imho. Not having a public directive is not the same as not having any directive. In the same sense, having one private response (allow browser caching) does not mean all responses can be cached by the browser. Only if all other are either public or private a browser should be allowed to cache the result. FYI: this might not be implemented in my PR. agreed Why do we merge multiple header values that are not meant to be the same? s-maxage is for shared caches only max-age is for private caches as well as for shared caches if no s-maxage is defined Expires headers are similar to max-age, but mostly useful for HTTP/1.0 protocol. Expires can existing in combination with max-age to control or prohibit caching for HTTP/1.0 servers, especially if they do not support features like must-revalidate. same as 3. A private directive without max-age is perfectly fine. Each cache definition is valid without other. It's up to the cache server to decide on a cache duration if none is given. This assumes all responses are marked as private but all or some of them do not have a max-age. The age has nothing to do with expiration time. Age is determined by the difference between Date header (when the response has been added to cache) and now. After merging multiple responses, age must represent the oldest response's age imho. I still don't understand why Symfony generally adds a no-cache, must-revalidation. Quoting from #26245 A no-cache directive essentially tells all caches (and that includes the Browser, which is a private cache) to no cache the response. I think you're misreading the RFC you quoted? no-cache means "don't serve this from a cache without revalidation", private means "store in private caches only". no-store is the way of stating "don't keep this in caches". You're somewhat right with my incorrect interpretation. I wasn't very sure about revalidation, so I looked it up: https://stackoverflow.com/questions/18148884/difference-between-no-cache-and-must-revalidate#19938619 So essentially, no-cache, must-revalidate allows a cache to store it but never to use it without revalidation. And we are removing both possible revalidation options (Etag and Last-Modified, see 2.), which essentially results in a useless waste of storage? Also, no-cache implies must-revalidate, so the second directive is not useful either?

Open

Closed

Contributor

### Toflar commented Apr 9, 2018

 Any progress here? The current implementation breaks client cache headers when working with ESI fragments :(
Member

### nicolas-grekas commented Apr 14, 2018

 friendly ping @aschempp
Contributor Author

### aschempp commented Apr 15, 2018 • edited

 I have updated the PR to fix some mentioned coding style and implemented point 1 in my comment. I know there's more, but I think we should discuss the general idea before I finish the implementation. However, most of the notes in my thoughts in the last comment still stand, so I guess we should also ping @mpdude 😁 Before we discuss the fine details of this though, we should agree or even write down our thoughts on the supposed default behavior. Symfony always adds cache control headers which I consider strange. Especially the second if-condition means I can have an Expires header but the ResponseHeaderBag will add a cache-control directive that "breaks" this for any HTTP/1.1 cache. According the the RFC, not having a Cache-Control header is perfectly fine, but that is not possible with Symfony. Is this behavior considered correct @mpdude / @nicolas-grekas ? And if we want to be conservative (prevent caching if no caching headers are given), I think we should still correctly handle HTTP/1.0 headers?
Contributor Author

### aschempp commented Apr 30, 2018

 Is there anything I can do to push this forward?
Member

### nicolas-grekas commented Jun 19, 2018

 @aschempp would you mind rebasing please?

### aschemppforce-pushed the aschempp:http-header-merge branch from 5f3f951 to b98e25aJun 20, 2018

Contributor Author

### aschempp commented Jun 20, 2018

 Rebase is done. The change from aschempp@3d27b59#diff-e5a339b48cec0fa0a4d0c5c4c705f568R75 has been dropped, but I preserved the unit test and they are still passing.
Contributor Author

### aschempp commented Jun 20, 2018

 I just looked at the disabled unit tests and will follow up with a few changes shortly.
Contributor Author

### aschempp commented Jun 20, 2018

 Sorry for taking a while. I tried to implement merging of Last-Modified and Etag but realized this won't work reliably. They now render a response uncachable without additional expiry headers. One thing I'm not sure about is the default Response behavior. If there are not Cache-Control headers, Symfony always adds private, must-revalidate. https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpFoundation/ResponseHeaderBag.php#L309-L331 First of all, private means a browser is allowed to cache it, which is not the same as no Cache-Control provided, ie. probably not cacheable. Secondly, must-revalidate does not make sense because there is never any validation information (Last-Modified and Etag are removed), a validation will never be possible.
Contributor Author

### aschempp commented Jun 21, 2018

 I have removed two assertions in the HttpCacheTest to fix tests. I'm very well aware that removing assertions to fix a test is not really a practical solution. However, as explained in #26245 (comment), I think the assertions (and Symfony behavior) were wrong in the first place …

### ausi reviewed Jun 21, 2018

Contributor Author

### aschempp commented Jul 10, 2018

 Any follow-up questions for this PR I need to address?
Member

### nicolas-grekas commented Jul 26, 2018

 (could you please rebase on latest 3.4?)
Contributor

### mpdude commented Jul 27, 2018

 For me, this PR is too confusing to give meaningful feedback. Can we somehow simplify this or break it into smaller pieces we could discuss independently?
Contributor Author

### aschempp commented Jul 30, 2018

 @mpdude did you look at the diff or at the new version of the file directly? What can I do to simplify the changes? I think they are all related, there are no "individual features" that could be split into smaller pieces. @nicolas-grekas I will rebase as soon as any questions are resolved. Previously all changes in the old file were irrelevant because it is basically a complete rewrite of the class.

### Toflar referenced this pull request Oct 15, 2018

Closed

#### HttpCache does not merge directive no-store is correctly #28872

Contributor Author

### aschempp commented Oct 22, 2018

 Is there anything I can do to get this merged at some point?
Member

### fabpot commented Dec 8, 2018

 @aschempp Can you rebase this pull request? We cannot merge a pull request with a merge commit. Thank you.

### stof requested changes Dec 8, 2018

#### aschempp Dec 18, 2018

Author Contributor

That would not be the same!

• Your suggestion means that a response with either ETag or Last-Modified willMakeFinalResponseUncacheable. That is not what I (tried to) implement
• The final response is uncache if
1. ( it is not a given status code OR it has ETag or Last-Modified )
2. AND it does not have any other caching info (like max-age).

The implementation just works the opposite way.

1. If it has a given status code and none of the headers, we already know it will not
…MakeFinalResponseUncacheable
2. If either it has a different status code OR it has one of the headers, we must also check for Cache-Control headers

#### dbu Dec 18, 2018

Contributor

right, this is not the same. hm, so if status is 200 and we have no etag/last-modified, we say its ok to cache, even if max-age is set to 0? ah, but then we would still mark the final result with max-age: 0, and same goes for private.

okay, then i agree this is correct, though somewhat counterintuitive. can you mention this in the phpdoc, that cache-control instructions are handled elsewhere?

also, if the fragment has no cache-control header but the master response has max-age: 1 day, would we end up with caching the combined response for a whole day? or should we assume a default max-age when caching something with status 200 and no cache-control instruction? varnish takes 2 minutes in that case, by default...

#### aschempp Dec 18, 2018

Author Contributor

hm, so if status is 200 and we have no etag/last-modified, we say its ok to cache, even if max-age is set to 0?

No, we don't say its ok to cache. We just determine that this response does not make the final response uncacheable. The final response can still have no useful caching info and therefore be not cacheable.

if the fragment has no cache-control header but the master response has max-age: 1 day, would we end up with caching the combined response for a whole day?

There is no difference between ESI fragments and the master response. They are all sent to the add method, and the result of ALL responses (regardless of their type) is added in the update method. So if any - regardless if master or fragment - has no mergeable data, nothing will be added to the final response.

#### dbu Dec 19, 2018

Contributor

ah right, this happens here: https://github.com/symfony/symfony/pull/26532/files#diff-e5a339b48cec0fa0a4d0c5c4c705f568R212 - if one response has no max-age we set that info to false.

so this should indeed be fine. maybe put part of this explanation into the phpdoc?

We just determine that this response does not make the final response uncacheable. The final response can still have no useful caching info and therefore be not cacheable.

#### dbu Jan 30, 2019

Contributor

can you port the gist of this discussion into the comment here?

src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php

Contributor

### dbu left a comment

 i think its important that we improve things here. it is quite confusing to reason about these things, as interactions are quite complicated with the many directives. i think we need to tweak the code to be easier to understand to be more confident we get it right this time. could you have another iteration to improve the readability of this code?
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
 } elseif (null !== $maxAge = min($this->maxAges)) { $response->setSharedMaxAge($maxAge); $response->headers->set('Age',$maxAge - min(\$this->ttls)); // Last-Modified and Etag headers cannot be merged, they render the response uncacheable

#### dbu Dec 16, 2018

Contributor

maybe formulate it as "If an embedded response uses Last-Modified or an Etag, the combined response is not cacheable."

hm, but the code is saying that if we have a success status AND no last modified / etag, it certainly is cacheable.
should we instead say that if there is either etag or last-modified, we return true?

and to simplify reasoning in this method, i suggest flipping this method over to keepsFinalResponseCacheable and switch true / false. and invert the bool where we call the method.

src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php Outdated
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php Outdated

### dbu reviewed Dec 19, 2018

 * we have to subtract the age so that the value is normalized for an age of 0. * * If the value is lower than the currently stored value, we update the value, to keep a rolling * minimal value of each instruction.

#### dbu Dec 19, 2018

Contributor

lets also mention that this method is always called and if there is no setting, the default value is null, and we won't set the information on the final response if it was not present on one of the responses.

#### dbu Jan 30, 2019

Contributor

formulation suggestion:

This method is always called, even for responses that do not have the respective instruction in
their Cache-Control header (or have no Cache-Control header at all). If any response does
not have an instruction, we do not set that instruction on the final response.


Member

### nicolas-grekas commented Jan 27, 2019

 What's the status here? Please rebase to account for short arrays also.
Contributor Author

### aschempp commented Jan 27, 2019

 This PR is still ready to me. Be aware that it currently points to 3.4 as a bugfix, so I'm not sure about short arrays? I have finished everything requested at the SymfonyCon, but we haven't made any more progress since. I'm not sure who's in charge of a final decision on what needs to be completed and actually merge this?
Member

### nicolas-grekas commented Jan 27, 2019

 Short arrays have been applied to 3.4 also, that's why a rebase is needed :)

### aschemppforce-pushed the aschempp:http-header-merge branch from 8fd595e to 76adb4aJan 29, 2019

Contributor Author

### aschempp commented Jan 29, 2019

 Rebased now and updated to short arrays using the php-cs-fixer
Member

### nicolas-grekas commented Jan 29, 2019

Contributor Author

### aschempp commented Jan 29, 2019

 Thanks! What about unanswered comments from David? You mean this one? #26532 (comment) I'm not sure how to handle this, or how a different comment would be an improvement…
Contributor

### dbu commented Jan 30, 2019

 the open comments are all about improving the phpdoc or doc comments. basically for the places where i misunderstood the code before, because i think the explanations given in the discussion here and in person in lisbon would merit to be in the code file for future reference. some of the interactions are quite intricate (i don't see a way to improve that) and therefore need to be well documented so we still remember why things are as they are in a couple of years.
Member

### nicolas-grekas commented Jan 30, 2019

 Any suggestions maybe? :)

Member

### nicolas-grekas left a comment

 I'm going to trust you on this one :)

Member

### stof left a comment

 This implementation is the one we discussed during the SymfonyCon hackday, solving lots of weird cases. Great to see it finally validated 😄
 [HttpKernel] Correctly merging cache directives in HttpCache/Response… 
…CacheStrategy
 893118f 

Member

### fabpot commented Feb 25, 2019

 Thank you @aschempp.

### fabpot merged commit 893118f into symfony:3.4 Feb 25, 2019 1 of 3 checks passed

#### 1 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Details

### fabpot added a commit that referenced this pull request Feb 25, 2019

 bug #26532 [HttpKernel] Correctly merging cache directives in HttpCac… 
…he/ResponseCacheStrategy (aschempp)

This PR was squashed before being merged into the 3.4 branch (closes #26532).

Discussion
----------

[HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #26245, #26352, #28872
| Doc PR        | -

This PR is a first draft to fix the incorrect merging of private and other cache-related headers that are not meant for the shared cache but the browser (see mentioned issues).

The existing implementation of HttpFoundation\Response is very much tailored to the HttpCache, for example isCacheable returns false if the response is private, which is not true for a browser cache. That is why my implementation does not longer use much of the response methods. They are however still used by the HttpCache and we should keep them as-is. FYI, the ResponseCacheStrategy does **not** affect the stored data of HttpCache but is only applied to the result of multiple merged subrequests/ESI responses.

I did read up a lot on RFC2616 as a reference. [Section 13.4](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4) gives an overall view of when a response MAY be cached. [Section 14.9.1](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1) has more insight into the Cache-Control directives.

Here's a summary of the relevant information I applied to the implementation:

- > Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation.

A response without cache control headers is totally fine, and it's up to the cache (shared or private) to decide what to do with it. That is why the implementation does not longer set no-cache if no Cache-Control headers are present.

- > A response received with a status code of 200, 203, 206, 300, 301 or 410 MAY be stored […] unless a cache-control directive prohibits caching.

> A response received with any other status code (e.g. status codes 302 and 307) MUST NOT be returned […] unless there are cache-control directives or another header(s) that explicitly allow it.

This is what ResponseCacheStrategy::isUncacheable implements to decide whether a response is not cacheable at all. It differs from Response::isCacheable which only returns true if there are actual Cache-Control headers.

- > [Section 13.2.3](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.3): When a response is generated from a cache entry, the cache MUST include a single Age header field in the response with a value equal to the cache entry's current_age.

That's why the implementation **always** adds the Age header. It takes the oldest age of any of the responses as common denominator for the content.

- > [Section 14.9.3](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3): If a response includes an s-maxage directive, then for a shared cache (but not for a private cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header.

This effectively means that max-age, s-maxage and Expires must all be kept on the response. My implementation assumes that we can only do that if they exist in **all** of the responses, and then takes the lowest value of any of them. Be aware the implementation might look confusing at first. Due to the fact that the Age header might come from another subresponse than the lowest expiration value, the values are stored relative to the current response date and then re-calculated based on the age header.

The Symfony implementation did not and still does not implement the full RFC. As an example, some of the Cache-Control headers (like private and no-cache) MAY actually have a string value, but the implementation only supports boolean. Also, [Custom Cache-Control headers](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.6) are currently not merged into the final response.

**ToDo/Questions:**

1. [Section 13.5.2](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.2) specifies that we must add a [Warning 214 Transformation applied](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.46) if we modify the response headers.

2. Should we add an Expires headers based on max-age if none is explicitly set in the responses? This would essentially provide the same information as max-age but with support for HTTP/1.0 proxies/clients.

3. I'm not sure about the implemented handling of the private directive. The directive is currently only added to the final response if it is present in all of the subresponses. This can effectively result in no cache-control directive, which does not tell a shared cache that the response must not be cached. However, adding a private might also tell a browser to actually cache it, even though non of the other responses asked for that.

4. > [Section 14.9.2](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.2): The purpose of the no-store directive is to prevent the inadvertent release or retention of sensitive information […]. The no-store directive applies to the entire message, and MAY be sent either in a response or in a request. If sent in a request, a cache MUST NOT store any part of either this request or any response to it. If sent in a response, a cache MUST NOT store any part of either this response or the request that elicited it.

I have not (yet) validated whether the HttpCache implementation respects any of this.

5. As far as I understand, the current implementation of [ResponseHeaderBag::computeCacheControlValue](https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpFoundation/ResponseHeaderBag.php#L313) is incorrect. no-cache means a response [must not be cached by a shared or private cache](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1), which overrides private automatically.

5. The unit tests are still very limited and I want to add plenty more to test and sort-of describe the implementation or assumptions on the RFC.

/cc @nicolas-grekas

#SymfonyConHackday2018

Commits
-------

893118f [HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy
 bb90359 
Contributor

### mpdude commented Feb 25, 2019

 Great job @aschempp and all others involved!

Merged

Merged

Closed