New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP NEW Add better HTTP cache-control manipulation #8086
Conversation
_config/config.yml
Outdated
must-revalidate: "true" | ||
no-transform: "true" | ||
vary: "Cookie, X-Forwarded-Protocol, User-Agent, Accept" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this an implicit API change? There'll be configurations which relied on those defaults, however bad they might be. At least this will need a pretty solid upgrading guide.
- Removing
Vary: Cookie
: Leaving this out could be dangerous unless HTTP proxies implement "safe but not spec compliant behaviour", namely that they ignore caching instructions if detecting cookie that aren't whitelisted, regardless ofVary
(CloudFlare and Incapsula do that). The proper answer is not to sendCache-Control
headers in the first place with those cookies, soVary
doesn't apply. I think as long as we stick with "PHP session = potentially private content", we're good. There's some edge cases around cookie-based a/b testing, but I think they can be covered in the upgrading guide rather than 100% backwards compat. - Removing
Vary: Accept
: That seems fine, it's auto-added by mod_gzip etc anyway - Removing
Vary: X-Forwarded-Protocol
: Do you have a reference on why this is bad practice? It seems common practice for a cache to include the full URI (incl. protocol) in computing the cache key, making this header redundant. It's more relevant for the application processing a request. - Adding
Vary: X-Requested-With
: That seems overkill as a default? It's fairly uncommon to vary responses between ajax and non-ajax requests. We might do this in the CMS, but those are never cacheable requests anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They may be an implicit API change and yes some sites may rely on these, but this combination is practically "no-cache" as far as a proxy is concerned because these Headers are likely to be unique per visitor so if we want to provide meaningful HTTP caches we just can't have this set up.
Vary: Cookie
: If we denote the content as private using the private
directive then this vary is not needed. Fastly explicitly say Vary: Cookie
is no good. If we use Vary: Cookie
we may as well not use Cache-Control
it seems.
Vary: Accept
: Note this is different from Accept-Encoding
(which is added by mod_deflate
and other server layers that do compression). Accept
is the mime types the browser understands (eg: text/html
or application/json
). I can't seem to find many references online to support using Vary: Accept
anywhere.
Vary: X-Forwarded-Protocol
: This is a header that should only be set by proxies, load balancers, CDNs and is not a header the browser sends, this means it provides no value as a Vary
.
Vary: X-Requested-With
: I've added that purely because of our HTTP.cache_ajax_requests
config option as it seems sensible; though I think you're right that it's not needed... Happy to remove it if you think even with our HTTP.cache_ajax_requests
option it is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, thanks for taking the time to respond!
cache_ajax_requests
is such a whacky behaviour, I think we should deprecate it in 4.x. But in the meantime, I would only add this Vary header if its turned off - because that's the only time the behaviour "varies" based on that metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only add this Vary header if its turned off - because that's the only time the behaviour "varies" based on that metadata.
That's true, so I'm happy to take it out, it does make managing the Vary
stuff a bit harder, though
_config/config.yml
Outdated
must-revalidate: "true" | ||
no-transform: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mozilla says:
No transformations or conversions should be made to the resource. The Content-Encoding, Content-Range, Content-Type headers must not be modified by a proxy. A non- transparent proxy might, for example, convert between image formats in order to save cache space or to reduce the amount of traffic on a slow link. The no-transform directive disallows this.
That seems like over-reach for framework defaults, and should be left to the specific implementation. Happy to have that removed.
|
||
// Validate argument | ||
if($body && !($body instanceof SS_HTTPResponse)) { | ||
user_error("HTTP::add_cache_headers() must be passed an SS_HTTPResponse object", E_USER_WARNING); | ||
$body = null; | ||
} | ||
|
||
// Development sites have frequently changing templates; this can get stuffed up by the code | ||
// below. | ||
if(Director::isDev()) $cacheAge = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think given the security implications of getting this change wrong, it'd be good to have quite granular commits for specific stuff, for example one commit for "move isDev below headers_sent check" to make that explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split this PR into smaller commits that are hopefully a bit easier to follow
Hi Dan, This is a good start and I like the general direction, although I've got a lot of feedback. :-) General:
Specific:
|
control/HTTPCacheControl.php
Outdated
* | ||
* | ||
*/ | ||
class HTTPCacheControl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about separating HTTPCacheControl
from the HTTP
class. Maybe we should just handle the state statically in HTTP
instead, and then move to an instance-based (and more testable) HTTP
class in 5.x? It seems weird to have HTTP::add_cache_headers()
and HTTP::set_cache_age()
controlling the Vary
, ETag
and Max-Age
aspects of HTTP caching, and then a separate HTTPCacheControl
class specifically for the Cache-Control
header. These headers conceptually belong together, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr: The Cache-Control header is quite complicated / specialised I felt it needed a dedicated class to manage its state.
The main reason I went with a separate class was just for my sanity in keeping this closely related logic together.
The Cache-Control
header is a special kind of header with these multiple directives that can be applied, a bit like a header of headers (like Cookies in that respect) and I just felt this was best contained in one single class. Symfony do something semi-similar where their "Header Bag" has a special treatment of the cache control header (https://github.com/symfony/http-foundation/blob/master/HeaderBag.php).
I don't much like this approach either, but I prefer it to adding it onto HTTP
and making that more bloated and less focused. Really, I'd have preferred to be managing this on the response object itself, but that seemed a bit too much of a shift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Symfony has it right, treating all headers in one API/class, and then specialising with methods like addCacheControlDirective()
.
control/HTTPCacheControl.php
Outdated
{ | ||
$allowedDirectoves = Config::inst()->get(__CLASS__, 'allowed_directives'); | ||
$directive = strtolower($directive); | ||
if (in_array($directive, $allowedDirectoves)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling
control/HTTP.php
Outdated
// Grab header for checking. Unfortunately HTTPRequest uses a mistyped variant. | ||
$contentDisposition = $body->getHeader('Content-disposition', true); | ||
if ($currentVary) { | ||
$vary = $currentVary . ', ' . $vary; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could lead to double ups (from the specific response and the generic config), but that's acceptable - I assume it's unlikely to throw off HTTP caches. So before generic config would override specific Vary
headers on the response, which makes that feature useless. Now it appends both. Ideally you could opt out of certain Vary
headers set in config (e.g. instruct to vary on cookie, apart from specific responses). But that's quite specialised, can't think of a great use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this unique shouldn't be too tough; I'll look at adding that in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this unique for now given the current approach, but this may change with some refactors
control/HTTP.php
Outdated
HTTPCacheControl::inst() | ||
->privateCache() | ||
->removeDirective('no-cache') | ||
->removeDirective('no-store'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the else
case from the original implementation which added the no-cache
and no-store
flags. Why wasn't this replaced with HTTPCacheControl::inst()->privateCache()
instead? Are you intentionally removing those defaults for non-content-disposition responses, and rely on other logic paths to mark this as a private cache? I can't see any invocations of setPrivate()
or privateCache()
in your changeset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old logic flow was essentially [logic to figure out $cacheAge
value] -> [if $cacheAge > 0] cache response [else if MSIE HTTPS attachment] cache response for short period [ else ] no cache.
The new logic flow is: [ work out if response should be cache-able and modify HTTPCacheControl
as appropriate ] -> [ fix edge case bugs ]
The previous logic was "if this is a non-cacheable resource ($cacheAge is <= 0) AND this is MSIE requesting a file download over HTTPS, make it cacheable but for a short period of time).
In my refactor I'm no longer using $cacheAge
as a way of determining if a response should be cached, instead I'm directly calling HTTPCacheControl::inst()->disableCaching();
.
So what happens now is we are saying "if this is MSIE requesting a file download over HTTPS and we have the no-cache
or no-store
directives, then make this a private cache response. This is more inline with fixing the actual bug as described by MS: "When you try to open a Microsoft Office document or a PDF file by typing an HTTPS URL for the document in the Address bar ..., you may receive [an error message]. ... This issue occurs if the server sends a "Cache-control:no-store" header or sends a "Cache-control:no-cache" header."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any invocations of setPrivate() or privateCache() in your changeset?
3 lines above this review comment I use privateCache
:)
$body->setBody(''); | ||
} else { | ||
// this is wrong, we need to send the same vary headers and so on | ||
header('HTTP/1.0 304 Not Modified'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this logic branch exists. Why does the presence or lack of a response body influence the way the response code is sent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre-existing behaviour of HTTP::add_cache_headers() that should be deprecated in 4 but we still need to support in SS3:
You can call it without an SS_HTTPResponse object and it will instead perform direct header()
calls. Yuck, but it's used in RSSFeed.php in SS3 and in principle could be used by project code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, it's checking for the presence of a response object implicitly by checking for $body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RSSFeed could still take a response here, and we could make the missing response arg a deprecation notice.
We should also have, say, "temporary" could also be "dynamic", "uncacheable", or even "ephemeral" (okay that last one is kind of a job) |
Do we need to make On modes, this sounds a bit like gold plating the API? I think a "temporary" mode would be an advanced enough use case that we should rely on the implementor to understand the HTTP caching behaviour and use the more lowlevel HTTP APIs like |
We should also have, say, "temporary" could also be "dynamic", "uncacheable", or even "ephemeral" (okay that last one is kind of a job) |
This is a non-trivial problem. It's almost like you need a prioritised list of "set private" and "set public" (or set temporary) declarations and the highest priority one wins. For example, if we say "set private if a session cookie is present", then a conscientious developer might choose to override that with something more targeted. In general, however, temporary > private > public for any given priority level. |
Temporary is not advanced. It occurs on pretty well every POST request, and
we need to put the default cache headers for these situations somewhere.
|
Just a quick summary for any further The way I understand the For simplicity I'm only going to consider the flow for a browser. When a browser makes a request to a server, if the response is cache-able (ie: If we look at a typical HTTP request that chrome might make:
And the server responds with:
Then the browsers cache key is no longer just the requested URL it will now also contain the value the browser sent with it's For CDNs it's similar except they'll ignore more requests (like those marked as Therefore when thinking of what goes into a
Especially when considering CDNs, we want to be caching resources that can be shared and using |
It looks like must of this API has stayed the same in 4.x, with
My intention was the
I think that's true, I've added
I hadn't thought about this, mostly because this isn't possible without modifying config at the moment anyway. I was thinking we'd leave this as is for the current change / allowing users to directly set the vary header on the response object... |
So looks like we've got a few unresolved points:
|
I think to answer this we need to see a PR to silverstripe-controllerpolicy designed to work with this that ensure that the configuration currently being provided to controllerpolicy only applies to public responses. This is absolutely critical; all of this work is meaningless unless that is achieved. @dhensby can you do some work on a controllerpolicy PR to go alongside this and see how you get on? Make sure that it doesn't require any change to preexisting controllerpolicy configuration so that an upgrade to both framework and controllerpolicy can be dropped in-place to an existing site. |
The design of Given the lifetime of 4, we're going to need to be able to undertake refactorings in a non-breaking way and it's not in the interests of the project to leave them 'til 5. |
<moved tasks from this comment to silverstripe/silverstripe-versioned#146 > |
61c3cb3
to
4cd7c32
Compare
@dhensby Can you please prioritise the rest of your checklist for your working day today? We need to get this fully into peer review (incl. any changes requires to controllerpolicy) by Thursday NZST, if we want to release 3.7.0-rc1 by Tuesday NZST (Monday is a public holiday here)
I'm on the fence with this. We want to teach developers good caching habits, and having consistent API that can be memorised across releases is a big part of that. But then again, the 3.x release line won't have a big impact on developer habits any more. I don't see a whole lot of practical difference between e.g.
I note you haven't added that to your checklist ;) I think if we go to a middleware based approach in 4.x, I wouldn't expect there to be another API called We have slightly longer for 4.2.0-rc1 than we have for 3.7.0-rc1 (deadline next Tuesday), but if we introduce an API in 3.x we should have a plan for how to continue it in 4.x. I don't feel strongly about the above, tending towards "whatever gets the job done" at the moment. We need to make a call one way or another in the next 24h on this, and I'm happy to follow Dan's lead since he's put the most energy into this. |
6c8e66c
to
8774857
Compare
Dan has sent a pull request for controllerpolicy: silverstripe/silverstripe-controllerpolicy#21 |
…stance Fix ajax detection Reset helper for httpcachecontrol
d120130
to
1cb8a39
Compare
control/HTTP.php
Outdated
if ($body && $body->getHeader('Cache-Control')) { | ||
trigger_error("Cache-Control header has already been set", E_USER_WARNING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we should say something like "Please use HTTPCacheControl::singleton() methods to more safely customise your cache-control headers."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
trigger_error(
"Cache-Control header has already been set. "
. "Please use HTTPCacheControl API to set caching options instead.",
E_USER_WARNING
);
I'm going to squash on merge; There are too many commits here to be worthwhile keeping if that's ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo!
Merged! |
Looks like it might be a bit late now heh, but earlier in this thread there were requests from @chillu to keep some of the commit messages separate so that they were clearer about their intention to future developers. Oh well - great to see it merged regardless! |
Using new core API, see silverstripe/silverstripe-framework#8086
Using new core API, see silverstripe/silverstripe-framework#8086
SS3 code for silverstripe/silverstripe-versioned#146
@chillu / @sminnee please don't hate me with this. This is just an initial POC of how we can get Cache-Control working in a more controlled way. Once the general approach is accepted I'd then start applying
HTTPCacheControl::inst()->disableCaching()
(or even
HTTPCacheControl::inst()->privateCache()`) to the relevant parts of the framework that indicate to us that there's some globally non-cachable content being passed around.HTTPCacheControl
which keeps better track of theCache-Control
directives and adds some helper methods to set up the cache-control to what we'd want (eg:disableCaching
to add the holy trinity ofno-cache, no-store, must-revalidate
andprivateCache
to set the cache-control to the sensible defaults for private cache-control headers.HTTP::add_cache_headers()
to fix the issues outlined here: Enforce no-cache when viewing draft or private content silverstripe-versioned#146 (comment)SS_HTTPRequest::getHeader
so that we can simplify a small piece of silly code here4, Fixed our inconsistent ETag generation