Rack::Cache swallows If-None-Match and If-Modified-Since #24

Closed
jacobat opened this Issue Jan 29, 2011 · 15 comments

Projects

None yet

3 participants

@jacobat
jacobat commented Jan 29, 2011

It seems Rack::Cache will swallow If-None-Match and If-Modified-Since headers. This seems to be a problem if I want to use Rack::Cache but have parts of my application that only allows private caching. In this case private caching will not reduce the strain on my server.

I've made a fresh Rails3 app and added Rack::Cache.

I'm calling my application with the following command:

$ curl -H "If-Modified-Since: Sat, 29 Jan 2011 11:14:37 GMT" -H "If-None-Match: \"03273f2f207cb7864f217458f0f85e4e\"" -q -v "http://127.0.0.1:3000/users"

And this is what I'm seeing in the log:

Started GET "/users" for 127.0.0.1 at 2011-01-29 14:15:02 +0100
  Processing by UsersController#index as */*
HTTP_IF_MODIFIED_SINCE: Sat, 29 Jan 2011 11:14:37 GMT
HTTP_IF_NONE_MATCH: "03273f2f207cb7864f217458f0f85e4e"
  User Load (0.9ms)  SELECT "users".* FROM "users"
Completed 304 Not Modified in 26ms

After adding Rack::Cache to application.rb and restarting my server this is what I get with the same curl command:

cache: [GET /users] miss

Started GET "/users" for 127.0.0.1 at 2011-01-29 14:24:22 +0100
  Processing by UsersController#index as */*
HTTP_IF_MODIFIED_SINCE: 
HTTP_IF_NONE_MATCH: 
Rendered users/index.html.erb within layouts/application (30.7ms)
Completed 200 OK in 91ms (Views: 43.8ms | ActiveRecord: 1.9ms)

In both cases I do get a 304 back from the server as reported by curl - I'm guessing that might be due to Rack::ConditionalGet though.

@rtomayko
Owner
rtomayko commented Apr 6, 2011

Hmm. That's a tough one.

Just so I'm clear, when you say "private caching" you mean setting the Cache-Control response header to private so that the response is cached in the browser/client only?

That seems valid to me. I'll have to think about this a little. The reason we remove conditional GET request headers is because we always want content from upstream if we don't already have the response in our cache, so that we can serve subsequent conditional requests without making additional upstream requests. That falls down in this case and I'm not sure how to get around it. We could store that a previous request for the same URL returned a private response but that's a pretty big change.

Let me do some research. It'd be great if clients sent a Cache-Control: private request header when making validation requests for responses that were previously marked private or something like that. In that case, we could simply bypass all the cache related logic when we see a private request.

@jacobat
jacobat commented Apr 7, 2011

Yes exactly, by private caching I mean "Cache-Control: private" to disallow shared caches to store the response.

I've been thinking about the problem and I think maybe the issue is in the assumption that the cache needs to fetch the full content if the content is not already in the cache.

Let's say the cache does not contain anything for a given request (etag/url combination). The cache would then hit the backend at the url passing the etag. Now two things can happen:

  1. The backend sees the etag is a match and returns 304. The cache can then store the 304 for that etag/url combination and use that to serve additional 304's without hitting the backend. It does not need to retrieve the full content in this case.
  2. The backend sees the etag is not a match (or the request etag is empty) and return 200 along with a full response body. The cache will store the response as usual.

Does this sound like a solution?

@rtomayko
Owner
rtomayko commented Apr 8, 2011

The backend sees the etag is a match and returns 304. The cache can then store the 304 for that etag/url combination and use that to serve additional 304's without hitting the backend. It does not need to retrieve the full content in this case.

That's interesting. It would only be valid for the amount of time given in the cache control max-age response value and would be useless to cache if there was none, but yeah that could be workable.

Implementing this is probably non-trivial. I'm not against just letting conditional requests pass through and bypassing any cache storage for now.

@jacobat
jacobat commented Apr 8, 2011

That's interesting. It would only be valid for the amount of time given in the cache control max-age response value and would be useless to cache if there was none, but yeah that could be workable.

Indeed. That's identical to the current situation in which the cache will need to hit the backend on each request if there's no expiration in the response.

Implementing this is probably non-trivial. I'm not against just letting conditional requests pass through and bypassing any cache storage for now.

Cool. I might take a stab at it when I get some time. I'll close this for now and get back to you if I get an implementation working.

@jacobat jacobat closed this Apr 8, 2011
@eric1234

Was wondering if this issue could be opened back up (or a new issue created) as this is causing a problem for Rails apps. In Rails 3.1 rack-cache is enabled by default (from what I can tell). This means that http://apidock.com/rails/ActionController/ConditionalGet/stale%3F doesn't work like it is supposed to.

Since the headers are removed Rails never has a chance to skip the expensive processing contained inside the if statement. It also renders the template which may be expensive as well. Even though a 304 is returned correctly Rails still did all the work (making the stale? useless).

I really don't complete understand this all but the swallowing of these headers is interfering with stale? In addition this behavior is un-expected. I am getting headers zapped and the fact that rack-cache was the one doing it made it VERY hard to track down. What if other headers can do useful stuff with these headers. Seems to violate principal of least surprise.

Just my two cents and notes. I still bow down before the gods of programming that put together these wonderful frameworks. :)

@rtomayko
Owner

Hmm. I think I see what you're saying.

One thing to note is that Rack::Cache does send If-Modified-Since and/or If-None-Match headers when it has an existing response in cache to validate that the cached response is fresh. In this way it makes stale? even more useful.

The only case where the IMS/INM headers are removed is when rack-cache does not have a version of the response in its cache, which should only be on the first request for each resource.

Further, rack-cache handles the same kind of stale? logic for you automatically when the response is cached and has an expiration lifetime (Cache-Control: max-age=n). In that case, the request doesn't even need to go to the rails app to validate freshness.

The following may be helpful for getting your ahead around some of this behavior:

http://tomayko.com/writings/things-caches-do

It's about HTTP reverse proxy caches in general but describes rack-cache's behavior exactly.

@eric1234

Two things:

  • If stale? is not given the :public => true flag (the default) then Rack-Cache doesn't cache it right? So it ends up hitting Rails every time and since the headers are removed it doesn't give Rails a chance to skip some expensive processing.
  • While Cache-Control: max-age=n is useful, sometimes you cannot know when in the future the item should expire. That is what is so nice about stale? It lets you do some processing (check some timestamps on some database objects) and potentially skip some heavy processing if the database hasn't updated since the user last requested the action. If the database has changed (which may happen at any time and cannot be predicted) then we end up inside the if statement doing the heavy processing.

This stale? functionality worked great prior to rack-cache being added to Rails 3.1. But after rack-cache was added stale? stale stopped working properly.

Feel free to correct my understanding here. But my experience is that upgrading to Rails 3.1 made it end up inside the stale? block on EVERY request (even though a 304 goes back after the first response). I can get around this by saying :public => true. But now my lack of understand has me not sure if my stale? check is being performed (hence a database update could cause a page change but the browser may not see it).

@rtomayko
Owner

If stale? is not given the :public => true flag (the default) then Rack-Cache doesn't cache it right? So it ends up hitting Rails every time and since the headers are removed it doesn't give Rails a chance to skip some expensive processing.

I'm not 100% sure how the :public flag in Rails effects things (does it just set cache-control: public?) but rack-cache stores anything that includes at least a Last-Modified or ETag header in its cache, since it can use the cached response for validation on future requests. This is a major reason rack-cache exists. Not just for Cache-Control: max-age=n responses.

... my experience is that upgrading to Rails 3.1 made it end up inside the stale? block on EVERY request (even though a 304 goes back after the first response).

I'm confused how it can both go inside the stale? block and send a 304 on subsequent responses. Do you mean the rails app sends a 200 for every response and then rack-cache sends a 304 to the client?

@rtomayko
Owner

I think a small example that reproduces this would be the most useful thing here. It's possible there's a bug somewhere or something changed in Rails to cause rack-cache validations to not kick in. What you're describing is definitely not intended behavior.

@eric1234

I'm not 100% sure how the :public flag in Rails effects things (does it just set cache-control: public?)

According to the documentation I linked in my first post that is exactly what it does.

Do you mean the rails app sends a 200 for every response and then rack-cache sends a 304 to the client?

That is exactly the behavior I am getting. Every request is sent onto Rails (I guess since it is private it doesn't keep it in it's cache?). Then rails not seeing the If-Modified-Since header returns "true" on stale? causing extra processing, template rendering and returns status 200. But the status I get back in browser (according to Chrome) is 304. I assumed Rack was seeing the content didn't change and sent back a 304?

@rtomayko
Owner

Interesting. Okay, I think I should be able to reproduce that. Thanks.

@eric1234

OK, I have have a small example. This is a factory rails app created via the following:

  1. rails new cache_test
  2. create app/controller/cache_controller.rb
  3. send the root request to cache#expensive
  4. turned on caching in the config/environments/development.rb (which adds the Rack-Cache middleware)
  5. Start up local server (i.e. rails s)

You will see every request goes inside the if statement even though the action indicates the page has not been modified since the epoch! Also Rails returns status 200 while the browser gets 304.

http://files.pixelwareinc.com/cache_test.tar.bz2

@rtomayko
Owner

Thanks so much for that.

@rtomayko
Owner

I was able to recreate the issue locally. It definitely looks like it's due to the Cache-Control: private header in the response. That's causing it not to go into cache so validation fails on subsequent requests.

@rtomayko rtomayko reopened this Sep 18, 2011
@rtomayko rtomayko added a commit that closed this issue Sep 18, 2011
@rtomayko allow IMS/INM requests through to backend on miss, fixes #24
Prior to this change, rack-cache would always strip any
If-None-Match and/or If-Modified-Since headers from the request
before passing along to the backend in an attempt to retrieve a
response to put in the cache (304 responses are not currently
cacheable). This approach falls down when the response includes a
'Cache-Control: private' header (Rails default) because the response
cannot be cached but will also never allow for client initiated
validation.

The downside to this change is that the cache will fill in more
slowly when clients make conditional requests for public resources;
it won't fill in until a non-conditional request is received. The
upside is that validation works for non-cacheable private responses.
f949823
@rtomayko rtomayko closed this in f949823 Sep 18, 2011
@rtomayko
Owner

This should be better now. More info:

f949823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment