Skip to content

Commit

Permalink
Merge branch 'caching' of github.com:kaiwren/wrest
Browse files Browse the repository at this point in the history
  • Loading branch information
Jasim A Basheer committed Jan 25, 2011
2 parents 9c0ad94 + aa345d6 commit 2828621
Show file tree
Hide file tree
Showing 24 changed files with 1,494 additions and 108 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ TAGS
.bundle
.redcar
.rvmrc
.idea
spec_all_rubies.sh
154 changes: 154 additions & 0 deletions Caching.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Caching in Wrest #

[RFC 2616's Caching section ](http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html) describes in detail how Caching is to be implemented by the clients.

A response should obey the following conditions to be considered cacheable by Wrest:

* Only responses to GET requests are cached.
* The response code must be 200, 203, 300, 301, 302, 304 or 307.
* The Cache-Control headers should not have neither no-cache nor no-store flag.
* There should not be Pragma: no-cache header. (this header is only used by HTTP 1.0 servers)
* Either Cache-Control: max-age or the Expires headers (or both) should be set. (Cache-control: max-age always take priority over Expires header.)
* If only Expires header is set, it should not be lesser than the response's Date header. It should also be greater than the time when the response was received by the client.
* The date headers (Date, Expires) should be in [RFC 1123 format](http://www.ietf.org/rfc/rfc1123.txt).
* The Vary header should not be present at all. (The Vary mechanism is used to conditionally control caching, which Wrest does not currently implement. Section 14.44 of the RFC 2616 describes the Vary tag in detail)

Whenever a GET request is sent to Wrest, it consults the Cache Store for a matching entry. If an entry is found and has not expired, it is returned back as the response without making a request to the server.

A cache entry is considered to be fresh (not expired) if:

* Its freshness lifetime is greater than zero.
* Freshness lifetime of a cache entry is its Cache-control: max-age if max-age is defined. If max-age is not defined, it would be the cache entry's Expires header-Current Time.
(note: either max-age or Expires header is liable to be present for the cache entry since only such response's are cached at all).

**AND**

* Its freshness lifetime is greater than the cache entry's age.
* Age of a cache entry is: Current Date & Time - the cached response's Date header, or the value of the Age header in the cached response, whichever is greater.

If a cache entry is available, but expired, Wrest sees if the entry can be validated. A cache entry can be validated if:

* It has a Last-Modified header, or an ETag header, or both.

If a cache-entry can be validated, Wrest sends the actual GET request to the server, alongwith:

* If-Modified-Since : <Last-Modified value of the cache entry> (if the header Last-Modified was present in the cache entry), and/or
* If-None-Match: <ETag of the cache entry> (if ETag was present in the cache entry)

The server determines whether the response cached at the client is still valid by looking at the values of the If-Modified-Since/If-None-Match headers. It sends a 304 (Not Modified) response without a body, if the response available with the client is still valid.

Wrest, upon receiving the 304 will update the existing cache entry with the headers provided in the 304 (RFC 2616 13.5.3 Combining Headers) and return the cached response to the client.

If the server determines the cached entry at the client side is invalid, it sends a full response (usually 200 Ok), which Wrest passes to the client after updating the existing cache entry with the new response.

If the cache-entry is expired, but cannot be validated, then Wrest sends a full blown GET request to the server. The response is passed to the client after updating the existing cache entry with the new response.

#### Edge Case for HTML documents ####

<META HTTP-EQUIV="Pragma" CONTENT="no-cache">

Firefox respects the Pragma header in the HTML document (nsHttpResponseHead.h:NoCache). Wrest cannot since it does not parse the response body.


## A Rough note on how the browsers (Firefox and Chrome) implement caching ##

Browsers usually cache all responses including non-cacheable ones. These are for use in the browser History (Forward, Back buttons). [ [RFC 2616](http://www.ietf.org/rfc/rfc2616.txt) 13.13 History Lists]
The non-cachebility restriction is usually observed after fetching a cache entry - if the stored response was not cacheable, it is not used.

A large chunk of caching logic for Firefox 3 is in the file netwerk/protcols/http/nsHttpChannel.cpp inside its source tree.

The browsers are optimistic with respect to caching - if a response does not explicitly specify an Expiration mechanism, it uses its own heuristics to calculate an Expiry time. However Wrest is pessimistic - if a document does not specifiy an explicit cache expiration mechanism, the response is not cached at all.

The following is a rough outline that I'd written to understand how the browsers implement caching. However, they do not necessarily reflect the browsers' behaviour accurately and has been heaviliy adapted to suit Wrest.

## Firefox: nsHttpChannell::CheckCache() ##

do_fetch if method.head != cache.head
do_fetch if not (method.head = 'GET' || method.head = 'HEAD')

use_cache if Cache-Control: max-age validates. Refer cache_expired?

re_validate if:

* Expires: header is a past date OR cache_expired?
* the cache entry has 'must-revalidate' header. [RFC 2616 14.9.4](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4)

## doValidation ##

Add an If-Modified-Since to the request if the cache has a Last-Modified value.
Add an If-None-Match to the request if the cache had an ETag

Send Request.

If a full response is received, update cache and return the result.
If a Not-Modified received, return the cache itself.

## Do Not Store in Cache If ##

* Original request was not (GET or HEAD)

* Any response with a code other than given MUST NOT be cached.
(success codes) 200,203 (cacheable redirects) 300, 301, 302, 304, 307.
[from Mozilla: nsHttpResponseHead.cpp::MustValidate(), also we cannot support 206 (partial content)]

* this is a response to a cache validation request: ie: the original request contained
an 'if-modified-since' or 'if-match' (http://codesearch.google.com/codesearch/p#OAMlx_jo-ck/src/net/http/http_cache_transaction.cc&l=45)

* has tags 'cache-control: no-cach or no-store', or 'pragma: no-cache' [HTTP 1.0]

* does not provide any explicit expiration time. to maintain maximum semantic transparency, we only cache those responses that explicitly permit caching. [RFC 2616 13.2.2](http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.2)

* if no max-age defined AND the cache expires in its past itself: cache.expires < cache.date

* the response has the Vary tag at all
[TODO: implement fully.
(http://www.subbu.org/blog/2007/12/vary-header-for-restful-applications)
(http://devel.squid-cache.org/vary/vary-header.html) ]


## cache_expired? ##

Firefox: nsHttpResponseHead.cpp: ComputeCurrentAge
[Chrome: RequiresValidation in http_response_headers.cc](http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/net/http/http_response_headers.cc&q=RequiresValidation&exact_package=chromium&sa=N&cd=2&ct=rc)

freshness_time=freshness_lifetime
if fresh <= 0
return true
end

return freshness_time <= current_age


## current_age ##

Verbatim from [Chrome's http_response_headers.cc](http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/net/http/http_response_headers.cc&q=RequiresValidation&exact_package=chromium&l=817)

date_value = headers['Date'] || response_time;
age_value=headers['Age'] || 0

apparent_age = response_time - date_value
corrected_received_age = max(apparent_age, age_value);
response_delay = response_time - request_time;
corrected_initial_age = corrected_received_age + response_delay;
resident_time = Time.now - response_time;

corrected_initial_age + resident_time;


## freshness_lifetime ##

This is a [link to Chrome source code](http://codesearch.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/net/http/http_response_headers.cc&q=GetFreshnessLifetime&exact_package=chromium&l=848) where freshness_lifetime is defined.

# References #

* [RFC 2616 Section 13 : HTTP Caching protocol](http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html)
* [Mozilla HTTP Caching FAQ](http://www.mozilla.org/projects/netlib/http/http-caching-faq.html)
* [Mark Nottingham's Caching Tutorial](http://www.mnot.net/cache_docs/)
* [Redbot for analyzing HTTP headers](http://redbot.org)


### Alternate Cache Implementations ###

[Resourceful - Ruby HTTP client that does caching](https://github.com/pezra/resourceful/blob/master/lib/resourceful/response.rb#L25)

[Python Httplib2 library](http://code.google.com/p/httplib2/source/browse/python3/httplib2/__init__.py?r=c86239ee0b6271309be2374f0ebfffd4455b7fb7#237)
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ group :fast_xml_deserialisation_rexml do
end

group :development do
gem 'dalli'
gem 'rubyforge'
gem 'hanna'
end
Expand Down
9 changes: 9 additions & 0 deletions README.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,15 @@ For Facebook, Twitter, Delicious, GitHub and other API examples, see http://gith

:follow_redirects_limit defaults to 5 if not specified.

* Caching

The following example will use Memcached to cache the response.

c42 = "http://c42.in".to_uri(:cache_store => Wrest::Components::CacheStore::Memcached.new("localhost:11211"))
response = c42.get

A detailed writeup regarding caching as defined by RFC 2616, and how Wrest implements caching is at {Wrest Caching Doc}[https://github.com/kaiwren/wrest/blob/caching/Caching.markdown]

* Deserialise with XPath filtering

ActiveSupport::XmlMini.backend = 'REXML'
Expand Down
4 changes: 4 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ else

begin
require 'metric_fu'
MetricFu::Configuration.run do |config|
config.rcov[:test_files] = ['spec/**/*_spec.rb']
config.rcov[:rcov_opts] << "-Ispec" # Needed to find spec_helper
end
rescue LoadError
puts 'metric_fu is not available. Install it with: gem install jscruggs-metric_fu -s http://gems.github.com'
end
Expand Down
3 changes: 3 additions & 0 deletions lib/wrest.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,19 @@ def self.use_curl
ActiveSupport::JSON.backend = "JSONGem"

require "#{Wrest::Root}/wrest/core_ext/string"
require "#{Wrest::Root}/wrest/hash_with_case_insensitive_access"

# Load XmlMini Extensions
require "#{Wrest::Root}/wrest/xml_mini"

# Load Wrest Core
require "#{Wrest::Root}/wrest/version"
require "#{Wrest::Root}/wrest/cache_proxy"
require "#{Wrest::Root}/wrest/http_shared"
require "#{Wrest::Root}/wrest/http_codes"
require "#{Wrest::Root}/wrest/native"


# Load Wrest Wrappers
require "#{Wrest::Root}/wrest/uri"
require "#{Wrest::Root}/wrest/uri_template"
Expand Down
105 changes: 105 additions & 0 deletions lib/wrest/cache_proxy.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
module Wrest

class CacheProxy
class << self
def new(get, cache_store)
if cache_store
DefaultCacheProxy.new(get, cache_store)
else
NullCacheProxy.new(get)
end
end
end

class NullCacheProxy
def initialize(get)
@get = get
end
def get
@get.invoke_without_cache_check
end
end

class DefaultCacheProxy
HOP_BY_HOP_HEADERS = ["connection",
"keep-alive",
"proxy-authenticate",
"proxy-authorization",
"te",
"trailers",
"transfer-encoding",
"upgrade"]

def initialize(get, cache_store)
@get = get
@cache_store = cache_store
end

def get
cached_response = @cache_store[@get.hash]
return get_fresh_response if cached_response.nil?

if cached_response.expired?
if cached_response.can_be_validated?
get_validated_response_for(cached_response)
else
get_fresh_response
end
else
cached_response
end
end

def update_cache_headers_for(cached_response, new_response)
# RFC 2616 13.5.3 (Combining Headers)
cached_response.headers.merge!(new_response.headers.select {|key, value| not (HOP_BY_HOP_HEADERS.include? key.downcase)})
end

def cache(response)
@cache_store[@get.hash] = response.clone if response && response.cacheable?
end

#:nodoc:
def get_fresh_response
@cache_store.delete @get.hash

response = @get.invoke_without_cache_check

cache(response)

response
end

#:nodoc:
def get_validated_response_for(cached_response)
new_response = send_validation_request_for(cached_response)
if new_response.code == "304"
update_cache_headers_for(cached_response, new_response)
cached_response
else
cache(new_response)
new_response
end
end

#:nodoc:
# Send a cache-validation request to the server. This would be the actual Get request with extra cache-validation headers.
# If a 304 (Not Modified) is received, Wrest would use the cached_response itself. Otherwise the new response is cached and used.
def send_validation_request_for(cached_response)
last_modified = cached_response.last_modified
etag = cached_response.headers["etag"]

cache_validation_headers = {}
cache_validation_headers["if-modified-since"] = last_modified unless last_modified.nil?
cache_validation_headers["if-none-match"] = etag unless etag.nil?

new_headers =@get.headers.clone.merge cache_validation_headers
new_options =@get.options.clone.tap { |opts| opts.delete :cache_store } # do not run this through the caching mechanism.

new_request = Wrest::Native::Get.new(@get.uri, @get.parameters, new_headers, new_options)

new_request.invoke
end
end
end
end
1 change: 1 addition & 0 deletions lib/wrest/components.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ module Components

require "#{Wrest::Root}/wrest/components/container"
require "#{Wrest::Root}/wrest/components/translators"
require "#{Wrest::Root}/wrest/components/cache_stores"
27 changes: 27 additions & 0 deletions lib/wrest/components/cache_store/memcached.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
require 'dalli'

module Wrest::Components::CacheStore
class Memcached

def initialize(server_urls=nil, options={})
@memcached = Dalli::Client.new(server_urls, options)
end

def [](key)
@memcached.get(key)
end

def []=(key, value)
@memcached.set(key, value)
end

# should be compatible with Hash - return value of the deleted element.
def delete(key)
value = self[key]

@memcached.delete key

return value
end
end
end
1 change: 1 addition & 0 deletions lib/wrest/components/cache_stores.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
require "#{Wrest::Root}/wrest/components/cache_store/memcached"
Loading

0 comments on commit 2828621

Please sign in to comment.