Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add caching support with pluggable storage classes #304

Closed
jaraco opened this Issue · 20 comments

7 participants

@jaraco

Our application relies heavily on the caching support of httplib2. We cannot consider switching to requests until requests has basic caching support with pluggable storage backends (we use a BSDDB backend for a multi-process shared cache).

Glancing over the code, I did not see any cache support. Let's sprint on this some day. I maintain an http cache handler for urllib2 and I've also worked on the httplib2 code base (though I recognize that might be as much a detriment as a compliment ;).

@jaraco

What's the status of caching support? Has anyone considered it yet? Would pull requests be considered? Are there any other considerations that should be made before going down the path of an implementation?

@kennethreitz

Excellent! It's absolutely something that I'd like to support eventually, as long as the API is right, and it doesn't make requests difficult to maintain.

If it is, it may serve best as a "sister module". We'll have to see. I'd like to see it in core if it fits, though.

@jokull

I'm interested in middleware that does storing, caching, purging and additionally protects against API failures and downtime. This is the reality of most API's, so why not bake that in since we can use values from the cache while the API is down.

The logic for such a module:

  • Cache on path and parameter dictionary variations
  • On default only cache responses for GET requests
  • If API response is erroneous or non responsive, use cached value, reset TTL so the service get's a little time to breath

For an example of a CachedResponse class check out this tumblr.py module I'm using in production: https://gist.github.com/1455583

@megaman821

@jokull Something much like this is done in the HTTP cache Varnish by setting a grace period where Varnish will return stale values if the back-end is not responding.

The only note I would like to add is that cache headers should be respected and not to just blindly cache all GET requests.

Also other nice properties for a cache would be:

@queeup

I love to have cache to file support :)

@git2samus

there's a project named python-cache that provides this functionality for httplib2, maybe it would be best to make requests able to use it too, if not already.

there's also a project called requests-cache that provides cache over sqlite.

@jaraco

I briefly reviewed the mentioned libraries.

Requests-cache does provide a cache, but as best as I can tell, it doesn't respect HTTP headers and protocol. Also, requests-cache indicates monkey-patching, so at the very least needs better hooks.

Interesting that python-cache was implemented at all. It wraps httplib2 to provide caching, even though httplib2 integrates caching at the protocol level. Still, that approach won't be viable for requests because it doesn't respect HTTP headers and protocol.

Dogbutler appears to implement good HTTP-level caching, but it is also a separate HTTP request handler.

@kennethreitz

I should make my long term plans a little more well known ;)

Essentially, a new library called 'cachecore' will be made as part of the werkzeug/requests refactor. It will provide all the cache storage backends that werkzeug.contrib.cache does currently. Requests will then be able to use those backends for cache storage.

The actual implementation of the HTTP cache algorithm is yet to be determined.

@git2samus

thanks for that, sounds like a good plan.

@ionrock

I ported the algorithm for the httplib2 caching in a simple library I'm hesitantly calling httpcache. It currently focuses on the HTTP 1.1 Cache-Control support for caching requests but I plan to add the HTTP 1.0 Expires support as well. In terms of testing whether to return a cached value, it uses a direct port from httplib2.

It supports plugging in your own cache object. It comes with an ultra simple dictionary based cache, but I believe any cache that works with httplib2 would work as well.

If this is helpful, I'm happy to work on merging this work into requests. That said, I'm not a heavy requests user (because of this caching aspect), so suggestions on how best to make the code requests friendly is welcome. Any other suggestions or criticisms are also appreciated.

@queeup

I am asking just for to be sure, Is it going to be cache to disk for requests.get() or post()?

@ionrock

@queeup cachecore is a storage interface for the cache and doesn't deal specifically with HTTP caching. For example, if you have code that looks at the Cache-Control headers to store GET responses, you could store those responses in cachecore.

If you are looking for a tool to handle the HTTP protocol details for caching, I wrote a wrapper based on the httplib2 algorithms. If you do try it out please let me know any issues you find - https://bitbucket.org/elarson/httpcache

@kennethreitz

@ionrock: this looks perfect. I think I'm going to try to merge this into the codebase ;)

@ionrock

@kennethreitz That is great news. Please let me know if I can help. I'm happy to fork and try to merge it myself. I also plan on adding the etag and if-* header support, which I'm happy to submit as a patch later if need be.

@kennethreitz

@ionrock Fantastic! Start watching the #700 pull request, where i'll be working on it. It'll be using the cachecore caching interfaces.

@queeup

Fantastic :) thanks both of you. After finish this I can use requests for my all XBMC add-ons :)

@kennethreitz

Closing for #700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.