Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No cache when no internet connection - even with forever set to True #49

Closed
femtotrader opened this issue Dec 12, 2014 · 11 comments
Closed
Labels

Comments

@femtotrader
Copy link

Hello,

I try this code with my internet connection enabled

import requests
from cachecontrol import CacheControl
from cachecontrol.caches import FileCache

req_session = requests.session()
cache = FileCache('web_cache', forever=True)
session = CacheControl(req_session, cache=cache)
response = session.get('http://www.google.com')
print(response.status_code)

I disabled my internet connection and run again this code.

It raised ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))

That's probably a misunderstanding from my side. But I thought that if I store in a file both request and response I could get it when my connection was disabled.

I also don't understand why this forever flag exists. In my understanding we should pass a custom caching strategies (aka caching heuristics) to CacheControl

class Forever(BaseHeuristic):
    pass

and use it like

req_session = requests.session()
cache = FileCache('web_cache')
session = CacheControl(req_session, cache=cache, heuristic=Forever())
response = session.get('http://www.google.com')
print(response.status_code)

Any idea ? but that's like I said probably a misunderstanding from my side.

Kind regards

@sigmavirus24
Copy link
Contributor

It looks like cachecontrol still performs a DNS lookup.

@femtotrader
Copy link
Author

Is there a way to urge CacheControl to performs differently ?
Maybe this is related to #48

@femtotrader
Copy link
Author

Hello,

after this commit

I try this

import requests

from cachecontrol import CacheControl
from cachecontrol.heuristics import ExpiresAfter
from cachecontrol.caches import FileCache

req_session = requests.session()
cache = FileCache('web_cache') #, forever=True
cached_sess = CacheControl(req_session, cache=cache, heuristic=ExpiresAfter(hours=1))

response = cached_sess.get('http://google.com')

print response.text
print response.status_code
assert(response.status_code==200)

Unfortunately first try is ok (when internet is enabled) but it's still failing when internet connection is disabled.

@ionrock
Copy link
Contributor

ionrock commented Dec 22, 2014

This doesn't change requests behavior where the DNS look up still happens.

@ionrock
Copy link
Contributor

ionrock commented Jan 16, 2015

@sigmavirus24 I'm not seeing where CacheControl would be performing the DNS lookup. If there is a cached response we find when doing the send in the adapter, the last step is to call build_response, which just puts together the Response object. Any ideas where that DNS lookup might implicitly happen in the requests code?

@ionrock ionrock added the bug label Jan 16, 2015
@sigmavirus24
Copy link
Contributor

requests does not do DNS look-ups. That's up to urllib3. gaierror however, is raised by _g_et_a_ddr_i_nfo. So something is trying to look up a name and we're running into problems. I may be able to work on this on this weekend.

@ionrock
Copy link
Contributor

ionrock commented Mar 24, 2016

One option here is to go ahead and do a DNS resolution manually on the URL before passing it into requests. That could cache the result or use some sort of a lookup. I don't see a good generic way to do this in CacheControl, but I might look into adding a DNS cache feature that provides some common tooling that could be used.

@sigmavirus24
Copy link
Contributor

Yeah, this could also be related to the proxy look-ups that requests does when dealing with that stuff. A complete traceback would have been really nice to have to help diagnose this.

@dsully
Copy link

dsully commented Dec 7, 2016

I worked around this by creating an adapter subclass:

class ResilientCacheControlAdapter(CacheControlAdapter):
    """Subclass to always return a cached response if it is available."""

    def send(self, request, **kw):

        try:
            resp = super(ResilientCacheControlAdapter, self).send(request, **kw)
        except requests.exceptions.RequestException:
            log.debug("Failed to make HTTP Request, attempting to return stale data from cache: ", exc_info=True)

            cache_url = self.controller.cache_url(request.url)
            cache_data = self.controller.cache.get(cache_url)
            cached_response = self.controller.serializer.loads(request, cache_data)

            if cached_response:
                return self.build_response(request, cached_response, from_cache=True)

            raise

        return resp

Which can be instantiated like this:

def cached_session(cache=True, cache_path=None, controller_class=None):
    """Return a requests.Session() object that handles Cache-Control headers.

    :param bool cache: Use caching. If `False`, an un-cached Session will be returned.
    :param str cache_path: If set, a file based cache will be used. Otherwise, an in-memory dict cache.
    :param class controller_class: An optional controller class to use in place of CacheController.
    :returns: A requests session object.
    :rtype: requests.Session
    """

    session = requests.Session()

    if cache:
        if cache_path:
            cache_obj = InterProcessFileCache(cache_path)
        else:
            cache_obj = DictCache()

        adapter = ResilientCacheControlAdapter(cache_obj, cache_etags=True, controller_class=controller_class)
        session.mount('http://', adapter)
        session.mount('https://', adapter)

    return session

session = cached_session(cache_path='/tmp', controller_class=ResilientCacheControlAdapter)

@elnuno
Copy link
Contributor

elnuno commented Apr 11, 2017

Any interest in adding the behavior of @dsully's contribution to CacheControlAdapter? Might filter only ConnectionErrors somehow.

@ionrock
Copy link
Contributor

ionrock commented Jan 27, 2018

I think this use case is unique enough that someone could distribute it as its own library. I personally don't have time or interest to maintain this sort of specialized case.

That said, the usage above seems totally reasonable to me for those that do want the behavior.

@ionrock ionrock closed this as completed Jan 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants