Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting max-age on the response causes cachecontrol to fail with exception about 'expires' argument #262

Open
svanoort opened this issue Oct 29, 2021 · 7 comments

Comments

@svanoort
Copy link

Applies to cachecontrol 0.12.7 and probably 0.12.8

When we have a response with following header: 'Cache-Control': 'public, max-age=1'

I believe this may be related to this PR: #233

We get the following exception from cachecontrol:

TypeError: set() got an unexpected keyword argument 'expires'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.9/site-packages/requests/sessions.py:555: in get
    return self.request('GET', url, **kwargs)
/usr/local/lib/python3.9/site-packages/requests/sessions.py:542: in request
    resp = self.send(prep, **send_kwargs)
/usr/local/lib/python3.9/site-packages/requests/sessions.py:697: in send
    r.content
/usr/local/lib/python3.9/site-packages/requests/models.py:836: in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
/usr/local/lib/python3.9/site-packages/requests/models.py:758: in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
/usr/local/lib/python3.9/site-packages/urllib3/response.py:576: in stream
    data = self.read(amt=amt, decode_content=decode_content)
/usr/local/lib/python3.9/site-packages/urllib3/response.py:519: in read
    data = self._fp.read(amt) if not fp_closed else b""
/usr/local/lib/python3.9/site-packages/cachecontrol/filewrapper.py:96: in read
    self._close()
/usr/local/lib/python3.9/site-packages/cachecontrol/filewrapper.py:76: in _close
    self.__callback(result)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <cachecontrol.controller.CacheController object at 0x7f30ba331640>
request = <PreparedRequest [GET]>
response = <urllib3.response.HTTPResponse object at 0x7f30ba3179a0>, body = b''
status_codes = None
    def cache_response(self, request, response, body=None, status_codes=None):
        """
        Algorithm for caching requests.
    
        This assumes a requests Response object.
        """
        # From httplib2: Don't cache 206's since we aren't going to
        #                handle byte range requests
        cacheable_status_codes = status_codes or self.cacheable_status_codes
        if response.status not in cacheable_status_codes:
            logger.debug(
                "Status code %s not in %s", response.status, cacheable_status_codes
            )
            return
    
        response_headers = CaseInsensitiveDict(response.headers)
    
        if 'date' in response_headers:
            date = calendar.timegm(
                parsedate_tz(response_headers['date'])
            )
        else:
            date = 0
    
        # If we've been given a body, our response has a Content-Length, that
        # Content-Length is valid then we can check to see if the body we've
        # been given matches the expected size, and if it doesn't we'll just
        # skip trying to cache it.
        if (
            body is not None
            and "content-length" in response_headers
            and response_headers["content-length"].isdigit()
            and int(response_headers["content-length"]) != len(body)
        ):
            return
    
        cc_req = self.parse_cache_control(request.headers)
        cc = self.parse_cache_control(response_headers)
    
        cache_url = self.cache_url(request.url)
        logger.debug('Updating cache with response from "%s"', cache_url)
    
        # Delete it from the cache if we happen to have it stored there
        no_store = False
        if "no-store" in cc:
            no_store = True
            logger.debug('Response header has "no-store"')
        if "no-store" in cc_req:
            no_store = True
            logger.debug('Request header has "no-store"')
        if no_store and self.cache.get(cache_url):
            logger.debug('Purging existing cache entry to honor "no-store"')
            self.cache.delete(cache_url)
        if no_store:
            return
    
        # https://tools.ietf.org/html/rfc7234#section-4.1:
        # A Vary header field-value of "*" always fails to match.
        # Storing such a response leads to a deserialization warning
        # during cache lookup and is not allowed to ever be served,
        # so storing it can be avoided.
        if "*" in response_headers.get("vary", ""):
            logger.debug('Response header has "Vary: *"')
            return
    
        # If we've been given an etag, then keep the response
        if self.cache_etags and "etag" in response_headers:
            expires_time = 0
            if response_headers.get('expires'):
                expires = parsedate_tz(response_headers['expires'])
                if expires is not None:
                    expires_time = calendar.timegm(expires) - date
    
            expires_time = max(expires_time, 14 * 86400)
    
            logger.debug('etag object cached for {0} seconds'.format(expires_time))
            logger.debug("Caching due to etag")
            self.cache.set(
                cache_url,
                self.serializer.dumps(request, response, body),
                expires=expires_time
            )
    
        # Add to the cache any permanent redirects. We do this before looking
        # that the Date headers.
        elif int(response.status) in PERMANENT_REDIRECT_STATUSES:
            logger.debug("Caching permanent redirect")
            self.cache.set(cache_url, self.serializer.dumps(request, response, b''))
    
        # Add to the cache if the response headers demand it. If there
        # is no date header then we can't do anything about expiring
        # the cache.
        elif "date" in response_headers:
            date = calendar.timegm(
                parsedate_tz(response_headers['date'])
            )
            # cache when there is a max-age > 0
            if "max-age" in cc and cc["max-age"] > 0:
                logger.debug("Caching b/c date exists and max-age > 0")
                expires_time = cc['max-age']
>               self.cache.set(
                    cache_url,
                    self.serializer.dumps(request, response, body),
                    expires=expires_time
                )
E               TypeError: set() got an unexpected keyword argument 'expires'
/usr/local/lib/python3.9/site-packages/cachecontrol/controller.py:354: TypeError
@svanoort
Copy link
Author

CC @ionrock I suspect this one is straightforward, but am not sure if there's any other supporting information needed here? (other library versions etc)

@ionrock
Copy link
Contributor

ionrock commented Oct 31, 2021

@svanoort I'm not able to reproduce this in a test with the headers you mentioned. Do you mind sending the actual response headers? Also, are you using any heuristics or a custom cache implementation?

@ziddey
Copy link

ziddey commented Oct 31, 2021

@svanoort what backend are you using?

Using diskcache's FanoutCache here and had to shim it:

    class FanoutCache(_FanoutCache):
        def set(self, key, value, **kwargs):
            kwargs['expire'] = kwargs.pop('expires', None)
            return super().set(key, value, **kwargs)

Or can just pop and drop it to not expire in the backend

@itamarst
Copy link
Contributor

itamarst commented Nov 3, 2021

I'm testing new release with pip and had the same issue, I had to add a expires keyword to pip's custom cache:

itamarst/pip@99a0bcf

@dsully
Copy link

dsully commented Nov 8, 2021

I'm having the same issue with PR #233

This PR breaks those of us who use fasteners instead of the deprecated and not updated since 2015 lockfile module. Both fasteners and oslo.concurrency are recommended replacements:

Note: This package is deprecated. It is highly preferred that instead of using this code base that instead fasteners or oslo.concurrency is used instead.

However, neither of those accepts an expires= argument on their .lock() method interface.

See also #109 which I appear to have opened a few years ago. 😁

An alternative referenced there is https://github.com/tox-dev/py-filelock.git - which also does not support the concept of expiry.

@anarcat
Copy link

anarcat commented Dec 10, 2021

i don't know when this was introduced, but this also causes a regression in feed2exec: https://gitlab.com/anarcat/feed2exec/-/issues/22

basically, the set parameter to the cache backend now should expect an expires parameter, which is new. it's not mentioned in the release notes, for what it's worth, and it doesn't seem like a minor version bump signaled the API change either... 0.12.6 worked fine and 0.12.10 (or before?) broke it.

i patched it with: https://gitlab.com/anarcat/feed2exec/-/commit/c5563d093cfd6e86123438cacee9358ac491cbc7.patch

but it would be nice to have a little heads up for things like this. :)

@anarcat
Copy link

anarcat commented Dec 10, 2021

oh, and thanks for implementing this feature, quite nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants