New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileCache does not work #111
Comments
I'm experiencing this problem as well with requests 2.9.1. Tracing through the implementation, it appears that the
I believe the problem is that by the time that |
@burtgulash @wcraigtrader is correct. Unless the response is actually read, it won't be cached. The reason for this is that unless you use the result, there isn't a good reason to exhaust the file handle and save the result, potentially, in memory. I'm going to close this out. Feel free to open a new issue if you find some other issue! |
My diagnosis may be correct, but that still doesn't solve the problem: it isn't caching data. When I use 'response.text', I am using the data, therefore the request and its result should be cached, and that isn't happening. |
@ionrock looking at @burtgulash's example, they are not using |
@sigmavirus24 Very true! The other thing to recognize is that if the response isn't cacheable, then it won't preemptively create the cache dir either.
I'm not sure what headers might be getting in the way offhand, but it looks like google is asking that the response isn't cached. |
Good catch @ionrock |
FWIW, I substituted |
@wcraigtrader I just used
I didn't see any cache headers in there. To make sure this wasn't something with requests, I tried curl and received:
Is there a site that this can be reproduced against? |
@sigmavirus24 You prove my point -- without any cache headers, cachecontrol should be caching that page, and it doesn't. That demonstrates the bug. |
So looking at the Controller which handles the caching logic, it looks like cachecontrol isn't caching because all that exists on the site is a |
@wcraigtrader Thanks for taking the time to dig into this. The rfc (as I've read them) leave this aspect a little ambiguous, I think, somewhat intentionally in order to prescribe caching mechanisms that can work across many different use cases. CacheControl, by default, tries to use a very explicit caching pattern where a response won't be cached unless the server explicitly provides headers that make the response cacheable. This is essentially what @sigmavirus24 has confirmed. Thanks @sigmavirus24! With that said, browsers typically take on the heuristics to use the data and make a guess that something can be cached. This was why I added heuristics in the first place! For example, CacheControl does come with a heuristic that is meant to act more like a web browser (https://github.com/ionrock/cachecontrol/blob/master/cachecontrol/heuristics.py#L91). In CacheControl, heuristics take an original response and allow adjusting the headers before the caching logic is applied. This allows you to essentially make something that isn't normally cacheable, able to be cached using CacheControl. If you give the heuristics a try or have suggestions to improve them, please let me know in a ticket or pull request. The feature, while powerful, I've left rather bare bones until there is a clear consensus on how to improve them. Thanks again for digging into this issue, reading the code and commenting on the ticket. This conversation makes me think it would be a good idea to move the |
@ionrock could doesitcache be improved to allow the user to specify a heuristic so users could test the heuristic(s) they might want to apply? |
@sigmavirus24 Great idea! |
I tried to test for existence of cache directory, similar to included test test_storage_filecache.py but it does not get created. forever=True flag does not help, changing directory .web_cache to something else neither.
Attached log:
The text was updated successfully, but these errors were encountered: