New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not getting the caching I expected #95
Comments
I fixed an embarrassing typo in the logging modifications, but the log output still contains no mention of |
Ping @ionrock - any chance that this is going to be attacked anytime soon? |
@toolforger the fastest way to a solution is for you to pick this up and run with it. I'm sure @ionrock has more important real life things to do at the moment. |
I doubt that it's any of my (or anybody else's) business what his real-life things are. What's my business is whether he's going to do anything about it in the foreseeable future or not. If no, fine; if yes, fine again; if unknown or not response, I'll take that for a no and get on with it. I don't know what you mean with "pick this up and run with it". |
I doubt it's impossible. Perhaps it's impossible to do in 30 seconds or 5 minutes, but it's not impossible. Inf act it's impossible that it's impossible. ;) |
@toolforger Indeed @sigmavirus24 is correct. Life has very much been getting in the way of providing you a response. My apologies. Can you send along information on what the response looks like? Specifically, I need to see the headers in the response. Usually, if something isn't getting cached, it is because the server is providing a header that is telling the client not to cache things. Once we figure out what the issue is, we can look at writing a heuristic to to add that might work around the problem. |
Am 31.08.2015 um 00:39 schrieb Ian Cordasco:
As I said: Unknown specs. At least three RFCs and an unknown host of
I can tell you that it's impossible even after five days of reading Am 31.08.2015 um 00:47 schrieb Eric Larson:
No worries, I can understand that. I just wanted to know, I didn't want
URL sent: Headers sent:
Headers received:
I'm not sure how etag and date/max-age interact. In fact it would be
Hopefully the heuristics can be applied on a per-site basis. I'd hate to |
After some more debugging, I'm seeing that What's happening is that |
@toolforger The general use case is that if you don't use the response, you don't want to cache it. If your app was using a simple in memory cache and made a request that resulted in huge response, it could fill up the available memory or have other unintended consequences depending on the cache store. While I can agree that it does make things slightly more complex, the result is safer for users. As for caching things larger than ram, this is really dependent on the cache store. There are many tools that can handle larger data such MongoDB's GridFS, S3 or even the FileStore. Sometimes the largest responses are the ones you want cached the most. Thanks for providing info on the response. I'll take a look. In the meantime, it sounds like the file handle of the response is not getting consumed. Take a look at the body content docs to see how you can ensure the file handle is exhausted. |
@toolforger I wrote a simple tool to hopefully help sort out why your response isn't getting cached. It doesn't handle the use case where the response is not used, so keep that in mind. If you do try it, please feel free to give any feedback in PR #100. |
Yeah, after some more thinking I suspected myself that eliminating Streaming is not the culprit I'm seeing, I have The immediate cause seems to be chunking. I'm seeing |
Some more analysis...
Sounds easy to solve, just hook In the light of these findings, wouldn't hooking the |
Old code used to hook the HTTPResponse._fp object, on the code path that deals just with non-chunked responses. Since _fp isn't responsible for detecting the end of chunked responses, it cannot know when to create the cache entry. New code hooks the HTTPResponse itself, on the stream() function, that's the common code path for chunked and non-chunked responses. Fixes psf#95.
@toolforger I've merged a patch to work with chunked responses. If you are still interested, please give master a try and see if that solves the issue. Otherwise, I'll plan on closing this issue. Thanks for all your work debugging and working on a fix to this! |
So I installed cachecontrol from PyPI, and github3.py and did a simple test: import cachecontrol
import github3
g = github3.GitHub()
cachecontrol.CacheControl(g.session)
print(g.rate_limit()['rate']['remaining'])
print(g.user('sigmavirus24'))
print(g.rate_limit()['rate']['remaining'])
print(g.user('sigmavirus24'))
print(g.rate_limit()['rate']['remaining']) Assuming everything is working correctly you would see:
What I saw initially was a 58 at the end instead. I then installed
So I believe it's working now. 👍 |
Here's my CCSSE:
In the output, I'm seeing that the rate limit is being ticked off for each
g.repository
call.With logging cranked up to the max and tons of logging added (see #93 for the exact code, 50a76aa to be specific), I'm seeing the long trace below.
Conspicuously, there's no
Updating cache with response from "http://..."
, which is what I added at https://github.com/toolforger/cachecontrol/blob/50a76aa0f022a47d34c65c13c4c813ecb1f2c086/cachecontrol/controller.py#L228, so I guess indicatingcachecontrol.controller.CacheController.cache_response
is never called.Since that call is stashed away in a
functools.partial
, I have no idea where and when that call should have happened, so I have come to a dead end.The text was updated successfully, but these errors were encountered: