Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing caching of chunked/streamed responses #106

Closed
wants to merge 4 commits into from

Conversation

rmcgibbo
Copy link
Contributor

Resolves #105, #81. This takes a different approach from #82 in that it doesn't get rid of CallbackFileWrapper, so there is still the same delay actually caching the response until the calling application itself reads the data.

On the other hand, this requires dynamically monkeypatching urllib3, so ¯_(ツ)_/¯.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.49% when pulling 6d42027 on rmcgibbo:stream into bc89ecc on ionrock:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.63% when pulling d466ead on rmcgibbo:stream into bc89ecc on ionrock:master.

@rmcgibbo rmcgibbo force-pushed the stream branch 2 times, most recently from 97be551 to 0750e56 Compare November 24, 2015 07:04
@landscape-bot
Copy link

Code Health
Code quality remained the same when pulling 0750e56 on rmcgibbo:stream into bc89ecc on ionrock:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.64% when pulling 81b21fd on rmcgibbo:stream into bc89ecc on ionrock:master.

@rmcgibbo
Copy link
Contributor Author

Fixes #95 (comment)

@rmcgibbo rmcgibbo changed the title Cache streamed responses Implementing caching of chunked/streamed responses Nov 24, 2015
content_1 = resp_1.content

resp_2 = self.sess.get(url + 'stream')
content_2 = resp_1.content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two spaces before the = here.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.64% when pulling e60d0e4 on rmcgibbo:stream into bc89ecc on ionrock:master.

@ionrock
Copy link
Contributor

ionrock commented Nov 26, 2015

@rmcgibbo In trying this out, I added a test:

def test_stream_is_not_cached_when_content_is_not_read(self, url):
    self.sess.get(url + 'stream')
    resp = self.sess.get(url + 'stream')
    assert not resp.from_cache

This fails, when (I think) it should pass. I'm thinking a chunked response is essentially the same as if you did something like sess.get(url, stream=True), which means you need to consume the file handle in order to cache it.

Let me know your thoughts and if you have ideas for fixing it. At the moment, I think it would be most helpful if urllib3 provided a "consumed" method on the HTTPResponse that helped solve this type of problem where CacheControl has to guess that the file handle has been read.

@rmcgibbo
Copy link
Contributor Author

Is that a fresh session or in the TestStream?

@ionrock
Copy link
Contributor

ionrock commented Nov 26, 2015

@rmcgibbo That is the same session in the TestStream.

@rmcgibbo
Copy link
Contributor Author

Then it depends on the order in which the two tests get executed, no? If your test is executed afterwards, then should be cached.

@rmcgibbo
Copy link
Contributor Author

Or, maybe not. I just added your test in a way that I thought should pass, but no. I'll take a look.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.62% when pulling dee246b on rmcgibbo:stream into bc89ecc on ionrock:master.

@rmcgibbo
Copy link
Contributor Author

Let me know your thoughts and if you have ideas for fixing it. At the moment, I think it would be most helpful if urllib3 provided a "consumed" method on the HTTPResponse that helped solve this type of problem where CacheControl has to guess that the file handle has been read.

I'm taking a look now. A consumed method would definitely be cleaner.

@rmcgibbo
Copy link
Contributor Author

@ionrock: this test is failing, even without anything to do with streamed or chunked responses:

class TestNonchunked(object):
    def test_not_cached_when_content_is_not_read(self, url):
        sess = CacheControl(requests.Session())
        sess.get(url)
        resp = sess.get(url)

        assert not resp.from_cache

https://travis-ci.org/ionrock/cachecontrol/jobs/93289665#L153-L154

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.57% when pulling 1dd1e51 on rmcgibbo:stream into 8d2e6e8 on ionrock:master.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.60% when pulling 50778f3 on rmcgibbo:stream into 8d2e6e8 on ionrock:master.

@scollinson
Copy link

Any movement on this PR? Would be nice to use this lib for services with chunked responses.

@rmcgibbo
Copy link
Contributor Author

@scollinson I don't really remember. I haven't looked at this in a long time. But I think this PR is good to go. IIRC, the apparent bug that @ionrock noted above is not a bug in this PR, but actually a bug somewhere else that exists even in the current version of cachecontrol without this PR (see the travis output in #107).

headers.pop('transfer-encoding')

cached['response']['headers'] = headers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason for this? It seems like it doesn't hurt to know the original response was chunked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really remember this PR. I think perhaps when the initial request is transferred with chunks, one of the first steps is to strip off the chunk markers, so that the data that actually enters the cache lacks the chunk markers. when it's read back out of the cache then, it doesn't look like a chunked request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, that is what I suspected. Thanks!

I've finally gotten around to getting this merged with some changes. Thanks for your patience!

@ionrock
Copy link
Contributor

ionrock commented Mar 23, 2016

Made some changes and merged this locally. Unfortunately, it seems I blew it and it didn't pick up this PR as merged :(

Closing it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants