Fix chunked decompression logic #747

Lukasa · 2015-11-22T15:32:52Z

Resolves #743.

shazow · 2015-11-22T20:05:37Z

Missing coverage. Also do we need that extra _decode? Should we re-order the loop instead?

Lukasa · 2015-11-23T08:03:51Z

Yeah, I meant to write a thing about that but then...I didn't. I don't know why.

What I'm not sure about is whether that last flush is needed at all. I've yet to be able to construct a scenario where we need it. You're right though, we might be able to resolve this problem by re-ordering the loop, let me have a think.

Lukasa · 2015-11-23T09:37:11Z

Hmm. The way the loop is written makes it really hard to have a single decode point. In particular, there's a lot of state being hung off the object. This means, for example, we cannot safely call handle_chunk on the final zero-length chunk. That's somewhat frustrating.

However, we can refactor to pull the flush logic out of _decode and then call that directly, which will at least make the whole thing conceptually somewhat cleaner. Separation of concerns and all that.

Lukasa · 2015-11-23T09:45:19Z

So I can do that refactor, but we still don't hit the line in my testing. We're at a stage now where I have to investigate exactly how flush behaves in all the cases we use it.

Lukasa · 2015-11-23T10:01:21Z

AHA! I got it!

We don't need to call flush on CPython, at the very least. I give you the weirdly hard to find CPython bug 23200, which says:

The decompress() method changed from Z_NO_FLUSH to Z_SYNC_FLUSH in Feb 2001; see revision 01c2470eeb2e. I guess previously flush() was necessary to get all your data.

So, on CPython (and probably PyPy, I'll have to check), we definitely don't need it at all, in any circumstance. That leaves me with a question: what does Jython need?

Lukasa · 2015-11-23T10:08:20Z

Just did a source code check: PyPy also always calls the decompressor with Z_SYNC_FLUSH from the decompressobj, so we don't need it there either (good work PyPy!). That potentially leaves Jython as the odd-one-out. If we can prove that Jython doesn't automatically use Z_SYNC_FLUSH, then this is a bug worth reporting to them @sigmavirus24.

Lukasa · 2015-11-23T10:13:11Z

And I just walked straight into a Jython zlib bug unrelated to this one.

Lukasa · 2015-11-23T10:39:05Z

For those who care, the bug is here.

However, on the flush issue, I haven't been able to demonstrate that Jython needs it. However, that doesn't prove much: it may need it and I just haven't spotted it yet. I need to dive deeper, sadly.

Lukasa · 2015-11-23T11:32:35Z

Ok, so sadly Jython does not pass Z_SYNC_FLUSH, it passes Z_NO_FLUSH. I don't know if this is a problem or not, because I don't know what zlib does with this information: I'll see if I can find out. Regardless, I've mentioned this inconsistency as Jython bug 2434.

shazow · 2015-11-23T19:20:20Z

🔩

jimbaker · 2015-11-23T23:18:37Z

So here's the input from the Jython side - and really from the underlying Java support - since Java 7, Z_SYNC_FLUSH is supported

More in this Java dev blog post

So we can readily support in Jython. Thanks for filing that bug with us!

Lukasa · 2015-11-24T08:05:04Z

Right, so in the meantime we need to decide what we're doing while Jython is suspected to require flushing. @shazow, got a preference?

shazow · 2015-11-24T08:33:57Z

What are the options? Always flush vs what? What's the down-side of always-flush in this case?

Lukasa · 2015-11-24T08:34:36Z

The only downside of always flush is testing-based: specifically, I'm currently unable to construct a test-case that hits the flush statement.

shazow · 2015-11-24T08:35:54Z

I'm alright with doing the # Platform-specific: Jython coverage omission if it's documented why (link to here I guess).

Lukasa · 2015-11-25T11:01:47Z

Ok, cool, I'll fix that up.

Lukasa · 2015-11-25T11:14:10Z

Hmm, why won't Travis build this?

Lukasa · 2015-11-25T11:15:24Z

Ok, now you build. Stupid Travis.

sigmavirus24 · 2015-11-25T14:14:38Z

👍

shazow · 2015-11-25T20:01:30Z

urllib3/response.py

+                # decoder. However, on Jython we *might* need to, so
+                # lets defensively do it anyway.
+                decoded = self._flush_decoder()
+                if decoded:  # Platform-specific: Jython.


Hm why is the yield Jython-specific? (Why does Jython yield an extra thing?)

(Should the parent if-block be the Platform-specific?)

Because the flush works on all platforms, but it doesn't return anything on any other platform (the decoder was flushed at the last decompress call). The if block is not platform specific because decode_content may be True on any platform.

¯_(ツ)_/¯

Fix chunked decompression logic (Jython bugfix)

jimbaker · 2016-02-10T14:38:40Z

Just closed out the Jython related bug: http://bugs.jython.org/issue2434 - we should have compliant support, specifically all flush behaviors, thanks to using the underlying support available as of Java 7 (which Jython 2.7 requires). I don't think you need to do anything specific for Jython going forward, although this will of course require >= Jython 2.7.1

shazow · 2016-02-10T20:04:49Z

Yay this is excellent news, we should add Jython on our list of supported platforms in the README.

shazow · 2016-02-10T20:06:20Z

Wonder how hard it would be to setup a jython 2.7.1 endpoint for travisci...

sigmavirus24 · 2016-02-10T21:06:05Z

@jimbaker 🍰

@shazow I think we'd have to use pyenv to get jython 2.7.1

Lukasa added 2 commits November 22, 2015 15:30

Don't flush the decoder repeatedly.

035cc03

Changelog for urllib3#743

2ce2fb5

Lukasa added 2 commits November 25, 2015 11:08

Add flush_decoder method.

cd088e6

Explain why we only flush on Jython

5663f0b

shazow reviewed Nov 25, 2015
View reviewed changes

shazow added a commit that referenced this pull request Nov 25, 2015

Merge pull request #747 from Lukasa/chunked-decompression

b44f539

Fix chunked decompression logic (Jython bugfix)

shazow merged commit b44f539 into urllib3:master Nov 25, 2015

zawan-ila mentioned this pull request Feb 4, 2024

Do we flush the decoder when reaching EOF in partial reads? #2799

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix chunked decompression logic #747

Fix chunked decompression logic #747

Lukasa commented Nov 22, 2015

shazow commented Nov 22, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

shazow commented Nov 23, 2015

jimbaker commented Nov 23, 2015

Lukasa commented Nov 24, 2015

shazow commented Nov 24, 2015

Lukasa commented Nov 24, 2015

shazow commented Nov 24, 2015

Lukasa commented Nov 25, 2015

Lukasa commented Nov 25, 2015

Lukasa commented Nov 25, 2015

sigmavirus24 commented Nov 25, 2015

shazow Nov 25, 2015

Lukasa Nov 25, 2015

shazow Nov 25, 2015

jimbaker commented Feb 10, 2016

shazow commented Feb 10, 2016

shazow commented Feb 10, 2016

sigmavirus24 commented Feb 10, 2016

Fix chunked decompression logic #747

Fix chunked decompression logic #747

Conversation

Lukasa commented Nov 22, 2015

shazow commented Nov 22, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

Lukasa commented Nov 23, 2015

shazow commented Nov 23, 2015

jimbaker commented Nov 23, 2015

Lukasa commented Nov 24, 2015

shazow commented Nov 24, 2015

Lukasa commented Nov 24, 2015

shazow commented Nov 24, 2015

Lukasa commented Nov 25, 2015

Lukasa commented Nov 25, 2015

Lukasa commented Nov 25, 2015

sigmavirus24 commented Nov 25, 2015

shazow Nov 25, 2015

Choose a reason for hiding this comment

Lukasa Nov 25, 2015

Choose a reason for hiding this comment

shazow Nov 25, 2015

Choose a reason for hiding this comment

jimbaker commented Feb 10, 2016

shazow commented Feb 10, 2016

shazow commented Feb 10, 2016

sigmavirus24 commented Feb 10, 2016