Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chunked decompression logic #747

Merged
merged 4 commits into from Nov 25, 2015
Merged

Conversation

Lukasa
Copy link
Sponsor Contributor

@Lukasa Lukasa commented Nov 22, 2015

Resolves #743.

@shazow
Copy link
Member

shazow commented Nov 22, 2015

Missing coverage. Also do we need that extra _decode? Should we re-order the loop instead?

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

Yeah, I meant to write a thing about that but then...I didn't. I don't know why.

What I'm not sure about is whether that last flush is needed at all. I've yet to be able to construct a scenario where we need it. You're right though, we might be able to resolve this problem by re-ordering the loop, let me have a think.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

Hmm. The way the loop is written makes it really hard to have a single decode point. In particular, there's a lot of state being hung off the object. This means, for example, we cannot safely call handle_chunk on the final zero-length chunk. That's somewhat frustrating.

However, we can refactor to pull the flush logic out of _decode and then call that directly, which will at least make the whole thing conceptually somewhat cleaner. Separation of concerns and all that.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

So I can do that refactor, but we still don't hit the line in my testing. We're at a stage now where I have to investigate exactly how flush behaves in all the cases we use it.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

AHA! I got it!

We don't need to call flush on CPython, at the very least. I give you the weirdly hard to find CPython bug 23200, which says:

The decompress() method changed from Z_NO_FLUSH to Z_SYNC_FLUSH in Feb 2001; see revision 01c2470eeb2e. I guess previously flush() was necessary to get all your data.

So, on CPython (and probably PyPy, I'll have to check), we definitely don't need it at all, in any circumstance. That leaves me with a question: what does Jython need?

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

Just did a source code check: PyPy also always calls the decompressor with Z_SYNC_FLUSH from the decompressobj, so we don't need it there either (good work PyPy!). That potentially leaves Jython as the odd-one-out. If we can prove that Jython doesn't automatically use Z_SYNC_FLUSH, then this is a bug worth reporting to them @sigmavirus24.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

And I just walked straight into a Jython zlib bug unrelated to this one.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

For those who care, the bug is here.

However, on the flush issue, I haven't been able to demonstrate that Jython needs it. However, that doesn't prove much: it may need it and I just haven't spotted it yet. I need to dive deeper, sadly.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 23, 2015

Ok, so sadly Jython does not pass Z_SYNC_FLUSH, it passes Z_NO_FLUSH. I don't know if this is a problem or not, because I don't know what zlib does with this information: I'll see if I can find out. Regardless, I've mentioned this inconsistency as Jython bug 2434.

@shazow
Copy link
Member

shazow commented Nov 23, 2015

🔩

@jimbaker
Copy link

So here's the input from the Jython side - and really from the underlying Java support - since Java 7, Z_SYNC_FLUSH is supported

More in this Java dev blog post

So we can readily support in Jython. Thanks for filing that bug with us!

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 24, 2015

Right, so in the meantime we need to decide what we're doing while Jython is suspected to require flushing. @shazow, got a preference?

@shazow
Copy link
Member

shazow commented Nov 24, 2015

What are the options? Always flush vs what? What's the down-side of always-flush in this case?

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 24, 2015

The only downside of always flush is testing-based: specifically, I'm currently unable to construct a test-case that hits the flush statement.

@shazow
Copy link
Member

shazow commented Nov 24, 2015

I'm alright with doing the # Platform-specific: Jython coverage omission if it's documented why (link to here I guess).

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 25, 2015

Ok, cool, I'll fix that up.

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 25, 2015

Hmm, why won't Travis build this?

@Lukasa
Copy link
Sponsor Contributor Author

Lukasa commented Nov 25, 2015

Ok, now you build. Stupid Travis.

@sigmavirus24
Copy link
Contributor

👍

# decoder. However, on Jython we *might* need to, so
# lets defensively do it anyway.
decoded = self._flush_decoder()
if decoded: # Platform-specific: Jython.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm why is the yield Jython-specific? (Why does Jython yield an extra thing?)

(Should the parent if-block be the Platform-specific?)

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the flush works on all platforms, but it doesn't return anything on any other platform (the decoder was flushed at the last decompress call). The if block is not platform specific because decode_content may be True on any platform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯_(ツ)_/¯

shazow added a commit that referenced this pull request Nov 25, 2015
Fix chunked decompression logic (Jython bugfix)
@shazow shazow merged commit b44f539 into urllib3:master Nov 25, 2015
@jimbaker
Copy link

Just closed out the Jython related bug: http://bugs.jython.org/issue2434 - we should have compliant support, specifically all flush behaviors, thanks to using the underlying support available as of Java 7 (which Jython 2.7 requires). I don't think you need to do anything specific for Jython going forward, although this will of course require >= Jython 2.7.1

@shazow
Copy link
Member

shazow commented Feb 10, 2016

Yay this is excellent news, we should add Jython on our list of supported platforms in the README.

@shazow
Copy link
Member

shazow commented Feb 10, 2016

Wonder how hard it would be to setup a jython 2.7.1 endpoint for travisci...

@sigmavirus24
Copy link
Contributor

@jimbaker 🍰

@shazow I think we'd have to use pyenv to get jython 2.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants