Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for zstd in HttpCompressionMiddleware #4831

Merged
merged 7 commits into from Oct 8, 2020

Conversation

starrify
Copy link
Contributor

@starrify starrify commented Oct 5, 2020

No description provided.

Copy link
Member

@Gallaecio Gallaecio left a comment

What about tests? I see there are brotli tests, so it should not be too hard to add tests for this as well.

docs/topics/downloader-middleware.rst Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 5, 2020

Codecov Report

No coverage uploaded for pull request base (master@004b40a). Click here to learn what that means.
The diff coverage is 77.77%.

@@            Coverage Diff            @@
##             master    #4831   +/-   ##
=========================================
  Coverage          ?   87.69%           
=========================================
  Files             ?      160           
  Lines             ?     9752           
  Branches          ?     1440           
=========================================
  Hits              ?     8552           
  Misses            ?      943           
  Partials          ?      257           
Impacted Files Coverage Δ
scrapy/downloadermiddlewares/httpcompression.py 87.27% <77.77%> (ø)

@starrify
Copy link
Contributor Author

starrify commented Oct 5, 2020

Hi @Gallaecio, thanks a lot for pointing that out! I didn't see a test for brotli at a quick glance, thus decided to submit the PR for review before adding tests for both.

I didn't see a test for brotli at a quick glance

However it turns out this was actually caused by a rather stupid mistake of mine as what I did was to check only the main source directory: 🤦

$ find scrapy/ -type f -name "*.py" | parallel grep -H brotli

I have now updated this PR with further enhancements and the corresponding test. :)

@wRAR
Copy link
Contributor

wRAR commented Oct 6, 2020

Interesting, do you know which websites use that?

@starrify
Copy link
Contributor Author

starrify commented Oct 6, 2020

Hi @wRAR that's surely a very good question 😅 and I'm unfortunately unable to name many. (nearly unused like brotli? 🤔)
I'm having it added majorly because I recently noticed that zstd is now included in IANA's registry.

By the way here's a list of "Zstandard is used by" tools / projects / websites / etc.: https://facebook.github.io/zstd/#:~:text=Zstandard%20is%20used%20by

Copy link
Member

@Gallaecio Gallaecio left a comment

Nice!

PS: The incomplete coverage is due to test always having the optional dependency installed, same as with brotli.

scrapy/downloadermiddlewares/httpcompression.py Outdated Show resolved Hide resolved
wRAR
wRAR approved these changes Oct 8, 2020
@wRAR
Copy link
Contributor

wRAR commented Oct 8, 2020

Thanks!

@wRAR wRAR merged commit 45c06cf into scrapy:master Oct 8, 2020
2 checks passed
if encoding == b'zstd' and b'zstd' in ACCEPTED_ENCODINGS:
# Using its streaming API since its simple API could handle only cases
# where there is content size data embedded in the frame
reader = zstandard.ZstdDecompressor().stream_reader(io.BytesIO(body))
Copy link
Member

@kmike kmike Oct 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey! I wonder if it makes sense to use a context manager, to close the reader after it is used. This is very minor, as it is not doing much (https://github.com/indygreg/python-zstandard/blob/53b71dc3f96961564c9c140bf88b0aa118589247/zstandard/cffi.py#L1937), but it still may clean up some references earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants