Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Use body to choose response type after decompression content #2393

Merged
merged 1 commit into from Mar 7, 2017

Conversation

@redapple
Copy link
Contributor

@redapple redapple commented Nov 10, 2016

Fixes #2145

@redapple redapple changed the title Use body to chose response type after decompression content Use body to choose response type after decompression content Nov 10, 2016
@codecov-io
Copy link

@codecov-io codecov-io commented Nov 10, 2016

Codecov Report

Merging #2393 into master will decrease coverage by -0.8%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master    #2393     +/-   ##
=========================================
- Coverage   84.16%   83.36%   -0.8%     
=========================================
  Files         162      161      -1     
  Lines        9079     8730    -349     
  Branches     1346     1285     -61     
=========================================
- Hits         7641     7278    -363     
- Misses       1177     1201     +24     
+ Partials      261      251     -10
Impacted Files Coverage Δ
scrapy/downloadermiddlewares/httpcompression.py 89.18% <ø> (+0.3%)
scrapy/extensions/memusage.py 25% <0%> (-23.92%)
scrapy/core/downloader/handlers/http.py 81.81% <0%> (-18.19%)
scrapy/resolver.py 80% <0%> (-9.48%)
scrapy/spidermiddlewares/referer.py 84.61% <0%> (-9.1%)
scrapy/mail.py 69.86% <0%> (-6.46%)
scrapy/core/downloader/tls.py 83.87% <0%> (-4.71%)
scrapy/linkextractors/regex.py 91.3% <0%> (-4.35%)
scrapy/utils/misc.py 91.66% <0%> (-3.34%)
scrapy/downloadermiddlewares/chunked.py 57.14% <0%> (-2.86%)
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 451f147...e42b846. Read the comment docs.

@redapple redapple changed the title Use body to choose response type after decompression content [MRG] Use body to choose response type after decompression content Nov 16, 2016
@redapple redapple added this to the v1.3 milestone Nov 16, 2016
@kmike
Copy link
Member

@kmike kmike commented Mar 6, 2017

It looks similar to #2001.

This is quite inconsistent in Scrapy: e.g. cache or ftp handler still won't use body to get response class.

I guess a part of the problem is that ResponseTypes.from_body is not based on any specification, so there was no strong desire to use body consistently - not using body was not seen as a bug because from_body may look like a hack.

It seems the most up-to-date document is https://mimesniff.spec.whatwg.org, especially https://mimesniff.spec.whatwg.org/#identifying-a-resource-with-an-unknown-mime-type.

While this change looks fine, the fix is not complete, and it looks like a part of a larger problem.

@redapple
Copy link
Contributor Author

@redapple redapple commented Mar 6, 2017

@kmike , what do you mean by the fix not being complete?
Is it because responsetypes.from_args() does not follow whatwg?
Or because other uses of from_args() do not use the body either?

The aim here was to fix an issue at http decompression where it determines something else than the default Response type if there are hints in the body.

@kmike
Copy link
Member

@kmike kmike commented Mar 6, 2017

@redapple the patch looks good because it improves response handling in decompression middleware, so I'm fine with merging it after a rebase.

The logic middleware uses to detect response type is still different from what browsers do, and Scrapy is inconsistent in mime sniffing it performs. I should have opened another ticket for that, but I found it while reviewing this PR, so I wrote it in a comment :)

@redapple
Copy link
Contributor Author

@redapple redapple commented Mar 6, 2017

Maybe #2145 is so rare we don't need to care. (I have never seen it myself.)

@redapple redapple force-pushed the redapple:response-replace-encoding branch from 626de8b to e42b846 Mar 6, 2017
@kmike kmike merged commit 6320b74 into scrapy:master Mar 7, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.