Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Do not degrade JPEG files. #3689

Closed
wants to merge 13 commits into from

Conversation

anubhavp28
Copy link
Contributor

Fixes #3055 by not converting a JPEG file again.

@anubhavp28
Copy link
Contributor Author

Hey @Gallaecio, could you help me please? The tests are failing only for Python 3.7. I am not sure if it is related to my changes.

@anubhavp28
Copy link
Contributor Author

Hey @kmike, could you help me with this please? The tests are failing only for Python 3.7. I am not sure if it is related to my changes.

@Gallaecio Gallaecio closed this Mar 22, 2019
@Gallaecio Gallaecio reopened this Mar 22, 2019
@codecov
Copy link

codecov bot commented Mar 22, 2019

Codecov Report

Merging #3689 into master will increase coverage by 0.9%.
The diff coverage is 57.14%.

@@            Coverage Diff            @@
##           master    #3689     +/-   ##
=========================================
+ Coverage   84.54%   85.44%   +0.9%     
=========================================
  Files         167      169      +2     
  Lines        9420     9989    +569     
  Branches     1402     1586    +184     
=========================================
+ Hits         7964     8535    +571     
  Misses       1199     1199             
+ Partials      257      255      -2
Impacted Files Coverage Δ
scrapy/pipelines/images.py 86.62% <57.14%> (-4.1%) ⬇️
scrapy/commands/check.py 24% <0%> (-1.72%) ⬇️
scrapy/http/__init__.py 100% <0%> (ø) ⬆️
scrapy/http/request/__init__.py 100% <0%> (ø) ⬆️
scrapy/core/downloader/middleware.py 100% <0%> (ø) ⬆️
scrapy/squeues.py 100% <0%> (ø) ⬆️
scrapy/utils/gz.py 100% <0%> (ø) ⬆️
scrapy/http/request/json_request.py 93.75% <0%> (ø)
scrapy/pqueues.py 98.97% <0%> (ø)
scrapy/item.py 98.48% <0%> (+0.07%) ⬆️
... and 23 more

@Gallaecio
Copy link
Member

The error is not related to your pull request. That test failure occurs randomly, we’ll have to look into it eventually. In the meantime, I’ve closed and reopened your pull request, which triggers the test runs again. Hopefully there will be no false positives this time.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a test to cover this change, one that verified that JPEG/RGB images are not modified at a binary level.

@anubhavp28
Copy link
Contributor Author

@Gallaecio Can you review this again? I have made the changes suggested by you.

tests/test_pipeline_images.py Outdated Show resolved Hide resolved
@anubhavp28
Copy link
Contributor Author

@Gallaecio Can you review this again please? If I am not wrong, my changes have decreased the code coverage, hence codecov is complaining. How should I increase it?

scrapy/pipelines/images.py Outdated Show resolved Hide resolved
tests/test_pipeline_images.py Outdated Show resolved Hide resolved
tests/test_pipeline_images.py Outdated Show resolved Hide resolved
tests/test_pipeline_images.py Outdated Show resolved Hide resolved
@Gallaecio
Copy link
Member

I’ve left a few minor comments. But do not worry about coverage, its integration with GitHub feels a bit clunky at times, if you look at the actual details I think you are increasing coverage.

@Gallaecio Gallaecio changed the title Do not degrade JPEG files. [MRG+1] Do not degrade JPEG files. Apr 17, 2019
@Gallaecio
Copy link
Member

I think this is ready. Thank you!

yield thumb_path, thumb_image, thumb_buf

def convert_image(self, image, size=None):
def convert_image(self, image, response_body, size=None):
Copy link
Member

@kmike kmike Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is backwards incompatible; there are image pipeline subclasses in a wild which override this method - they'll stop working because of signature change. I've checked some codebases, and found examples of this. Pipeline may also override a method which calls this method. Could you please make it backwards compatible? It seems something like this is needed:

  • make response_body optional (None by default), make it a last argument
  • return buf if response_body is not passed (i.e. disable the feature in this PR if a middleware uses deprecated signature).
  • because user may override convert_image method, pipeline shouldn't always pass response_body - it should inspect convert_image method signature, and pass response_body only if this argument is present.
  • issue a warning when response_body is not passed (ScrapyDeprecationWarning) - it seems two warnings are needed, one when convert_image is overridden in incompatible way, and another one is when convert_image is called from an overridden method in incompatible way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike I have made changes, could you review this again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anubhavp28 One thing you could do, since the goal is to support the previous API, is to, instead of modifying the tests to use the new API, keep the old tests and add new tests that test the new API.

You could also extend the old tests to ensure that a deprecation warning is used.

Copy link
Contributor Author

@anubhavp28 anubhavp28 May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gallaecio I am not sure how to write tests to ensure that deprecation warnings are received. Currently, I am just counting the number of warnings. Would that be enough?

self.assertTrue(len(w) >= 4)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anubhavp28 Please, check other usages of catch_warnings in the tests and you’ll see how to check the warning message as well. You could ensure that the warnings contain certain text to ensure that they are the expected warnings, and not some other warnings.

yield path, image, buf

for thumb_id, size in six.iteritems(self.thumbs):
thumb_path = self.thumb_path(request, thumb_id, response=response, info=info)
thumb_image, thumb_buf = self.convert_image(image, size)
if convert_image_overriden:
_warn()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think you need to log a warning each time you check the value of convert_image_overriden, logging the warning once should be enough.

@Gallaecio Gallaecio changed the title [MRG+1] Do not degrade JPEG files. Do not degrade JPEG files. May 16, 2019
@anubhavp28 anubhavp28 closed this May 16, 2019
@anubhavp28 anubhavp28 reopened this May 16, 2019
@anubhavp28
Copy link
Contributor Author

@Gallaecio I have made few changes. Though the AppVeyor build is failing, it doesn't seem to be related to my changes? Could you review this again?

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve left a few minor comments, but it looks good to me.

scrapy/pipelines/images.py Outdated Show resolved Hide resolved
scrapy/pipelines/images.py Outdated Show resolved Hide resolved
@anubhavp28
Copy link
Contributor Author

@Gallaecio I have made the changes suggested by you. Could you review this again?

@Gallaecio Gallaecio changed the title Do not degrade JPEG files. [MRG+1] Do not degrade JPEG files. May 27, 2019
@anubhavp28 anubhavp28 closed this Jul 4, 2019
@anubhavp28 anubhavp28 reopened this Jul 4, 2019
@kmike kmike added this to the v1.8 milestone Aug 8, 2019
@kmike kmike requested a review from wRAR September 26, 2019 12:45
self._deprecated_convert_image = 'response_body' not in get_func_args(self.convert_image)
if self._deprecated_convert_image:
from scrapy.exceptions import ScrapyDeprecationWarning
import warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move these import to the top level, as well as imports in convert_image?

yield thumb_path, thumb_image, thumb_buf

def convert_image(self, image, size=None):
def convert_image(self, image, size=None, response_body=None):
if not response_body:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not response_body:
if response_body is None:

I think response_body can be really empty in some cases.

@@ -151,7 +175,9 @@ def convert_image(self, image, size=None):
if size:
image = image.copy()
image.thumbnail(size, Image.ANTIALIAS)

elif response_body and image.format == 'JPEG':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif response_body and image.format == 'JPEG':
elif response_body is not None and image.format == 'JPEG':

@@ -76,35 +76,71 @@ def test_thumbnail_name(self):
'thumbs/50/850233df65a5b83361798f532f1fc549cd13cbe9.jpg')

def test_convert_image(self):
# tests for old API
with warnings.catch_warnings(record=True) as w:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please move "tests for old API" to a separate test method (or multiple test methods)?

@Gallaecio Gallaecio modified the milestones: v1.8, v2.0 Oct 29, 2019
drs-11 added a commit to drs-11/scrapy that referenced this pull request Aug 25, 2020
@wRAR wRAR closed this in #4753 Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] pillow will always recode images in imagepipieline
4 participants