-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Do not degrade JPEG files. #3689
Conversation
Hey @Gallaecio, could you help me please? The tests are failing only for Python 3.7. I am not sure if it is related to my changes. |
Hey @kmike, could you help me with this please? The tests are failing only for Python 3.7. I am not sure if it is related to my changes. |
Codecov Report
@@ Coverage Diff @@
## master #3689 +/- ##
=========================================
+ Coverage 84.54% 85.44% +0.9%
=========================================
Files 167 169 +2
Lines 9420 9989 +569
Branches 1402 1586 +184
=========================================
+ Hits 7964 8535 +571
Misses 1199 1199
+ Partials 257 255 -2
|
The error is not related to your pull request. That test failure occurs randomly, we’ll have to look into it eventually. In the meantime, I’ve closed and reopened your pull request, which triggers the test runs again. Hopefully there will be no false positives this time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a test to cover this change, one that verified that JPEG/RGB images are not modified at a binary level.
@Gallaecio Can you review this again? I have made the changes suggested by you. |
@Gallaecio Can you review this again please? If I am not wrong, my changes have decreased the code coverage, hence codecov is complaining. How should I increase it? |
I’ve left a few minor comments. But do not worry about coverage, its integration with GitHub feels a bit clunky at times, if you look at the actual details I think you are increasing coverage. |
I think this is ready. Thank you! |
scrapy/pipelines/images.py
Outdated
yield thumb_path, thumb_image, thumb_buf | ||
|
||
def convert_image(self, image, size=None): | ||
def convert_image(self, image, response_body, size=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is backwards incompatible; there are image pipeline subclasses in a wild which override this method - they'll stop working because of signature change. I've checked some codebases, and found examples of this. Pipeline may also override a method which calls this method. Could you please make it backwards compatible? It seems something like this is needed:
- make response_body optional (None by default), make it a last argument
- return buf if response_body is not passed (i.e. disable the feature in this PR if a middleware uses deprecated signature).
- because user may override convert_image method, pipeline shouldn't always pass response_body - it should inspect convert_image method signature, and pass response_body only if this argument is present.
- issue a warning when response_body is not passed (ScrapyDeprecationWarning) - it seems two warnings are needed, one when convert_image is overridden in incompatible way, and another one is when convert_image is called from an overridden method in incompatible way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmike I have made changes, could you review this again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anubhavp28 One thing you could do, since the goal is to support the previous API, is to, instead of modifying the tests to use the new API, keep the old tests and add new tests that test the new API.
You could also extend the old tests to ensure that a deprecation warning is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gallaecio I am not sure how to write tests to ensure that deprecation warnings are received. Currently, I am just counting the number of warnings. Would that be enough?
scrapy/tests/test_pipeline_images.py
Line 111 in 881bade
self.assertTrue(len(w) >= 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anubhavp28 Please, check other usages of catch_warnings
in the tests and you’ll see how to check the warning message as well. You could ensure that the warnings contain certain text to ensure that they are the expected warnings, and not some other warnings.
scrapy/pipelines/images.py
Outdated
yield path, image, buf | ||
|
||
for thumb_id, size in six.iteritems(self.thumbs): | ||
thumb_path = self.thumb_path(request, thumb_id, response=response, info=info) | ||
thumb_image, thumb_buf = self.convert_image(image, size) | ||
if convert_image_overriden: | ||
_warn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think you need to log a warning each time you check the value of convert_image_overriden
, logging the warning once should be enough.
@Gallaecio I have made few changes. Though the AppVeyor build is failing, it doesn't seem to be related to my changes? Could you review this again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve left a few minor comments, but it looks good to me.
@Gallaecio I have made the changes suggested by you. Could you review this again? |
self._deprecated_convert_image = 'response_body' not in get_func_args(self.convert_image) | ||
if self._deprecated_convert_image: | ||
from scrapy.exceptions import ScrapyDeprecationWarning | ||
import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move these import to the top level, as well as imports in convert_image?
yield thumb_path, thumb_image, thumb_buf | ||
|
||
def convert_image(self, image, size=None): | ||
def convert_image(self, image, size=None, response_body=None): | ||
if not response_body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not response_body: | |
if response_body is None: |
I think response_body can be really empty in some cases.
@@ -151,7 +175,9 @@ def convert_image(self, image, size=None): | |||
if size: | |||
image = image.copy() | |||
image.thumbnail(size, Image.ANTIALIAS) | |||
|
|||
elif response_body and image.format == 'JPEG': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif response_body and image.format == 'JPEG': | |
elif response_body is not None and image.format == 'JPEG': |
@@ -76,35 +76,71 @@ def test_thumbnail_name(self): | |||
'thumbs/50/850233df65a5b83361798f532f1fc549cd13cbe9.jpg') | |||
|
|||
def test_convert_image(self): | |||
# tests for old API | |||
with warnings.catch_warnings(record=True) as w: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please move "tests for old API" to a separate test method (or multiple test methods)?
Fixes #3055 by not converting a JPEG file again.