Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] pillow will always recode images in imagepipieline #3055

Closed
NewUserHa opened this issue Dec 31, 2017 · 13 comments · Fixed by #4753
Closed

[bug] pillow will always recode images in imagepipieline #3055

NewUserHa opened this issue Dec 31, 2017 · 13 comments · Fixed by #4753

Comments

@NewUserHa
Copy link
Contributor

image.save(buf, 'JPEG')

this line will always recode images silently and damage the image quality.

please add an option to avoid this.

@raphapassini
Copy link
Contributor

Well, seems reasonable to me, but you can simply extend the default image pipeline and override the method to do what you want.

@NewUserHa
Copy link
Contributor Author

I don't know image pipeline of scrapy will always damage the quality of image, even after have read scrapy's documents.

@ghost
Copy link

ghost commented Jan 11, 2018

Hi @NewUserHa - I'm sorry this has been a problem for you! The documents do say, at the higher-level overview, that the images pipeline will convert all images to a common format, namely JPG with RGBA colourspace.

I think similarly to you, that unless the user explicitly requests recoding, the images would be better left alone. But, that's one of the big differences between simply using the Files pipeline and the Images pipeline, I guess. I'd suggest using the Files pipeline for now, if you want the unaltered images to be exactly as they occur online.

@NewUserHa
Copy link
Contributor Author

I meant images pipeline will still convert a jpg file to a less quality jpg file. is that a problem?

@ghost ghost added docs discuss labels Feb 16, 2018
@ghost
Copy link

ghost commented Feb 16, 2018

I'd need to read the code again to see whether quality is 'expected' to change under normal conditions.. but I do think there's a problem with the docs if this behaviour is surprising.

@NewUserHa
Copy link
Contributor Author

NewUserHa commented Feb 16, 2018

it's a problem. jpg encoding is lossy. pillow shouldn't reencode jpg to jpg at least.
but I noticed not all jpgs will not be re-encoded by pillow. maybe there's difference within the jpgs(I checked both they are jpgs, not only the extensions.)

@Gallaecio
Copy link
Member

I believe the documentation is OK.

The Images Pipeline uses Pillow_ for thumbnailing and normalizing images to JPEG/RGB format.

It says JPEG/RGB. The RGB part must be the culprit of the quality change @NewUserHa is experiencing.

I’ve checked the code and JPEG/RGB images are not converted.

@NewUserHa If you believe they are, please share an example image that gets converted and should not. Otherwise, I believe we should close this issue.

@NewUserHa
Copy link
Contributor Author

Did you try yourself?
Image.open('a.jpg').save('b.jpg', 'JPEG')
I just pick a jpg randomly and run it with the above code, and it was re-encoded just like my last try.

DID you really double check it ?
picture info:

JPEG, quality: 77, subsampling ON (2x2)
16,7 Million   (24 BitsPerPixel)

@Gallaecio
Copy link
Member

If that is so, please provide the smallest image possible that allows to reproduce the issue. It can be used to test a fix.

@NewUserHa
Copy link
Contributor Author

Q20161115204613

@Gallaecio
Copy link
Member

Although I could not visually appreciate any difference, Image.open('original.jpg').save('modified.jpg', 'JPEG') does indeed slightly reduce the image size. Image.open('original.jpg').save('modified.jpg', 'JPEG', quality='keep') does not maintain the size, it increases it instead.

I think we could consider modifying convert_image to accept BytesIO(response.body) as a second parameter, and return it as buf when no conversion is necessary. @cathalgarvey I think this could be a good first issue.

@drs-11
Copy link
Contributor

drs-11 commented Aug 23, 2020

Looks like the author of the linked PR hasn't responded to the last few suggestions.
Can I take his PR and finish this issue?

@elacuesta
Copy link
Member

@drs-11 By all means, be our guest. The scenario is covered in our contributing guidelines:

Sometimes there is an existing pull request for the problem you’d like to solve, which is stalled for some reason. Often the pull request is in a right direction, but changes are requested by Scrapy maintainers, and the original pull request author hasn’t had time to address them. In this case consider picking up this pull request: open a new pull request with all commits from the original pull request, as well as additional changes to address the raised issues. Doing so helps a lot; it is not considered rude as soon as the original author is acknowledged by keeping his/her commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants