Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Optimization #5548

Closed
wants to merge 14 commits into from
Closed

Image Optimization #5548

wants to merge 14 commits into from

Conversation

vardaan-raj
Copy link

I have simply optimized the images by compressing their size without compromising their quality to make them more space efficient

ImgBotApp and others added 2 commits August 29, 2021 00:23
*Total -- 5,465.43kb -> 4,908.52kb (10.19%)

/skimage/data/palette_gray.png -- 0.91kb -> 0.26kb (71.55%)
/skimage/data/horse.png -- 16.24kb -> 6.93kb (57.33%)
/skimage/data/astronaut.png -- 773.00kb -> 412.79kb (46.6%)
/skimage/data/palette_color.png -- 1.00kb -> 0.60kb (39.74%)
/skimage/data/bw_text.png -- 8.15kb -> 5.20kb (36.22%)
/skimage/data/chessboard_RGB.png -- 1.10kb -> 0.73kb (33.9%)
/doc/source/gitwash/pull_button.png -- 12.59kb -> 8.70kb (30.92%)
/doc/source/gitwash/forking_button.png -- 12.79kb -> 8.91kb (30.29%)
/skimage/data/phantom.png -- 3.31kb -> 2.34kb (29.33%)
/doc/source/skips/_static/skip-flowchart.png -- 12.62kb -> 9.00kb (28.7%)
/skimage/data/clock_motion.png -- 57.41kb -> 43.30kb (24.57%)
/skimage/data/moon.png -- 49.00kb -> 42.77kb (12.71%)
/skimage/data/logo.png -- 175.51kb -> 156.38kb (10.9%)
/skimage/data/page.png -- 46.56kb -> 41.62kb (10.62%)
/skimage/data/microaneurysms.png -- 4.83kb -> 4.38kb (9.47%)
/skimage/data/chelsea.png -- 234.88kb -> 214.01kb (8.88%)
/skimage/data/no_time_for_that_tiny.gif -- 4.33kb -> 3.97kb (8.43%)
/skimage/data/green_palette.png -- 1.67kb -> 1.55kb (6.74%)
/doc/source/gitwash/branch_dropdown.png -- 15.93kb -> 14.99kb (5.9%)
/skimage/data/hubble_deep_field.jpg -- 515.57kb -> 487.74kb (5.4%)
/skimage/data/coffee.png -- 455.77kb -> 431.74kb (5.27%)
/doc/source/user_guide/data/elevation_map.jpg -- 73.92kb -> 70.04kb (5.25%)
/skimage/data/retina.jpg -- 263.25kb -> 251.98kb (4.28%)
/skimage/data/rocket.jpg -- 109.89kb -> 106.36kb (3.21%)
/skimage/data/ihc.png -- 466.71kb -> 456.50kb (2.19%)
/doc/source/user_guide/data/denoise_viewer_window.png -- 89.45kb -> 88.20kb (1.4%)
/skimage/registration/tests/data/OriginalX75Y75.png -- 60.83kb -> 60.03kb (1.32%)
/skimage/registration/tests/data/OriginalX-130Y130.png -- 40.00kb -> 39.48kb (1.32%)
/skimage/data/grass.png -- 212.79kb -> 210.01kb (1.3%)
/doc/source/themes/scikit-image/static/img/logo.png -- 44.33kb -> 43.76kb (1.28%)
/skimage/registration/tests/data/OriginalX130Y130.png -- 40.81kb -> 40.30kb (1.23%)
/skimage/registration/tests/data/TransformedX75Y75.png -- 58.84kb -> 58.13kb (1.2%)
/skimage/data/motorcycle_left.png -- 629.59kb -> 622.69kb (1.1%)
/skimage/data/motorcycle_right.png -- 625.36kb -> 618.77kb (1.05%)
/skimage/registration/tests/data/TransformedX-130Y130.png -- 43.19kb -> 42.74kb (1.04%)
/skimage/data/coins.png -- 74.05kb -> 73.31kb (1%)
/skimage/registration/tests/data/TransformedX130Y130.png -- 39.58kb -> 39.32kb (0.64%)
/skimage/data/gravel.png -- 189.69kb -> 188.99kb (0.37%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
@grlee77
Copy link
Contributor

grlee77 commented Aug 29, 2021

Thanks @vardaan-raj.

We should not modify any images under the skimage/data folder. Changes to those will break the SHA256 checksums stored in skimage/data/_registry.py and may cause some test case failures or unexpected changes to downstream user code that relies on the exact pixel values. I think it happened to pass on the Azure CI in this PR because it downloaded previously cached data files (indicating that perhaps we should add additional checks of when to reuse the cache).

The majority of the image size in this project in terms of the website are the images that get generated for the gallery examples at runtime during the documentation builds. We do have image compression (via optipng) running there (see #4800).

@grlee77
Copy link
Contributor

grlee77 commented Aug 29, 2021

I was surprised this passes tests, but it is because the checks here:

if _has_hash(resolved_path, expected_hash):

and here:
if _has_hash(gh_repository_path, expected_hash):

Do not raise an error if the file is found, but the SHA256 does not match. It will instead just move onto the next source, eventually pulling the file from the main branch in this repository. So, none of the tests here are using the modified data files!

@JDWarner
Copy link
Contributor

JDWarner commented Aug 29, 2021

I agree with @grlee77 for JPEG which is inherently lossy so any re-compression or transcoding will produce artifacts or subtle changes. There are a few .jpg files in here which should be reverted.

However, the majority of these are PNG which is an inherently lossless format which uses lossless compression under the hood (like 7zip or FLAC, not like MP3). For PNGs it is safe to experiment with compression. Though it would require changing the checksums, swapping in a more optimally compressed PNG file yields the exact same data - and this should be simple to verify by loading them in pairs.

For just the PNG files here, could you sum the pre/post size @vardaan-raj to understand the potential space saved? A brief description of the PNG compression technique used would also be appreciated.

@vardaan-raj
Copy link
Author

vardaan-raj#1 (comment)

If you can just go to the above link you can see that the image file size has been reduced by 10% and to check for individual size reduction of each image just click on details and you can see a chart showing the initial sizes along with the new compressed sizes.

@vardaan-raj
Copy link
Author

The total image size was initially 5,465.43kb and after compression, it came down to 4,908.52kb, a 10.19% reduction

@grlee77
Copy link
Contributor

grlee77 commented Aug 29, 2021

The total image size was initially 5,465.43kb and after compression, it came down to 4,908.52kb, a 10.19% reduction

In one case a PNG shrank by > 70% and a number of others were > 30% which is not negligable.

@JDWarner's suggestion sounds good to me:
1.) revert the changes to any JPEG files in the data folder
2.) update SHA256 checksums in skimage/data/_registry.py for any PNG files that were modified. This will not need to be done for any of the images under the doc folder, but any modified files that already have a checksum in that registry file will need to be updated.

Let me know if the above is not clear or you need help with either of these suggestions

@vardaan-raj
Copy link
Author

Only PNG files have been optimised by img-bot

@vardaan-raj
Copy link
Author

How do I update the checksums?

@grlee77
Copy link
Contributor

grlee77 commented Aug 29, 2021

How do I update the checksums?

On linux I went to the skimage/data folder in a terminal and ran shasum -a 256 *.png. For this branch, I get the following (only a subset of these were changed by imgbot).

f108ee4bbd66a89e2b0d6d5fd81c7e40b6cc74a03b083a890f0135b878413b00  astronaut.png
4d2d021f0a7d8aa16bae8466cf35b1830ae60a3af3d0ab2518e5ca4940c6eb80  block.png
7966caf324f6ba843118d98f7a07746d22f6a343430add0233eca5f6eaaa8fcf  brick.png
c51a0b84a9269594de73f99579720b333bcface4e99519dfd3be44167818791e  bw_text.png
b0793d2adda0fa6ae899c03989482bff9a42d3d5690fc7e3648f2795d730c23a  camera.png
8d23a7fb81f7cc877cd09f330357fc7f595651306e84e17252f6e0a1b3f61515  cell.png
2e207e486545874a2a3e69ba653b28fdef923157be9017559540e65d1bcb8e28  checker_bilevel.png
d7784fad50b59a15b0b1fcff9904a9ce199a60eda0c368de1aae4f969f9e888e  chelsea.png
3e51870774515af4d07d820bd8827364c70839bf9b573c746e485095e893df90  chessboard_GRAY.png
13982eb7c596e16d003ecada8471120975d76572688f46b5b1b5f76780753cbe  chessboard_RGB.png
0f8b39c9829bf019c8168393813a4aa1b99e09b8c693a2e0debb6ae46b8b52ae  clock_motion.png
78769c7b02a40adee579122b73174b60da31474c6134240ec10be806e40d63ed  coffee.png
f44803741520d7f3529f4cf4adff9982c8b900b3ba6b4a6429a8e230bc6fe850  coins.png
7d2df993de2b4fa2a78e04e5df8050f49a9c511aa75e59ab3bd56ac9c98aef7e  color.png
48a64c25c6da000ffdb5fcc34ebafe9ba3b1c9b61d7984ea7ca6dc54f9312dfa  foo3x5x4indexed.png
d6fbd3f7c094262295a651c8b1aa193b6248bba0a3c5347b935c13e16fe8097f  grass.png
056cee2711fb76c49d9efb338bd261f2adc57c45d93a911081b7bdded6dd5515  gravel.png
343f4aaed9658dac0aa6c9c65820643b772c44080c7f4bd799a5552c13273535  green_palette.png
f742f76baff3b84c726d9b921b8c30b64e8c4d4fc4c0d7c8a0458292467bbf6f  horse.png
0c2324ab4adf58710c608a18f3ef8e687a4e2955de44a2e88295c0d099e6fba2  ihc.png
ee790cc1d9c6528f14c307feabb9417f0c39ba3ce815c3c2c65b6cb0c8a6e794  logo.png
d4f8809a453b4470167f98703d5daad37a5b994123b6bc6759e0c48ee253d14b  microaneurysms.png
a5fcc42fc4a34f0f2e5c87fdfdc10a5966f3df1bfefbbba7489d9c4bab13a504  moon.png
8e5b5fcb6cd6b3bf39cfdb93e93bb646c5fc37b08ff24b8de836bb562e3aff16  motorcycle_left.png
b8d1ddd7f55068f199a362965f504eb528136e9f7ad78a490aafdb05d890e8dc  motorcycle_right.png
c702c62db86cf6c0a43e712c844b137f78c4d9c31527b5063cf5bf85d6102e75  page.png
60f1cf3990989782fc26cd4de7f5d91504dd057280464b744be7d618ffd6f7f5  palette_color.png
101f460e14b13715a32e78f339dd2189a47d8fd8805824ffdccd9bd219c1664a  palette_gray.png
7689063ef8453e31a85e056013e76f793d5e31bf5fa29745177456bbbd7c9274  phantom.png
bd84aa3a6e3c9887850d45d606c96b2e59433fbef50338570b63c319e668e6d1  text.png

and for the files in skimage/registration/tests

653de553b1a57f69534978620794ba57aba7f11b0f6dc67f97148e8305441c41  OriginalX-130Y130.png
ed34208397350e87f773cb51335b8940406a9b0b3d3b52757555826dacee53c3  OriginalX130Y130.png
6bf6b88f97fa6d845ab725723e35c1c71f0ded56c5a5a7f4e64b0e70e418a5fe  OriginalX75Y75.png
900df2cbc38da8b2ff044780b836f146d974ab41d4be05c7e5aeff8211191fc6  TransformedX-130Y130.png
5a6766807d6218165d52283dbee584938b22d28f1180af53abc3427f6262abb4  TransformedX130Y130.png
c61bbf89b33dbbae2946191538a81cadc973811ba6544c7324459ac3627c8b91  TransformedX75Y75.png

You would need to change the existing dictionary entries in this variable to match the new values:

If there were hundreds of values to update it would probably be worth writing a custom script to help with this, but it is probably easiest in this case to just paste the values from the terminal into the file individually.

@pep8speaks
Copy link

pep8speaks commented Aug 30, 2021

Hello @vardaan-raj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 45:42: E231 missing whitespace after ':'
Line 45:80: E501 line too long (109 > 79 characters)
Line 47:41: E231 missing whitespace after ':'
Line 47:80: E501 line too long (108 > 79 characters)
Line 48:42: E231 missing whitespace after ':'
Line 48:80: E501 line too long (109 > 79 characters)
Line 88:80: E501 line too long (93 > 79 characters)
Line 93:80: E501 line too long (98 > 79 characters)
Line 94:80: E501 line too long (91 > 79 characters)
Line 95:80: E501 line too long (96 > 79 characters)
Line 96:80: E501 line too long (90 > 79 characters)
Line 97:80: E501 line too long (89 > 79 characters)
Line 100:80: E501 line too long (89 > 79 characters)
Line 101:80: E501 line too long (89 > 79 characters)
Line 103:80: E501 line too long (87 > 79 characters)
Line 106:80: E501 line too long (98 > 79 characters)
Line 107:80: E501 line too long (88 > 79 characters)
Line 108:80: E501 line too long (99 > 79 characters)
Line 109:80: E501 line too long (100 > 79 characters)
Line 112:80: E501 line too long (88 > 79 characters)
Line 113:80: E501 line too long (91 > 79 characters)
Line 116:80: E501 line too long (90 > 79 characters)
Line 128:80: E501 line too long (97 > 79 characters)
Line 130:80: E501 line too long (97 > 79 characters)
Line 134:80: E501 line too long (105 > 79 characters)
Line 140:80: E501 line too long (91 > 79 characters)
Line 150:80: E501 line too long (120 > 79 characters)
Line 151:80: E501 line too long (119 > 79 characters)
Line 152:80: E501 line too long (117 > 79 characters)
Line 153:80: E501 line too long (123 > 79 characters)
Line 154:80: E501 line too long (122 > 79 characters)
Line 155:80: E501 line too long (120 > 79 characters)

Comment last updated at 2021-08-31 05:21:42 UTC

@vardaan-raj
Copy link
Author

I have just updated the registry to include the new hash values for the modified png images, can you help as to why the checks are failing??

@vardaan-raj
Copy link
Author

I have updated each individual modified png file in the registry with the new checksum value

@vardaan-raj
Copy link
Author

Can someone please update me on what to do next??

many lines containing SHA256 values exceed 80 characters. It will be less readable if we
break up the lines.
see: https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes
skimage/io/test_pil.py's test_palette_is_gray requires this file have mode P, not L
@grlee77
Copy link
Contributor

grlee77 commented Aug 30, 2021

Can someone please update me on what to do next??

There are a mixture of real failures here and ones that are due to the fetcher comparing hashes in this PR to data that was fetched from main. Those SHA256 errors will resolve themselves if we merge.

There is one file, no_time_for_that_tiny.gif whos SHA256 changed, but it hadn't updated in the table

However, there are also a couple of more subtle errors here!

Specifically the "mode" of several of the PNG files (see Pillow docs on mode) has changed. Two of these changes cause test suite failures:
1.) logo.png was RGBA, but the compression here dropped the alpha channel (because it was all ones). However there is a test that explicitly checks that this file has 4 channels.
2.) palette_gray.png had mode P (palette) before, but now has mode L (uint8 grayscale).

Those are the only two mode changes that cause a test suite error, but there are others that got changed from RGB to type L or 1 as well:

measured mode changes

bw_text.png  RGB -> 1
chessboard_GRAY.png   RGB->I
horse.png   RGBA->LA
logo.png  RGBA->RGB
palette_gray.png  P->L
phantom.png   RGB->L

I am not sure we should be making any of those changes even if they are not causing a test suite error.

Modes were determined via navigating to the data folder and running the following script:

from glob import glob

from PIL import Image

img_files = sorted(glob('*.png') + glob('*.gif'))
for f in img_files:
    print(f, Image.open(f).mode)

@vardaan-raj
Copy link
Author

I have updated the sha256 for no_time_for_that_tiny.gif in the registry

ImgBotApp and others added 2 commits August 30, 2021 21:00
*Total -- 176.42kb -> 156.64kb (11.21%)

/skimage/data/palette_gray.png -- 0.91kb -> 0.26kb (71.55%)
/skimage/data/logo.png -- 175.51kb -> 156.38kb (10.9%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
@vardaan-raj
Copy link
Author

What's the next step?

Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay now. My only question is if we should be even more conservative and not change any of the files under skimage/data.

This one will not pass the CI tests given how the data gets pulled from main so the SHA256 is going to mismatch until we merge this.

@grlee77
Copy link
Contributor

grlee77 commented Sep 2, 2021

What's the next step?

I have approved it now. Let's see if we can get a review from another maintainer before merging.

@rfezzani rfezzani added the 🔧 type: Maintenance Refactoring and maintenance of internals label Oct 25, 2021
@stefanv
Copy link
Member

stefanv commented Feb 6, 2023

I don't think we can make this change, unless we keep the old copies in the repository as well (otherwise older versions of scikit-image won't be able to fetch their data). So, as much as I appreciate the idea behind this PR, I think we'll have to close.

@grlee77 Happy to discuss if you think this is too conservative.

@stefanv stefanv closed this Feb 6, 2023
@grlee77
Copy link
Contributor

grlee77 commented Feb 16, 2023

I don't think we can make this change, unless we keep the old copies in the repository as well (otherwise older versions of scikit-image won't be able to fetch their data).

Hmm, fetch points to the version-specific branch. Those release branches would not be modified by this, so it should still work after the change. I guess I did not consider what the behavior was prior to the introduction of pooch, though? Is that what you are referring to? Overall, agree that this may not be worth the risk of breaking things.

@stefanv
Copy link
Member

stefanv commented Feb 16, 2023

@grlee77 If you think the change is worth it, that it won't cause problems (I was thinking of the registry itself, but sounds like that's not a real concern), and you'd like to shepherd it through, please go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 type: Maintenance Refactoring and maintenance of internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants