-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use libjpeg-turbo in CI instead of libjpeg #5941
Conversation
If this works, we can probably close #5184, correct? |
yup |
Yes indeed, that was the reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CI seems to be happy, I think this is good to go. Thanks Nicolas!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -1,13 +1,14 @@ | |||
channels: | |||
- pytorch | |||
- defaults | |||
- conda-forge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @malfet as previously you've tried to phase out conda-forge from TorchVision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two issues I'm slightly concerned about:
- Would it mean that we are testing something different then what we ship?
- This might unintentionally pull in different dependencies, that we think exist in conda, but they are not.
If this is only needed for testing, we can build and host libjpeg-turbo
for all the platforms we care about in pytorch
or pytorch-nightly
channels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback Nikita
Would it mean that we are testing something different then what we ship?
Yes, but only until #5951 is merged. #5951 's goal is to ship torchvision with libjpeg turbo.
It's also not as bad as it sounds: it's fair at this point to assume that libjpeg-turbo's libjpeg.h
is compatible with that of libjpeg. In terms of decoding results, they're both jpeg-compliant as well.
If this is only needed for testing, we can build and host libjpeg-turbo for all the platforms we care about in pytorch or pytorch-nightly channels
Ultimately we don't want this just for testing: we also want to ship torchvision with libjpeg in #5951 . But if we're happy to host libjpeg-turbo in the pytorch
channel, then this might make all this much easier. WDYT @malfet ?
if mode == ImageReadMode.GRAY: | ||
abs_mean_diff = (img_ljpeg.type(torch.float32) - img_pil).abs().mean().item() | ||
assert abs_mean_diff < 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too sure why this happens TBH. Maybe there are some slight differences in our decoding C++ code? But this is not a regression, in fact we're now making this check much stricter than it previously was.
If we merge this, we should test that it has no big effects on the model accuracies. We don't need to test all the models, a sample of 2-3 cases will do. |
@NicolasHug Yes, I meant to comment on the other PR. :( Sorry for the confusion. Should I post it again on the other one or we are good? |
Minor changes detected in accuracy, needs a more discussion.
@NicolasHug I ran a few benchmark checks to see what would be the effect of this change. I do factor in the argument that PIL has changed the backend on 9.x, but I think we should measure carefully the effect on existing pre-trained models, the speed improvements that we get by switching and potential alternative approaches. This is one of these changes that are trivial to make on the code side but might have effects that need to be investigated. Accuracy BenchmarksFirst some good news. The effect on the models is detectable but very small. Here is how the model accuracies change with JPEG and JPEG-turbo:
The above were executed on a single gpu and batch=1 to minimize variations. From the tests that your PR is removing, we know that there are differences between the two implementations (noted also on PIL release notes you reference) but in practice it's mostly noise. It would be worth to check other model families such as Object Detection, Semantic Segmentation and Optical Flow to see if any of them is more sensitive to the change. If the differences continue being so small and the speed gains are significant, I think there is a compelling case to introduce this minor BC-breaking change. Perhaps another argument for doing so is the stability we will get across platforms on our unit-tests and the ability to compare easier the results of PIL vs TorchVision on the reading of images. Alternative approachesThere are alternative routes we can take, some of which are temporary. None of them are "free" and we should consider them only if there is massive effect on the accuracy of the existing models:
index 40d5e26d..982dd2ba 100644
--- a/torchvision/datasets/folder.py
+++ b/torchvision/datasets/folder.py
@@ -243,9 +243,15 @@ IMG_EXTENSIONS = (".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tif
def pil_loader(path: str) -> Image.Image:
# open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835)
+ """
with open(path, "rb") as f:
img = Image.open(f)
return img.convert("RGB")
+ """
+ from torchvision.io.image import read_image
+ from torchvision.transforms.functional import to_pil_image
+ img = read_image(path)
+ return to_pil_image(img).convert("RGB") It's worth noting that this method is not used by most of the datasets right now (they opt for doing
I hope that none of the above will be necessary and that the difference on accuracy will remain minor, allowing us to get away with this minor BC breakage. I would also be OK to merge a PR like this immediately after the upcoming release to give a lot of time to users to detect and flag potential issues. It's highly likely we would need to rerun inference jobs for all models to correct the meta-data and documentation, so this needs to be factored in on the amount of work that this switch entails. It might be worth starting a project doc or an RFC on Github to record all these (instead of having these discussions spread across PRs and issues) to keep track of what we need to do. Happy to chat more on this. |
Thanks a lot for the benchmarks @datumbox . Just sharing initial thoughts below:
Could you share the exact setup you used to compare libjpeg and libjpeg-turbo? Did you rely on
Unfortunately, our |
I tried to isolate the effects of PIL versions as follows:
From the above, I'm fairly confident that our IO
That's a good call out. There are probably no CMYK images on the validation split and that's why I didn't get an error during inference. I think, CMYK is something we should look into doing eventually; I had put a TODO when I worked on the |
This PR makes our CI rely on libjpeg-turbo insteaf of libjpeg. The main benefit of libjpeg-turbo is decoding speed, but this will also allow us to clean up our tests and make them more robust.
Note: This PR only concerns the CI tests, not the packaging for conda or PyPI. I'll look into that in a separate PR.
We had numerous issues and headaches in the past because of differences between PIL and torchvision and their underlying implementation (libjpeg vs libjpeg-turbo), e.g. #3913, #5910 or #5162.
PIL has been shipped on libjpeg-turbo for a while on Windows python-pillow/Pillow#3833 (comment), and since PIL 9, the linux and MacOS wheels are now also linked against turbo.
Shipping with libjpeg-turbo instead of libjpeg will allow us to be fully aligned with PIL, and greatly simplify / cleanup our tests, which had become a bit messy (see the amount of removed code in this PR). Note that internally, we already rely on turbo.
Closes #5184
Closes #5162
Closes #3913
Fixes #5910