Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate models accuracy when using inference transforms on Tensors (instead of PIL images) #6506

Closed
NicolasHug opened this issue Aug 26, 2022 · 6 comments

Comments

@NicolasHug
Copy link
Member

NicolasHug commented Aug 26, 2022

The accuracy of our trained models that we currently report come from evaluations run on PIL images. However, our inference-time transforms also support Tensors.

In the wild, our users might be passing Tensors to the pre-trained models (instead of PIL images), so it's worth figuring out whether the accuracy is consistent between Tensors and PIL.

Note: we do check that all the transforms are consistent between PIL and Tensors, so hopefully differences should be minimal. But models are known to learn interpolation tweaks and in particular the use of anti-aliasing. PIL uses anti-aliasing by default and this is what our models where trained on, but we don't pass antialias=True to the Resize transform, so it might be a source of discrepancy.

As discussed internally with @datumbox, figuring that out is part of the transforms rework plan (although it's relevant outside of the rework as well).

cc @vfdev-5 @datumbox

@datumbox
Copy link
Contributor

PIL uses anti-aliasing by default and this is what our models where trained on, but we don't pass antialias=True to the Resize transform, so it might be a source of discrepancy.

@pmeier that's an important point that we need to include in the #6433 PR

@NicolasHug
Copy link
Member Author

Some partial results below using #6508. efficientnet seems like an outlier but for all the rest, transforming on Tensors led to significantly worse accuracy, unless antialias=True is used. Perhaps this warrants a more in-depth analysis, but I wonder if we should change these presets to use antialias=True by default? This might even qualify as a bugfix?


All of these are on the DEFAULT weights in this order:

transforms on PIL Images with antialias=True (PIL doesn't support no-antialias anyway)
transforms on Tensor with antialias=False (current default)
transforms on Tensor with antialias=True

Note: PIL was used for jpeg decoding in all cases - we're only interested about variance w.r.t. to the transforms here, not about the decoding (although that might still be worth considering).

mobilenet_v3_large
Test:  Acc@1 75.274 Acc@5 92.564
Test:  Acc@1 74.264 Acc@5 92.232
Test:  Acc@1 75.246 Acc@5 92.580

resnet50
Test:  Acc@1 80.848 Acc@5 95.428
Test:  Acc@1 79.822 Acc@5 94.962
Test:  Acc@1 80.828 Acc@5 95.446

vit
Test:  Acc@1 81.068 Acc@5 95.318
Test:  Acc@1 80.858 Acc@5 95.216
Test:  Acc@1 81.054 Acc@5 95.310

efficientnet_b4
Test:  Acc@1 83.382 Acc@5 96.600
Test:  Acc@1 83.420 Acc@5 96.572
Test:  Acc@1 83.412 Acc@5 96.600

shufflenet_v2_x1_0
Test:  Acc@1 69.348 Acc@5 88.324
Test:  Acc@1 68.280 Acc@5 87.648
Test:  Acc@1 69.324 Acc@5 88.346

@datumbox
Copy link
Contributor

@NicolasHug thanks for publishing the results this looks very interesting. It's obvious that antialiasing will have a massive effect to users when they start using more the Tensor backend.

@pmeier @vfdev-5 These are the numbers I quoted on our offline chat today. It's worth noting that if we wanted to ensure that the user has control over the antialiasing option on Transforms, we would have to expose this option to every Transform that uses resize behind the scenes.

@NicolasHug
Copy link
Member Author

NicolasHug commented Sep 28, 2023

Closed by #7949 (background: #7093)

@CA4GitHub
Copy link

CA4GitHub commented Oct 5, 2023

Re PIL images vs tensors, I was experiencing the issue with antialias set to "warn" in the older torchvision versions. After looking at the forward method of the Image Classification class in transforms _presets.py, I decided to test each step. Using the torchvision.transforms.functional.resize method I noticed differences between PIL inputs and Tensor inputs. However, for a few tests I ran it also seemed like the PIL images I used were uint8 with pixels values with dynamic range 0-255 and the Tensor was a float32 with pixel values with dynamic range 0.0-1.0. Converting the Tensor to a uint8 with values in the range 0-255 (i.e. (255*the_tensor).to(torch.uint8) ) resulted in the differences of the resize method outputs going away.

I wonder if the real difference is due to using different data types (uint8 vs float32) and/or dynamic ranges. I think it's probably the different data types, but a more thorough investigation would be useful.

@NicolasHug
Copy link
Member Author

NicolasHug commented Oct 5, 2023

Hi @CA4GitHub ,

It's not about dtype,, it's really about antialiasing. In fact even if you passed a uint8 tensor image it would be converted to float internally, resized, and back to uint8.

(that's not the case anymore as of... yesterday, with v2.Resize() in 0.16 which properly handle uint8 natively).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants