Preprocessing for pretrained models? #39

jcjohnson · 2017-01-22T04:25:17Z

What kind of image preprocessing is expected for the pretrained models? I couldn't find this documented anywhere.

If I had to guess I would assume that they expect RGB images with the mean/std normalization used in fb.resnet.torch and pytorch/examples/imagenet. Is this correct?

soumith · 2017-01-24T14:35:25Z

yes, the mean/std normalization that is used in pytorch/examples/imagenet is what is expected. I'll document it now.

Atcold · 2017-02-08T18:37:09Z

@soumith, are you referring to this documentation -> http://pytorch.org/docs/torchvision/models.html
I cannot find any reference to preprocessing the images.
I think the network object should have a preprocessing attribute, where those values are stored. Moreover, they should also have a classes attribute, that let you go from the output max index to the class name.
As they are right now they are hardly usable.
Finally, most of the times, these nets are retrained, so it would be nice to have a method which allows you to replace the final classifier.

Here is a link to the required preprocessing -> https://github.com/pytorch/examples/blob/master/imagenet/main.py#L92-L93

soumith · 2017-03-18T02:23:40Z

documented in the README of vision now.

https://github.com/pytorch/vision/blob/master/README.rst#models

jianchao-li · 2018-07-10T05:04:27Z

Reply for easy reference

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

For training images

preprocessing = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    normalize,
])

For validation images

preprocessing = transforms.Compose([
    transforms.Scale(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
])

youkaichao · 2018-07-27T10:33:51Z

is it better that we can keep the mean and std inside the torchvision models? It is annoying to keep some magic numbers inside the code.

fmassa · 2018-07-30T12:09:45Z

@youkaichao this is a good point, and the pre-trained models should have something like that.
But that's not all of it, as there are other underlying assumptions that are made as well that should be known (image is RGB in 0-1 range, even though that's the current default in PyTorch).
But I'm open to suggestions. I'm not sure where we should include such information: should it be in the state_dict of the serialized models (that can be read specially some mechanism)? Should it be hard-coded in the model implementation?

youkaichao · 2018-07-30T12:38:56Z

@fmassa how about registering mean and std as a buffer?
As for the input range, I think you can print out a line that says "accepted images are in range [0, 1]" at initialization.

fmassa · 2018-07-30T12:41:24Z

Registering them as a buffer is an option, but that also means that we would either need to change the way we do image normalization (which is currently handled in a transform) and do it in the model, or find a way of loading the state dict into a transform.

Both solutions are backwards-incompatible, so I'm not very happy with them...

youkaichao · 2018-07-30T12:56:44Z

@fmassa you can add a parameter at __init__ like pre_process=False, the default value for backwards compatibility, and if pre_process==True, use the registered buffer. this way, users can use pre-defined preprocessing just by setting a boolean flag, which seems much better than searching for the exact mean and std value everywhere

fmassa · 2018-07-30T12:59:19Z

well, the good thing about torchvision models is that (almost) all of them have the same pre-processing values.

Also, it's a bit more involved than that, because before one could just load the model using load_state_dict, but now if we add extra buffers, old users might need to load it using strict=False, or else their loading part will crash.

gursimar · 2018-10-29T05:37:49Z

Hi, I want to extract features from pre-trained resnet pool5 and res5c layer.
I'm using extracted frames (RGB values) from the TGIF-QA dataset (gifs).

Should I transform my image using the values specified above?
I'm using the following preprocessing. Does this okay for my purpose?

loader = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(IMAGE_SIZE),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

fmassa · 2018-10-30T10:10:18Z

@gursimar yes, it should be fine

yashrathi-git · 2022-03-02T10:44:15Z

Hey @Atcold the link no longer works. I still cannot find the documentation for the pre-processing transforms used for various pre-trained models in torchvision. I think the transforms should be included with the model.
Would I get better performance if while fine-tuning I use the same transforms or it doesn't matters?

Do all pretrained models in torchvision use the same pre-processing transforms as described by jianchao-li?

Atcold · 2022-03-07T19:20:46Z

The new link -> https://pytorch.org/vision/stable/models.html

datumbox · 2022-03-07T19:24:54Z

I think the transforms should be included with the model.

They are on the new Multi-weights API. Currently on prototype and you can read more here:

vision/torchvision/prototype/models/resnet.py

Line 113 in d8654bb

transforms=partial(ImageClassificationEval, crop_size=224, resize_size=232),

We plan to roll it out within the next couple of weeks on main TorchVision. We have dedicated issue for feedback.

soumith closed this as completed Mar 18, 2017

ngeorgis mentioned this issue Jan 28, 2019

Is pre-processing image normalization correct? ngeorgis/pytorch_onnx_openvino#1

Closed

pmeier mentioned this issue Oct 9, 2019

Origin of the means and stds used for preprocessing? #1439

Closed

emericit mentioned this issue Jan 27, 2020

Illegal instruction (core dumped) with some pretrained models (but not all) #1782

Closed

vanpersie32 mentioned this issue Apr 12, 2021

Why the preprocessing method differs from the official method of pytorch jackroos/VL-BERT#77

Open

rajveerb pushed a commit to rajveerb/vision that referenced this issue Nov 30, 2023

Minor spelling tweaks (pytorch#39)

cdb75f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing for pretrained models? #39

Preprocessing for pretrained models? #39

jcjohnson commented Jan 22, 2017

soumith commented Jan 24, 2017

Atcold commented Feb 8, 2017 •

edited

Loading

soumith commented Mar 18, 2017

jianchao-li commented Jul 10, 2018 •

edited

Loading

youkaichao commented Jul 27, 2018

fmassa commented Jul 30, 2018

youkaichao commented Jul 30, 2018

fmassa commented Jul 30, 2018

youkaichao commented Jul 30, 2018

fmassa commented Jul 30, 2018

gursimar commented Oct 29, 2018

fmassa commented Oct 30, 2018

yashrathi-git commented Mar 2, 2022 •

edited

Loading

Atcold commented Mar 7, 2022

datumbox commented Mar 7, 2022

Preprocessing for pretrained models? #39

Preprocessing for pretrained models? #39

Comments

jcjohnson commented Jan 22, 2017

soumith commented Jan 24, 2017

Atcold commented Feb 8, 2017 • edited Loading

soumith commented Mar 18, 2017

jianchao-li commented Jul 10, 2018 • edited Loading

youkaichao commented Jul 27, 2018

fmassa commented Jul 30, 2018

youkaichao commented Jul 30, 2018

fmassa commented Jul 30, 2018

youkaichao commented Jul 30, 2018

fmassa commented Jul 30, 2018

gursimar commented Oct 29, 2018

fmassa commented Oct 30, 2018

yashrathi-git commented Mar 2, 2022 • edited Loading

Atcold commented Mar 7, 2022

datumbox commented Mar 7, 2022

Atcold commented Feb 8, 2017 •

edited

Loading

jianchao-li commented Jul 10, 2018 •

edited

Loading

yashrathi-git commented Mar 2, 2022 •

edited

Loading