Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Origin of the means and stds used for preprocessing? #1439

Closed
pmeier opened this issue Oct 9, 2019 · 17 comments
Closed

Origin of the means and stds used for preprocessing? #1439

pmeier opened this issue Oct 9, 2019 · 17 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Oct 9, 2019

Does anyone remember how exactly we came about the channel means and stds we use for the preprocessing?

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

I think the first mention of the preprocessing in this repo is in #39. In that issue @soumith points to https://github.com/pytorch/examples/tree/master/imagenet for reference. If you look at the history of main.py the commit pytorch/examples@27e2a46 first introduced the values. Unfortunately it contains no explanation, hence my question.

Specifically, I'm seeking answers to the following questions:

  • Are these values rounded, floored, or even ceiled?
  • Did we use only the images in the training set of ImageNet or additionally the images of the validation set?
  • Did we perform any kind of resizing or cropping on each image before the calculations were performed?

I've tested some combinations and will post my results here.

Parameters mean std
train set only, no resizing / cropping [0.4803, 0.4569, 0.4083] [0.2806, 0.2736, 0.2877]
train set only, resize to 256 and center crop to 224 [0.4845, 0.4541, 0.4025] [0.2724, 0.2637, 0.2761]
train set only, center crop to 224 [0.4701, 0.4340, 0.3832] [0.2845, 0.2733, 0.2805]

While the means match fairly well, the std differ significantly.


Update:

The process for obtaining the values of mean and std was roughly equivalent to the following but the the concrete subset that was used is lost:

import torch
from torchvision import datasets, transforms as T

transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.PILToTensor(), T.ConvertImageDtype(torch.float)])
dataset = datasets.ImageNet(".", split="train", transform=transform)

means = []
stds = []
for img in subset(dataset):
    means.append(torch.mean(img))
    stds.append(torch.std(img))

mean = torch.mean(torch.tensor(means))
std = torch.mean(torch.tensor(stds))

See #1965 for the reproduction experiments.

@nizhib
Copy link

nizhib commented Oct 10, 2019

You need to go deeper ;)

https://github.com/facebook/fb.resnet.torch/blob/master/datasets/imagenet.lua

-- Computed from random subset of ImageNet training images
local meanstd = {
   mean = { 0.485, 0.456, 0.406 },
   std = { 0.229, 0.224, 0.225 },
}

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 11, 2019

For my project I need to know the covariances between the channels. Since they are not part of the current implementation, my hope was that I can calculate them myself if I know the necessary images and processing. Unfortunately

random subset

gives me little hope that I'm able to do that. I suppose no one remembers how this random subset was selected?

Should we investigate this further? I'm a little anxious that we simply use this normalization for all our models without being able to reproduce it.

@fmassa
Copy link
Member

fmassa commented Oct 14, 2019

@colesbury do you have more information here to clarify on the mean / std for imagenet that we use?

@soumith
Copy link
Member

soumith commented Oct 14, 2019

afaik we calculated the mean / std to use by running one pass on the training set of Imagenet

@soumith
Copy link
Member

soumith commented Oct 14, 2019

that being said, i see that std is not matching. possibly a bug of the past or some detail that we completely forgot about :-/

@apple2373
Copy link

Can we put batch normalization layer before input so that mean/std will be computed automatically in the training time?

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 21, 2019

@apple2373 We currently implementing the transforms for tensors in order to be able to use them within a model (see #1375). Whether we want to include them within the models is AFAIK still up for discussion (see #782)

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 21, 2019

@fmassa @soumith

Any update on this? Do we investigate further or keep it as is?

@fmassa
Copy link
Member

fmassa commented Oct 21, 2019

@pmeier I don't know if we will ever be able to get back those numbers, given that they seem to have been computed on a randomly-sampled part of the dataset.

If we really want to see if this has any impact, we would run multiple runs of end-to-end training with the new mean/std and see if it brings any noticeable improvement.

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 21, 2019

I don't think we get significant improvement (or decline) of performance. I just think we shouldn't use numbers that are not reproducible. A change like this is of course a lot of work, BC breaking etc, but we don't know what the future brings. Maybe this becomes significant in the future and than its even harder to correct.

@fmassa
Copy link
Member

fmassa commented Oct 21, 2019

don't think we get significant improvement (or decline) of performance. I just think we shouldn't use numbers that are not reproducible.

I agree. But given the scale of how things would break with such a change, I think we should just live with it for now, and maybe document somewhere the findings you have shown in here.

@colesbury
Copy link
Member

It's been almost four years, so I don't remember, but I probably just used the mean / std from the previous Lua ImageNet training script:

https://github.com/soumith/imagenet-multiGPU.torch/blob/deb5466a16e54ec7a69fe027e5fbcd3c1bfb49cc/donkey.lua#L161-L187

It uses the average standard deviation of an individual image's channel instead of the an estimate of the standard deviation across the entire dataset.

I don't think we should change the mean/std, nor do I see any reproducibility issue. The scientific result here is the neural network, not mean/std values. Especially since the exact choice does not matter as long as they approximately whiten the input.

@nizhib
Copy link

nizhib commented Oct 23, 2019

A change like this is of course a lot of work, BC breaking etc, but we don't know what the future brings.

These numbers have become a standard for most neural networks created so far, it's not just a lot of work — one need to retrain hundreds of neural networks (approx. 2 gpu x week each for a model like resnet50) and create pull requests for all the pretrainedmodels/dpn/wide resnets/etc. repos all over the github just to adjust normalizing std by 0.05. What the future can justify this?

@fmassa
Copy link
Member

fmassa commented Oct 25, 2019

Following the discussion that we had in here, I agree with @colesbury and @nizhib points above.

@pmeier would you like to send a PR adding some summary of the discussion that we had here, including @colesbury comment on how those numbers were obtained?

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 28, 2019

I'm covered for the next weeks. This will take some time.

@Stannislav
Copy link

Maybe the reason why the stds don't match is that it was originally called with unbiased=False?

@pmeier
Copy link
Collaborator Author

pmeier commented Jul 7, 2020

@Stannislav in #1965 I've managed to get pretty close the the original numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants