Possible inconsistency in data preprocessing #4

alirezazareian · 2020-06-23T17:24:06Z

Hi, thank you so much for sharing this code. It is very helpful.

However, I am confused about the data preprocessing configuration. In the config files, Caffe-style image mean and std is used, but it seams they are not used in the code. Instead, the code seems to hard-code torchvision-style mean and std (here). Can you confirm that both pretraining and fine-tuning use the latter?

Furthermore, I am not sure whether the images are in 0-255 range or 0-1. For Caffe-style mean and std, it should be 0-255, but it seems with your hard-coded mean and std, it should be 0-1. However, I noticed you are using opencv to load images, which loads in 0-255, and I did not find anywhere in the code that they are transformed into 0-1, except in supervised pretraining (here).

Could you please comment on the aforementioned issues? Especially it is important to make sure the config is identical for all pretraining and downstream settings. Since you fine-tune all layers and don't freeze the stem, it is hard to notice if such inconsistencies exist, because the fine-tuning process would fix them to some extent.

Thank you so much.

kdexd · 2020-06-23T17:57:00Z

Hi @alirezazareian:

In the config files, Caffe-style image mean and std is used, but it seams they are not used in the code.

By "config files", I think you probably mean the Detectron2 config files in configs/detectron2 directory. These config files are used in fine-tuning tasks that use Detectron2, specifically --d2-config argument in scripts/eval_detectron2.py. Detectron2 accepts Caffe-style ImageNet color mean and std (in range 0-255). I simply changed them from BGR order (Detectron2 default) to RGB order. Detectron2 internally loads images in 0-255 range and normalizes them in that range.

I noticed you are using opencv to load images, which loads in 0-255, and I did not find anywhere in the code that they are transformed into 0-1, except in supervised pretraining (here).

VirTex pretraining uses Normalize from albumentations (like ImageNet supervised pretraining) but with default max_pixel_value = 255.0 (whereas ImageNet supervised pretraining sets it as 1.0, here). Both are equivalent — they load image in 0-255 format, but finally Normalize to N(0, 1) — a standard convention in torchvision.

Also, note that our ImageNet supervised pretraining script is exactly the same as torchvision pretraining script — we only use common libraries with rest of our codebase (like albumentations instead of torchvision), and do some code-style formatting.

Could you please comment on the aforementioned issues? Especially it is important to make sure the config is identical for all pretraining and downstream settings.

Detectron2 config and our VirTex config follow a completely different structure, and are meant for very different use-cases. We have a limitation in enforcing too much similarity. But rest assured, the models receive inputs normalized with N(0, 1) in RGB format during both, pretraining and fine-tuning.

For this issue, I will make the Normalize call uniform between VirTex pretraining (here) and ImageNet supervised pretraining (here). add some helpful inline comments to clear your confusion (along with someone who might have similar questions in future).

kdexd · 2020-06-23T19:01:07Z

I pushed 91bfd0c with some inline comments and uniform API calls of Normalize. My commit message triggered a close on this issue, but feel free to re-open otherwise. I hope this helps!

alirezazareian · 2020-06-23T21:44:42Z

Thank you for your prompt response. It is much more clear now.

kdexd closed this as completed in ff9ad87 Jun 23, 2020

kdexd added a commit that referenced this issue Jun 23, 2020

Fix #4: uniform API calls for normalize and some comments.

91bfd0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible inconsistency in data preprocessing #4

Possible inconsistency in data preprocessing #4

alirezazareian commented Jun 23, 2020

kdexd commented Jun 23, 2020

kdexd commented Jun 23, 2020

alirezazareian commented Jun 23, 2020

Possible inconsistency in data preprocessing #4

Possible inconsistency in data preprocessing #4

Comments

alirezazareian commented Jun 23, 2020

kdexd commented Jun 23, 2020

kdexd commented Jun 23, 2020

alirezazareian commented Jun 23, 2020