New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible inconsistency in data preprocessing #4
Comments
Hi @alirezazareian:
By "config files", I think you probably mean the Detectron2 config files in configs/detectron2 directory. These config files are used in fine-tuning tasks that use Detectron2, specifically
VirTex pretraining uses Normalize from albumentations (like ImageNet supervised pretraining) but with default Also, note that our ImageNet supervised pretraining script is exactly the same as torchvision pretraining script — we only use common libraries with rest of our codebase (like albumentations instead of torchvision), and do some code-style formatting.
Detectron2 config and our VirTex config follow a completely different structure, and are meant for very different use-cases. We have a limitation in enforcing too much similarity. But rest assured, the models receive inputs normalized with For this issue, I will make the |
I pushed 91bfd0c with some inline comments and uniform API calls of |
Thank you for your prompt response. It is much more clear now. |
Hi, thank you so much for sharing this code. It is very helpful.
However, I am confused about the data preprocessing configuration. In the config files, Caffe-style image mean and std is used, but it seams they are not used in the code. Instead, the code seems to hard-code torchvision-style mean and std (here). Can you confirm that both pretraining and fine-tuning use the latter?
Furthermore, I am not sure whether the images are in 0-255 range or 0-1. For Caffe-style mean and std, it should be 0-255, but it seems with your hard-coded mean and std, it should be 0-1. However, I noticed you are using opencv to load images, which loads in 0-255, and I did not find anywhere in the code that they are transformed into 0-1, except in supervised pretraining (here).
Could you please comment on the aforementioned issues? Especially it is important to make sure the config is identical for all pretraining and downstream settings. Since you fine-tune all layers and don't freeze the stem, it is hard to notice if such inconsistencies exist, because the fine-tuning process would fix them to some extent.
Thank you so much.
The text was updated successfully, but these errors were encountered: