-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global query: wrong repeating #10
Comments
@shkarupa-alex Thanks for your comment. I tested it again but get the correct order of images in the batch for both train and validation. We will test again using different Pytorch versions and hardware to ensure consistency. |
@ahatamiz you can reproduce issue in such way:
batch_size = input.shape[0]
batch_range = np.arange(batch_size, dtype='int32')
np.random.shuffle(batch_range)
input = input[batch_range]
target = target[batch_range]
You will always get different top1 and top5 accuracy |
I ran into the same issue when playing with the current GCViT model code. It does seem that repeating |
We indeed refactored our internal version of the code and introduced some bugs trying to simplify things. Thanks @tridao @shkarupa-alex for helping to identify those issues. We have fixed them and retraining models to share in the repo. ETA is EOW. |
Hi @tridao We have updated our models and re-trained them. In all cases, the accuracy is better compared to reported numbers in the paper, from our internal checkpoints. In addition, we have conducted a set of stress tests and verified there is no inconsistency with the model performance across different batch sizes at test time. Please check the new model weights and let us know if you face any issues. |
Model code has an issue in adapting global query for attention estimation: https://github.com/NVlabs/GCVit/blob/main/models/gc_vit.py#L223
Repeating along batch dimension results not in [1, 1, 1, 2, 2, 2, 3, 3, 3] bached element order, but in [1, 2, 3, 1, 2, 3, 1, 2, 3].
In the same time after window_partition key and value "windows" are odered as [1, 1, 1, 2, 2, 2, 3, 3, 3].
You can verify this by evaluating imagenet with batch size 1 and 16. Results would be different.
In my tests results for batch size 1 are always equals. And results for batch size 16 will be always different if we add stohasticy (shuffle batch items and labels simultaneously).
During training this bug leads to partly wrong attention matrix estimation (because around "B_//B - 1" portion of channels will try to attend to wrong batch elements).
I suppose that fixing this bug and finetuning existing checkpoints should get even better metrics.
The text was updated successfully, but these errors were encountered: