-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-training details #2
Comments
Hi, Thanks for your interest in our work. learning rate scheduleSorry, this row "warmup learning rate=1e-6" in the appendix table seems a mistake from our side and has nothing to do there. AugmentationWe use independent augmentations for the two images in each pair. Best |
Thank you so much for your clarification. In Section D.1, it is claimed that the finetuning configurations are consistent with MultiMAE. But in the second table, a lot of hyperparameters on NYUv2 are different, such as batch size (128 vs. 16), learning rate (3e-5 vs. 1e-4), and epoch number (1500 vs. 2000). Could you please specify where these configurations come from? or you just set them by searching for the optimal in your setting? Besides, do you use MultiMAE code for finetuning? By the way, in Table 3, the performance of MAE is 79.6 while in MultiMAE, they report 85.1. Could you please specify how you get this number? Thank you again for your time and attentino! |
While our codebase for semantic segmentation on ADE 20k and Taskonomy is based on MultiMAE code, the one for NYUv2 has been developed independently. We have not heavily tuned it for any method and has been reported by simply changing the pre-training weights ; this might not be the optimal setup for MultiMAE, MAE or even CroCo. To obtain a better performance using CroCo, we could also leverage the decoder and reach over 88 Acc@1.25, see Table 7. |
I have just tried the Multi-MAE finetuning code for NYU depth with CroCo pretrained weights and their finetuning setup seems indeed way better than the ones we have implemented.
So I would recommend directly using their finetuning code and report these numbers. To be more complete, here are the val_stats output by their scripts:
|
Thank you so much for your detailed explanation. |
With "MAE Habitat", I got 84.0. {'rmse': 3480.384033203125, 'rel': 0.14437608793377876, 'srel': 706.9155578613281, 'log10': 0.1991322711110115, 'delta_1': 0.8398233950138092, 'delta_2': 0.9496022164821625, 'delta_3': 0.9814011752605438, 'loss': 2.9076149463653564, 'depth_loss': 2.9076149463653564} |
Thank you so much for providing this result. |
You can find it here. The run is done after converting the weights to the MultiMAE format, and using |
Yep. At the end of this script, they load the weights of the best model. While it is not present in the code, it should clearly be to run the test for the best model, as they do in their other finetuning scripts. So I added the lines of the code doing so before running the script: test_stats = evaluate(model=model, tasks_loss_fn=tasks_loss_fn, data_loader=data_loader_test,
device=device, epoch=-1, in_domains=args.in_domains, mode='test', log_images=True,
return_all_layers=return_all_layers, standardize_depth=args.standardize_depth)
print(test_stats) |
Thank you for your clarification, but I find it difficult to understand the huge drop from online validation set performance to test set performance. I think we can rule out the difference between the best checkpoint and the last checkpoint, as they perform very similarly in my experiment. As a soft reminder, the argument max_val_images should be unset to use the full validation set for evaluation. But in my experiment I did not observe a big performance difference between subset evaluation and full validation set evaluation. I am not sure if this could explain the performance drop in your case. Thank you again for your time and patience! |
It seems to be only due to the distributed setting.
When enabling the `print' on all the processes while testing, I got that
So a huge difference between the batch of each process ; which should also be pretty imbalanced in terms of number of images due to the batch size of 96 (64*1.5) and the dataset size of 100 ; so doing a global avg is probably not really fair. Then I have launched the test script on a single gpu with a 2x larger batch size (which covers the 100 validation/test images) and got:
|
Hi,
Could you please specify the meaning of "warmup learning rate=1e-6" in the pre-training stage? Does it mean that the learning rate starts from 1e-6 and linearly grows to 1.5e-4?
Additionally, for an image pair, the Homography and Color jittering augmentation should be applied to each individual one independently, right?
Thank you for your attention and time!
The text was updated successfully, but these errors were encountered: