pre-training details #2

Huiimin5 · 2023-04-11T10:25:42Z

Hi,
Could you please specify the meaning of "warmup learning rate=1e-6" in the pre-training stage? Does it mean that the learning rate starts from 1e-6 and linearly grows to 1.5e-4?

Additionally, for an image pair, the Homography and Color jittering augmentation should be applied to each individual one independently, right?

Thank you for your attention and time!

PhilippeWeinzaepfel · 2023-04-12T09:59:35Z

Hi,

Thanks for your interest in our work.

learning rate schedule

Sorry, this row "warmup learning rate=1e-6" in the appendix table seems a mistake from our side and has nothing to do there.
During the warm-up of 40 epochs, we linearly increase the learning rate from 0 to 1.5e-4.
We then use a cosine decay until epoch 800, but stop the training at epoch 400 as performance saturates.

Augmentation

We use independent augmentations for the two images in each pair.
We actually find later that homography does not help at all, or even slightly decrease performance, so you can also just ignore it.
For color jittering, we do not augment the hue but use standard value for the brightness/contrast/saturation, ie, ColorJitter(brightness=(0.6, 1.4), contrast=(0.6, 1.4), saturation=(0.6, 1.4), hue=0.0)

Best
Philippe

Huiimin5 · 2023-04-12T10:42:10Z

Thank you so much for your clarification.

In Section D.1, it is claimed that the finetuning configurations are consistent with MultiMAE. But in the second table, a lot of hyperparameters on NYUv2 are different, such as batch size (128 vs. 16), learning rate (3e-5 vs. 1e-4), and epoch number (1500 vs. 2000). Could you please specify where these configurations come from? or you just set them by searching for the optimal in your setting? Besides, do you use MultiMAE code for finetuning?

By the way, in Table 3, the performance of MAE is 79.6 while in MultiMAE, they report 85.1. Could you please specify how you get this number?

Thank you again for your time and attentino!

PhilippeWeinzaepfel · 2023-04-17T10:26:16Z

While our codebase for semantic segmentation on ADE 20k and Taskonomy is based on MultiMAE code, the one for NYUv2 has been developed independently. We have not heavily tuned it for any method and has been reported by simply changing the pre-training weights ; this might not be the optimal setup for MultiMAE, MAE or even CroCo. To obtain a better performance using CroCo, we could also leverage the decoder and reach over 88 Acc@1.25, see Table 7.

PhilippeWeinzaepfel · 2023-04-18T09:36:40Z

I have just tried the Multi-MAE finetuning code for NYU depth with CroCo pretrained weights and their finetuning setup seems indeed way better than the ones we have implemented.
Here is the Acc@1.25 (= delta 1) I obtain :

MAE: 85.1
CroCo: 87.8

So I would recommend directly using their finetuning code and report these numbers.

To be more complete, here are the val_stats output by their scripts:

MAE pretrained weights: {'rmse': 3196.1256103515625, 'rel': 0.12942856550216675, 'srel': 546.7084655761719, 'log10': 0.17717715352773666, 'delta_1': 0.8512685894966125, 'delta_2': 0.9644498229026794, 'delta_3': 0.9886034429073334, 'loss': 2.7408722639083862, 'depth_loss': 2.7408722639083862}
CroCo pretrained weights: {'rmse': 3025.2979736328125, 'rel': 0.12315808981657028, 'srel': 545.6929016113281, 'log10': 0.17656730860471725, 'delta_1': 0.8780348598957062, 'delta_2': 0.9566936492919922, 'delta_3': 0.9870030283927917, 'loss': 2.4889049530029297, 'depth_loss': 2.4889049530029297}

Huiimin5 · 2023-04-19T14:37:31Z

Thank you so much for your detailed explanation.
Could you please also share the downstream depth estimation results initialized with MAE pretrained on Habitat dataset and finetued using MultiMAE codebase?
Thank you again for your time!

PhilippeWeinzaepfel · 2023-04-21T07:14:40Z

With "MAE Habitat", I got 84.0.

{'rmse': 3480.384033203125, 'rel': 0.14437608793377876, 'srel': 706.9155578613281, 'log10': 0.1991322711110115, 'delta_1': 0.8398233950138092, 'delta_2': 0.9496022164821625, 'delta_3': 0.9814011752605438, 'loss': 2.9076149463653564, 'depth_loss': 2.9076149463653564}

Huiimin5 · 2023-04-27T03:29:02Z

Thank you so much for providing this result.
It seems I get a different number by finetuning with the checkpoint you provided.
Do you mind sharing the finetuning log of CroCo?
Thank you for your consideration.

PhilippeWeinzaepfel · 2023-04-28T08:40:22Z

You can find it here. The run is done after converting the weights to the MultiMAE format, and using --num_global_tokens 0 as we do not have global tokens in the CroCo architecture.
nyucroco.stdout.txt

Huiimin5 · 2023-04-29T06:14:32Z

Thank you so much for providing this log file.
I am curious how you get the final evaluation values in the last line:

as the finetuning script ends at line 8790.
Could you please specify how you get these extra output lines?

PhilippeWeinzaepfel · 2023-05-01T08:33:56Z

Yep. At the end of this script, they load the weights of the best model. While it is not present in the code, it should clearly be to run the test for the best model, as they do in their other finetuning scripts. So I added the lines of the code doing so before running the script:

    test_stats = evaluate(model=model, tasks_loss_fn=tasks_loss_fn, data_loader=data_loader_test,
                         device=device, epoch=-1, in_domains=args.in_domains, mode='test', log_images=True,
                         return_all_layers=return_all_layers, standardize_depth=args.standardize_depth)
    print(test_stats)

Huiimin5 · 2023-05-01T15:29:24Z

Thank you for your clarification, but I find it difficult to understand the huge drop from online validation set performance to test set performance.
In my understanding, the test set (i.e., validation set) results has already been dumped into log.txt.
In my experiment, these dumped numbers are very close to corresponding printouts in stdout.txt, with the former synchronized across all machines and the latter unsynchronized.
In other words, the difference between line 8794 and line 8791 in previous screenshot should be insignificant.
Could you please specify what leads to such a huge performance drop?

I think we can rule out the difference between the best checkpoint and the last checkpoint, as they perform very similarly in my experiment.

As a soft reminder, the argument max_val_images should be unset to use the full validation set for evaluation. But in my experiment I did not observe a big performance difference between subset evaluation and full validation set evaluation. I am not sure if this could explain the performance drop in your case.

Thank you again for your time and patience!

PhilippeWeinzaepfel · 2023-05-02T16:23:50Z

Difference between L8794 and L8791

It seems to be only due to the distributed setting.
I had deleted the checkpoints, so I have rerun the finetuning.
Initially, I get this output with the new finetuning

(Eval) Epoch: [-1] [0/1] eta: 0:00:03 rmse: 2951.8828 (2951.8828) rel: 0.1080 (0.1080) srel: 455.3148 (455.3148) log10: 0.1533 (0.1533) delta_1: 0.9129 (0.9129) delta_2: 0.9736 (0.9736) delta_3: 0.9911 (0.9911) loss: 2.4124 (2.4124) depth_loss: 2.4124 (2.4124) time: 3.2422 data: 2.7493 max mem: 22481 
(Eval) Epoch: [-1] Total time: 0:00:03 (3.3992 s / it) 
* Loss 2.481 
* {'rmse': 3020.5174560546875, 'rel': 0.12364871427416801, 'srel': 550.2191009521484, 'log10': 0.1771184504032135, 'delta_1': 0.878442794084549, 'delta_2': 0.9569390416145325, 'delta_3': 0.9872540533542633, 'loss': 2.4809699058532715, 'depth_loss': 2.4809699058532715}

When enabling the `print' on all the processes while testing, I got that

(Test) Epoch: [-1] [0/1] eta: 0:00:06 rmse: 3089.1521 (3089.1521) rel: 0.1393 (0.1393) srel: 645.1234 (645.1234) log10: 0.2010 (0.2010) delta_1: 0.8440 (0.8440) delta_2: 0.9403 (0.9403) delta_3: 0.9834 (0.9834) loss: 2.5496 (2.5496) depth_loss: 2.5496 (2.5496) time: 6.7801 data: 3.4942 max mem: 5493 
(Test) Epoch: [-1] [0/1] eta: 0:00:06 rmse: 2951.8828 (2951.8828) rel: 0.1080 (0.1080) srel: 455.3148 (455.3148) log10: 0.1533 (0.1533) delta_1: 0.9129 (0.9129) delta_2: 0.9736 (0.9736) delta_3: 0.9911 (0.9911) loss: 2.4124 (2.4124) depth_loss: 2.4124 (2.4124) time: 6.6569 data: 3.5448 max mem: 5489 
(Test) Epoch: [-1] Total time: 0:00:06 (6.9482 s / it) 
(Test) Epoch: [-1] Total time: 0:00:06 (6.8161 s / it) 
* Loss 2.481
* Loss 2.481 
{'rmse': 3020.5174560546875, 'rel': 0.12364871427416801, 'srel': 550.2191009521484, 'log10': 0.1771184504032135, 'delta_1': 0.878442794084549, 'delta_2': 0.9569390416145325, 'delta_3': 0.9872540533542633, 'loss': 2.4809699058532715, 'depth_loss': 2.4809699058532715} 
{'rmse': 3020.5174560546875, 'rel': 0.12364871427416801, 'srel': 550.2191009521484, 'log10': 0.1771184504032135, 'delta_1': 0.878442794084549, 'delta_2': 0.9569390416145325, 'delta_3': 0.9872540533542633, 'loss': 2.4809699058532715, 'depth_loss': 2.4809699058532715}

So a huge difference between the batch of each process ; which should also be pretty imbalanced in terms of number of images due to the batch size of 96 (64*1.5) and the dataset size of 100 ; so doing a global avg is probably not really fair.

Then I have launched the test script on a single gpu with a 2x larger batch size (which covers the 100 validation/test images) and got:

(Test) Epoch: [-1]  [0/1]  eta: 0:00:10  rmse: 3021.9480 (3021.9480)  rel: 0.1238 (0.1238)  srel: 551.1194 (551.1194)  log10: 0.1789 (0.1789)  delta_1: 0.8781 (0.8781)  delta_2: 0.9568 (0.9568)  delta_3: 0.9872 (0.9872)  loss: 2.4809 (2.4809)  depth_loss: 2.4809 (2.4809)  time: 10.3641  data: 6.1422  max mem: 9358
(Test) Epoch: [-1] Total time: 0:00:10 (10.7133 s / it)
* Loss 2.481
{'rmse': 3021.947998046875, 'rel': 0.12379693239927292, 'srel': 551.1194458007812, 'log10': 0.17894227802753448, 'delta_1': 0.8781160116195679, 'delta_2': 0.9567808508872986, 'delta_3': 0.9872171878814697, 'loss': 2.480867862701416, 'depth_loss': 2.480867862701416}

max_val_images
The MultiMAE setup is anyway not directly comparable to the state of the art, due to the resize, etc.
All numbers above are with max_val_images set to 100 as they have in their github instructions.
When validating on all images, I get higher values (on 1 gpu, batch_size of 1 for having a correct global average that is performed over all images)

{'rmse': 2364.8145019811227, 'rel': 0.10753235335971602, 'srel': 408.2935388460072, 'log10': 0.13091224415187896, 'delta_1': 0.9001198714153644, 'delta_2': 0.9783268600302842, 'delta_3': 0.9945694150727823, 'loss': 2.678799651130259, 'depth_loss': 2.678799651130259}

PhilippeWeinzaepfel mentioned this issue Apr 17, 2023

pre-training code #1

Closed

PhilippeWeinzaepfel closed this as completed Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-training details #2

pre-training details #2

Huiimin5 commented Apr 11, 2023

PhilippeWeinzaepfel commented Apr 12, 2023

Huiimin5 commented Apr 12, 2023

PhilippeWeinzaepfel commented Apr 17, 2023

PhilippeWeinzaepfel commented Apr 18, 2023 •

edited

Huiimin5 commented Apr 19, 2023

PhilippeWeinzaepfel commented Apr 21, 2023

Huiimin5 commented Apr 27, 2023

PhilippeWeinzaepfel commented Apr 28, 2023 •

edited

Huiimin5 commented Apr 29, 2023

PhilippeWeinzaepfel commented May 1, 2023

Huiimin5 commented May 1, 2023 •

edited

PhilippeWeinzaepfel commented May 2, 2023

pre-training details #2

pre-training details #2

Comments

Huiimin5 commented Apr 11, 2023

PhilippeWeinzaepfel commented Apr 12, 2023

learning rate schedule

Augmentation

Huiimin5 commented Apr 12, 2023

PhilippeWeinzaepfel commented Apr 17, 2023

PhilippeWeinzaepfel commented Apr 18, 2023 • edited

Huiimin5 commented Apr 19, 2023

PhilippeWeinzaepfel commented Apr 21, 2023

Huiimin5 commented Apr 27, 2023

PhilippeWeinzaepfel commented Apr 28, 2023 • edited

Huiimin5 commented Apr 29, 2023

PhilippeWeinzaepfel commented May 1, 2023

Huiimin5 commented May 1, 2023 • edited

PhilippeWeinzaepfel commented May 2, 2023

PhilippeWeinzaepfel commented Apr 18, 2023 •

edited

PhilippeWeinzaepfel commented Apr 28, 2023 •

edited

Huiimin5 commented May 1, 2023 •

edited