Setup for CIFAR-10 LARGE #7

zaccharieramzi · 2021-04-07T17:41:35Z

Hi,

I am currently trying to replicate the results of MDEQ for the CIFAR-10 dataset using the LARGE configuration.
I noticed that there were discrepancies between the supplementary material of the paper and the file in the code.

I wanted to know which configs allowed you to achieve 93.8 top1 acc.
Discrepancies I noted: batch size, weight decay, # channels, # epochs, thresholds, dropout.

Currently, with the content of the file in the code unchanged (except paths, # epochs and resume), this is my learning curve (I did it in 2 steps because I originally had set 50 epochs):

jerrybai1995 · 2021-04-07T18:34:40Z

Hi @zaccharieramzi ,

Thanks for the question! You should use the settings in the code.

It's in general very important to keep the number of epochs the same. I believe you should be able to get the same level of results with the settings in the code (e.g., #1 (comment)). My impression is that your accuracy should be ~91.5% after ~100 epochs and ~92.5% after ~150-160 epochs, so maybe you can compare your log with that.

zaccharieramzi · 2021-04-08T05:25:02Z

I don't necessarily understand why the number of epochs would change something in the early behavior, especially such a huge drop.
I just want to clarify that here the blue and orange curves are from the same training and that the blue is the continuation of the orange from epoch 50 to ~60 .

I will relaunch in full with 220 epochs nonetheless to make sure everything is similar to the original setup except paths.

jerrybai1995 · 2021-04-08T07:56:34Z

Not sure about the exact setting that you are running with, but the code is using cosine annealing learning rate decay (see https://github.com/locuslab/mdeq/blob/master/tools/cls_train.py#L203). If you only change the "END_EPOCH" entry in the config file to 50, then that will probably not be the "same" training process as the 220-epoch version indeed.

zaccharieramzi · 2021-04-08T08:07:44Z

Surely this must be it!
I had the segmentation code in mind which I think bypasses this cosine annealing rate using the adjust_learning_rate function which just hard sets the learning rate.

Indeed if in classification we don't use such a tool, then the decay is not going to be correctly tuned.
My training for 220 epochs is soon arriving at the point where it failed last time, so let's see how it goes this time.

I will update this issue as soon as I get the info.

EDIT

It's indeed working (at least I don't see the drop I was seeing before). Thanks for being so patient, and closing this.

zaccharieramzi closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup for CIFAR-10 LARGE #7

Setup for CIFAR-10 LARGE #7

zaccharieramzi commented Apr 7, 2021

jerrybai1995 commented Apr 7, 2021

zaccharieramzi commented Apr 8, 2021

jerrybai1995 commented Apr 8, 2021 •

edited

zaccharieramzi commented Apr 8, 2021 •

edited

Setup for CIFAR-10 LARGE #7

Setup for CIFAR-10 LARGE #7

Comments

zaccharieramzi commented Apr 7, 2021

jerrybai1995 commented Apr 7, 2021

zaccharieramzi commented Apr 8, 2021

jerrybai1995 commented Apr 8, 2021 • edited

zaccharieramzi commented Apr 8, 2021 • edited

EDIT

jerrybai1995 commented Apr 8, 2021 •

edited

zaccharieramzi commented Apr 8, 2021 •

edited