The learning rate of linear classification #38

dddzg · 2020-11-04T13:32:16Z

Thanks for your awesome work.
I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)?
In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo.
Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo?
I wonder about the performance impact of the lr.

mathildecaron31 · 2020-11-04T14:11:35Z

The different methods (moco, swav, etc) result in networks with feature distributions (e.g., magnitudes) which can be very different. That is why we perform learning rate and weight decay grid search and find that for our network lr=0.3 gives the best performance.

dddzg · 2020-11-04T14:49:03Z

Wow. Thanks for your response. Although I am still surprised that there is a 100x learning rate gap for the linear classification experiments.

mathildecaron31 · 2020-11-05T08:52:56Z

This is not that surprising given that the two methods are trained with a different loss, different optimizer, different learning rate, different weight decay, etc. There is no reason that the subsequent weight distributions should match.

dddzg · 2020-11-05T09:05:45Z

Thanks again for your response. Does it indicate that we should be careful with the results of the linear classification of different pre-training models? For example, Table 6 in SwAV paper, there are about a 4% and 10% top-1 gap between MoCo v2 and SwAV in Places205 and inat18. However, In our experiments, we find that the MoCo weight performs badly with low lr linear classification on ImageNet.

is the result in the linear classification in Table 6 conducted with the same lr for SwAV and MoCo?

mathildecaron31 · 2020-11-05T14:50:32Z

Each method performs its own learning rate grid search to find the best learning rate.

dddzg · 2020-11-05T15:07:25Z

Thank you so much!

dddzg closed this as completed Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The learning rate of linear classification #38

The learning rate of linear classification #38

dddzg commented Nov 4, 2020 •

edited

mathildecaron31 commented Nov 4, 2020

dddzg commented Nov 4, 2020

mathildecaron31 commented Nov 5, 2020

dddzg commented Nov 5, 2020 •

edited

mathildecaron31 commented Nov 5, 2020

dddzg commented Nov 5, 2020

The learning rate of linear classification #38

The learning rate of linear classification #38

Comments

dddzg commented Nov 4, 2020 • edited

mathildecaron31 commented Nov 4, 2020

dddzg commented Nov 4, 2020

mathildecaron31 commented Nov 5, 2020

dddzg commented Nov 5, 2020 • edited

mathildecaron31 commented Nov 5, 2020

dddzg commented Nov 5, 2020

dddzg commented Nov 4, 2020 •

edited

dddzg commented Nov 5, 2020 •

edited