Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

The learning rate of linear classification #38

Closed
dddzg opened this issue Nov 4, 2020 · 6 comments
Closed

The learning rate of linear classification #38

dddzg opened this issue Nov 4, 2020 · 6 comments

Comments

@dddzg
Copy link

dddzg commented Nov 4, 2020

Thanks for your awesome work.
I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)?
In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo.
Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo?
I wonder about the performance impact of the lr.

@mathildecaron31
Copy link
Contributor

The different methods (moco, swav, etc) result in networks with feature distributions (e.g., magnitudes) which can be very different. That is why we perform learning rate and weight decay grid search and find that for our network lr=0.3 gives the best performance.

@dddzg
Copy link
Author

dddzg commented Nov 4, 2020

Wow. Thanks for your response. Although I am still surprised that there is a 100x learning rate gap for the linear classification experiments.

@mathildecaron31
Copy link
Contributor

This is not that surprising given that the two methods are trained with a different loss, different optimizer, different learning rate, different weight decay, etc. There is no reason that the subsequent weight distributions should match.

@dddzg
Copy link
Author

dddzg commented Nov 5, 2020

Thanks again for your response. Does it indicate that we should be careful with the results of the linear classification of different pre-training models? For example, Table 6 in SwAV paper, there are about a 4% and 10% top-1 gap between MoCo v2 and SwAV in Places205 and inat18. However, In our experiments, we find that the MoCo weight performs badly with low lr linear classification on ImageNet.

is the result in the linear classification in Table 6 conducted with the same lr for SwAV and MoCo?

@mathildecaron31
Copy link
Contributor

Each method performs its own learning rate grid search to find the best learning rate.

@dddzg
Copy link
Author

dddzg commented Nov 5, 2020

Thank you so much!

@dddzg dddzg closed this as completed Nov 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants