Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question for Experimental result #6

Closed
baek85 opened this issue Nov 10, 2019 · 1 comment
Closed

A question for Experimental result #6

baek85 opened this issue Nov 10, 2019 · 1 comment

Comments

@baek85
Copy link

baek85 commented Nov 10, 2019

Thank you for share benchmark.

In your experimental result, There are many teacher and student pairs.
Especially, in KD(Distilling the Knowledge in a Neural Network) method, the optimal setting(ie. temperature) may differ with each pairs.
Does performance change a lot with temperature difference?

There may be a similar problem with KD as well as other methods, how do you think about this?

@HobbitLong
Copy link
Owner

For all methods (including KD), I only tune hyper-parameters for one of the pair. After that, I kept those parameters fixed and evaluated on other pairs.

T=4 is what I found optimal and also consistent with previous works. I think you are right, it might be different for different pairs. But on the other hand, the point of this benchmark is to see the generalization ability of different methods, i.e., you can use the same hyper-parameters on different models but still get good performance.

@baek85 baek85 closed this as completed Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants