A question for Experimental result #6

baek85 · 2019-11-10T11:58:05Z

Thank you for share benchmark.

In your experimental result, There are many teacher and student pairs.
Especially, in KD(Distilling the Knowledge in a Neural Network) method, the optimal setting(ie. temperature) may differ with each pairs.
Does performance change a lot with temperature difference?

There may be a similar problem with KD as well as other methods, how do you think about this?

HobbitLong · 2019-11-16T17:23:03Z

For all methods (including KD), I only tune hyper-parameters for one of the pair. After that, I kept those parameters fixed and evaluated on other pairs.

T=4 is what I found optimal and also consistent with previous works. I think you are right, it might be different for different pairs. But on the other hand, the point of this benchmark is to see the generalization ability of different methods, i.e., you can use the same hyper-parameters on different models but still get good performance.

baek85 closed this as completed Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question for Experimental result #6

A question for Experimental result #6

baek85 commented Nov 10, 2019

HobbitLong commented Nov 16, 2019

A question for Experimental result #6

A question for Experimental result #6

Comments

baek85 commented Nov 10, 2019

HobbitLong commented Nov 16, 2019