New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some questions about experiment setting and discriminator #2
Comments
Hi @alxer, thanks for your interest in our work!
|
I'm curious about this target too. You didn't permute the input, so the discriminator output the same result across all training steps, does this operation cause a overfitting problem? |
Hi @wuwuwuxxx, it seems the discriminator converges very fast and tends to be overfitting, I'm not sure whether shuffling the input can alleviate this, as I think the feature patterns from teacher ensemble and student have a clear difference and the discriminator is easy to distinguish them, but I will try it later. |
@szq0214 I don't think shuffling the input would make a difference. The model isn't aware of the batch index of each input so it shouldn't be able to overfit to the ordering of the logits in the batch. |
@Freeman1937, see #4 for the comparison of using weight decay and without it. |
i also confuse this settings |
HI~ @szq0214
I'm highly intersted in your work!
Here is a question, I hope you can give your thoughts about it.
in experiment setting, why set weight_decay to 0, in general, weight_decay is important factor to the final performance, usually have 1% validation accuracy difference on ILSVRC2012 imagenet.
about the discriminator, It contains three convolution operations, its inputs is the logits of student and combined logits of teachers, but the target for discriminator is not right, in code that is as following:
target = torch.FloatTensor([[1, 0] for _ in range(batch_size//2)] + [[0, 1] for _ in range(batch_size//2)])
I think the target should be [1,0] through the whole batch_size, so that is weird. are there any considerations? if so, the influence of discriminator loss is to make logit of students away from teachers, something like regularization?
The text was updated successfully, but these errors were encountered: