New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the change of network structure #18
Comments
Additional comments I would like to raise include that I think the variations proposed in this paper, compared to other approaches, come from several aspects. First, the change of the network architecture, although the new proposed ResNet network structure is the same with the original ResNet network structure in terms of computational volume, but even if the computational volume is kept the same or approximately the same, the final accuracy may still be very different, so that when comparing with the baseline method, we should make the the network structure in baseline methods is exactly the same as the one in the paper, so as to eliminate the effect of simple changes in the network structure. In this way, we can go on to study the effects of other aspects, such as the various training methods and loss functions proposed by the authors in the paper, to check how much the training methods and loss functions contribute to the final performance. |
As the number of experts grows, the number of parameters and the computation of the network increases, and with it the learning capability of the network is strengthened, in which case the final performance increase is also certain, but for the baseline approach, such a comparison is unfair, so that we will yield an intuition that the increase in the number of experts on the surface is intrinsically due to the increase in network parameters, which leads to the final performance improvement. |
The most evident illustration, for example, can be seen in the results from Table 2 in your paper, where although the computational volume of ResNet-50 and ResNext-50 is approximately the same, it is revealed that the difference in accuracy is about 2-3%. |
Hi @abababa-ai, thanks for your interest in our paper, but you seem to have some misunderstandings about our method. I will try to address some of your questions here.
|
|
|
Hope these answers are helpful! Feel free to let us know if you have more questions. |
I observed that in the experiment, you used more experts, such as 3 or 4 experts, and in this case, the number of parameters and computation of the base network increased, but I think you should add a more comparative experiment to show whether the overall performance gain is more attributed to your proposed training method or to the increase of the network parameters. For example, when comparing with other methods, you should keep the network computation and number of parameters consistent while also keeping the network structure consistent, because I noticed that your ResNet network structure is different from the original network structure, although you claim that your computation is consistent. But I suspect that the change in network structure may have a significant impact on the overall performance as well. Anyway, my idea is that when comparing with the baseline method, the network structure used in the baseline method should be the same as your network structure, so that we can exclude the effect of the change of the network structure, and finally we can conclude that the various loss functions proposed in your paper are meaningful and valid.
The text was updated successfully, but these errors were encountered: