Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the change of network structure #18

Closed
abababa-ai opened this issue Jul 4, 2021 · 7 comments
Closed

About the change of network structure #18

abababa-ai opened this issue Jul 4, 2021 · 7 comments
Labels
question Further information is requested

Comments

@abababa-ai
Copy link

I observed that in the experiment, you used more experts, such as 3 or 4 experts, and in this case, the number of parameters and computation of the base network increased, but I think you should add a more comparative experiment to show whether the overall performance gain is more attributed to your proposed training method or to the increase of the network parameters. For example, when comparing with other methods, you should keep the network computation and number of parameters consistent while also keeping the network structure consistent, because I noticed that your ResNet network structure is different from the original network structure, although you claim that your computation is consistent. But I suspect that the change in network structure may have a significant impact on the overall performance as well. Anyway, my idea is that when comparing with the baseline method, the network structure used in the baseline method should be the same as your network structure, so that we can exclude the effect of the change of the network structure, and finally we can conclude that the various loss functions proposed in your paper are meaningful and valid.

@abababa-ai abababa-ai added the question Further information is requested label Jul 4, 2021
@abababa-ai
Copy link
Author

Additional comments I would like to raise include that I think the variations proposed in this paper, compared to other approaches, come from several aspects. First, the change of the network architecture, although the new proposed ResNet network structure is the same with the original ResNet network structure in terms of computational volume, but even if the computational volume is kept the same or approximately the same, the final accuracy may still be very different, so that when comparing with the baseline method, we should make the the network structure in baseline methods is exactly the same as the one in the paper, so as to eliminate the effect of simple changes in the network structure. In this way, we can go on to study the effects of other aspects, such as the various training methods and loss functions proposed by the authors in the paper, to check how much the training methods and loss functions contribute to the final performance.

@abababa-ai
Copy link
Author

abababa-ai commented Jul 4, 2021

As the number of experts grows, the number of parameters and the computation of the network increases, and with it the learning capability of the network is strengthened, in which case the final performance increase is also certain, but for the baseline approach, such a comparison is unfair, so that we will yield an intuition that the increase in the number of experts on the surface is intrinsically due to the increase in network parameters, which leads to the final performance improvement.

@abababa-ai
Copy link
Author

The most evident illustration, for example, can be seen in the results from Table 2 in your paper, where although the computational volume of ResNet-50 and ResNext-50 is approximately the same, it is revealed that the difference in accuracy is about 2-3%.

@frank-xwang
Copy link
Owner

Hi @abababa-ai, thanks for your interest in our paper, but you seem to have some misunderstandings about our method. I will try to address some of your questions here.

  1. Parameter number: When using RIDE with 2 experts, the model size is comparable to the baseline and the computational cost is lower than the baseline. We have shown the results of RIDE with 2 experts on all experimented benchmarks in the paper, and observed significant performance gains on all of them. (6.7% on ImageNet-LT and 5.1% on iNaturalist). Comparing with previous SOTA method Decouple with ResNet152 as a backbone, RIDE (2 experts) with ResNet-50 is still way much better. Decouple-ResNet152 vs RIDE-ResNet50: Params 60M vs 27M, Acc 50.5 vs 54.4.

@frank-xwang
Copy link
Owner

  1. Why change architecture? Framework changes are part of our contribution. We found that using a multi-expert framework can successfully reduce the variance, which was increased a lot in all previous methods and caused a significant performance degradation on the many-shot class. To solve this problem, we proposed a multi-expert framework. Previous works mainly focused on loss and sampler for long-tailed data, but we believe that due to the bias-variance problem analyzed in the paper, framework changes are also necessary and especially critical for long-tail data.

@frank-xwang
Copy link
Owner

frank-xwang commented Jul 4, 2021

  1. Compare with baseline without making all other changes except for architecture change: we performed an ablation study in Table 4 to show the contribution of each component. Simply using a backbone with more parameters is not enough to provide good performance. In Decouple's supplementary materials, using ResNet-152 as the backbone can only bring about 3% improvement compared to ResNet-50, which is far lower than the 7-8% obtained by RIDE. I think it is enough to show the effectiveness of RIDE.

@frank-xwang
Copy link
Owner

Hope these answers are helpful! Feel free to let us know if you have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants