About the change of network structure #18

abababa-ai · 2021-07-04T02:44:02Z

I observed that in the experiment, you used more experts, such as 3 or 4 experts, and in this case, the number of parameters and computation of the base network increased, but I think you should add a more comparative experiment to show whether the overall performance gain is more attributed to your proposed training method or to the increase of the network parameters. For example, when comparing with other methods, you should keep the network computation and number of parameters consistent while also keeping the network structure consistent, because I noticed that your ResNet network structure is different from the original network structure, although you claim that your computation is consistent. But I suspect that the change in network structure may have a significant impact on the overall performance as well. Anyway, my idea is that when comparing with the baseline method, the network structure used in the baseline method should be the same as your network structure, so that we can exclude the effect of the change of the network structure, and finally we can conclude that the various loss functions proposed in your paper are meaningful and valid.

abababa-ai · 2021-07-04T02:56:46Z

Additional comments I would like to raise include that I think the variations proposed in this paper, compared to other approaches, come from several aspects. First, the change of the network architecture, although the new proposed ResNet network structure is the same with the original ResNet network structure in terms of computational volume, but even if the computational volume is kept the same or approximately the same, the final accuracy may still be very different, so that when comparing with the baseline method, we should make the the network structure in baseline methods is exactly the same as the one in the paper, so as to eliminate the effect of simple changes in the network structure. In this way, we can go on to study the effects of other aspects, such as the various training methods and loss functions proposed by the authors in the paper, to check how much the training methods and loss functions contribute to the final performance.

abababa-ai · 2021-07-04T03:05:04Z

As the number of experts grows, the number of parameters and the computation of the network increases, and with it the learning capability of the network is strengthened, in which case the final performance increase is also certain, but for the baseline approach, such a comparison is unfair, so that we will yield an intuition that the increase in the number of experts on the surface is intrinsically due to the increase in network parameters, which leads to the final performance improvement.

abababa-ai · 2021-07-04T03:21:26Z

The most evident illustration, for example, can be seen in the results from Table 2 in your paper, where although the computational volume of ResNet-50 and ResNext-50 is approximately the same, it is revealed that the difference in accuracy is about 2-3%.

frank-xwang · 2021-07-04T03:30:04Z

Hi @abababa-ai, thanks for your interest in our paper, but you seem to have some misunderstandings about our method. I will try to address some of your questions here.

Parameter number: When using RIDE with 2 experts, the model size is comparable to the baseline and the computational cost is lower than the baseline. We have shown the results of RIDE with 2 experts on all experimented benchmarks in the paper, and observed significant performance gains on all of them. (6.7% on ImageNet-LT and 5.1% on iNaturalist). Comparing with previous SOTA method Decouple with ResNet152 as a backbone, RIDE (2 experts) with ResNet-50 is still way much better. Decouple-ResNet152 vs RIDE-ResNet50: Params 60M vs 27M, Acc 50.5 vs 54.4.

frank-xwang · 2021-07-04T03:37:43Z

Why change architecture? Framework changes are part of our contribution. We found that using a multi-expert framework can successfully reduce the variance, which was increased a lot in all previous methods and caused a significant performance degradation on the many-shot class. To solve this problem, we proposed a multi-expert framework. Previous works mainly focused on loss and sampler for long-tailed data, but we believe that due to the bias-variance problem analyzed in the paper, framework changes are also necessary and especially critical for long-tail data.

frank-xwang · 2021-07-04T03:41:51Z

Compare with baseline without making all other changes except for architecture change: we performed an ablation study in Table 4 to show the contribution of each component. Simply using a backbone with more parameters is not enough to provide good performance. In Decouple's supplementary materials, using ResNet-152 as the backbone can only bring about 3% improvement compared to ResNet-50, which is far lower than the 7-8% obtained by RIDE. I think it is enough to show the effectiveness of RIDE.

frank-xwang · 2021-07-04T03:45:07Z

Hope these answers are helpful! Feel free to let us know if you have more questions.

abababa-ai added the question Further information is requested label Jul 4, 2021

abababa-ai closed this as completed Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the change of network structure #18

About the change of network structure #18

abababa-ai commented Jul 4, 2021

abababa-ai commented Jul 4, 2021

abababa-ai commented Jul 4, 2021 •

edited

abababa-ai commented Jul 4, 2021

frank-xwang commented Jul 4, 2021

frank-xwang commented Jul 4, 2021

frank-xwang commented Jul 4, 2021 •

edited

frank-xwang commented Jul 4, 2021

About the change of network structure #18

About the change of network structure #18

Comments

abababa-ai commented Jul 4, 2021

abababa-ai commented Jul 4, 2021

abababa-ai commented Jul 4, 2021 • edited

abababa-ai commented Jul 4, 2021

frank-xwang commented Jul 4, 2021

frank-xwang commented Jul 4, 2021

frank-xwang commented Jul 4, 2021 • edited

frank-xwang commented Jul 4, 2021

abababa-ai commented Jul 4, 2021 •

edited

frank-xwang commented Jul 4, 2021 •

edited