You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I had two questions about the implementation of the dropPath:
Why do we do it per sample, as far as I understand from https://arxiv.org/pdf/1603.09382.pdf, you either take the whole batch or drop it all together with probability p_l, why is it done per sample here?
What is the _div(keep_prob) used for, I can't see that in the equation of the paper as well, can you please clarify the reason behind that?
The text was updated successfully, but these errors were encountered:
@IsmaelElsharkawi This sort of question is more appropriate as discussion. Stochastic depth is per sample, not per batch, believe it says 'independently per sample' somewhere in the paper.
The rescale is as per the sort of convoluted eq(5) and explanation, need to rescale because only a fraction of the activations participate in the output.
pytorch-image-models/timm/models/layers/drop.py
Line 140 in a6e8598
Hi,
I had two questions about the implementation of the dropPath:
The text was updated successfully, but these errors were encountered: