DropPath Implementation #2118

IsmaelElsharkawi · 2024-03-20T21:16:53Z

pytorch-image-models/timm/models/layers/drop.py

Line 140 in a6e8598

def drop_path(x, drop_prob: float = 0., training: bool = False):

Hi,
I had two questions about the implementation of the dropPath:

Why do we do it per sample, as far as I understand from https://arxiv.org/pdf/1603.09382.pdf, you either take the whole batch or drop it all together with probability p_l, why is it done per sample here?
What is the _div(keep_prob) used for, I can't see that in the equation of the paper as well, can you please clarify the reason behind that?

rwightman · 2024-03-20T22:43:47Z

@IsmaelElsharkawi This sort of question is more appropriate as discussion. Stochastic depth is per sample, not per batch, believe it says 'independently per sample' somewhere in the paper.

The rescale is as per the sort of convoluted eq(5) and explanation, need to rescale because only a fraction of the activations participate in the output.

IsmaelElsharkawi · 2024-03-20T23:14:11Z

Thanks a lot for your explanation, and sorry for that, I'll continue this in a discussion thread.

rwightman closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DropPath Implementation #2118

DropPath Implementation #2118

IsmaelElsharkawi commented Mar 20, 2024

rwightman commented Mar 20, 2024

IsmaelElsharkawi commented Mar 20, 2024

DropPath Implementation #2118

DropPath Implementation #2118

Comments

IsmaelElsharkawi commented Mar 20, 2024

rwightman commented Mar 20, 2024

IsmaelElsharkawi commented Mar 20, 2024