Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DropPath Implementation #2118

Closed
IsmaelElsharkawi opened this issue Mar 20, 2024 · 2 comments
Closed

DropPath Implementation #2118

IsmaelElsharkawi opened this issue Mar 20, 2024 · 2 comments

Comments

@IsmaelElsharkawi
Copy link

def drop_path(x, drop_prob: float = 0., training: bool = False):

Hi,
I had two questions about the implementation of the dropPath:

  1. Why do we do it per sample, as far as I understand from https://arxiv.org/pdf/1603.09382.pdf, you either take the whole batch or drop it all together with probability p_l, why is it done per sample here?
  2. What is the _div(keep_prob) used for, I can't see that in the equation of the paper as well, can you please clarify the reason behind that?
@rwightman
Copy link
Collaborator

@IsmaelElsharkawi This sort of question is more appropriate as discussion. Stochastic depth is per sample, not per batch, believe it says 'independently per sample' somewhere in the paper.

The rescale is as per the sort of convoluted eq(5) and explanation, need to rescale because only a fraction of the activations participate in the output.

@IsmaelElsharkawi
Copy link
Author

Thanks a lot for your explanation, and sorry for that, I'll continue this in a discussion thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants