Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop gradient operation in merging #2

Closed
jihwanp opened this issue Oct 29, 2023 · 1 comment
Closed

stop gradient operation in merging #2

jihwanp opened this issue Oct 29, 2023 · 1 comment

Comments

@jihwanp
Copy link

jihwanp commented Oct 29, 2023

Hi, I have a question about implementation details regarding learned threshold merging.

In this line of code, you detach the generated mask which still has a gradient flow by straight through trick.
In my understanding, still, the threshold can still be learned by flop loss. Is there any other reason for using a stop gradient in the mask applying to the features? Can it make models learn hard if no stop gradient is applied?

Thanks for providing wonderful work!

@Mxbonn
Copy link
Owner

Mxbonn commented Feb 26, 2024

Hi, I have a question about implementation details regarding learned threshold merging.

In this line of code, you detach the generated mask which still has a gradient flow by straight through trick. In my understanding, still, the threshold can still be learned by flop loss. Is there any other reason for using a stop gradient in the mask applying to the features? Can it make models learn hard if no stop gradient is applied?

Thanks for providing wonderful work!

The gradient contribution happens in the line above.
Basically we only want the gradient flow to go through the mask, similarly to how it is done for pruning.
The lines you mention need to the 1. and 0. multiplication for the scatter reduce tricks but we don't want them to influence the backpropagation. It may be easier to think of it as if we need merge_mask = (merge_mask.detach() > 0.5).float() after unm_mask = torch.ones_like(merge_mask) - merge_mask.

@Mxbonn Mxbonn closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants