This repository has been archived by the owner on Nov 22, 2022. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
[sparsification] fix sparsity calculation error when use accumulate_mask option. Purpose: enable building up the mask accumulately by permanently removing s% from the unpruned weights in every X steps.
Previously, the sparsity is calculated among ALL weights, which is wrong when we want to use accumulate_mask option. In this case, a param is masked to 0 permanently and future sparsification should perform only among the unmasked weights.
._masks: attribute stores the masks. In between the X steps, the ._masks are still applied to the weights but the masks are only updated in every X steps. When ._masks are updated, it prunes away s% of the weights that were previously "1" (kept) in ._masks. In addition, the mask is also applied to the gradient (so the param is effective removed from the architecture, not just set to zero)
Differential Revision: D18698077