Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Accumulative blockwise pruning #1170

Closed

Conversation

psuzhanhy
Copy link
Contributor

Summary:
[sparsification] fix sparsity calculation error when use accumulate_mask option. Purpose: enable building up the mask accumulately by permanently removing s% from the unpruned weights in every X steps.

Previously, the sparsity is calculated among ALL weights, which is wrong when we want to use accumulate_mask option. In this case, a param is masked to 0 permanently and future sparsification should perform only among the unmasked weights.

._masks: attribute stores the masks. In between the X steps, the ._masks are still applied to the weights but the masks are only updated in every X steps. When ._masks are updated, it prunes away s% of the weights that were previously "1" (kept) in ._masks. In addition, the mask is also applied to the gradient (so the param is effective removed from the architecture, not just set to zero)

Differential Revision: D18698077

@facebook-github-bot facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Nov 25, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18698077

Summary:
Pull Request resolved: facebookresearch#1170

[sparsification] fix sparsity calculation error when use accumulate_mask option. Purpose: enable building up the mask accumulately by permanently removing s% from the unpruned weights in every X steps.

Previously, the sparsity is calculated among ALL weights, which is wrong when we want to use accumulate_mask option. In this case, a param is masked to 0 permanently and future sparsification should perform only among the unmasked weights.

._masks: attribute stores the masks. In between the X steps, the ._masks are still applied to the weights but the masks are only updated in every X steps. When ._masks are updated, it prunes away s% of the weights that were previously "1" (kept) in ._masks. In addition, the mask is also applied to the gradient (so the param is effective removed from the architecture, not just set to zero)

Reviewed By: arbabu123

Differential Revision: D18698077

fbshipit-source-id: 79ad67ddbd8eb55e3ef0eaec320d2b3cf4ed5239
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18698077

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 8f54218.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants