-
Notifications
You must be signed in to change notification settings - Fork 798
Questions about regularization and pruning #54
Comments
Hi, I'll start with a long explanation :-), and then I'll take your questions. Regularization can be a means to achieve sparsity - but there is an important distinction between sparsity and pruning which relates to the rest of my answer. Sparsity is a measure of the absolute zeros in a tensor. Pruning algorithms are one approach to achieve sparsity. But the distinction is even deeper. Consider what happens when we prune connections: we remove those connections entirely from the network which means that no information flows through these connections: neither forward data, nor backward gradients. Practically, we mask both weights during the forward pass, and gradients during the backward pass. But you know this 😉 What happens when we regularize? At first glance, there is no relation between pruning and regularization, because in regularization we just use an added loss term to put “downward pressure” on the weights (individually; or in grouped structures) - We don't remove connections. So no masking should be involved, right?
Thanks for the interesting comments, |
@hunterkun I'm closing because this has been idle for 19 days. If you have questions remaining we can reopen, or use another issue. |
on_minibatch_begin
while regularization is on the end of batchon_minibatch_end
. It means that you set the regularization term zero below the threshold every batch iteration during training.What is the reason for this? I think it's natural that this happens on the end of one epoch or end of whole training when the regularization terms have been decreased enough for pruning.
zeros_mask_dict
, it may brings some messes. for exampleapply_mask
inon_minibatch_end
of class RegularizationPolicy would be called by regularization mask, but also pruning mask if there are both regularizer and pruner.thinning.py
, right?The text was updated successfully, but these errors were encountered: