Replies: 6 comments
-
The pruner.compress() only uses a mask (a tensor contains only 1 and 0) to multiply the weight (and bias if any) before calling the module on the input, but all the modules are kept the same, in terms of their shape, weight, and bias, so the model is at the same size. While speedup_model() actually changes the modules, for example, from Con2d(in_channels=16, out_channels=32) to Conv2d(in_channels=13, out_channels=27), so the size of the model is physically changed. Speedup_model() does not remove the batch normalization layer, it only changes it if the shape of its input or output is changed. If you find it hard to converge after speedup_model(), I guess it might be due to the over pruning. Try pruning less, see if it can converge. |
Beta Was this translation helpful? Give feedback.
-
what i mean so is it ok to finetune directly from speedup_model()? |
Beta Was this translation helpful? Give feedback.
-
Sorry, my misunderstanding. Yes, you are right. |
Beta Was this translation helpful? Give feedback.
-
my problem is that with the same sparsity. finetune from pruner.compress() or finetune from model_speedup(), the former is fine but the other is hard to conveage. the problem exists even though reducing the sparsity. |
Beta Was this translation helpful? Give feedback.
-
so my guess is that the un-removed bias of the batch normalization would recover some meaningful activation value. if finetune from model_speedup() is good then why examples of nni don't do that? |
Beta Was this translation helpful? Give feedback.
-
thanks @twmht for raising this up, and sorry for the late responds, we were too busy on the recent release and missed this issue somehow, hope you are still stay tuned. I just converted the issue to a discussion, as we don't have a good resolution for this issue yet, and would like to open this for idea and discussions. |
Beta Was this translation helpful? Give feedback.
-
nni version: latest
pytorch version: 1.8
Hi,
I found out when finetune model from pruner.compress(), the result is fine.
but when finetune model from speedup_model(), the loss is hard to convearge.
I thought those two things are almost equivalent, except that model speedup removes the batch normalization layer.
any idea?
Beta Was this translation helpful? Give feedback.
All reactions