hard to converage when finetune from model_speedup #3752

twmht · 2021-04-21T10:13:45Z

twmht
Apr 21, 2021

nni version: latest
pytorch version: 1.8

Hi,

I found out when finetune model from pruner.compress(), the result is fine.

but when finetune model from speedup_model(), the loss is hard to convearge.

I thought those two things are almost equivalent, except that model speedup removes the batch normalization layer.

any idea?

Davidxswang · 2021-04-22T11:57:17Z

Davidxswang
Apr 22, 2021

The pruner.compress() only uses a mask (a tensor contains only 1 and 0) to multiply the weight (and bias if any) before calling the module on the input, but all the modules are kept the same, in terms of their shape, weight, and bias, so the model is at the same size. While speedup_model() actually changes the modules, for example, from Con2d(in_channels=16, out_channels=32) to Conv2d(in_channels=13, out_channels=27), so the size of the model is physically changed. Speedup_model() does not remove the batch normalization layer, it only changes it if the shape of its input or output is changed.

If you find it hard to converge after speedup_model(), I guess it might be due to the over pruning. Try pruning less, see if it can converge.

0 replies

twmht · 2021-04-22T12:23:44Z

twmht
Apr 22, 2021
Author

@davbowman

what i mean removing batch normalzation is that it will remove the "dependent" batch normalization layer. for example, conv(64)->bn(64), if the removed planes are 32, it would be conv(32)->bn(32). am i right?

so is it ok to finetune directly from speedup_model()?

0 replies

Davidxswang · 2021-04-22T12:27:21Z

Davidxswang
Apr 22, 2021

@davbowman

what i mean removing batch normalzation is that it will remove the "dependent" batch normalization layer. for example, conv(64)->bn(64), if the removed planes are 32, it would be conv(32)->bn(32). am i right?

so is it ok to finetune directly from speedup_model()?

Sorry, my misunderstanding. Yes, you are right.
I think it is better to finetune from speedup_model().

0 replies

twmht · 2021-04-22T12:29:46Z

twmht
Apr 22, 2021
Author

my problem is that with the same sparsity. finetune from pruner.compress() or finetune from model_speedup(), the former is fine but the other is hard to conveage.

the problem exists even though reducing the sparsity.

0 replies

twmht · 2021-04-22T12:31:01Z

twmht
Apr 22, 2021
Author

so my guess is that the un-removed bias of the batch normalization would recover some meaningful activation value. if finetune from model_speedup() is good then why examples of nni don't do that?

0 replies

scarlett2018 · 2021-06-09T08:00:02Z

scarlett2018
Jun 9, 2021
Maintainer

thanks @twmht for raising this up, and sorry for the late responds, we were too busy on the recent release and missed this issue somehow, hope you are still stay tuned.

I just converted the issue to a discussion, as we don't have a good resolution for this issue yet, and would like to open this for idea and discussions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hard to converage when finetune from model_speedup #3752

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

hard to converage when finetune from model_speedup #3752

twmht Apr 21, 2021

Replies: 6 comments

Davidxswang Apr 22, 2021

twmht Apr 22, 2021 Author

Davidxswang Apr 22, 2021

twmht Apr 22, 2021 Author

twmht Apr 22, 2021 Author

scarlett2018 Jun 9, 2021 Maintainer

twmht
Apr 21, 2021

Davidxswang
Apr 22, 2021

twmht
Apr 22, 2021
Author

Davidxswang
Apr 22, 2021

twmht
Apr 22, 2021
Author

twmht
Apr 22, 2021
Author

scarlett2018
Jun 9, 2021
Maintainer