Enable BN adaptation upon compressed model initialization #41

lzrvch · 2020-06-25T20:56:58Z

The PR adds a final BatchNorm statistics adaptation step to any compressed model initializer.

This is roughly based on the results of the FNNP paper and with further considerations that BN statistics adaptation acts as a bias & variance correction procedure following model weight perturbations during compression.

Preliminary results for quantization (ImageNet; fixed random seed for all expts; accuracy measured right after initialization):

Model	Quantization bitwidths	Quantization mode	Range initializer	Accuracy@1	Accuracy@5
ResNet18	a8w4	asymmetric, per-channel	mean min max, 100 batches	49.352	73.822
ResNet18 (BN adapted)	a8w4	asymmetric, per-channel	mean min max, 100 batches	66.866	87.476
MobilenetV2	a8w4	asymmetric, per-channel	mean min max, 100 batches	14.034	29.278
MobilenetV2 (BN adapted)	a8w4	asymmetric, per-channel	mean min max, 100 batches	65.216	86.304
MobilenetV2	a8w8	symmetric, per-tensor	mean min max, 100 batches	69.984	89.356
MobilenetV2 (BN adapted)	a8w8	symmetric, per-tensor	mean min max, 100 batches	70.842	89.760
ResNet50	a4w4	asymmetric, per-channel	mean min max, 100 batches	2.206	5.902
ResNet50 (BN adapted)	a4w4	asymmetric, per-channel	mean min max, 100 batches	20.428	39.690
ResNet50	a8w4	asymmetric, per-channel	mean min max, 100 batches	69.440	89.208
ResNet50 (BN adapted)	a8w4	asymmetric, per-channel	mean min max, 100 batches	74.866	92.244

NB: the "per-channel" tag above relates to weight quantization, activations are always quantized per-tensor. "(A)symmetric" relates to both weights and activations.

Pruning/sparsity results (ImageNet; fixed random seed for all expts; accuracy measured right after initialization)

Model	Pruning algo info	Accuracy@1	Accuracy@5
ResNet18	geometric median criterion, pruning target = 30%	0.308	1.110
ResNet18 (BN adapted)	geometric median criterion, pruning target = 30%	33.582	59.336
ResNet18	magnitude sparsity, sparsity tagret = 70%	41.196	66.952
ResNet18 (BN adapted)	magnitude sparsity, sparsity target = 70%	61.222	84.022

Preliminary results obtained by fully resetting BN statistics after initialization and running inference of 200 batches (256 samples each).

To do's:

Add BN adaptation to sparsity & pruning initialization
Use large momentum value instead of zeroing BN stats completely
Do iterative adaptation BN layer after BN layer - results in BatchNorm adaptation results #46

openvino-pushbot · 2020-06-25T20:57:02Z

Can one of the admins verify this patch?

openvino-pushbot · 2020-06-25T20:57:02Z

Can one of the admins verify this patch?

lzrvch · 2020-06-25T21:25:51Z

@mkaglins have a look. This initialization procedure can be used for model subjected to filter pruning. We should try to reproduce the results of the FNNP paper saying that accuracy after BN adaptation correlates well with fine-tuned model accuracy (basically using Figure 4 of the paper as reference).

Need to further rename QuantizationRangeInitArgs properly since it's now being used outside the quantization algo scope.

…ameter for algo initializers

…gs struct

nncf/config_schema.py

lzrvch · 2020-06-29T19:51:29Z

@mkaglins could you also please run your fine-tuning experiments with L1/L2 and Geomean filter selection criteria separately? It'd be good to know whether the correlation between metrics depends somehow on the pruning criterion chosen.

…um value. Appoximately same accuracy with fewer init steps required.

…anyalzr/nncf_pytorch into il/batch_statistics_adaptation

…pirically

lzrvch · 2020-07-21T10:18:10Z

Jenkins please retry a build

vshampor · 2020-07-21T12:19:13Z

Jenkins please retry a build

nncf/quantization/algo.py

nncf/initialization.py

nncf/config_schema.py

nncf/initialization.py

nncf/sparsity/magnitude/algo.py

lzrvch · 2020-07-22T12:49:24Z

@AlexKoff88 @vshampor PR is ready to be merged. BN adaptation is switched off by default for now.

…olkit#41) * Add batchnorm statistics adaptation to quantization algo initializer * Enable BN stats adaptation for filter pruning. Need to further rename QuantizationRangeInitArgs properly since it's now being used outside the quantization algo scope. * + BN adaptation for magnitude sparsity * Extend the config schema to include the *num_bn_adaptation_steps* parameter for algo initializers * Proper pass of BN adaptation args (dataloader) via BNAdaptationInitArgs struct * Improved config setting for BN adaptation parameters * Remove initialzier in config schema for RB sparsity * Replace resetting of BN stats with a forgetting step via large momentum value. Appoximately same accuracy with fewer init steps required. * Fix forgetting momentum value * Default num of steps = 20 for BN adaptation since it is sufficient empirically * Move BN adaptation to base algo class * formatting * Fix export_model method docstring for filter pruning * Fix range init call counter test for BN adaptation * No BN adaptation by default, adjust call counter test * add BN adaptation docs

Add batchnorm statistics adaptation to quantization algo initializer

78cef52

lzrvch requested a review from a team June 25, 2020 20:56

Ivan Lazarevich added 6 commits June 26, 2020 03:52

Enable BN stats adaptation for filter pruning.

45809cc

Need to further rename QuantizationRangeInitArgs properly since it's now being used outside the quantization algo scope.

+ BN adaptation for magnitude sparsity

45fbb39

Extend the config schema to include the *num_bn_adaptation_steps* par…

b447695

…ameter for algo initializers

Proper pass of BN adaptation args (dataloader) via BNAdaptationInitAr…

7cd5640

…gs struct

Improved config setting for BN adaptation parameters

0cdd7dc

Merge branch 'develop' into il/batch_statistics_adaptation

7f276e7

lzrvch commented Jun 26, 2020

View reviewed changes

nncf/config_schema.py Outdated Show resolved Hide resolved

Ivan Lazarevich and others added 3 commits June 30, 2020 11:41

Remove initialzier in config schema for RB sparsity

cf232ab

Replace resetting of BN stats with a forgetting step via large moment…

37a486e

…um value. Appoximately same accuracy with fewer init steps required.

Merge branch 'il/batch_statistics_adaptation' of https://github.com/v…

aca76e1

…anyalzr/nncf_pytorch into il/batch_statistics_adaptation

lzrvch mentioned this pull request Jun 30, 2020

BatchNorm adaptation results #46

Closed

Ivan Lazarevich added 2 commits June 30, 2020 23:05

Fix forgetting momentum value

1ef851a

Default num of steps = 20 for BN adaptation since it is sufficient em…

6675c28

…pirically

lzrvch changed the title ~~WIP: Enable BN adaptation upon compressed model initialization~~ Enable BN adaptation upon compressed model initialization Jul 20, 2020