Early Exit Inference #103

HKLee2040 · 2018-12-11T08:51:37Z

How to train an early exit model? Here is the command I used:

python3 compress_classifier.py --arch resnet20_cifar_earlyexit ../../../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/resnet20_cifar_baseline_training.yaml -j=1 --deterministic --earlyexit_thresholds 0.9 1.2 --earlyexit_lossweights 0.2 0.3

But Distiller shows me the following error message:

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
==> using cifar10 dataset
=> creating resnet20_cifar_earlyexit model for CIFAR10

Logging to TensorBoard - remember to execute the server:

tensorboard --logdir='./logs'

=> using early-exit threshold values of [0.9, 1.2]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'dampening': 0, 'weight_decay': 0.0001, 'momentum': 0.9, 'nesterov': False, 'lr': 0.3}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml

Training epoch: 45000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 477, in train
loss = earlyexit_loss(output, target, criterion, args)
File "compress_classifier.py", line 645, in earlyexit_loss
loss += (1.0 - sum_lossweights) * criterion(output[args.num_exits-1], target)
IndexError: list index out of range

resnet20_cifar_baseline_training.yaml ==>
lr_schedulers:
training_lr:
class: StepLR
step_size: 45
gamma: 0.10

policies:
- lr_scheduler:
instance_name: training_lr
starting_epoch: 45
ending_epoch: 200
frequency: 1

HKLee2040 · 2018-12-11T09:13:27Z

2018-12-11 17:11:12,837 - Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-171112/2018.12.11-171112.log
2018-12-11 17:11:12,837 - Number of CPUs: 4
2018-12-11 17:11:12,992 - Number of GPUs: 1
2018-12-11 17:11:12,993 - CUDA version: 8.0.61
2018-12-11 17:11:12,993 - CUDNN version: 7102
2018-12-11 17:11:12,993 - Kernel: 4.15.0-42-generic
2018-12-11 17:11:13,001 - OS: Ubuntu 16.04.5 LTS
2018-12-11 17:11:13,002 - Python: 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
2018-12-11 17:11:13,002 - PyTorch: 0.4.0
2018-12-11 17:11:13,002 - Numpy: 1.14.3
2018-12-11 17:11:13,631 - Git is dirty
2018-12-11 17:11:13,632 - Active Git branch: master
2018-12-11 17:11:13,643 - Git commit: 37d5774

Gxllii · 2018-12-11T10:15:07Z

resnet20_cifar_earlyexit has only 1 earlyexit.
you can use 1 earlyexit_threshold and 1 earlyexit_lossweight in your command:
e.g. --earlyexit_thresholds 0.9 --earlyexit_lossweights 0.2

HKLee2040 · 2018-12-11T10:27:15Z

Thanks!
The first epoch is finished, but another problem is shown below:

=> using early-exit threshold values of [0.9]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'nesterov': False, 'dampening': 0, 'weight_decay': 0.0001, 'lr': 0.3, 'momentum': 0.9}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml

Training epoch: 45000 samples (256 per mini-batch)
Epoch: [0][ 50/ 176] Overall Loss 3.187736 Objective Loss 3.187736 Top1_exit0 9.898438 Top5_exit0 50.078125 Top1_exit1 10.554688 Top5_exit1 50.609375 LR 0.300000 Time 0.482580
Epoch: [0][ 100/ 176] Overall Loss 3.125266 Objective Loss 3.125266 Top1_exit0 10.285156 Top5_exit0 50.812500 Top1_exit1 10.597656 Top5_exit1 51.570312 LR 0.300000 Time 0.484268
Epoch: [0][ 150/ 176] Overall Loss 2.833748 Objective Loss 2.833748 Top1_exit0 12.307292 Top5_exit0 54.539062 Top1_exit1 12.354167 Top5_exit1 55.369792 LR 0.300000 Time 0.484920

Parameters:
+----+----------------------------- | | Name |----+----------------------------- | 0 | module.conv1.weight | 1 | module.layer1.0.conv1.weight | 2 | module.layer1.0.conv2.weight | 3 | module.layer1.1.conv1.weight | 4 | module.layer1.1.conv2.weight | 5 | module.layer1.2.conv1.weight | 6 | module.layer1.2.conv2.weight | 7 | module.layer2.0.conv1.weight | 8 | module.layer2.0.conv2.weight | 9 | module.layer2.0.downsample.0.weight | 10 | module.layer2.1.conv1.weight | 11 | module.layer2.1.conv2.weight | 12 | module.layer2.2.conv1.weight | 13 | module.layer2.2.conv2.weight | 14 | module.layer3.0.conv1.weight | 15 | module.layer3.0.conv2.weight | 16 | module.layer3.0.downsample.0.weight | 17 | module.layer3.1.conv1.weight | 18 | module.layer3.1.conv2.weight | 19 | module.layer3.2.conv1.weight | 20 | module.layer3.2.conv2.weight | 21 | module.fc.weight | 22 | module.linear_exit0.weight | 23 | Total sparsity: +----+----------------------------- Total sparsity: 0.00 --------+----------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+
| Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean |
--------+----------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|
| (16, 3, 3, 3) | 432 | 432 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.67642 | -0.01829 | 0.43613 |
| (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.40499 | 0.06439 | 0.21337 |
| (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.42122 | 0.02479 | 0.24595 |
| (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.63190 | -0.03862 | 0.29190 |
| (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.43385 | -0.09513 | 0.22181 |
| (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.48105 | 0.00908 | 0.21156 |
| (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.38058 | -0.04943 | 0.21154 |
| (32, 16, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.13368 | 0.00847 | 0.07841 |
| (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.10541 | 0.01295 | 0.07728 |
| (32, 16, 1, 1) | 512 | 512 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.28425 | 0.00707 | 0.21435 |
| (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.09172 | 0.00345 | 0.07236 |
| (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08651 | 0.00590 | 0.06907 |
| (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08414 | 0.00368 | 0.06735 |
| (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08305 | 0.00277 | 0.06674 |
| (64, 32, 3, 3) | 18432 | 18432 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.06003 | 0.00150 | 0.04792 |
| (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05860 | -0.00236 | 0.04683 |
| (64, 32, 1, 1) | 2048 | 2048 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.17145 | 0.00071 | 0.13722 |
| (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05759 | -0.00043 | 0.04592 |
| (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05811 | -0.00372 | 0.04645 |
| (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05727 | 0.00079 | 0.04565 |
| (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05744 | -0.00756 | 0.04619 |
| (10, 64) | 640 | 640 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.13957 | -0.00407 | 0.10648 |
| (10, 1600) | 16000 | 16000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.26628 | 0.00000 | 0.14020 |
| - | 286896 | 286896 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
--------+----------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+

--- validate (epoch=0)-----------
5000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-182354/2018.12.11-182354.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 395, in main
top1, top5, vloss = validate(val_loader, model, criterion, [pylogger], args, epoch)
File "compress_classifier.py", line 539, in validate
return _validate(val_loader, model, criterion, loggers, args, epoch)
File "compress_classifier.py", line 594, in _validate
earlyexit_validate_loss(output, target, criterion, args)
File "compress_classifier.py", line 656, in earlyexit_validate_loss
earlyexit_validate_criterion = nn.CrossEntropyLoss(reduction='none').cuda()
TypeError: init() got an unexpected keyword argument 'reduction'

haim-barad · 2018-12-11T11:19:51Z

Hi - this is because the newer Pytorch 1.0 is deprecating the "reduce" parameter and using "reduction"

Either update to Pytorch 1.0 - or if you don't want to use the pre-release version, then change that line in the code to reduce='False'

Gxllii · 2018-12-11T11:21:25Z

I found this problem, too.
pytorch version must be >= 0.4.1

haim-barad · 2018-12-11T11:31:19Z

To be consistent with the rest of Distiller (which assumes version 0.4.0 of Pytorch), I will be taking the parameter back to the 0.4.0 method of calling. The parameter should be reduce=False

If you want to run right now - make the change yourself. I'll be submitting a patch in the upcoming day or two.

HKLee2040 · 2018-12-11T12:18:07Z

@haim-barad It works. Thanks!

nzmora added the early exit This issue is related to early-exit label Jan 2, 2019

nzmora closed this as completed Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early Exit Inference #103

Early Exit Inference #103

HKLee2040 commented Dec 11, 2018

HKLee2040 commented Dec 11, 2018

Gxllii commented Dec 11, 2018 •

edited

HKLee2040 commented Dec 11, 2018

haim-barad commented Dec 11, 2018

Gxllii commented Dec 11, 2018

haim-barad commented Dec 11, 2018

HKLee2040 commented Dec 11, 2018

Early Exit Inference #103

Early Exit Inference #103

Comments

HKLee2040 commented Dec 11, 2018

HKLee2040 commented Dec 11, 2018

Gxllii commented Dec 11, 2018 • edited

HKLee2040 commented Dec 11, 2018

haim-barad commented Dec 11, 2018

Gxllii commented Dec 11, 2018

haim-barad commented Dec 11, 2018

HKLee2040 commented Dec 11, 2018

Gxllii commented Dec 11, 2018 •

edited