Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

Early Exit Inference #103

Closed
HKLee2040 opened this issue Dec 11, 2018 · 7 comments
Closed

Early Exit Inference #103

HKLee2040 opened this issue Dec 11, 2018 · 7 comments
Labels
early exit This issue is related to early-exit

Comments

@HKLee2040
Copy link

How to train an early exit model? Here is the command I used:

python3 compress_classifier.py --arch resnet20_cifar_earlyexit ../../../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/resnet20_cifar_baseline_training.yaml -j=1 --deterministic --earlyexit_thresholds 0.9 1.2 --earlyexit_lossweights 0.2 0.3

But Distiller shows me the following error message:

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
==> using cifar10 dataset
=> creating resnet20_cifar_earlyexit model for CIFAR10


Logging to TensorBoard - remember to execute the server:

tensorboard --logdir='./logs'

=> using early-exit threshold values of [0.9, 1.2]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'dampening': 0, 'weight_decay': 0.0001, 'momentum': 0.9, 'nesterov': False, 'lr': 0.3}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml

Training epoch: 45000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 477, in train
loss = earlyexit_loss(output, target, criterion, args)
File "compress_classifier.py", line 645, in earlyexit_loss
loss += (1.0 - sum_lossweights) * criterion(output[args.num_exits-1], target)
IndexError: list index out of range

resnet20_cifar_baseline_training.yaml ==>
lr_schedulers:
training_lr:
class: StepLR
step_size: 45
gamma: 0.10

policies:
- lr_scheduler:
instance_name: training_lr
starting_epoch: 45
ending_epoch: 200
frequency: 1

@HKLee2040
Copy link
Author

2018-12-11 17:11:12,837 - Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-171112/2018.12.11-171112.log
2018-12-11 17:11:12,837 - Number of CPUs: 4
2018-12-11 17:11:12,992 - Number of GPUs: 1
2018-12-11 17:11:12,993 - CUDA version: 8.0.61
2018-12-11 17:11:12,993 - CUDNN version: 7102
2018-12-11 17:11:12,993 - Kernel: 4.15.0-42-generic
2018-12-11 17:11:13,001 - OS: Ubuntu 16.04.5 LTS
2018-12-11 17:11:13,002 - Python: 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
2018-12-11 17:11:13,002 - PyTorch: 0.4.0
2018-12-11 17:11:13,002 - Numpy: 1.14.3
2018-12-11 17:11:13,631 - Git is dirty
2018-12-11 17:11:13,632 - Active Git branch: master
2018-12-11 17:11:13,643 - Git commit: 37d5774

@Gxllii
Copy link
Contributor

Gxllii commented Dec 11, 2018

resnet20_cifar_earlyexit has only 1 earlyexit.
you can use 1 earlyexit_threshold and 1 earlyexit_lossweight in your command:
e.g. --earlyexit_thresholds 0.9 --earlyexit_lossweights 0.2

@HKLee2040
Copy link
Author

Thanks!
The first epoch is finished, but another problem is shown below:

=> using early-exit threshold values of [0.9]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'nesterov': False, 'dampening': 0, 'weight_decay': 0.0001, 'lr': 0.3, 'momentum': 0.9}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml

Training epoch: 45000 samples (256 per mini-batch)
Epoch: [0][ 50/ 176] Overall Loss 3.187736 Objective Loss 3.187736 Top1_exit0 9.898438 Top5_exit0 50.078125 Top1_exit1 10.554688 Top5_exit1 50.609375 LR 0.300000 Time 0.482580
Epoch: [0][ 100/ 176] Overall Loss 3.125266 Objective Loss 3.125266 Top1_exit0 10.285156 Top5_exit0 50.812500 Top1_exit1 10.597656 Top5_exit1 51.570312 LR 0.300000 Time 0.484268
Epoch: [0][ 150/ 176] Overall Loss 2.833748 Objective Loss 2.833748 Top1_exit0 12.307292 Top5_exit0 54.539062 Top1_exit1 12.354167 Top5_exit1 55.369792 LR 0.300000 Time 0.484920

Parameters:
+----+-------------------------------------+----------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+
| | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean |
|----+-------------------------------------+----------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|
| 0 | module.conv1.weight | (16, 3, 3, 3) | 432 | 432 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.67642 | -0.01829 | 0.43613 |
| 1 | module.layer1.0.conv1.weight | (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.40499 | 0.06439 | 0.21337 |
| 2 | module.layer1.0.conv2.weight | (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.42122 | 0.02479 | 0.24595 |
| 3 | module.layer1.1.conv1.weight | (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.63190 | -0.03862 | 0.29190 |
| 4 | module.layer1.1.conv2.weight | (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.43385 | -0.09513 | 0.22181 |
| 5 | module.layer1.2.conv1.weight | (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.48105 | 0.00908 | 0.21156 |
| 6 | module.layer1.2.conv2.weight | (16, 16, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.38058 | -0.04943 | 0.21154 |
| 7 | module.layer2.0.conv1.weight | (32, 16, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.13368 | 0.00847 | 0.07841 |
| 8 | module.layer2.0.conv2.weight | (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.10541 | 0.01295 | 0.07728 |
| 9 | module.layer2.0.downsample.0.weight | (32, 16, 1, 1) | 512 | 512 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.28425 | 0.00707 | 0.21435 |
| 10 | module.layer2.1.conv1.weight | (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.09172 | 0.00345 | 0.07236 |
| 11 | module.layer2.1.conv2.weight | (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08651 | 0.00590 | 0.06907 |
| 12 | module.layer2.2.conv1.weight | (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08414 | 0.00368 | 0.06735 |
| 13 | module.layer2.2.conv2.weight | (32, 32, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08305 | 0.00277 | 0.06674 |
| 14 | module.layer3.0.conv1.weight | (64, 32, 3, 3) | 18432 | 18432 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.06003 | 0.00150 | 0.04792 |
| 15 | module.layer3.0.conv2.weight | (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05860 | -0.00236 | 0.04683 |
| 16 | module.layer3.0.downsample.0.weight | (64, 32, 1, 1) | 2048 | 2048 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.17145 | 0.00071 | 0.13722 |
| 17 | module.layer3.1.conv1.weight | (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05759 | -0.00043 | 0.04592 |
| 18 | module.layer3.1.conv2.weight | (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05811 | -0.00372 | 0.04645 |
| 19 | module.layer3.2.conv1.weight | (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05727 | 0.00079 | 0.04565 |
| 20 | module.layer3.2.conv2.weight | (64, 64, 3, 3) | 36864 | 36864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05744 | -0.00756 | 0.04619 |
| 21 | module.fc.weight | (10, 64) | 640 | 640 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.13957 | -0.00407 | 0.10648 |
| 22 | module.linear_exit0.weight | (10, 1600) | 16000 | 16000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.26628 | 0.00000 | 0.14020 |
| 23 | Total sparsity: | - | 286896 | 286896 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
+----+-------------------------------------+----------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+
Total sparsity: 0.00

--- validate (epoch=0)-----------
5000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-182354/2018.12.11-182354.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 395, in main
top1, top5, vloss = validate(val_loader, model, criterion, [pylogger], args, epoch)
File "compress_classifier.py", line 539, in validate
return _validate(val_loader, model, criterion, loggers, args, epoch)
File "compress_classifier.py", line 594, in _validate
earlyexit_validate_loss(output, target, criterion, args)
File "compress_classifier.py", line 656, in earlyexit_validate_loss
earlyexit_validate_criterion = nn.CrossEntropyLoss(reduction='none').cuda()
TypeError: init() got an unexpected keyword argument 'reduction'

@haim-barad
Copy link
Contributor

Hi - this is because the newer Pytorch 1.0 is deprecating the "reduce" parameter and using "reduction"

Either update to Pytorch 1.0 - or if you don't want to use the pre-release version, then change that line in the code to reduce='False'

@Gxllii
Copy link
Contributor

Gxllii commented Dec 11, 2018

I found this problem, too.
pytorch version must be >= 0.4.1

@haim-barad
Copy link
Contributor

To be consistent with the rest of Distiller (which assumes version 0.4.0 of Pytorch), I will be taking the parameter back to the 0.4.0 method of calling. The parameter should be reduce=False

If you want to run right now - make the change yourself. I'll be submitting a patch in the upcoming day or two.

@HKLee2040
Copy link
Author

@haim-barad It works. Thanks!

@nzmora nzmora added the early exit This issue is related to early-exit label Jan 2, 2019
@nzmora nzmora closed this as completed Jan 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early exit This issue is related to early-exit
Projects
None yet
Development

No branches or pull requests

4 participants