Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

cannot resume model for training #59

Closed
gleefeng opened this issue Oct 22, 2018 · 8 comments
Closed

cannot resume model for training #59

gleefeng opened this issue Oct 22, 2018 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@gleefeng
Copy link

I use the test :
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10/ -j 1 --resume ../../../data.cifar10/models/best.pth.tar --epochs 200 --compress=../quantization/preact_resnet20_cifar_pact.yaml --out-dir="logs/" --wd=0.0002 --vs=0

some error:

=> loading checkpoint ../../../data.cifar10/models/best.pth.tar
Checkpoint keys:
arch
        compression_sched
        epoch
        optimizer
        state_dict
        quantizer_metadata
        best_top1
   best top@1: 39.310
Loaded compression schedule from checkpoint (epoch 0)
Loaded quantizer metadata from the checkpoint
{'params': {'bits_weights': 3, 'bits_activations': 4, 'quantize_bias': False, 'bits_overrides': OrderedDict([('conv1', OrderedDict([('wts', None), ('acts', None)])), ('layer1.0.pre_relu', OrderedDict([('wts', None), ('acts', None)])), ('final_relu', OrderedDict([('wts', None), ('acts', None)])), ('fc', OrderedDict([('wts', None), ('acts', None)]))])}, 'type': <class 'distiller.quantization.clipped_linear.PACTQuantizer'>}
Traceback (most recent call last):
  File "compress_classifier.py", line 686, in <module>
    main()
  File "compress_classifier.py", line 244, in main
    model, chkpt_file=args.resume)
  File "D:\pytorchProject\distiller\apputils\checkpoint.py", line 117, in load_checkpoint
    quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() missing 1 required positional argument: 'optimizer'

how to fix it?

@TonightGo
Copy link

Hi, gleefeng, I got the following error when I ran the command "python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01":

2018-10-22 17:03:03,745 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log
2018-10-22 17:03:03,745 - Number of CPUs: 24
2018-10-22 17:03:03,850 - Number of GPUs: 8
2018-10-22 17:03:03,850 - CUDA version: 8.0.61
2018-10-22 17:03:03,850 - CUDNN version: 7102
2018-10-22 17:03:03,851 - Kernel: 4.4.0-98-generic
2018-10-22 17:03:03,851 - Python: 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
2018-10-22 17:03:03,851 - PyTorch: 0.4.0
2018-10-22 17:03:03,851 - Numpy: 1.14.3
2018-10-22 17:03:03,852 - Traceback (most recent call last):
File "compress_classifier.py", line 686, in
main()
File "compress_classifier.py", line 179, in main
apputils.log_execution_env_state(sys.argv, gitroot=module_path)
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 78, in log_execution_env_state
log_git_state()
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 56, in log_git_state
repo = Repo(gitroot, search_parent_directories=True)
File "/home/project/compress/distiller-master/env/lib/python3.5/site-packages/git/repo/base.py", line 168, in init
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /home/project/compress/distiller-master

2018-10-22 17:03:03,852 -
2018-10-22 17:03:03,852 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log

Do you have this issue?

@gleefeng
Copy link
Author

You can git clone this project to solve this problem. some git messages must be checked by execution_env.py

@nzmora
Copy link
Contributor

nzmora commented Oct 22, 2018

Hi @gleefeng ,

Unfortunately resuming from a quantization session is not supported currently. See #21 (comment).

Cheers,
Neta

@TonightGo
Copy link

TonightGo commented Oct 23, 2018

You can git clone this project to solve this problem. some git messages must be checked by execution_env.py

Yeah, it's ok, thank you!

@hustzxd
Copy link

hustzxd commented Oct 24, 2018

I encountered the same problem:dog: I just want to evaluate the accuracy of quantized model.
Temporary modification:
quantizer.py:87

        # if train_with_fp_copy and optimizer is None:
        #     raise ValueError('optimizer cannot be None when train_with_fp_copy is True')

class WRPNQuantizer(Quantizer):

    def __init__(self, model, optimizer=None, bits_activations=32, bits_weights=32, bits_overrides=OrderedDict(),
                 quantize_bias=False):

@nzmora nzmora self-assigned this Oct 25, 2018
@nzmora nzmora assigned guyjacob and unassigned nzmora Nov 4, 2018
@nzmora nzmora added the enhancement New feature or request label Nov 4, 2018
@guyjacob
Copy link
Contributor

guyjacob commented Nov 4, 2018

@hustzxd the workaround you detail indeed works, thanks for posting here.
We'll work on a more permanent solution.

@stvreumi
Copy link

stvreumi commented Nov 9, 2018

Hmmm...I also have problem here.
The quantized checkpoint can't be loaded for summary and testing, and the error message is the same.

@guyjacob
Copy link
Contributor

guyjacob commented Apr 2, 2019

We'll track this on #185, closing this one.

@guyjacob guyjacob closed this as completed Apr 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants