report error while test train code #116

kaixin-bai · 2020-06-10T12:16:55Z

python pointnet2/train.py task=cls

report error:

Epoch 1: 100%|██████████████| 385/385 [01:29<00:00,  4.31it/s, loss=1.314, train_acc=0.562, v_num=4, val_acc=0.654, val_loss=1.2]
Epoch 00000: val_acc reached 0.65385 (best 0.65385), saving model to cls-ssg/epoch=0-val_loss=1.20-val_acc=0.654.ckpt as top 2   
[2020-06-10 14:15:12,645][lightning][INFO] - 
Epoch 00000: val_acc reached 0.65385 (best 0.65385), saving model to cls-ssg/epoch=0-val_loss=1.20-val_acc=0.654.ckpt as top 2
/home/kb/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)
Traceback (most recent call last):
  File "pointnet2/train.py", line 61, in <module>
    main()
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/main.py", line 24, in decorated_main
    strict=strict,
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/_internal/utils.py", line 174, in run_hydra
    overrides=args.overrides,
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/_internal/hydra.py", line 86, in run
    job_subdir_key=None,
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/plugins/common/utils.py", line 109, in run_job
    ret.return_value = task_function(task_cfg)
  File "pointnet2/train.py", line 57, in main
    trainer.fit(model)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 853, in fit
    self.dp_train(model)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 578, in dp_train
    self.run_pretrain_routine(model)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1015, in run_pretrain_routine
    self.train()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 347, in train
    self.run_training_epoch()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 452, in run_training_epoch
    self.call_checkpoint_callback()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 790, in call_checkpoint_callback
    self.checkpoint_callback.on_validation_end(self, self.get_model())
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py", line 10, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 241, in on_validation_end
    self._do_check_save(filepath, current, epoch)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 275, in _do_check_save
    self._save_model(filepath)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 142, in _save_model
    self.save_function(filepath)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_io.py", line 260, in save_checkpoint
    checkpoint = self.dump_checkpoint()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_io.py", line 355, in dump_checkpoint
    f' not {checkpoint["hparams_type"]}'
ValueError: ('The acceptable hparams type is dict or argparse.Namespace,', ' not DictConfig')
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1086, in __del__
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1293, in close
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1471, in display
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1089, in __repr__
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: 'NoneType' object is not iterable

The text was updated successfully, but these errors were encountered:

kaixin-bai · 2020-06-10T15:40:22Z

'checkpoint' is a dict with keys: 'epoch', 'global_step', 'checkpoint_callback_best' and so on, but the keys don't have 'hparams_type' and 'hparams'. That's why the training code report ValueError.

But i don't know how to fix it.

kaixin-bai · 2020-06-10T16:18:37Z

after the training of each epoch, the checkpoint will be saved, and pytorch-lightning will check if the hparams_type of checkpoint (got from model.hparams) is 'dict'. But it is 'DictConfig'. So the code turns to ValueError.

change pytorch_lightning/trainer/training_io.py 348 to:

if checkpoint['hparams_type'] == 'dict' or 'DictConfig':

and build pytorch-lighning from source, problem solved.

kaixin-bai · 2020-06-10T16:22:04Z

I know this shouldn't be the solution, it will be appreciated if somebody can help me to change the type of model.hparams to 'dict', or find why this happens.

kaixin-bai · 2020-06-10T16:34:07Z

seems the problem of pytorch-lightning has not been solved yet. https://github.com/PyTorchLightning/pytorch-lightning/issues/2027

erikwijmans · 2020-06-10T18:10:11Z

Hi, can you try #117 to see if that fixes it?

kaixin-bai · 2020-06-10T18:34:41Z

Hi, can you try #117 to see if that fixes it?

Thanks for reply, i'll try.

kaixin-bai closed this as completed Jun 10, 2020

kaixin-bai reopened this Jun 10, 2020

kaixin-bai closed this as completed Jun 10, 2020

erikwijmans mentioned this issue Jun 10, 2020

Fix DictConfig Issue #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report error while test train code #116

report error while test train code #116

kaixin-bai commented Jun 10, 2020 •

edited

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

erikwijmans commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

report error while test train code #116

report error while test train code #116

Comments

kaixin-bai commented Jun 10, 2020 • edited

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

erikwijmans commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020

kaixin-bai commented Jun 10, 2020 •

edited