Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report error while test train code #116

Closed
kaixin-bai opened this issue Jun 10, 2020 · 6 comments · Fixed by #117
Closed

report error while test train code #116

kaixin-bai opened this issue Jun 10, 2020 · 6 comments · Fixed by #117

Comments

@kaixin-bai
Copy link

kaixin-bai commented Jun 10, 2020

python pointnet2/train.py task=cls

report error:

Epoch 1: 100%|██████████████| 385/385 [01:29<00:00,  4.31it/s, loss=1.314, train_acc=0.562, v_num=4, val_acc=0.654, val_loss=1.2]
Epoch 00000: val_acc reached 0.65385 (best 0.65385), saving model to cls-ssg/epoch=0-val_loss=1.20-val_acc=0.654.ckpt as top 2   
[2020-06-10 14:15:12,645][lightning][INFO] - 
Epoch 00000: val_acc reached 0.65385 (best 0.65385), saving model to cls-ssg/epoch=0-val_loss=1.20-val_acc=0.654.ckpt as top 2
/home/kb/.local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)
Traceback (most recent call last):
  File "pointnet2/train.py", line 61, in <module>
    main()
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/main.py", line 24, in decorated_main
    strict=strict,
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/_internal/utils.py", line 174, in run_hydra
    overrides=args.overrides,
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/_internal/hydra.py", line 86, in run
    job_subdir_key=None,
  File "/home/kb/.local/lib/python3.6/site-packages/hydra/plugins/common/utils.py", line 109, in run_job
    ret.return_value = task_function(task_cfg)
  File "pointnet2/train.py", line 57, in main
    trainer.fit(model)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 853, in fit
    self.dp_train(model)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 578, in dp_train
    self.run_pretrain_routine(model)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1015, in run_pretrain_routine
    self.train()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 347, in train
    self.run_training_epoch()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 452, in run_training_epoch
    self.call_checkpoint_callback()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 790, in call_checkpoint_callback
    self.checkpoint_callback.on_validation_end(self, self.get_model())
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py", line 10, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 241, in on_validation_end
    self._do_check_save(filepath, current, epoch)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 275, in _do_check_save
    self._save_model(filepath)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 142, in _save_model
    self.save_function(filepath)
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_io.py", line 260, in save_checkpoint
    checkpoint = self.dump_checkpoint()
  File "/home/kb/.local/lib/python3.6/site-packages/pytorch_lightning/trainer/training_io.py", line 355, in dump_checkpoint
    f' not {checkpoint["hparams_type"]}'
ValueError: ('The acceptable hparams type is dict or argparse.Namespace,', ' not DictConfig')
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1086, in __del__
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1293, in close
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1471, in display
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1089, in __repr__
  File "/home/kb/.local/lib/python3.6/site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: 'NoneType' object is not iterable
@kaixin-bai
Copy link
Author

'checkpoint' is a dict with keys: 'epoch', 'global_step', 'checkpoint_callback_best' and so on, but the keys don't have 'hparams_type' and 'hparams'. That's why the training code report ValueError.

But i don't know how to fix it.

@kaixin-bai
Copy link
Author

after the training of each epoch, the checkpoint will be saved, and pytorch-lightning will check if the hparams_type of checkpoint (got from model.hparams) is 'dict'. But it is 'DictConfig'. So the code turns to ValueError.

change pytorch_lightning/trainer/training_io.py 348 to:

if checkpoint['hparams_type'] == 'dict' or 'DictConfig':

and build pytorch-lighning from source, problem solved.

@kaixin-bai
Copy link
Author

I know this shouldn't be the solution, it will be appreciated if somebody can help me to change the type of model.hparams to 'dict', or find why this happens.

@kaixin-bai
Copy link
Author

seems the problem of pytorch-lightning has not been solved yet. https://github.com/PyTorchLightning/pytorch-lightning/issues/2027

@erikwijmans
Copy link
Owner

Hi, can you try #117 to see if that fixes it?

@kaixin-bai
Copy link
Author

Hi, can you try #117 to see if that fixes it?

Thanks for reply, i'll try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants