Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception[pytorch_lightning]: You are trying to self.log() but it is not managed by the Trainer control flow #101

Closed
JANGSOONMYUN opened this issue Jun 1, 2022 · 1 comment

Comments

@JANGSOONMYUN
Copy link

Hi, I've met a crash while running training model.
There is a problem from pytorch_lightning, maybe. But I don't understand why 'self.log' cannot be used.
Please help me to solve the problem.

I ran the project on the colab.
The version of the pytorch_lightening is 1.5.

Traceback (most recent call last):
File "train.py", line 99, in
main(config, args.resume)
File "train.py", line 75, in main
trainer.fit(model=model, datamodule=data_module)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 769, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
results = self._run_stage()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
return self._run_train()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
self.fit_loop.run()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/fit_loop.py", line 268, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 246, in advance
self.trainer._logger_connector.update_train_step_metrics()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 197, in update_train_step_metrics
self._log_gpus_metrics()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 226, in _log_gpus_metrics
key, mem, prog_bar=False, logger=True, on_step=True, on_epoch=False
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/lightning.py", line 386, in log
"You are trying to self.log() but it is not managed by the Trainer control flow"
pytorch_lightning.utilities.exceptions.MisconfigurationException: You are trying to self.log() but it is not managed by the Trainer control flow

@JANGSOONMYUN
Copy link
Author

Uncommented "accelerator='ddp'" and "log_gpu_memory=config.trainer.log_gpu_memory" in Trainer in train.py as below. And it worked.

trainer = Trainer(
    logger=wandb_logger,
    callbacks=[checkpoint_callback],
    max_epochs=config.trainer.epochs,
    default_root_dir=root_dir,
    gpus=gpus,
    # accelerator='ddp',
    benchmark=True,
    sync_batchnorm=True,
    precision=config.precision,
    # log_gpu_memory=config.trainer.log_gpu_memory,
    log_every_n_steps=config.trainer.log_every_n_steps,
    overfit_batches=config.trainer.overfit_batches,
    weights_summary='full',
    terminate_on_nan=config.trainer.terminate_on_nan,
    fast_dev_run=config.trainer.fast_dev_run,
    check_val_every_n_epoch=config.trainer.check_val_every_n_epoch,
    resume_from_checkpoint=resume_ckpt)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant