Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when performing tests #13

Closed
zozni opened this issue Mar 17, 2022 · 3 comments
Closed

Errors when performing tests #13

zozni opened this issue Mar 17, 2022 · 3 comments

Comments

@zozni
Copy link

zozni commented Mar 17, 2022

Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [05:24<00:00, 2.27it/s]Evaluation

--- Entity Mentions ---

Traceback (most recent call last):
File "./jerex_test.py", line 20, in test
model.test(cfg)
File "/home/jhj/jerex/jerex/model.py", line 389, in test
trainer.test(model, datamodule=data_module)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 910, in test
results = self.__test_given_model(model, test_dataloaders)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 970, in __test_given_model
results = self.fit(model)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 540, in dispatch
self.accelerator.start_testing(self)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 76, in start_testing
self.training_type_plugin.start_testing(trainer)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 118, in start_testing
self._results = trainer.run_test()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 786, in run_test
eval_loop_results, _ = self.run_evaluation()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in run_evaluation
deprecated_eval_results = self.evaluation_loop.evaluation_epoch_end()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 189, in evaluation_epoch_end
deprecated_results = self.__run_eval_epoch_end(self.num_dataloaders)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 221, in __run_eval_epoch_end
eval_results = model.test_epoch_end(eval_results)
File "/home/jhj/jerex/jerex/model.py", line 155, in test_epoch_end
metrics = self._evaluator.compute_metrics(self._eval_test_gt, predictions)
File "/home/jhj/jerex/jerex/evaluation/joint_evaluator.py", line 76, in compute_metrics
mention_eval = scoring.score(gt_mentions, pred_mentions, print_results=True)
File "/home/jhj/jerex/jerex/evaluation/scoring.py", line 55, in score
metrics = _compute_metrics(gt_flat, pred_flat, labels, labels_str, print_results)
File "/home/jhj/jerex/jerex/evaluation/scoring.py", line 64, in _compute_metrics
per_type = prfs(gt_all, pred_all, labels=labels, average=None, zero_division=0)
TypeError: precision_recall_fscore_support() got an unexpected keyword argument 'zero_division'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [05:24<00:00, 2.16it/s]

Hi.
When testing, an error like that occurs and the result value is not saved.
Any ideas?

thanks

@markus-eberts
Copy link
Member

Hi,
which scikit-learn version are you using?

@zozni
Copy link
Author

zozni commented May 18, 2022

The scikit-learn version was 0.21.3. After reinstalling the environment, the above issue was resolved. thanks for support.

However, another problem arises: what's the reason?

home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/sklearn/utils/validation.py:179: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.0 [00:45<03:03, 1.16it/s]
if LooseVersion(joblib_version) < '0.12':
Epoch 0: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 3098/3308 [04:58<00:20, 10.38it/s, loss=0.445, v_num=0_0]Traceback (most recent call last):██████████████████████████████▏ | 89/300 [00:45<02:46, 1.27it/s]
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 637, in run_train
self.train_loop.run_training_epoch()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 577, in run_training_epoch
self.trainer.run_evaluation(on_epoch=True)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 725, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 166, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step
return self.training_type_plugin.validation_step(*args)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 131, in validation_step
return self.lightning_module.validation_step(*args, **kwargs)
File "/home/jhj/JEREX/jerex/model.py", line 126, in validation_step
return self._inference(batch, batch_idx)
File "/home/jhj/JEREX/jerex/model.py", line 176, in _inference
output = self(**batch, inference=True)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jhj/JEREX/jerex/model.py", line 106, in forward
max_rel_pairs=max_rel_pairs, inference=inference)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jhj/JEREX/jerex/models/joint_models.py", line 144, in forward
return self._forward_inference(*args, **kwargs)
File "/home/jhj/JEREX/jerex/models/joint_models.py", line 209, in _forward_inference
mention_sample_masks, max_spans=max_spans, max_coref_pairs=max_coref_pairs)
File "/home/jhj/JEREX/jerex/models/joint_models.py", line 81, in _forward_inference_common
mention_reprs = self.mention_representation(h, mention_masks, max_spans=max_spans)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jhj/JEREX/jerex/models/modules/mention_representation.py", line 20, in forward
chunk_mention_reprs = self._forward(chunk_mention_masks, chunk_h)
File "/home/jhj/JEREX/jerex/models/modules/mention_representation.py", line 28, in _forward
mention_reprs = m + h
RuntimeError: CUDA out of memory. Tried to allocate 6.16 GiB (GPU 0; 7.77 GiB total capacity; 1.73 GiB already allocated; 4.80 GiB free; 1.92 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./jerex_train.py", line 24, in
train()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/main.py", line 37, in decorated_main
strict=strict,
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 347, in _run_hydra
lambda: hydra.run(
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
raise ex
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 350, in
overrides=args.overrides,
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 112, in run
configure_logging=with_log_configuration,
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/core/utils.py", line 127, in run_job
ret.return_value = task_function(task_cfg)
File "./jerex_train.py", line 20, in train
model.train(cfg)
File "/home/jhj/JEREX/jerex/model.py", line 341, in train
trainer.fit(model, datamodule=data_module)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
self.accelerator.start_training(self)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
self._results = trainer.run_train()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 669, in run_train
self.train_loop.on_train_end()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 134, in on_train_end
self.check_checkpoint_callback(should_update=True, is_last=True)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 164, in check_checkpoint_callback
cb.on_validation_end(self.trainer, model)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in on_validation_end
self.save_checkpoint(trainer, pl_module)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 247, in save_checkpoint
self._validate_monitor_key(trainer)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 495, in _validate_monitor_key
raise MisconfigurationException(m)
pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='valid_f1') not found in the returned metrics: ['train_mention_loss', 'train_coref_loss', 'train_entity_loss', 'train_rel_loss', 'train_loss']. HINT: Did you call self.log('valid_f1', value) in the LightningModule?

@zozni
Copy link
Author

zozni commented May 18, 2022

This was also a scikit-learn version issue....
I upgraded the version to 0.23.2 and it was solved.

@zozni zozni closed this as completed May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants