Errors when performing tests #13

zozni · 2022-03-17T08:41:45Z

Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [05:24<00:00, 2.27it/s]Evaluation

--- Entity Mentions ---

Traceback (most recent call last):
File "./jerex_test.py", line 20, in test
model.test(cfg)
File "/home/jhj/jerex/jerex/model.py", line 389, in test
trainer.test(model, datamodule=data_module)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 910, in test
results = self.__test_given_model(model, test_dataloaders)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 970, in __test_given_model
results = self.fit(model)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 540, in dispatch
self.accelerator.start_testing(self)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 76, in start_testing
self.training_type_plugin.start_testing(trainer)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 118, in start_testing
self._results = trainer.run_test()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 786, in run_test
eval_loop_results, _ = self.run_evaluation()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in run_evaluation
deprecated_eval_results = self.evaluation_loop.evaluation_epoch_end()
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 189, in evaluation_epoch_end
deprecated_results = self.__run_eval_epoch_end(self.num_dataloaders)
File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 221, in __run_eval_epoch_end
eval_results = model.test_epoch_end(eval_results)
File "/home/jhj/jerex/jerex/model.py", line 155, in test_epoch_end
metrics = self._evaluator.compute_metrics(self._eval_test_gt, predictions)
File "/home/jhj/jerex/jerex/evaluation/joint_evaluator.py", line 76, in compute_metrics
mention_eval = scoring.score(gt_mentions, pred_mentions, print_results=True)
File "/home/jhj/jerex/jerex/evaluation/scoring.py", line 55, in score
metrics = _compute_metrics(gt_flat, pred_flat, labels, labels_str, print_results)
File "/home/jhj/jerex/jerex/evaluation/scoring.py", line 64, in _compute_metrics
per_type = prfs(gt_all, pred_all, labels=labels, average=None, zero_division=0)
TypeError: precision_recall_fscore_support() got an unexpected keyword argument 'zero_division'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [05:24<00:00, 2.16it/s]

Hi.
When testing, an error like that occurs and the result value is not saved.
Any ideas?

thanks

markus-eberts · 2022-03-17T13:21:44Z

Hi,
which scikit-learn version are you using?

zozni · 2022-05-18T08:18:04Z

The scikit-learn version was 0.21.3. After reinstalling the environment, the above issue was resolved. thanks for support.

However, another problem arises: what's the reason?

home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/sklearn/utils/validation.py:179: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.0 [00:45<03:03, 1.16it/s]
if LooseVersion(joblib_version) < '0.12':
Epoch 0: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 3098/3308 [04:58<00:20, 10.38it/s, loss=0.445, v_num=0_0]Traceback (most recent call last):██████████████████████████████▏ | 89/300 [00:45<02:46, 1.27it/s]
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 637, in run_train
self.train_loop.run_training_epoch()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 577, in run_training_epoch
self.trainer.run_evaluation(on_epoch=True)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 725, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 166, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step
return self.training_type_plugin.validation_step(*args)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 131, in validation_step
return self.lightning_module.validation_step(*args, **kwargs)
File "/home/jhj/JEREX/jerex/model.py", line 126, in validation_step
return self._inference(batch, batch_idx)
File "/home/jhj/JEREX/jerex/model.py", line 176, in _inference
output = self(**batch, inference=True)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jhj/JEREX/jerex/model.py", line 106, in forward
max_rel_pairs=max_rel_pairs, inference=inference)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jhj/JEREX/jerex/models/joint_models.py", line 144, in forward
return self._forward_inference(*args, **kwargs)
File "/home/jhj/JEREX/jerex/models/joint_models.py", line 209, in _forward_inference
mention_sample_masks, max_spans=max_spans, max_coref_pairs=max_coref_pairs)
File "/home/jhj/JEREX/jerex/models/joint_models.py", line 81, in _forward_inference_common
mention_reprs = self.mention_representation(h, mention_masks, max_spans=max_spans)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jhj/JEREX/jerex/models/modules/mention_representation.py", line 20, in forward
chunk_mention_reprs = self._forward(chunk_mention_masks, chunk_h)
File "/home/jhj/JEREX/jerex/models/modules/mention_representation.py", line 28, in _forward
mention_reprs = m + h
RuntimeError: CUDA out of memory. Tried to allocate 6.16 GiB (GPU 0; 7.77 GiB total capacity; 1.73 GiB already allocated; 4.80 GiB free; 1.92 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./jerex_train.py", line 24, in
train()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/main.py", line 37, in decorated_main
strict=strict,
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 347, in _run_hydra
lambda: hydra.run(
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
raise ex
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 350, in
overrides=args.overrides,
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 112, in run
configure_logging=with_log_configuration,
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/core/utils.py", line 127, in run_job
ret.return_value = task_function(task_cfg)
File "./jerex_train.py", line 20, in train
model.train(cfg)
File "/home/jhj/JEREX/jerex/model.py", line 341, in train
trainer.fit(model, datamodule=data_module)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
self.accelerator.start_training(self)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
self._results = trainer.run_train()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 669, in run_train
self.train_loop.on_train_end()
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 134, in on_train_end
self.check_checkpoint_callback(should_update=True, is_last=True)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 164, in check_checkpoint_callback
cb.on_validation_end(self.trainer, model)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in on_validation_end
self.save_checkpoint(trainer, pl_module)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 247, in save_checkpoint
self._validate_monitor_key(trainer)
File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 495, in _validate_monitor_key
raise MisconfigurationException(m)
pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='valid_f1') not found in the returned metrics: ['train_mention_loss', 'train_coref_loss', 'train_entity_loss', 'train_rel_loss', 'train_loss']. HINT: Did you call self.log('valid_f1', value) in the LightningModule?

zozni · 2022-05-18T09:19:35Z

This was also a scikit-learn version issue....
I upgraded the version to 0.23.2 and it was solved.

zozni closed this as completed May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when performing tests #13

Errors when performing tests #13

zozni commented Mar 17, 2022

markus-eberts commented Mar 17, 2022

zozni commented May 18, 2022

zozni commented May 18, 2022

Errors when performing tests #13

Errors when performing tests #13

Comments

zozni commented Mar 17, 2022

markus-eberts commented Mar 17, 2022

zozni commented May 18, 2022

zozni commented May 18, 2022