Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on colab pytorch_lightning v1.2 throws valueerror when following setup step !python training/run_experiment.py --max_epochs=3 #9

Closed
ravindrabharathi opened this issue Feb 19, 2021 · 6 comments

Comments

@ravindrabharathi
Copy link

ravindrabharathi commented Feb 19, 2021

while following the setup steps for colab (https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2021-labs/blob/main/setup/readme.md) , install pytorch_lightning step gets the latest v1.2 .
This version results in the following Error when trying !python training/run_experiment.py --max_epochs=3

If pytorch_lightning 1.1.8 is used (!pip install pytorch_lightning==1.1.8) , the test step works without issues as shown in the image in readme

I haven't explored further to check what might be causing the issue between the two versions (or if it is already a known issue )

Links to colab notebooks with pytorch-lightning v1.2 and v1.1.8
v1.2 : https://colab.research.google.com/drive/1DvfGtym_oZRg2q5R78gWm6997LEZj4Ma?usp=sharing
v1.1.8 : https://colab.research.google.com/drive/1DBjpKEMTJ9w6U3rNltLcHsw976AvNX9j?usp=sharing

  File "training/run_experiment.py", line 90, in <module>
    main()
  File "training/run_experiment.py", line 85, in main
    trainer.fit(lit_model, datamodule=data)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 513, in fit
    self.dispatch()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 553, in dispatch
    self.accelerator.start_training(self)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 111, in start_training
    self._results = trainer.run_train()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 614, in run_train
    self.run_sanity_check(self.lightning_module)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 863, in run_sanity_check
    _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 732, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 164, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 178, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 128, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/content/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/lit_models/base.py", line 61, in validation_step
    self.val_acc(logits, y)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/metric.py", line 152, in forward
    self.update(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/metric.py", line 199, in wrapped_func
    return update(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 139, in update
    preds, target, threshold=self.threshold, top_k=self.top_k, subset_accuracy=self.subset_accuracy
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/functional/accuracy.py", line 25, in _accuracy_update
    preds, target, mode = _input_format_classification(preds, target, threshold=threshold, top_k=top_k)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/helpers.py", line 439, in _input_format_classification
    top_k=top_k,
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/helpers.py", line 296, in _check_classification_inputs
    _basic_input_validation(preds, target, threshold, is_multiclass)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/metrics/classification/helpers.py", line 74, in _basic_input_validation
    raise ValueError("The `preds` should be probabilities, but values were detected outside of [0,1] range.")
ValueError: The `preds` should be probabilities, but values were detected outside of [0,1] range.```

@AlexHandy1
Copy link

+1

@wayfarerjing
Copy link

Same here. Looks like there's a compatibility issue with PL 1.2 >= 1.2:
Lightning-Universe/lightning-bolts#551

@Daniel8hen
Copy link

+1

1 similar comment
@numanai
Copy link

numanai commented Feb 20, 2021

+1

@Tianqiao-Yvonne
Copy link

+1

@sergeyk
Copy link
Contributor

sergeyk commented Feb 22, 2021

Thanks for the reports and the fix! Pushed to main branch, closing.

@sergeyk sergeyk closed this as completed Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants