-
-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix checks integration about pytorch lightning #4322
Fix checks integration about pytorch lightning #4322
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #4322 +/- ##
=======================================
Coverage 90.43% 90.43%
=======================================
Files 172 172
Lines 13660 13660
=======================================
+ Hits 12353 12354 +1
+ Misses 1307 1306 -1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…tning-alternative
Adding changes in this PR, I confirmed that all mypy test in |
…ion-pytorch-lightning-alternative
I made this PR review ready. I appreciate it if you could assign reviews. |
This pull request has not seen any recent activity. |
@@ -96,7 +94,7 @@ def on_validation_end(self, trainer: Trainer, pl_module: LightningModule) -> Non | |||
if trainer.is_global_zero: | |||
self._trial.report(current_score.item(), step=epoch) | |||
should_stop = self._trial.should_prune() | |||
should_stop = trainer.training_type_plugin.broadcast(should_stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the version check at L64 from 1.5.0
to 1.6.0
since pytorch-lightning==1.5.0
does not have Trainer.strategy
.
pytorch-lightning version |
optuna-examples/pytorch/pytorch_lightning_simple.py |
---|---|
1.5.0 | NG |
1.6.0 | OK |
1.7.0 | OK |
1.8.0 | OK |
The error message during the pytorch-lightning==1.5.0
was as follows:
AttributeError: 'Trainer' object has no attribute 'strategy'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. I updated the version constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, LGTM. Let me ask one question. on_fit_start
includes the logic when the backend is DDP, but we do not test it. Do we have a plan to support the DDP (maybe, as a follow-up of this PR)?
Thanks for your comment. I am working on supporting DDP here and will create follow-up PR based on this. |
…tning-alternative
I added follow-up PR #4384 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the delayed response. pytorch_lightning_simple.py
in the optuna example worked with the latest pytorch lightning as expected. We my update the lower bound of pythorch lightning in the example from 1.5.0 to 1.6.0, it is a follow-up task.
LGTM!
Motivation
Resolve #3418 and #4116
Description of the changes
Refactor deprecated features
trainer.training_type_plugin
is deleted since v1.8 (PR#11239). The attributetraining_type_plugin
is just renamed tostrategy
, so refactored as suggested.An optional argument of
trainer
,accelerator
stopped to acceptddp_cpu
. Instead, we can passddp
tostrategy
andcpu
toaccelerator
. Also,num_processes
will be removed, so we give the number of processes todevices
instead.AcceleratorConnector.distributed_backend
is deleted, but now they haveAcceleratorConnector.is_distributed
instead, so refactored as suggested.callback.on_init_start()
is deleted since v1.8 (Issue#10894, PR#10940, PR#14867).Although they don't provide exactly equivalent alternative, it would be possible to move this confirmation to somewhere else. Although it seems
strategy.setup_environment
is the right place to implement this check, implementing as a method ofStrategy
will affect user's code. Therefore it will be reasonable to implement ascallback.setup
orcallback.on_fit_start()
.Stop supporting DDP temporary
When you use DDP and
optuna.TrialPruned()
is raised from the child process, Pytorch Lightning tries to resynchronize to fix the "error" and finally deal it as aDeadlockDetectedException
, which terminates the whole process. For more details, see reconciliate_processes. To fix this problem, we need to change environment variable and private variable.It might be possible to solve this problem by manually raising
optuna.TrialPruned()
from objective function as inCatBoostPruningCallback
(see this comment). If it is possible, I am going to apply the change as another PR.test_pytorch_lightning_pruning_callback_ddp_monitor
andtest_pytorch_lightning_pruning_callback_ddp_unsupported_storage
.