New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test_pytorch_lightning.py #4305
Fix test_pytorch_lightning.py #4305
Conversation
Out of curiosity, is all working for PL 1.8 for this branch w.r.t. the pruner? |
Codecov Report
@@ Coverage Diff @@
## master #4305 +/- ##
==========================================
- Coverage 90.42% 90.38% -0.04%
==========================================
Files 172 172
Lines 13660 13668 +8
==========================================
+ Hits 12352 12354 +2
- Misses 1308 1314 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks for the question. It is what we aim to do in this PR. @grburgess |
Thanks! I checked out this branch to try (which I should have done in the first place) and indeed the issue seems fixed |
Thank you for trying the patch. Please note that the problem of distributed processing still remains for now. |
ProblemWhen you use DDP and optuna.TrialPruned() is raised from the child process, Pytorch Lightning tries to resynchronize to fix the "error" and finally deal it as a DeadlockDetectedException, which terminates the whole process. For more details, see reconciliate_processes. To fix this problem, we need to change environment variable and private variable. AlternativeStop supporting DDP in order to fix daily test integration (#4116) and to support PyTorch Lightning after v1.6. |
When I tried to fix this issue before, I also thought that calling |
This pull request has not seen any recent activity. |
This pull request was closed automatically because it had not seen any recent activity. If you want to discuss it, you can reopen it freely. |
Motivation
Resolve #3418 and #4116
Description of the changes
Refactor deprecated features
trainer.training_type_plugin
is deleted since v1.8 (PR#11239). The attributetraining_type_plugin
is just renamed tostrategy
, so refactored as suggested.An optional argument of
trainer
,accelerator
stopped to acceptddp_cpu
. Instead, we can passddp
tostrategy
andcpu
toaccelerator
. Also,num_processes
will be removed, so we give the number of processes todevices
instead.AcceleratorConnector.distributed_backend
is deleted, but now they haveAcceleratorConnector.is_distributed
instead, so refactored as suggested.callback.on_init_start()
is deleted since v1.8 (Issue#10894, PR#10940, PR#14867).Although they don't provide exactly equivalent alternative, it would be possible to move this confirmation to somewhere else. Although it seems
strategy.setup_environment
is the right place to implement this check, implementing as a method ofStrategy
will affect user's code. Therefore it will be reasonable to implement ascallback.setup
orcallback.on_fit_start()
.