Fix checks integration about pytorch lightning #4322

Alnusjaponica · 2023-01-12T02:05:10Z

Motivation

Description of the changes

Refactor deprecated features

trainer.training_type_plugin is deleted since v1.8 (PR#11239). The attribute training_type_plugin is just renamed to strategy, so refactored as suggested.
An optional argument of trainer, accelerator stopped to accept ddp_cpu. Instead, we can pass ddp to strategy and cpu to accelerator. Also, num_processes will be removed, so we give the number of processes to devices instead.
AcceleratorConnector.distributed_backend is deleted, but now they have AcceleratorConnector.is_distributed instead, so refactored as suggested.
callback.on_init_start() is deleted since v1.8 (Issue#10894, PR#10940, PR#14867).
Although they don't provide exactly equivalent alternative, it would be possible to move this confirmation to somewhere else. Although it seems strategy.setup_environment is the right place to implement this check, implementing as a method of Strategy will affect user's code. Therefore it will be reasonable to implement as callback.setup or callback.on_fit_start().

Stop supporting DDP temporary

When you use DDP and optuna.TrialPruned() is raised from the child process, Pytorch Lightning tries to resynchronize to fix the "error" and finally deal it as a DeadlockDetectedException, which terminates the whole process. For more details, see reconciliate_processes. To fix this problem, we need to change environment variable and private variable.

It might be possible to solve this problem by manually raising optuna.TrialPruned() from objective function as in CatBoostPruningCallback (see this comment). If it is possible, I am going to apply the change as another PR.

This PR skips test_pytorch_lightning_pruning_callback_ddp_monitor and test_pytorch_lightning_pruning_callback_ddp_unsupported_storage.

codecov-commenter · 2023-01-12T03:00:10Z

Codecov Report

Merging #4322 (eab1a89) into master (765143e) will increase coverage by 0.00%.
The diff coverage is 0.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##           master    #4322   +/-   ##
=======================================
  Coverage   90.43%   90.43%           
=======================================
  Files         172      172           
  Lines       13660    13660           
=======================================
+ Hits        12353    12354    +1     
+ Misses       1307     1306    -1

Impacted Files	Coverage Δ
optuna/integration/pytorch_lightning.py	`0.00% <0.00%> (ø)`
optuna/progress_bar.py	`87.69% <0.00%> (+1.53%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…tning-alternative

Alnusjaponica · 2023-01-15T05:08:41Z

Adding changes in this PR, I confirmed that all mypy test in Checks (integration) passes. See following CI on my fork.
Although Tests (MPI) failed in the fork, this test has already failed on the master branch. Therefore, the cause should lie somewhere independent of this change.

…ion-pytorch-lightning-alternative

Alnusjaponica · 2023-01-19T06:12:24Z

I made this PR review ready. I appreciate it if you could assign reviews.
@toshihikoyanase

github-actions · 2023-01-26T23:05:17Z

This pull request has not seen any recent activity.

toshihikoyanase · 2023-01-27T08:25:01Z

optuna/integration/pytorch_lightning.py

@@ -96,7 +94,7 @@ def on_validation_end(self, trainer: Trainer, pl_module: LightningModule) -> Non
        if trainer.is_global_zero:
            self._trial.report(current_score.item(), step=epoch)
            should_stop = self._trial.should_prune()
-        should_stop = trainer.training_type_plugin.broadcast(should_stop)


I think we should update the version check at L64 from 1.5.0 to 1.6.0 since pytorch-lightning==1.5.0 does not have Trainer.strategy.

pytorch-lightning version optuna-examples/pytorch/pytorch_lightning_simple.py

1.5.0 NG

1.6.0 OK

1.7.0 OK

1.8.0 OK

The error message during the pytorch-lightning==1.5.0 was as follows:

AttributeError: 'Trainer' object has no attribute 'strategy'

Thank you for your comment. I updated the version constraint.

…lternative

HideakiImamura

Basically, LGTM. Let me ask one question. on_fit_start includes the logic when the backend is DDP, but we do not test it. Do we have a plan to support the DDP (maybe, as a follow-up of this PR)?

Alnusjaponica · 2023-01-30T06:35:36Z

Thanks for your comment. I am working on supporting DDP here and will create follow-up PR based on this.

…tning-alternative

Alnusjaponica · 2023-02-02T05:42:52Z

I added follow-up PR #4384

HideakiImamura

LGTM.

toshihikoyanase

I'm sorry for the delayed response. pytorch_lightning_simple.py in the optuna example worked with the latest pytorch lightning as expected. We my update the lower bound of pythorch lightning in the example from 1.5.0 to 1.6.0, it is a follow-up task.

LGTM!

Alnusjaponica added 3 commits January 12, 2023 09:58

Activate test_pytorch_lightning

a0fd999

Fix pytorch_lightning.py

2268c64

Fix test_pytorch_lightning

a1499a6

github-actions bot added the optuna.integration Related to the `optuna.integration` submodule. This is automatically labeled by github-actions. label Jan 12, 2023

Skip DDP test

f287514

Alnusjaponica mentioned this pull request Jan 13, 2023

Fix test_pytorch_lightning.py #4305

Closed

Alnusjaponica marked this pull request as ready for review January 13, 2023 09:23

Merge branch 'optuna:master' into fix-checks-integration-pytorch-ligh…

144661f

…tning-alternative

Alnusjaponica mentioned this pull request Jan 15, 2023

Fix mypy error abut AllenNLP in Checks (integration) #4277

Merged

Merge remote-tracking branch 'origin/master' into fix-checks-integrat…

5242c0c

…ion-pytorch-lightning-alternative

toshihikoyanase self-assigned this Jan 19, 2023

github-actions bot added the stale Exempt from stale bot labeling. label Jan 26, 2023

contramundum53 added the CI Continuous integration. label Jan 27, 2023

toshihikoyanase reviewed Jan 27, 2023

View reviewed changes

Update the version constraint

e900272

github-actions bot removed the stale Exempt from stale bot labeling. label Jan 29, 2023

HideakiImamura self-assigned this Jan 30, 2023

Alnusjaponica added 2 commits January 30, 2023 13:17

Merge branch 'master' into fix-checks-integration-pytorch-lightning-a…

c52be91

…lternative

Update pytorch-lightning version constraint

4dce6e9

HideakiImamura reviewed Jan 30, 2023

View reviewed changes

Merge branch 'optuna:master' into fix-checks-integration-pytorch-ligh…

eab1a89

…tning-alternative

Alnusjaponica mentioned this pull request Feb 2, 2023

Support DDP in PyTorch-Lightning #4384

Merged

HideakiImamura approved these changes Feb 6, 2023

View reviewed changes

HideakiImamura removed their assignment Feb 6, 2023

toshihikoyanase approved these changes Feb 9, 2023

View reviewed changes

toshihikoyanase added code-fix Change that does not change the behavior, such as code refactoring. and removed CI Continuous integration. labels Feb 9, 2023

toshihikoyanase added this to the v3.2.0 milestone Feb 9, 2023

toshihikoyanase merged commit 17ff642 into optuna:master Feb 9, 2023

Alnusjaponica mentioned this pull request Feb 10, 2023

Update pytorch-lightning version optuna/optuna-examples#172

Merged

Alnusjaponica deleted the fix-checks-integration-pytorch-lightning-alternative branch February 10, 2023 07:14

This was referenced Feb 16, 2023

Remove ignore pytorch_lightning_test Alnusjaponica/optuna#7

Closed

Remove ignore test_pytorch_lightning #4432

Merged

Alnusjaponica mentioned this pull request Mar 9, 2023

Fix PytorchLightningPruningCallback for pytorch_lightning>=1.8.0 #4499

Closed

nzw0301 mentioned this pull request Apr 4, 2023

Pytorch Lightning "training_type_plugin" renamed to "strategy" and removed in Lightning 1.7.0 #4577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix checks integration about pytorch lightning #4322

Fix checks integration about pytorch lightning #4322

Alnusjaponica commented Jan 12, 2023 •

edited

codecov-commenter commented Jan 12, 2023 •

edited

Alnusjaponica commented Jan 15, 2023

Alnusjaponica commented Jan 19, 2023

github-actions bot commented Jan 26, 2023

toshihikoyanase Jan 27, 2023

Alnusjaponica Jan 29, 2023

HideakiImamura left a comment

Alnusjaponica commented Jan 30, 2023

Alnusjaponica commented Feb 2, 2023

HideakiImamura left a comment

toshihikoyanase left a comment •

edited

Fix checks integration about pytorch lightning #4322

Fix checks integration about pytorch lightning #4322

Conversation

Alnusjaponica commented Jan 12, 2023 • edited

Motivation

Description of the changes

Refactor deprecated features

Stop supporting DDP temporary

codecov-commenter commented Jan 12, 2023 • edited

Codecov Report

Alnusjaponica commented Jan 15, 2023

Alnusjaponica commented Jan 19, 2023

github-actions bot commented Jan 26, 2023

toshihikoyanase Jan 27, 2023

Choose a reason for hiding this comment

Alnusjaponica Jan 29, 2023

Choose a reason for hiding this comment

HideakiImamura left a comment

Choose a reason for hiding this comment

Alnusjaponica commented Jan 30, 2023

Alnusjaponica commented Feb 2, 2023

HideakiImamura left a comment

Choose a reason for hiding this comment

toshihikoyanase left a comment • edited

Choose a reason for hiding this comment

Alnusjaponica commented Jan 12, 2023 •

edited

codecov-commenter commented Jan 12, 2023 •

edited

toshihikoyanase left a comment •

edited