Skip to content

Commit

Permalink
[train] Pin pytorch lightning to 2.0.4 (#37400)
Browse files Browse the repository at this point in the history
2.0.5 breaks master as the deepspeed device validation fails. We should look into this separately.

Signed-off-by: Kai Fricke <kai@anyscale.com>
  • Loading branch information
krfricke committed Jul 13, 2023
1 parent 49e884d commit 8f737c4
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .buildkite/pipeline.gpu_large.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,6 @@
- PYTHON=3.8 DOC_TESTING=1 TRAIN_TESTING=1 TUNE_TESTING=1 ./ci/env/install-dependencies.sh
- pip install -Ur ./python/requirements/ml/dl-gpu-requirements.txt
- pip uninstall -y pytorch-lightning
- pip install lightning==2.0.0
- pip install lightning==2.0.4 pytorch-lightning==2.0.4 # todo move to requirements-test.txt
- ./ci/env/env_info.sh
- bazel test --config=ci $(./scripts/bazel_export_options) --test_tag_filters=ptl_v2 python/ray/train/...
2 changes: 1 addition & 1 deletion python/ray/train/lightning/_lightning_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def _session_report(self, trainer: "pl.Trainer", stage: str):
if isinstance(v, torch.Tensor):
metrics[k] = v.item()

# Ensures all workers already finish writing their checkpoints.
# Ensures all workers already finish writing their checkpoints
trainer.strategy.barrier()

# Create and report the latest checkpoint
Expand Down

0 comments on commit 8f737c4

Please sign in to comment.