You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR;
Root of the problem, use pytorch_lightning instead of lighting.pytorch
When using the new unified TorchTrainer API with the lightning module instead of pytorch_lightning the following error is produced:
...
File ".../lib/python3.11/site-packages/ray/train/_internal/worker_group.py", line 33, in __execute
raise skipped from exception_cause(skipped)
File ".../lib/python3.11/site-packages/ray/train/_internal/utils.py", line 129, in discard_return_wrapper
train_func(*args, **kwargs)
File ".../bug.py", line 44, in train_func
trainer = pl.Trainer(
^^^^^^^^^^^
File ".../lib/python3.11/site-packages/lightning/pytorch/utilities/argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 399, in __init__
self._accelerator_connector = _AcceleratorConnector(
^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 140, in __init__
self._check_config_and_set_final_flags(
File ".../lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 210, in _check_config_and_set_final_flags
raise ValueError(
ValueError: You selected an invalid strategy name: `strategy=<ray.train.lightning._lightning_utils.RayDDPStrategy object at 0x7f7327186290>`. It must be either a string or an instance of `lightning.pytorch.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../bug.py", line 60, in <module>
result = trainer.fit()
...
The error complains that RayDDPStrategy() is not an instance of Strategy.
A simple test shows that this is probably a name space issue, caused by importing the pl.Trainer from lighting.pytorch instead of the pytorch_lightning module.
In [1]: import pytorch_lightning
In [2]: import lightning
In [3]: from ray.train.lightning import RayDDPStrategy
In [4]: isinstance(RayDDPStrategy(), pytorch_lightning.strategies.Strategy)
Out[4]: True
In [5]: isinstance(RayDDPStrategy(), lightning.pytorch.strategies.Strategy)
Out[5]: False
I feel this is a common trap for plenty of people to walk into, as lightning docs makes use of lightning.pytorch and not pytorch_lightning. Perhaps it should be made explicit in the ray docs that pytorch_lightning should be used throughout one's own code (including when one imports other lightning classes, such as DataModule or the same error will just happen elsewhere). Or alternatively, ray should use lightning.pytorch but that means pulling in the whole lightning ecosystem as a dependency, which doesn't seem logical. I am uncertain if there is a python hacky-way of getting this to work with both pytorch_lightning and lightning.pytroch.
Versions / Dependencies
ray 2.7.0rc0
pytorch-lightning 2.0.8
lightning 2.0.8
torch 2.0.1
cuda 11.7
python 3.11.5
The text was updated successfully, but these errors were encountered:
CMGeldenhuys
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Sep 16, 2023
matthewdeng
added
P1
Issue that should be fixed within a few weeks
train
Ray Train Related Issue
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Sep 20, 2023
What happened + What you expected to happen
TLDR;
Root of the problem, use
pytorch_lightning
instead oflighting.pytorch
When using the new unified
TorchTrainer
API with thelightning
module instead ofpytorch_lightning
the following error is produced:The error complains that
RayDDPStrategy()
is not an instance ofStrategy
.A simple test shows that this is probably a name space issue, caused by importing the
pl.Trainer
fromlighting.pytorch
instead of thepytorch_lightning
module.I feel this is a common trap for plenty of people to walk into, as lightning docs makes use of
lightning.pytorch
and notpytorch_lightning
. Perhaps it should be made explicit in the ray docs thatpytorch_lightning
should be used throughout one's own code (including when one imports other lightning classes, such asDataModule
or the same error will just happen elsewhere). Or alternatively, ray should uselightning.pytorch
but that means pulling in the wholelightning
ecosystem as a dependency, which doesn't seem logical. I am uncertain if there is a python hacky-way of getting this to work with bothpytorch_lightning
andlightning.pytroch
.Versions / Dependencies
ray 2.7.0rc0
pytorch-lightning 2.0.8
lightning 2.0.8
torch 2.0.1
cuda 11.7
python 3.11.5
Reproduction script
Issue Severity
Low: It annoys or frustrates me.
The text was updated successfully, but these errors were encountered: