-
Notifications
You must be signed in to change notification settings - Fork 39
Description
The dvclive.lightning.DVCLiveLogger initializes the Live object in it's __init__ function since this change. This causes a "cannot pickle io.BufferedReader" error when trying to use this logger with the new LightningTrainer of Ray Train.
It is working with a custom Live logger for PyTorch Lightning I had implemented few months ago, still based on dvclive 1.3 at the time.
Find below a stack trace of the error when adding logger=DVCLiveLogger() to LightningConfigBuilder.trainer(). This is because Ray needs to transfer the config to the distributed compute node that is executing the Ray Actor which is running the PyTorch Lightning trainer. An alternative might be for Ray's LightningTrainer to allow for more customization of the worker training loop, maybe including support for callables to set up loggers.
However, in this instance, it seems it would be easier for DVCLive not to initialize the Live object this early.
# 2023-06-07 16:27:37,549 ERROR ray_train.py:164 -- Traceback (most recent call last):
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/worker.py", line 676, in put_object
# serialized_value = self.get_serialization_context().serialize(value)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/serialization.py", line 466, in serialize
# return self._serialize_to_msgpack(value)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/serialization.py", line 444, in _serialize_to_msgpack
# pickle5_serialized_object = self._serialize_to_pickle5(
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/serialization.py", line 406, in _serialize_to_pickle5
# raise e
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/serialization.py", line 401, in _serialize_to_pickle5
# inband = pickle.dumps(
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 88, in dumps
# cp.dump(obj)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 733, in dump
# return Pickler.dump(self, obj)
# TypeError: cannot pickle '_io.BufferedReader' object
#
# The above exception was the direct cause of the following exception:
#
# Traceback (most recent call last):
# File "/data/aschuh/hf-research-ray-trainer/projects/multiseries/trials/pairwise_selected_target_series/tools/ray_train.py", line 153, in func
# train(
# File "/data/aschuh/hf-research-ray-trainer/projects/multiseries/trials/pairwise_selected_target_series/tools/ray_train.py", line 225, in train
# result = trainer.fit()
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/train/base_trainer.py", line 570, in fit
# trainable = self.as_trainable()
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/train/base_trainer.py", line 800, in as_trainable
# return tune.with_parameters(trainable_cls, **base_config)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 382, in with_parameters
# parameter_registry.put(prefix + k, v)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/tune/registry.py", line 296, in put
# self.flush()
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/tune/registry.py", line 308, in flush
# self.references[k] = ray.put(v)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
# return func(*args, **kwargs)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/worker.py", line 2593, in put
# object_ref = worker.put_object(value, owner_address=serialize_owner_address)
# File "/opt/conda/envs/mult-ray-train/lib/python3.10/site-packages/ray/_private/worker.py", line 685, in put_object
# raise TypeError(msg) from e
# [...]