-
Notifications
You must be signed in to change notification settings - Fork 409
Closed
Description
I get the following error while training with PyTorch Lightning during an Optuna optimization:
...
File "/path/to/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 524, in __max_ckpt_version_in_folder
files = [os.path.basename(f["name"]) for f in fs.listdir(uri)]
^^^^^^^^^^^^^^^
File "/path/to/.venv/lib/python3.12/site-packages/fsspec/spec.py", line 1593, in listdir
return self.ls(path, detail=detail, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/.venv/lib/python3.12/site-packages/fsspec/implementations/local.py", line 63, in ls
infos = [self.info(f) for f in it]
^^^^^^^^^^^^
File "/path/to/.venv/lib/python3.12/site-packages/fsspec/implementations/local.py", line 106, in info
result["destination"] = os.readlink(path)
^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/some/path/optuna.log.lock'
I think this is related to a race condition - the file might exists during the initial listing of files, but might disappear before os.readlink()
is called. This seems like an incorrect implementation of ls()
. The optuna.log.lock
file is a lock file for the optuna.log
journal, which is a central storage for hyperparameter optimization trials shared by many processes.
fsspec version: 2024.6.1
Metadata
Metadata
Assignees
Labels
No labels