-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train/tune] Use posix paths throughout library code #38319
Conversation
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, have a few comments:
The main place where paths are stored now is StorageContext
. I've actually had a TODO about how this handles windows and relative paths, so would be a good time to fix it up.
StorageContext takes in the exact user storage_path
that's passed, as well as an optional storage_filesystem
. Here's what I was imagining the path handling could look like:
class StorageContext:
def __init__(self, storage_path: Tuple[str, os.PathLike], storage_filesystem: Optional[pyarrow.fs.FileSystem]):
if storage_filesystem provided:
# just convert storage_path to posix path (don't convert to an absolute path)
# this is because the path should stay relative to the custom_fs
elif local path (scheme == file or no scheme):
# expand and convert to *absolute* posix path
else: # do nothing for URIs, which will get handled by pyarrow.
assert is_uri(storage_path)
Then, once we've converted it once in the constructor, we can keep using os.path.join
using these paths elsewhere.
I wanted to avoid checking is_local_path
in this class, but it seems like it's necessary to support relative / windows paths, since pyarrow will raise an error from the input to pyarrow.fs.from_uri
.
if ( | ||
self._legacy_local_experiment_path | ||
and self._legacy_remote_experiment_path | ||
and Path(self._legacy_local_experiment_path) | ||
== Path(self._legacy_remote_experiment_path) | ||
): | ||
warnings.warn( | ||
"The local experiment path is the same as the remote " | ||
"experiment path. Set a different `storage_path` or raise an " | ||
"issue on GitHub if this issue persists. Deactivating the" | ||
"remote experiment path." | ||
) | ||
self._legacy_remote_experiment_path = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Windows, we resolved the default storage path to C:/Users/username/ray_results
. When passing the path on, it was interpreted as a remote path in downstream objects (because we used urlparse to check for local/non-local paths and that doesn't work with windows paths).
So we end up in a situation where the remote path is C:/Users/username/ray_results
, and the local path gets resolved again to the same path, C:/Users/username/ray_results
. We write to the path locally and the syncer tries to sync up to that path. This results in various different error messages.
@@ -1153,14 +1153,14 @@ def _create_logger( | |||
def _open_logfiles(self, stdout_file, stderr_file): | |||
"""Create loggers. Open stdout and stderr logfiles.""" | |||
if stdout_file: | |||
stdout_path = os.path.expanduser(os.path.join(self._logdir, stdout_file)) | |||
stdout_path = (Path(self._logdir) / stdout_file).expanduser().as_posix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can just add this Path(...).expanduser().as_posix()
logic once at the user entrypoint, then just assume posix paths everywhere else in our own code, rather than need to patch it everywhere internally.
Ex:
RunConfig(storage_path)
Tuner.restore(...), Trainer.restore(...)
RAY_AIR_LOCAL_CACHE_DIR
ExperimentAnalysis(path)
In the case of self._logdir
, it should already be expanded/as_posix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem comes up when Path
is used again on windows, which will convert a posix path into a WindowsPath
. If we only want to have as_posix()
in one place, we can't use pathlib downstream.
The crux is this line:
the
I'm also not convinced that So yes, let's use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Still need to update StorageContext
for the new codepath, but that can be in the follow-up with more general relative/windows path-handling support.
We're currently using `os.path` in many places, but sometimes mix it with default unix paths, such as `~/ray_results`. This can lead to conflicts in Windows. For instance, `os.path.expanduser("~/ray_results")` will result in `C:\\Users\\username/ray_results` on windows systems. While this works in practice (python's stdlib can handle these mixed paths), it is confusing in the output. Thus, we try to replace most occurrences with a posix-compatible path. Second, and central to this PR, our detection of local/remote paths did not work on Windows. `urllib.parse.urlparse` detects a local windows path such as `C:\\Users\\username` as scheme `C`, so Tune thinks it's a URI. This effectively lead to a situation where windows users experienced that a remote path was detected, but it was in fact a local path. Ray Tune tried to "sync up" local results to the same local folder. This lead to errors in the syncer code. We thus update the `_is_local_windows_path` utility and use it in `_split_remote_local_path` to detect local windows directories. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>
We're currently using `os.path` in many places, but sometimes mix it with default unix paths, such as `~/ray_results`. This can lead to conflicts in Windows. For instance, `os.path.expanduser("~/ray_results")` will result in `C:\\Users\\username/ray_results` on windows systems. While this works in practice (python's stdlib can handle these mixed paths), it is confusing in the output. Thus, we try to replace most occurrences with a posix-compatible path. Second, and central to this PR, our detection of local/remote paths did not work on Windows. `urllib.parse.urlparse` detects a local windows path such as `C:\\Users\\username` as scheme `C`, so Tune thinks it's a URI. This effectively lead to a situation where windows users experienced that a remote path was detected, but it was in fact a local path. Ray Tune tried to "sync up" local results to the same local folder. This lead to errors in the syncer code. We thus update the `_is_local_windows_path` utility and use it in `_split_remote_local_path` to detect local windows directories. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: harborn <gangsheng.wu@intel.com>
We're currently using `os.path` in many places, but sometimes mix it with default unix paths, such as `~/ray_results`. This can lead to conflicts in Windows. For instance, `os.path.expanduser("~/ray_results")` will result in `C:\\Users\\username/ray_results` on windows systems. While this works in practice (python's stdlib can handle these mixed paths), it is confusing in the output. Thus, we try to replace most occurrences with a posix-compatible path. Second, and central to this PR, our detection of local/remote paths did not work on Windows. `urllib.parse.urlparse` detects a local windows path such as `C:\\Users\\username` as scheme `C`, so Tune thinks it's a URI. This effectively lead to a situation where windows users experienced that a remote path was detected, but it was in fact a local path. Ray Tune tried to "sync up" local results to the same local folder. This lead to errors in the syncer code. We thus update the `_is_local_windows_path` utility and use it in `_split_remote_local_path` to detect local windows directories. Signed-off-by: Kai Fricke <kai@anyscale.com>
We're currently using `os.path` in many places, but sometimes mix it with default unix paths, such as `~/ray_results`. This can lead to conflicts in Windows. For instance, `os.path.expanduser("~/ray_results")` will result in `C:\\Users\\username/ray_results` on windows systems. While this works in practice (python's stdlib can handle these mixed paths), it is confusing in the output. Thus, we try to replace most occurrences with a posix-compatible path. Second, and central to this PR, our detection of local/remote paths did not work on Windows. `urllib.parse.urlparse` detects a local windows path such as `C:\\Users\\username` as scheme `C`, so Tune thinks it's a URI. This effectively lead to a situation where windows users experienced that a remote path was detected, but it was in fact a local path. Ray Tune tried to "sync up" local results to the same local folder. This lead to errors in the syncer code. We thus update the `_is_local_windows_path` utility and use it in `_split_remote_local_path` to detect local windows directories. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
We're currently using `os.path` in many places, but sometimes mix it with default unix paths, such as `~/ray_results`. This can lead to conflicts in Windows. For instance, `os.path.expanduser("~/ray_results")` will result in `C:\\Users\\username/ray_results` on windows systems. While this works in practice (python's stdlib can handle these mixed paths), it is confusing in the output. Thus, we try to replace most occurrences with a posix-compatible path. Second, and central to this PR, our detection of local/remote paths did not work on Windows. `urllib.parse.urlparse` detects a local windows path such as `C:\\Users\\username` as scheme `C`, so Tune thinks it's a URI. This effectively lead to a situation where windows users experienced that a remote path was detected, but it was in fact a local path. Ray Tune tried to "sync up" local results to the same local folder. This lead to errors in the syncer code. We thus update the `_is_local_windows_path` utility and use it in `_split_remote_local_path` to detect local windows directories. Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
We're currently using
os.path
in many places, but sometimes mix it with default unix paths, such as~/ray_results
. This can lead to conflicts in Windows.For instance,
os.path.expanduser("~/ray_results")
will result inC:\\Users\\username/ray_results
on windows systems.While this works in practice (python's stdlib can handle these mixed paths), it is confusing in the output. Thus, we try to replace most occurrences with a posix-compatible path.
Second, and central to this PR, our detection of local/remote paths did not work on Windows.
urllib.parse.urlparse
detects a local windows path such asC:\\Users\\username
as schemeC
, so Tune thinks it's a URI.This effectively lead to a situation where windows users experienced that a remote path was detected, but it was in fact a local path. Ray Tune tried to "sync up" local results to the same local folder. This lead to errors in the syncer code.
We thus update the
_is_local_windows_path
utility and use it in_split_remote_local_path
to detect local windows directories.Related issue number
Closes #36448
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.