Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] Add clearer actionable error message for AWS S3 credential error #26619

Merged
merged 2 commits into from
Jul 18, 2022

Conversation

c21
Copy link
Contributor

@c21 c21 commented Jul 15, 2022

Why are these changes needed?

In #19799, and #24184, we found when using Datasets to read S3 file, if file's credential is not set up right, the read_xxx API would throw confusing error message with AWS Error [code 15]: No response body like below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 758, in read_binary_files
    return read_datasource(
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 267, in read_datasource
    requested_parallelism, min_safe_parallelism, read_tasks = ray.get(
  File "/Users/chengsu/ray/python/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/chengsu/ray/python/ray/_private/worker.py", line 2196, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PermissionError): ray::_get_read_tasks() (pid=80200, ip=127.0.0.1)
  File "pyarrow/_fs.pyx", line 439, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
OSError: When getting information for key 'trainaasdasd' in bucket 'balajis-tiny-imagenet': AWS Error [code 15]: No response body.

The error message mentions nothing related to file credential, so it's quite confusing. This PR is to catch the error and give a better error message:

ray::_get_read_tasks() (pid=80200, ip=127.0.0.1)
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 1127, in _get_read_tasks
    reader = ds.create_reader(**kwargs)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_based_datasource.py", line 212, in create_reader
    return _FileBasedDatasourceReader(self, **kwargs)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_based_datasource.py", line 350, in __init__
    self._paths, self._file_sizes = meta_provider.expand_paths(
  File "/Users/chengsu/ray/python/ray/data/datasource/file_meta_provider.py", line 173, in expand_paths
    _handle_read_s3_files_error(e, path)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_meta_provider.py", line 342, in _handle_read_s3_files_error
    raise PermissionError(
PermissionError: Failing to read AWS S3 file(s): "balajis-tiny-imagenet/trainaasdasd". Please check file exists and has proper AWS credential. See https://docs.ray.io/en/latest/data/creating-datasets.html#reading-from-remote-storage for more information.

Related issue number

Closes #19799 and #24184

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@c21
Copy link
Contributor Author

c21 commented Jul 15, 2022

cc @jianoaix and @clarkzinzow for review, and also cc @pcmoritz and @matthewdeng FYI.

# readability. List of file paths will be shown up as ['foo', 'boo'],
# so only quote single file path here.
paths = f'"{paths}"'
raise PermissionError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add from None intentionally to still keep the original error message, that still might be useful for debugging in the future, if other unknown errors happen other than credential.

Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

python/ray/data/datasource/file_meta_provider.py Outdated Show resolved Hide resolved
python/ray/data/datasource/file_meta_provider.py Outdated Show resolved Hide resolved
python/ray/data/tests/test_dataset_formats.py Outdated Show resolved Hide resolved
@@ -2903,6 +2903,23 @@ def get_node_id():
assert sorted(ds.take()) == ["goodbye", "hello", "world"]


def test_read_s3_file_error(ray_start_regular_shared, s3_path):
dummy_path = s3_path + "_dummy"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not an error like "not found" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file not found and missing credential have same error. File not found is easier to unit test so doing test for it here.

Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pcmoritz
Copy link
Contributor

Thanks for putting this together!

We should probably keep the error be an OSError, since it can be either FileNotFound or PermissionError. Once we can do more fine grained differentiation between those in the future we can switch.

Let's also make sure the original error is preserved / the error code is not lost.

I realize that this situation is a little tough to deal with, but we should do our best. I also submitted an upstream issue https://issues.apache.org/jira/browse/ARROW-17079, so hopefully we can improve this going forward :)

In my experience, the [code 100] error happens if there is some misspecification in the parameters of the call, but there might be other reasons too :)

Comment on lines +172 to +173
except OSError as e:
_handle_read_s3_files_error(e, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: At first glance this is a bit confusing - would it make sense to rename this function to something more generic, and just add a comment within the function that only s3 handling is currently supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewdeng - sorry not fully get the point, do you have a suggested function name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _handle_read_os_error? The scenario I was thinking about was if we run into the same error when reading from GCS, what would make extending this logic the simplest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewdeng - got it, that makes sense, updated in #26669 .

@ericl ericl merged commit 0bb819f into ray-project:master Jul 18, 2022
@ericl
Copy link
Contributor

ericl commented Jul 18, 2022

Let's refine this in a separate PR; merging to unblock.

@c21
Copy link
Contributor Author

c21 commented Jul 18, 2022

We should probably keep the error be an OSError, since it can be either FileNotFound or PermissionError. Once we can do more fine grained differentiation between those in the future we can switch.

@pcmoritz - make sense, updated to OSError in #26669 .

Let's also make sure the original error is preserved / the error code is not lost.

@pcmoritz - yes the origianl error and error code (also original stack trace) is not lost, and still printed out as part of error, per #26619 (comment) .

@c21 c21 deleted the s3 branch July 18, 2022 19:07
scv119 pushed a commit that referenced this pull request Jul 19, 2022
…error (#26669)

As a followup of #26619 (comment) and #26619 (comment), here we change from PermissionError to OSError, to be consistent as original error, and also change function name from _handle_read_s3_files_error to _handle_read_os_error, which is more general that we can handle other file systems such as GCS in the future.

Also change to hanlde any error message with pattern AWS Error [code xxx]: No response body as new issue with error code 100 is raised in #26672 .
xwjiang2010 pushed a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022
… error (ray-project#26619)

In ray-project#19799, and ray-project#24184, we found when using Datasets to read S3 file, if file's credential is not set up right, the `read_xxx` API would throw confusing error message with `AWS Error [code 15]: No response body` like below:

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 758, in read_binary_files
    return read_datasource(
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 267, in read_datasource
    requested_parallelism, min_safe_parallelism, read_tasks = ray.get(
  File "/Users/chengsu/ray/python/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/chengsu/ray/python/ray/_private/worker.py", line 2196, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PermissionError): ray::_get_read_tasks() (pid=80200, ip=127.0.0.1)
  File "pyarrow/_fs.pyx", line 439, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
OSError: When getting information for key 'trainaasdasd' in bucket 'balajis-tiny-imagenet': AWS Error [code 15]: No response body.
```

The error message mentions nothing related to file credential, so it's quite confusing. This PR is to catch the error and give a better error message:

```
ray::_get_read_tasks() (pid=80200, ip=127.0.0.1)
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 1127, in _get_read_tasks
    reader = ds.create_reader(**kwargs)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_based_datasource.py", line 212, in create_reader
    return _FileBasedDatasourceReader(self, **kwargs)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_based_datasource.py", line 350, in __init__
    self._paths, self._file_sizes = meta_provider.expand_paths(
  File "/Users/chengsu/ray/python/ray/data/datasource/file_meta_provider.py", line 173, in expand_paths
    _handle_read_s3_files_error(e, path)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_meta_provider.py", line 342, in _handle_read_s3_files_error
    raise PermissionError(
PermissionError: Failing to read AWS S3 file(s): "balajis-tiny-imagenet/trainaasdasd". Please check file exists and has proper AWS credential. See https://docs.ray.io/en/latest/data/creating-datasets.html#reading-from-remote-storage for more information.
```

Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
xwjiang2010 pushed a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022
…error (ray-project#26669)

As a followup of ray-project#26619 (comment) and ray-project#26619 (comment), here we change from PermissionError to OSError, to be consistent as original error, and also change function name from _handle_read_s3_files_error to _handle_read_os_error, which is more general that we can handle other file systems such as GCS in the future.

Also change to hanlde any error message with pattern AWS Error [code xxx]: No response body as new issue with error code 100 is raised in ray-project#26672 .

Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
… error (ray-project#26619)

In ray-project#19799, and ray-project#24184, we found when using Datasets to read S3 file, if file's credential is not set up right, the `read_xxx` API would throw confusing error message with `AWS Error [code 15]: No response body` like below:

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 758, in read_binary_files
    return read_datasource(
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 267, in read_datasource
    requested_parallelism, min_safe_parallelism, read_tasks = ray.get(
  File "/Users/chengsu/ray/python/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/chengsu/ray/python/ray/_private/worker.py", line 2196, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(PermissionError): ray::_get_read_tasks() (pid=80200, ip=127.0.0.1)
  File "pyarrow/_fs.pyx", line 439, in pyarrow._fs.FileSystem.get_file_info
  File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
OSError: When getting information for key 'trainaasdasd' in bucket 'balajis-tiny-imagenet': AWS Error [code 15]: No response body.
```

The error message mentions nothing related to file credential, so it's quite confusing. This PR is to catch the error and give a better error message:

```
ray::_get_read_tasks() (pid=80200, ip=127.0.0.1)
  File "/Users/chengsu/ray/python/ray/data/read_api.py", line 1127, in _get_read_tasks
    reader = ds.create_reader(**kwargs)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_based_datasource.py", line 212, in create_reader
    return _FileBasedDatasourceReader(self, **kwargs)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_based_datasource.py", line 350, in __init__
    self._paths, self._file_sizes = meta_provider.expand_paths(
  File "/Users/chengsu/ray/python/ray/data/datasource/file_meta_provider.py", line 173, in expand_paths
    _handle_read_s3_files_error(e, path)
  File "/Users/chengsu/ray/python/ray/data/datasource/file_meta_provider.py", line 342, in _handle_read_s3_files_error
    raise PermissionError(
PermissionError: Failing to read AWS S3 file(s): "balajis-tiny-imagenet/trainaasdasd". Please check file exists and has proper AWS credential. See https://docs.ray.io/en/latest/data/creating-datasets.html#reading-from-remote-storage for more information.
```

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…error (ray-project#26669)

As a followup of ray-project#26619 (comment) and ray-project#26619 (comment), here we change from PermissionError to OSError, to be consistent as original error, and also change function name from _handle_read_s3_files_error to _handle_read_os_error, which is more general that we can handle other file systems such as GCS in the future.

Also change to hanlde any error message with pattern AWS Error [code xxx]: No response body as new issue with error code 100 is raised in ray-project#26672 .

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Datasets] [Bug] Access error when reading public data from S3 if no local AWS credentials are configured
6 participants