Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For exists(), more precise error handling. #757

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kasuteru
Copy link

@kasuteru kasuteru commented Jul 20, 2023

This PR improves the error handling logic for s3fs.S3FileSystem.exists() and fixes #750

First change is that there is no longer a generic except Exception block, meaning that e.g. ConnectionError now correctly raises an error instead of returning False when querying a bucket.

Second change is that a warning is now logged when attempting to query the existence of a bucket that the user does not have access to - this is important since the bucket might actually exist. This is in line with Amazon's proposed behavior, see here.

For more information on this, see related issue #750.

@kasuteru
Copy link
Author

@martindurant this implements solution 2) mentioned in #750. I am happy to change it to another solution, as discussed.

@martindurant
Copy link
Member

I think 2) is the right choice.

@martindurant
Copy link
Member

Can we make a test which surfaces the ConnectionError? Perhaps just set the endpoint_url to something silly.

@kasuteru
Copy link
Author

I will try to think of something, and also attempt to add tests for the other changes.

@kasuteru
Copy link
Author

I created test coverage for two of the three new cases:

  • A bucket that is not existent, but we have access to check (raises FileNotFoundError, which we catch)
  • A ConnectionError as requested, symbolic for all other types of errors where we are actually unable to even check whether a bucket exists. This is what this PR is trying to fix (since currently, False is returned instead).

However, I cannot figure out how to create good test coverage for the third path, the PermissionError that is returned in case listing of buckets is prohibited by the endpoint. I am unsure on how to create a s3 file system locally that disallows listing of available buckets (which would be required for this test). The best I found is to trigger the permissionerror by providing the wrong credentials:

def test_exists_bucket_nonexistent_or_no_access(caplog):
    # Ensure that a warning is raised and False is returned if querying a
    # bucket that might not exist or might be private.
    fs = s3fs.S3FileSystem(key="asdfasfsdf", secret="sadfdsfds")
    assert not fs.exists("s3://this-bucket-might-not-exist/")
    assert caplog.records[0].levelname == "WARNING"
    assert "doesn't exist or you don't have access" in caplog.text

If this is good enough, it is already in the PR. But I am happy to improve this one.

@martindurant
Copy link
Member

Yes, that test is fine - it is, of course, failing on permissions with real AWS in this case (moto does not enforce credentials, which is why you couldn't test with that)

s3fs/tests/test_s3fs.py Outdated Show resolved Hide resolved

def test_exists_bucket_nonexistent(s3, caplog):
# Ensure that NO warning is raised and False is returned if checking bucket
# existance when we have full access.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the bucket might still exist, but it's not one we can see. We happen to know for the sake of the test that it doesn't exist.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this is true for all S3 systems, but at least for our internal systems, I get a "PermissionError" when trying to access a bucket I cannot see (because I don't have permission to see it) but a FileNotFoundError like in this test when I would have permission to see, but it simply doesn't exist. This is why in the PR change, the warning would only be raised in case of PermissionError but not in case of FileNotFoundError. Should I change it to return the warning in both cases?

s3fs/tests/test_s3fs.py Outdated Show resolved Hide resolved
@kasuteru
Copy link
Author

kasuteru commented Jul 24, 2023

I finally managed to set up pre-commit, I ran in SSL errors before. So hopefully the PR should not fail because of that any more. I also added changes as requested. My tests pass locally but I don't have the setup to test all Python versions currently.

kasuteru and others added 2 commits July 25, 2023 14:54
Co-authored-by: Martin Durant <martindurant@users.noreply.github.com>
@kasuteru
Copy link
Author

Merged PR but removed duplicate endpoint_url (this was raising a different error). In addition, the other test now clears warnings cache as well - this was the reason for the test failure.

@kasuteru
Copy link
Author

Forgot to run pre-commit, this is now fixed... But I am honestly not sure where the warnings are coming from, I even installed Python 3.8 locally and ran the tests there, they run fine...

The problem is the Unclosed client session warning, but I am unable to reproduce how s3fs.exists() triggers that...

image

@martindurant
Copy link
Member

The following may remove the warnings

--- a/s3fs/core.py
+++ b/s3fs/core.py
@@ -541,6 +541,12 @@ class S3FileSystem(AsyncFileSystem):
     @staticmethod
     def close_session(loop, s3):
         if loop is not None and loop.is_running():
+            try:
+                loop = asyncio.get_event_loop()
+                loop.create_task(s3.__aexit__(None, None, None))
+                return
+            except RuntimeError:
+                pass
             try:
                 sync(loop, s3.__aexit__, None, None, None, timeout=0.1)

@martindurant
Copy link
Member

Your failures look like genuine problems.

For the client close warnings (which are not errors) #760 will fix this.

@kasuteru
Copy link
Author

If we don't manage to fix the warnings issue, I could also completely remove the warning from the PR. While I think it would be benefitial to warn users, it would already be an improvement if we manage to get rid of the generic except Exception clause. Let's see whether #760 changes things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3FileSystem.exists returns False instead of error if credentials are wrong.
2 participants