Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bad interaction with zarr and missing chunks #255

Closed
alimanfoo opened this issue Mar 24, 2020 · 2 comments · Fixed by #259
Closed

Potential bad interaction with zarr and missing chunks #255

alimanfoo opened this issue Mar 24, 2020 · 2 comments · Fixed by #259

Comments

@alimanfoo
Copy link

We've encountered a situation using zarr on GCS via gcsfs where an attempt to retrieve a chunk object fails for some transient reason, but the fsspec gcsfs mapping raises a KeyError, which zarr interprets as meaning the object doesn't exist. Zarr's default behaviour in that case is to fill the chunk entirely with the fill value set for the array.

What this means is that any transient errors in GCS can result in chunks of an array being returned as filled with the fill value, rather than the actual data, without any error being raised. This is obviously a serious issue.

Currently the fsspec FSMap class catches any exception within __getitem__ and reraises as a KeyError:

https://github.com/intake/filesystem_spec/blob/cc7cbc0cc58c96c427708c1e9a98b0d6c99bd1f0/fsspec/mapping.py#L77

I think this should be modified in some way, such that FSMap is able to distinguish between a genuine "this file/object does not exist" error and any other kind of error. In the case where a "this file/object does not exist" error is encountered, it should be safe to reraise that from __getitem__ as a KeyError, but any other type of error I suggest would be safer to raise either by propagating the underlying error as-is, or by reraising as a RuntimeError. This would mean that any transient errors would get propagated all the way up through zarr, preventing any silent replacement of data with fill values.

For the time being we have worked around this with a wrapper class that transforms all KeyError into RuntimeError as described here. Note that this only works for the case where we know ahead of time that all chunks of an array should exist, and therefore we never expect to experience a KeyError for any reason.

@martindurant
Copy link
Member

The appropriate exception to catch would be FileNotFoundError (i.e., we could read the directory listing, but the file is not there)

--- a/fsspec/mapping.py
+++ b/fsspec/mapping.py
@@ -74,7 +74,7 @@ class FSMap(MutableMapping):
         key = self._key_to_str(key)
         try:
             result = self.fs.cat(key)
-        except:  # noqa: E722
+        except FileNotFoundError:
             if default is not None:
                 return default

I wonder where other possible exceptions such as permissions errors (i.e., you cannot access said file) should fall. Your particular case is sort of in-between.

Indeed, it would be good to have more logging here, since I suppose a better retry policy would have solved it, but the general complaint still stands.

@alimanfoo
Copy link
Author

Thanks @martindurant, I think just mapping FileNotFoundError to KeyError is appropriate, I would let all other exceptions propagate, including any permissions errors. Zarr really relies on the semantics of KeyError meaning that the file/object does not exist. If you mapped a permission error to KeyError then any permissions issues would silently turn into chunks getting filled in with the fill value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants