Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'ETag' #477

Closed
machielg opened this issue May 11, 2021 · 10 comments · Fixed by #480 or #557
Closed

KeyError: 'ETag' #477

machielg opened this issue May 11, 2021 · 10 comments · Fixed by #480 or #557

Comments

@machielg
Copy link

What happened:
I get a KeyError when trying to copy files from bucket to local:

>       _, _, parts_suffix = info["ETag"].strip('"').partition("-")
E       KeyError: 'ETag'

What you expected to happen:
files get copied

Minimal Complete Verifiable Example:

from s3fs import S3FileSystem
s3 = S3FileSystem()
s3.copy(input_path, '/tmp/foobar', recursive=True)

Anything else we need to know?:
in the _info() function, when the ls was cached, the response never includes the 'Etag' key, so I think it's a bug to rely on it always being there.

Environment:

  • s3fs version: 2021.4.0
  • Python version: 3.6.10
  • Operating System: Mac
  • Install method (conda, pip, source): pip
@isidentical
Copy link
Member

in the _info() function, when the ls was cached, the response never includes the 'Etag' key, so I think it's a bug to rely on it always being there.

I think this is the actual issue. @machielg can you share the fs.info(input_path) too? I guess as an end result we could just do info.get("ETag", "") though I wonder why the etag is missing in the first place.

@efiop
Copy link
Member

efiop commented May 11, 2021

@machielg Are you using aws s3 or some s3-compatible storage (e.g. minio)?

@machielg
Copy link
Author

machielg commented May 11, 2021

@machielg Are you using aws s3 or some s3-compatible storage (e.g. minio)?

I'm using moto, this occurs in a unit test.

@machielg
Copy link
Author

in the _info() function, when the ls was cached, the response never includes the 'Etag' key, so I think it's a bug to rely on it always being there.

I think this is the actual issue. @machielg can you share the fs.info(input_path) too? I guess as an end result we could just do info.get("ETag", "") though I wonder why the etag is missing in the first place.

s3.info(input_path)
{'Key': 'source_bucket/processed/scoring_set/partitioning_date=2021-04-18', 'name': 'source_bucket/processed/scoring_set/partitioning_date=2021-04-18', 'type': 'directory', 'Size': 0, 'size': 0, 'StorageClass': 'DIRECTORY'}

@machielg
Copy link
Author

I'm getting the same error if I try it in a python shell

 s3.copy("s3://sagemaker-eu-west-1-123456789/cvc/cvc-mail/email/", "/tmp/foobar", recursive=True)

@martindurant
Copy link
Member

It does seem reasonable not to depend on the existence of ETag, if some implementations do not have it.
I imagine that for moto, this is version-dependent, since we are not seeing the problem in CI.

From the description, I couldn't tell if the problem was with attempting to copy a folder (which is not an S3 thing) or one of the constituent files.

@isidentical
Copy link
Member

It does seem reasonable not to depend on the existence of ETag, if some implementations do not have it.

Makes sense, will switch that part with a reasonable version.

@machielg
Copy link
Author

It does seem reasonable not to depend on the existence of ETag, if some implementations do not have it.
I imagine that for moto, this is version-dependent, since we are not seeing the problem in CI.

From the description, I couldn't tell if the problem was with attempting to copy a folder (which is not an S3 thing) or one of the constituent files.

I can't copy a file or a folder. I always get this ETag error.

@gwen-at-nielsen
Copy link

I've encountered a similar issue due to an object in s3 not returning an ETag when I try to open a parquet file using Pandas in Databricks. The stack trace ultimately points here.

Here's the full stack trace:

/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py in read_parquet(path, engine, columns, **kwargs)
    308 
    309     impl = get_engine(engine)
--> 310     return impl.read(path, columns=columns, **kwargs)
/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py in read(self, path, columns, **kwargs)
    119 
    120     def read(self, path, columns=None, **kwargs):
--> 121         path, _, _, should_close = get_filepath_or_buffer(path)
    122 
    123         kwargs["use_pandas_metadata"] = True
/databricks/python/lib/python3.7/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
    183 
    184         return s3.get_filepath_or_buffer(
--> 185             filepath_or_buffer, encoding=encoding, compression=compression, mode=mode
    186         )
    187 
/databricks/python/lib/python3.7/site-packages/pandas/io/s3.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
     46     mode: Optional[str] = None,
     47 ) -> Tuple[IO, Optional[str], Optional[str], bool]:
---> 48     file, _fs = get_file_and_filesystem(filepath_or_buffer, mode=mode)
     49     return file, None, compression, True
/databricks/python/lib/python3.7/site-packages/pandas/io/s3.py in get_file_and_filesystem(filepath_or_buffer, mode)
     27     fs = s3fs.S3FileSystem(anon=False)
     28     try:
---> 29         file = fs.open(_strip_schema(filepath_or_buffer), mode)
     30     except (FileNotFoundError, NoCredentialsError):
     31         # boto3 has troubles when trying to access a public file
/local_disk0/pythonVirtualEnvDirs/virtualEnv-9baeae3e-a195-4f34-9de0-cc931325fa4c/lib/python3.7/site-packages/fsspec/spec.py in open(self, path, mode, block_size, cache_options, **kwargs)
    946                 autocommit=ac,
    947                 cache_options=cache_options,
--> 948                 **kwargs,
    949             )
    950             if not ac and "r" not in mode:
/local_disk0/pythonVirtualEnvDirs/virtualEnv-9baeae3e-a195-4f34-9de0-cc931325fa4c/lib/python3.7/site-packages/s3fs/core.py in _open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, requester_pays, **kwargs)
    507             cache_type=cache_type,
    508             autocommit=autocommit,
--> 509             requester_pays=requester_pays,
    510         )
    511 
/local_disk0/pythonVirtualEnvDirs/virtualEnv-9baeae3e-a195-4f34-9de0-cc931325fa4c/lib/python3.7/site-packages/s3fs/core.py in __init__(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays)
   1754 
   1755         if "r" in mode:
-> 1756             self.req_kw["IfMatch"] = self.details["ETag"]
   1757 
   1758     def _call_s3(self, method, *kwarglist, **kwargs):
KeyError: 'ETag'

I'm not sure why the object in question returns no ETag. Any info or suggested workarounds here would be appreciated!

@martindurant
Copy link
Member

The line should probably be skipped if the ETag is missing or empty. Contribution welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants