Skip to content

Commit

Permalink
Update comment on s3.walk
Browse files Browse the repository at this point in the history
Explains that we are not downloading all the keys in one
go and collating locally because that can become an unbounded
problem for large buckets.
  • Loading branch information
timj committed Mar 4, 2021
1 parent 8fc5c78 commit 16e9568
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions python/lsst/daf/butler/core/_butlerUri/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,13 @@ def walk(self, file_filter: Optional[Union[str, re.Pattern]] = None) -> Iterator

# Limit each query to a single "directory" to match os.walk
# We could download all keys at once with no delimiter and work
# it out locally but as yet I'm not sure what the trade off is
# between doing a single listing of potentially millions of keys
# or an incremental get per folder.
# it out locally but this could potentially lead to large memory
# usage for millions of keys. It will also make the initial call
# to this method potentially very slow. If making this method look
# like os.walk was not required, we could query all keys with
# pagination and return them in groups of 1000, but that would
# be a different interface since we can't guarantee we would get
# them all grouped properly across the 1000 limit boundary.
prefix_len = len(self.relativeToPathRoot)
dirnames = []
filenames = []
Expand Down

0 comments on commit 16e9568

Please sign in to comment.