Update comment on s3.walk

Explains that we are not downloading all the keys in one go and collating locally because that can become an unbounded problem for large buckets.
lsst · Mar 4, 2021 · 16e9568 · 16e9568
1 parent 8fc5c78
commit 16e9568
Showing 1 changed file with 7 additions and 3 deletions.
diff --git a/python/lsst/daf/butler/core/_butlerUri/s3.py b/python/lsst/daf/butler/core/_butlerUri/s3.py
@@ -263,9 +263,13 @@ def walk(self, file_filter: Optional[Union[str, re.Pattern]] = None) -> Iterator
 
         # Limit each query to a single "directory" to match os.walk
         # We could download all keys at once with no delimiter and work
-        # it out locally but as yet I'm not sure what the trade off is
-        # between doing a single listing of potentially millions of keys
-        # or an incremental get per folder.
+        # it out locally but this could potentially lead to large memory
+        # usage for millions of keys. It will also make the initial call
+        # to this method potentially very slow. If making this method look
+        # like os.walk was not required, we could query all keys with
+        # pagination and return them in groups of 1000, but that would
+        # be a different interface since we can't guarantee we would get
+        # them all grouped properly across the 1000 limit boundary.
         prefix_len = len(self.relativeToPathRoot)
         dirnames = []
         filenames = []