DM-52947: Add a ResourcePath.get_info() method#140
Conversation
|
@airnandez this is a first attempt at adding a get_info method. I need to test it on a real S3 but it uses your existing info() implementation for WebDAV. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #140 +/- ##
==========================================
- Coverage 83.63% 83.62% -0.01%
==========================================
Files 36 36
Lines 7601 7958 +357
Branches 916 956 +40
==========================================
+ Hits 6357 6655 +298
- Misses 976 1007 +31
- Partials 268 296 +28 ☔ View full report in Codecov by Sentry. |
This is a standardized way to get the file size, last modified time, and any checksums known to the backend.
Somehow pre-commit finds them but the numpydoc github action does not.
dhirving
left a comment
There was a problem hiding this comment.
I didn't review the GS stuff, but the rest looks good to me.
| resp = self._propfind() | ||
| if resp.status_code != requests.codes.multi_status: | ||
| raise FileNotFoundError( | ||
| f"Resource {self} does not exist, status: {resp.status_code} {resp.reason}" | ||
| ) | ||
|
|
||
| prop = _parse_propfind_response_body(resp.text)[0] | ||
| if not prop.exists: | ||
| raise FileNotFoundError(f"Resource {self} does not exist") | ||
|
|
||
| return ResourceInfo( | ||
| uri=str(self), | ||
| is_file=prop.is_file, | ||
| size=prop.size, | ||
| last_modified=prop.last_modified, | ||
| creation_time=None, | ||
| checksums=dict(prop.checksums), | ||
| ) |
There was a problem hiding this comment.
I think we can probably leave out the WebDAV special case -- the regular HTTP one will still work against WebDAV, and it will be one less thing to back out when we go through and clean all the WebDAV stuff out of this file.
There was a problem hiding this comment.
Removing this code path does lead to a semantic change regarding directories. One of the tests is checking that you can't ask for the size of a directory but now that get_info does work for a directory. I think I should fix the test to expect 0 instead, consistent with what file URI does and non-webdav
There was a problem hiding this comment.
Actually, I'm getting some cascading errors from the test system for the change of behavior so I think I might leave the code in for now if I cant fix them relatively quickly.
| - name: Start fake-gcs-server | ||
| run: | | ||
| docker run -d --name fake-gcs-server -p 4443:4443 \ | ||
| fsouza/fake-gcs-server -scheme http -filesystem-root /tmp/fake-gcs-server | ||
|
|
||
| for i in $(seq 1 30); do | ||
| if curl --silent --fail http://127.0.0.1:4443/storage/v1/b >/dev/null; then | ||
| exit 0 | ||
| fi | ||
| sleep 1 | ||
| done | ||
|
|
||
| docker logs fake-gcs-server | ||
| exit 1 | ||
|
|
There was a problem hiding this comment.
If you do end up keeping this, the testcontainers library is a nice way to embed the docker setup logic in unit tests.
Example: https://github.com/lsst/daf_butler/blob/main/tests_integration/test_docker_container.py
It only exists on macOS APFS and whilst that's great for macOS users it doesn't help at all for our main systems.
In theory non-zero size S3 objects with trailing / can exist even though ResourcePath does not make them. get_info now handles that case and returns is_file=True if the file object has non-zero size.
This is a standardized way to get the file size, last modified time, and any checksums known to the backend.
Checklist
doc/changes