New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27355: Support Google Cloud Storage #9
Conversation
Codecov Report
@@ Coverage Diff @@
## main #9 +/- ##
==========================================
- Coverage 91.34% 86.48% -4.87%
==========================================
Files 20 22 +2
Lines 2485 2685 +200
Branches 349 397 +48
==========================================
+ Hits 2270 2322 +52
- Misses 145 290 +145
- Partials 70 73 +3
Continue to review full report at Codecov.
|
5f5279c
to
83f8f72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of a shame so much code needs to be repeated from s3.py
.
python/lsst/resources/gs.py
Outdated
if self.dirLike: | ||
return 0 | ||
# The first time this is called we need to sync from the remote. | ||
# Should we track when this needs to be called again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How frequently is it called? I would hope not very often for the same blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not very often for the same blob but it depends on the user. In butler we may call .size a couple of times in places.
# The root must already exist. | ||
return | ||
|
||
# Should this method do anything at all? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is creating a /
-terminated empty blob? That could potentially cause issues later on when listing objects with prefixes and delimiters (as you've found below, you have to filter) or if exposing the bucket as a Web site.
But I guess you already do this via the S3 interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. S3 does it and this matches that behavior. I'm not really sure what the best approach is in an object store. Make mkdir()
a no-op and then always say a "directory" exists because in theory it could?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really depends on what semantics you require.
As a side note, git doesn't allow empty directories either.
rewrite_token = None | ||
while True: | ||
try: | ||
rewrite_token, bytes_copied, total_bytes = self.blob.rewrite( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bucket.copy_blob()
might be simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but won't give the progress reporting that blob.rewrite()
has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, do I have to be consistent? :)
a8c013b
to
197f81a
Compare
import google.cloud.storage as storage | ||
from google.cloud.exceptions import NotFound | ||
except ImportError: | ||
storage = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess due to auth considerations it isn't possible to fall back to s3://
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the IDF the notebook setup would handle that because we have both credentials set up, although notebook environments already have google-cloud-storage installed. It would require some changes to the __new__
method to understand that a failure to import should change the scheme to s3 but it seems like in the general case that would be more confusing than telling people to install google-cloud-storage.
a141040
to
5d0c68e
Compare
Using black makes them irrelevant.
No test code at the moment.
Checklist
doc/changes