Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-27355: Support Google Cloud Storage #9

Merged
merged 6 commits into from Mar 18, 2022
Merged

DM-27355: Support Google Cloud Storage #9

merged 6 commits into from Mar 18, 2022

Conversation

timj
Copy link
Member

@timj timj commented Mar 15, 2022

No test code at the moment.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Mar 15, 2022

Codecov Report

Merging #9 (9abf898) into main (9fc64b4) will decrease coverage by 4.86%.
The diff coverage is 26.00%.

@@            Coverage Diff             @@
##             main       #9      +/-   ##
==========================================
- Coverage   91.34%   86.48%   -4.87%     
==========================================
  Files          20       22       +2     
  Lines        2485     2685     +200     
  Branches      349      397      +48     
==========================================
+ Hits         2270     2322      +52     
- Misses        145      290     +145     
- Partials       70       73       +3     
Impacted Files Coverage Δ
python/lsst/resources/gs.py 22.16% <22.16%> (ø)
tests/test_gs.py 66.66% <66.66%> (ø)
python/lsst/resources/_resourcePath.py 94.41% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fc64b4...9abf898. Read the comment docs.

@timj timj force-pushed the tickets/DM-27355 branch 6 times, most recently from 5f5279c to 83f8f72 Compare March 15, 2022 20:24
Copy link
Contributor

@ktlim ktlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a shame so much code needs to be repeated from s3.py.

python/lsst/resources/gs.py Outdated Show resolved Hide resolved
if self.dirLike:
return 0
# The first time this is called we need to sync from the remote.
# Should we track when this needs to be called again?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How frequently is it called? I would hope not very often for the same blob.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not very often for the same blob but it depends on the user. In butler we may call .size a couple of times in places.

.github/workflows/build.yaml Show resolved Hide resolved
python/lsst/resources/gs.py Outdated Show resolved Hide resolved
python/lsst/resources/gs.py Outdated Show resolved Hide resolved
# The root must already exist.
return

# Should this method do anything at all?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is creating a /-terminated empty blob? That could potentially cause issues later on when listing objects with prefixes and delimiters (as you've found below, you have to filter) or if exposing the bucket as a Web site.

But I guess you already do this via the S3 interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. S3 does it and this matches that behavior. I'm not really sure what the best approach is in an object store. Make mkdir() a no-op and then always say a "directory" exists because in theory it could?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really depends on what semantics you require.

As a side note, git doesn't allow empty directories either.

rewrite_token = None
while True:
try:
rewrite_token, bytes_copied, total_bytes = self.blob.rewrite(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket.copy_blob() might be simpler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but won't give the progress reporting that blob.rewrite() has?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do I have to be consistent? :)

python/lsst/resources/gs.py Outdated Show resolved Hide resolved
python/lsst/resources/gs.py Show resolved Hide resolved
python/lsst/resources/gs.py Show resolved Hide resolved
@timj timj force-pushed the tickets/DM-27355 branch 4 times, most recently from a8c013b to 197f81a Compare March 16, 2022 23:40
import google.cloud.storage as storage
from google.cloud.exceptions import NotFound
except ImportError:
storage = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess due to auth considerations it isn't possible to fall back to s3:// in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the IDF the notebook setup would handle that because we have both credentials set up, although notebook environments already have google-cloud-storage installed. It would require some changes to the __new__ method to understand that a failure to import should change the scheme to s3 but it seems like in the general case that would be more confusing than telling people to install google-cloud-storage.

@timj timj force-pushed the tickets/DM-27355 branch 6 times, most recently from a141040 to 5d0c68e Compare March 17, 2022 22:27
Using black makes them irrelevant.
@timj timj merged commit 4c3dfcb into main Mar 18, 2022
@timj timj deleted the tickets/DM-27355 branch March 18, 2022 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants