-
Notifications
You must be signed in to change notification settings - Fork 0
DM-36720: migrate upload.py to USDF #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
066fad5
to
12744cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the conversion work, and for adding unit tests for the uploader! I'm looking forward to seeing all the pieces fit together.
) | ||
storage_client = storage.Client(PROJECT_ID, credentials=credentials) | ||
dest_bucket = storage_client.bucket("rubin-prompt-proto-main") | ||
endpoint_url = "https://s3dfrgw.slac.stanford.edu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the endpoint URL need to be passed to boto3
? I thought all environments already had S3_ENDPOINT_URL
set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I need to pass in the endpoint URL even if the env var S3_ENDPOINT_URL
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's surprising, since activator.py
didn't need to (it was using client
instead of resource
, but I don't think they're supposed to behave differently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. To me using client
without explicit endpoint but with the env var S3_ENDPOINT_URL
doesn't work either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I can instantiate a client
without an explicit endpoint, but will then fail to access anything likely because the endpoint is not what's intended:
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.
python/tester/upload.py
Outdated
""" | ||
_log.info(f"Sending next_visit for group: {group}") | ||
topic_path = publisher.topic_path(PROJECT_ID, "nextVisit") | ||
topic = "nextVisit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the correct topic name? The overlays/dev
uses "next-visit-topic"
, and we may want separate topics for the real-world and development environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should it be pp-bucket-notify-topic
from the activator] if I read that correctly?
Yeah we probably want to decide what topics to use for real-world and dev for all components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the same question applies to pp-bucket-notify-topic
, but we probably shouldn't use it for next_visit
in addition to image arrival notifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm changing this to next-visit-topic
for now. When we find out what name Summit really uses it can be changed easily.
@@ -1,11 +1,15 @@ | |||
__all__ = ["get_last_group", ] | |||
|
|||
import boto3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What functionality is missing from lsst.resources that would make it work for this? Flexibility of endpoint (GCS vs S3 vs WebDAV) is what lsst.resources was designed to help with so it would help me if I knew why you had decided against using it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular script is not supposed to depend on anything LSST. It was previously only useful in an environment that didn't have the Stack installed (and where installation would have been logistically difficult), and may be so again in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you are installing boto3 so you are installing external software. Why can't you install lsst-resources from PyPI as well? Or are you using conda-forge so want me to add lsst-resources to conda-forge instead? lsst-resources is completely standalone PyPI installable with BSD license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply wasn't aware of lsst.resources
. Reading more, it makes sense to me to use it instead. Though I prefer this ticket to cover just the IDF->USDF port of upload.py
. I can create a new ticket for switching to lsst.resources
here and other places in the prompt processing codebase and handling dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lsst.resources
is how butler datastore can work on S3, WebDAV, GCS, or local file system without butler having to care where the files end up.
6f8ee50
to
006dfc2
Compare
It was updated from `int` to `str` in cafe9f3
As we switch from Pub/Sub to Kafka, we rename it to better match the terminology in Kafka.
Move the randomness generation to main()
006dfc2
to
a99ff20
Compare
This appears to be a limit of the boto3 API; see #35 for discussion.
No description provided.