Skip to content

Conversation

hsinfang
Copy link
Collaborator

@hsinfang hsinfang commented Nov 4, 2022

No description provided.

@hsinfang hsinfang force-pushed the tickets/DM-36720 branch 2 times, most recently from 066fad5 to 12744cb Compare November 4, 2022 20:43
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the conversion work, and for adding unit tests for the uploader! I'm looking forward to seeing all the pieces fit together.

)
storage_client = storage.Client(PROJECT_ID, credentials=credentials)
dest_bucket = storage_client.bucket("rubin-prompt-proto-main")
endpoint_url = "https://s3dfrgw.slac.stanford.edu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the endpoint URL need to be passed to boto3? I thought all environments already had S3_ENDPOINT_URL set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I need to pass in the endpoint URL even if the env var S3_ENDPOINT_URL is set.

Copy link
Member

@kfindeisen kfindeisen Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising, since activator.py didn't need to (it was using client instead of resource, but I don't think they're supposed to behave differently).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. To me using client without explicit endpoint but with the env var S3_ENDPOINT_URL doesn't work either.

Copy link
Collaborator Author

@hsinfang hsinfang Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I can instantiate a client without an explicit endpoint, but will then fail to access anything likely because the endpoint is not what's intended:
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.

"""
_log.info(f"Sending next_visit for group: {group}")
topic_path = publisher.topic_path(PROJECT_ID, "nextVisit")
topic = "nextVisit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the correct topic name? The overlays/dev uses "next-visit-topic", and we may want separate topics for the real-world and development environments.

Copy link
Collaborator Author

@hsinfang hsinfang Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should it be pp-bucket-notify-topic from the activator] if I read that correctly?
Yeah we probably want to decide what topics to use for real-world and dev for all components?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the same question applies to pp-bucket-notify-topic, but we probably shouldn't use it for next_visit in addition to image arrival notifications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing this to next-visit-topic for now. When we find out what name Summit really uses it can be changed easily.

@@ -1,11 +1,15 @@
__all__ = ["get_last_group", ]

import boto3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What functionality is missing from lsst.resources that would make it work for this? Flexibility of endpoint (GCS vs S3 vs WebDAV) is what lsst.resources was designed to help with so it would help me if I knew why you had decided against using it. Thanks.

Copy link
Member

@kfindeisen kfindeisen Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular script is not supposed to depend on anything LSST. It was previously only useful in an environment that didn't have the Stack installed (and where installation would have been logistically difficult), and may be so again in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you are installing boto3 so you are installing external software. Why can't you install lsst-resources from PyPI as well? Or are you using conda-forge so want me to add lsst-resources to conda-forge instead? lsst-resources is completely standalone PyPI installable with BSD license.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply wasn't aware of lsst.resources. Reading more, it makes sense to me to use it instead. Though I prefer this ticket to cover just the IDF->USDF port of upload.py. I can create a new ticket for switching to lsst.resources here and other places in the prompt processing codebase and handling dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lsst.resources is how butler datastore can work on S3, WebDAV, GCS, or local file system without butler having to care where the files end up.

@hsinfang hsinfang force-pushed the tickets/DM-36720 branch 6 times, most recently from 6f8ee50 to 006dfc2 Compare November 11, 2022 05:53
@hsinfang hsinfang merged commit 0062cc6 into main Nov 11, 2022
@hsinfang hsinfang deleted the tickets/DM-36720 branch November 11, 2022 17:57
kfindeisen added a commit that referenced this pull request Nov 14, 2022
This appears to be a limit of the boto3 API; see #35 for discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants