Skip to content

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Oct 26, 2022

This PR replaces all Google-specific components (including the logger, which doesn't depend on Google APIs as such) with USDF-appropriate components, and documents how to run the activator on our Kubernetes cluster.

The new logger is a placeholder that prioritizes human- over
machine-readability; it can be optimized for USDF's logging support at
a later date.
The USDF system no longer relies on verification tokens to prevent
DOS attacks.
@kfindeisen kfindeisen force-pushed the tickets/DM-36712 branch 3 times, most recently from 18c7d0e to 14a6ba4 Compare October 27, 2022 23:23
@kfindeisen kfindeisen marked this pull request as ready for review October 27, 2022 23:26
@kfindeisen kfindeisen requested a review from ktlim October 27, 2022 23:35

The service can be controlled with ``kubectl`` from ``rubin-devl``.
You must first `get credentials for the development cluster <https://k8s.slac.stanford.edu/usdf-prompt-processing-dev>`_ on the web; ignore the installation instructions and copy the commands from the second box.
Credentials are good for roughly one work day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought if you login with S3DF (Unix) that you can get a refresh token that allows you to use the credentials for much longer.

Copy link
Member Author

@kfindeisen kfindeisen Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the figure I was given for Unix S3DF was 8 hours. But I'll admit I hadn't tried not getting a new token at the start of the day.

EDIT: just tried running kubectl now (with my last credentials grab Thursday morning) and got:

Unable to connect to the server: failed to refresh token: oauth2: cannot fetch token: 400 Bad Request
Response: {"error":"invalid_request","error_description":"Refresh token is invalid or has already been claimed by another client."}

kafka_group_id = str(uuid.uuid4())
# The topic on which to listen to updates to image_bucket
# bucket_topic = f"{instrument_name}-image"
bucket_topic = "pp-bucket-notify-topic"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable (i.e. via env var)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about how well that would work once we have topics that depend on the instrument, detector, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrument has to have a different set of workers, I would think. You wouldn't be dynamically switching a given worker to a different instrument.

Detector/raft might be a problem. At least the non-dynamic part should be configurable, then? Or have the configuration specify a template?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm mostly thinking of a deployment per instrument and so having the instrument be part of the configuration and not the code is important.

Copy link
Member Author

@kfindeisen kfindeisen Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrument name is already part of the configuration -- we need it for lots of things!

ack_deadline_seconds=60,
)
_log.debug(f"Created subscription '{subscription.name}'")
consumer.subscribe([bucket_topic])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's no explicit redelivery on unprocessed messages like Pub/Sub had, there's no need to add subscriptions dynamically, I think. Can't we subscribe once and reuse for each visit?

This transition requires some nontrivial changes to the handler, since
Kafka messages require more filtering than PubSub, and S3 notifications
can (in theory) refer to more than one file.
The prompt prototype no longer uses any Google packages.
@kfindeisen kfindeisen force-pushed the tickets/DM-36712 branch 2 times, most recently from db591ac to 71fa6eb Compare October 31, 2022 23:21
@kfindeisen kfindeisen merged commit 0a0a785 into main Oct 31, 2022
@kfindeisen kfindeisen deleted the tickets/DM-36712 branch October 31, 2022 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants