Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 24 additions & 38 deletions doc/playbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,44 +214,30 @@ The service can be controlled with ``kubectl`` from ``rubin-devl``.
You must first `get credentials for the development cluster <https://k8s.slac.stanford.edu/usdf-prompt-processing-dev>`_ on the web; ignore the installation instructions and copy the commands from the second box.
Credentials must be renewed if you get a "cannot fetch token: 400 Bad Request" error when running ``kubectl``.

Each time the service container is updated, a new revision of the service should be edited and deployed.
(Continuous deployment has not yet been set up.)
To create the service, clone the `slaclab/rubin-usdf-prompt-processing`_ repo and navigate to the ``kubernetes/overlays/dev/prompt-proto-service`` directory.
Edit ``prompt-proto-service.yaml`` to point to the new service container (likely a ticket branch instead of ``latest``), then run ``make apply`` *from the same directory*.
See the readme in that directory for more details.

.. _slaclab/rubin-usdf-prompt-processing: https://github.com/slaclab/rubin-usdf-prompt-processing/

All service configuration is in ``prompt-proto-service.yaml``.
It includes the following required environment variables:

* RUBIN_INSTRUMENT: the "short" instrument name
* PIPELINES_CONFIG: a machine-readable string describing which pipeline(s) should be run for which visits.
Notation is complex and still in flux; see :file:`../python/activator/config.py` for current documentation and examples.
* PUBSUB_VERIFICATION_TOKEN: choose an arbitrary string matching the Pub/Sub endpoint URL below.
This variable is currently unused and may be removed in the future.
* IMAGE_BUCKET: bucket containing raw images
* CALIB_REPO: URI to repo containing calibrations (and templates)
* LSST_DISABLE_BUCKET_VALIDATION: set this so to disable validation of S3 bucket names, allowing Ceph multi-tenant colon-separated names to be used.
* IP_APDB: IP address or hostname and port of the APDB (see `Databases`_, below)
* IP_REGISTRY: IP address or hostname and port of the registry database (see `Databases`_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IP_REGISTRY and DB_REGISTRY are also mentioned in the Google Cloud session and may be deleted together?

Though I recall that in a previous meeting we agreed to terminate those running on Google Cloud so I'm not sure how much we care about that session.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the catch is that this version of the code won't work on Google (we can't get a .pgpass file with the right permissions there). So if you're running on Google, you must be using a version that still uses those variables. 😵‍💫

* DB_APDB: PostgreSQL database name for the APDB
* PSQL_APDB_PASS: secret containing the password for ``USER_APDB`` (see below)
* DB_REGISTRY: PostgreSQL database name for the registry database
* PSQL_REGISTRY_PASS: secret containing the password for ``USER_REGISTRY`` (see below)
* KAFKA_CLUSTER: hostname and port of the Kafka provider

The following environment variables are optional:

* IMAGE_TIMEOUT: timeout in seconds to wait after expected script completion for raw image arrival, default 20 sec.
* LOCAL_REPOS: absolute path (in the container) where local repos are created, default ``/tmp``.
* USER_APDB: database user for the APDB, default "postgres"
* USER_REGISTRY: database user for the registry database, default "postgres"
* NAMESPACE_APDB: the database namespace for the APDB, defaults to the DB's default namespace
* SERVICE_LOG_LEVELS: requested logging levels in the format of `Middleware's --log-level argument <https://pipelines.lsst.io/v/daily/modules/lsst.daf.butler/scripts/butler.html#cmdoption-butler-log-level>`_.
Default is to log prompt_prototype at DEBUG, other LSST code at INFO, and third-party code at WARNING.

Secrets are configured through the makefile and ``kustomization.yaml``.
The service container deployment is managed using `Argo CD and Phalanx <https://k8s.slac.stanford.edu/usdf-prompt-processing-dev/argo-cd>`_.
See the `Phalanx`_ docs for information on working with Phalanx in general (including special developer environment setup).

There are two different ways to deploy a development release of the service:

* If you will not be making permanent changes to the Phalanx config, go to the Argo UI, select the specific ``prompt-proto-service-<instrument>`` service, then select the first "svc" node.
Scroll down to the live manifest, click "edit", then update the ``template.spec.containers.image`` key to point to the new service container (likely a ticket branch instead of ``latest``).
The service will immediately redeploy with the new image.
To force an update of the container, edit ``template.metadata.annotations.revision``.
*Do not* click "SYNC" on the main screen, as that will undo all your edits.
* If you will be making permanent changes of any kind, the above procedure would force you to re-enter your changes with each update of the ``phalanx`` branch.
Instead, clone the `lsst-sqre/phalanx`_ repo and navigate to the ``applications/prompt-proto-service-<instrument>`` directory.
Edit ``values-usdfdev-prompt-processing.yaml`` to point to the new service container (likely a ticket branch instead of ``latest``) and push the branch.
You do not need to create a PR.
Then, in the Argo UI, follow the instructions in `the Phalanx docs <https://phalanx.lsst.io/developers/deploy-from-a-branch.html#switching-the-argo-cd-application-to-sync-the-branch>`_.
To force a container update without a corresponding ``phalanx`` update, you need to edit ``template.metadata.annotations.revision`` as described above -- `restarting a deployment <https://phalanx.lsst.io/developers/deploy-from-a-branch.html#restarting-a-deployment>`_ that's part of a service does not check for a newer container, even with Always pull policy.

.. _Phalanx: https://phalanx.lsst.io/developers/
.. _lsst-sqre/phalanx: https://github.com/lsst-sqre/phalanx/

The service configuration is in each instrument's ``values.yaml`` (for settings shared between development and production) and ``values-usdfdev-prompt-processing.yaml`` (for development-only settings).
``values.yaml`` and ``README.md`` provide documentation for all settings.
The actual Kubernetes config (and the implementation of new config settings or secrets) is in ``charts/prompt-proto-service/templates/prompt-proto-service.yaml``.
This file fully supports the Go template syntax.

A few useful commands for managing the service:

Expand Down
5 changes: 0 additions & 5 deletions python/activator/activator.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@

from .config import PipelinesConfig
from .logger import setup_usdf_logger
from .make_pgpass import make_pgpass
from .middleware_interface import get_central_butler, make_local_repo, MiddlewareInterface
from .raw import (
get_prefix_from_snap,
Expand Down Expand Up @@ -80,10 +79,6 @@


try:
# Write PostgreSQL credentials.
# This MUST be done before creating a Butler or accessing the APDB.
make_pgpass()

app = Flask(__name__)

consumer = kafka.Consumer({
Expand Down
61 changes: 0 additions & 61 deletions python/activator/make_pgpass.py

This file was deleted.

7 changes: 1 addition & 6 deletions python/activator/middleware_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,6 @@ class MiddlewareInterface:
"""

# Class invariants:
# self._apdb_uri is a valid URI that unambiguously identifies the APDB
# self.image_host is a valid URI with non-empty path and no query or fragment.
# self._download_store is None if and only if self.image_host is a local URI.
# self.visit, self.instrument, self.camera, self.skymap, self._deployment
Expand Down Expand Up @@ -262,11 +261,7 @@ def _get_deployment(self):
def _make_apdb_uri(self):
"""Generate a URI for accessing the APDB.
"""
# TODO: merge this code with make_pgpass.py
ip_apdb = os.environ["IP_APDB"] # Also includes port
db_apdb = os.environ["DB_APDB"]
user_apdb = os.environ.get("USER_APDB", "postgres")
return f"postgresql://{user_apdb}@{ip_apdb}/{db_apdb}"
return os.environ["URL_APDB"]

def _init_local_butler(self, repo_uri: str, output_collections: list[str], output_run: str):
"""Prepare the local butler to ingest into and process from.
Expand Down
8 changes: 2 additions & 6 deletions tests/test_middleware_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,7 @@ class MiddlewareInterfaceTest(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.env_patcher = unittest.mock.patch.dict(os.environ,
{"IP_APDB": "localhost",
"DB_APDB": "postgres",
"USER_APDB": "postgres",
{"URL_APDB": "postgresql://localhost/postgres",
"K_REVISION": "prompt-proto-service-042",
})
cls.env_patcher.start()
Expand Down Expand Up @@ -805,9 +803,7 @@ def setUpClass(cls):
super().setUpClass()

cls.env_patcher = unittest.mock.patch.dict(os.environ,
{"IP_APDB": "localhost",
"DB_APDB": "postgres",
"USER_APDB": "postgres",
{"URL_APDB": "postgresql://localhost/postgres",
"K_REVISION": "prompt-proto-service-042",
})
cls.env_patcher.start()
Expand Down