Skip to content

Suggested changes stemming from successful implementation of Studios SSH on Enterprise #1194

@gwright99

Description

@gwright99

Background

@gwright99 @schaluva investigated how to deploy the new 25.3.4 Studio SSH feature into a client enterprise deployment. We found parts of the existing documentation to be incomplete/confusing and have opened this ticket to suggest improvements which we believe will facilitate client deployment efforts.

NOTE: This feedback is based on an EKS-type deployment, with AWS ALB Controller and ExternalDNS Controller add-ons deployed. We still need to figure out nuances for AKS, GKE, generic Kubernetes, and Docker-Compose, but the EKS experience should provide content to work from.

Major Items

  • Documentation missing (initial) guidance on how to expose connect-proxy to Layer 4-type SSH traffic.
  • Documentation intermingles mandatory minimal configuration with optional enhancement (e.g. fingerprinting).
  • connect-* manifests load environment variables differently than backend and cron pods.

Layer 4 Initial Guidance

The connect-proxy pod needs to be able to accept Layer 4 traffic. Assuming clients have made their Platform available to traffic via the provided Ingress object, we should assume there is currently only an AWS Application Load Balancer (ALB) present (spawned in response to AWS ALB Controller annotations on the Ingress).

An AWS Network Load Balancer (NLB) is required for the Studio SSH deployment in order to:

  1. Tie a DNS record to.
  2. Route SSH traffic to the exposed Studio SSH Service object.

The documentation does touch upon Layer 4 requirements, but it's done in the midst of the deployment steps rather than called out as something that needs to be figured out before starting this deployment since it affects downstream PLATFORM/CONNECT configuration and manifest configuration.

AWS ALB Controller Ingress annotations provide a way to deploy an NLB in front of a deployed ALB. We tried this as our initial approach, hoping to reuse as much of the existing manifests as possible and to make the connect. URL dual-use (Layer 4 and Layer 7). Ingresses only support Layer 7 traffic, so we needed to expose the Studio SSH (Layer 4) endpoint to the NLB some other way. We were able to find workarounds that functioned temporarily (with a hacky mix of Terraformed AWS resources and K8s resources), but the core reconciliation loops of the AWS ALB Controller and ExternalDNS stripped these modifications seem they were deemed to be "drift".

We eventually settled on the deployment model I propose for the docs: define a separate LoadBalancer-type Service with its own dedicated NLB. Sample manifest:

apiVersion: v1
kind: Service
metadata:
  name: connect-proxy-ssh
  labels:
    app.kubernetes.io/component: connect-proxy
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    external-dns.alpha.kubernetes.io/hostname: "connect-ssh.gw.cxpg.dev-seqera.net"
spec:
  type: LoadBalancer
  ports:
    - name: ssh
      port: 2222
      targetPort: 2222
      protocol: TCP
  selector:
    app.kubernetes.io/component: connect-proxy

Mandatory Vs Optional Configuration

We recommend the existing documentation be modified to address the following issues.

  1. Move SSH fingerprinting to an advanced option.
    I appreciate this is a Production best practice, but it is not mandatory for the underlying feature to work. It is currently the very first step in the process, however, and thus likely to be a not-always-necessary distraction. IMO better to move it to an advanced option and let deployers focus on the mandatory subset of configuration first.
  2. Provide overarching guidance for Step 2: Configure Platform and Step 3: Configure proxy that makes it clear that several keys need to be aligned.
    • This is somewhat called out in the existing bullets but not immediately evident. A simple table would likely make this much clearer.
    • TOWER_DATA_STUDIO_CONNECT_SSH_PORT and CONNECT_SSH_ADDR
    • TOWER_DATA_STUDIO_CONNECT_SSH_ADDRESS must be aligned to the DNS record assigned to the NLB linked to the SSH Service (assuming you follow the advice in the previous section).
  3. Clarify TOWER_DATA_STUDIO_SSH_ALLOWED_WORKSPACES:
    • Current texts says ... # Comma-separated workspace IDs, or leave empty to disable TOWER_SSH_KEYS_MANAGEMENT_ENABLED: "true"
    • In our experience, this need to be set to TOWER_DATA_STUDIO_SSH_ALLOWED_WORKSPACES: "" to enable the feature in all workspaces and it's not quite clear how to specify the null value (_or if this is even required and should be controlled by CONNECT_SSH_ENABLED: "true"
  4. Modify Step 3: Configure proxy to have settings in a code block.
    • This is a stylistic nitpick, but harmonization would have made initial prose scanning easier.

Connect Manifest

  • The cron and backend source the majority of their application configuration from a mounted ConfigMap. This is a clean way to centralize configuration values and share them in a DRY manner.
  • The Connect proxy and server manifests load their environment variables via direct modification of environment variable definition in the manifest rather than via ConfigMap.
  • This difference in behaviour caused initial configuration and troubleshooting confusion. It was sorted quickly enough, but - given that various Platform and Connect values need to be aligned / synced - we think it makes more sense to migrate the CONNECT_-type environment variables from the connect manifests to the already-in-use ConfigMap that serves backend and cron. This will allow all values to be seen in one place, and should simplify the steps clients must follow to complete a deployment.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions