-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Background
@gwright99 @schaluva investigated how to deploy the new 25.3.4 Studio SSH feature into a client enterprise deployment. We found parts of the existing documentation to be incomplete/confusing and have opened this ticket to suggest improvements which we believe will facilitate client deployment efforts.
NOTE: This feedback is based on an EKS-type deployment, with AWS ALB Controller and ExternalDNS Controller add-ons deployed. We still need to figure out nuances for AKS, GKE, generic Kubernetes, and Docker-Compose, but the EKS experience should provide content to work from.
Major Items
- Documentation missing (initial) guidance on how to expose
connect-proxyto Layer 4-type SSH traffic. - Documentation intermingles mandatory minimal configuration with optional enhancement (e.g. fingerprinting).
connect-*manifests load environment variables differently thanbackendandcronpods.
Layer 4 Initial Guidance
The connect-proxy pod needs to be able to accept Layer 4 traffic. Assuming clients have made their Platform available to traffic via the provided Ingress object, we should assume there is currently only an AWS Application Load Balancer (ALB) present (spawned in response to AWS ALB Controller annotations on the Ingress).
An AWS Network Load Balancer (NLB) is required for the Studio SSH deployment in order to:
- Tie a DNS record to.
- Route SSH traffic to the exposed Studio SSH Service object.
The documentation does touch upon Layer 4 requirements, but it's done in the midst of the deployment steps rather than called out as something that needs to be figured out before starting this deployment since it affects downstream PLATFORM/CONNECT configuration and manifest configuration.
AWS ALB Controller Ingress annotations provide a way to deploy an NLB in front of a deployed ALB. We tried this as our initial approach, hoping to reuse as much of the existing manifests as possible and to make the connect. URL dual-use (Layer 4 and Layer 7). Ingresses only support Layer 7 traffic, so we needed to expose the Studio SSH (Layer 4) endpoint to the NLB some other way. We were able to find workarounds that functioned temporarily (with a hacky mix of Terraformed AWS resources and K8s resources), but the core reconciliation loops of the AWS ALB Controller and ExternalDNS stripped these modifications seem they were deemed to be "drift".
We eventually settled on the deployment model I propose for the docs: define a separate LoadBalancer-type Service with its own dedicated NLB. Sample manifest:
apiVersion: v1
kind: Service
metadata:
name: connect-proxy-ssh
labels:
app.kubernetes.io/component: connect-proxy
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
external-dns.alpha.kubernetes.io/hostname: "connect-ssh.gw.cxpg.dev-seqera.net"
spec:
type: LoadBalancer
ports:
- name: ssh
port: 2222
targetPort: 2222
protocol: TCP
selector:
app.kubernetes.io/component: connect-proxyMandatory Vs Optional Configuration
We recommend the existing documentation be modified to address the following issues.
- Move SSH fingerprinting to an advanced option.
I appreciate this is a Production best practice, but it is not mandatory for the underlying feature to work. It is currently the very first step in the process, however, and thus likely to be a not-always-necessary distraction. IMO better to move it to an advanced option and let deployers focus on the mandatory subset of configuration first. - Provide overarching guidance for
Step 2: Configure PlatformandStep 3: Configure proxythat makes it clear that several keys need to be aligned.- This is somewhat called out in the existing bullets but not immediately evident. A simple table would likely make this much clearer.
TOWER_DATA_STUDIO_CONNECT_SSH_PORTandCONNECT_SSH_ADDRTOWER_DATA_STUDIO_CONNECT_SSH_ADDRESSmust be aligned to the DNS record assigned to the NLB linked to the SSH Service (assuming you follow the advice in the previous section).
- Clarify
TOWER_DATA_STUDIO_SSH_ALLOWED_WORKSPACES:- Current texts says
... # Comma-separated workspace IDs, or leave empty to disable TOWER_SSH_KEYS_MANAGEMENT_ENABLED: "true" - In our experience, this need to be set to
TOWER_DATA_STUDIO_SSH_ALLOWED_WORKSPACES: ""to enable the feature in all workspaces and it's not quite clear how to specify thenullvalue (_or if this is even required and should be controlled byCONNECT_SSH_ENABLED: "true"
- Current texts says
- Modify
Step 3: Configure proxyto have settings in a code block.- This is a stylistic nitpick, but harmonization would have made initial prose scanning easier.
Connect Manifest
- The
cronandbackendsource the majority of their application configuration from a mounted ConfigMap. This is a clean way to centralize configuration values and share them in a DRY manner. - The Connect
proxyandservermanifests load their environment variables via direct modification of environment variable definition in the manifest rather than via ConfigMap. - This difference in behaviour caused initial configuration and troubleshooting confusion. It was sorted quickly enough, but - given that various Platform and Connect values need to be aligned / synced - we think it makes more sense to migrate the
CONNECT_-type environment variables from the connect manifests to the already-in-use ConfigMap that servesbackendandcron. This will allow all values to be seen in one place, and should simplify the steps clients must follow to complete a deployment.