Suggested changes stemming from successful implementation of Studios SSH on Enterprise

## Background
@gwright99 @schaluva investigated how to deploy the new 25.3.4 Studio SSH feature into a client enterprise deployment. We found parts of the existing documentation to be incomplete/confusing and have opened this ticket to suggest improvements which we believe will facilitate client deployment efforts.

**NOTE:** _This feedback is based on an **EKS-type deployment**, with **AWS ALB Controller** and **ExternalDNS Controller** add-ons deployed. We still need to figure out nuances for AKS, GKE, generic Kubernetes, and Docker-Compose, but the EKS experience should provide content to work from_. 

## Major Items
- Documentation missing (initial) guidance on how to expose `connect-proxy` to Layer 4-type SSH traffic.
- Documentation intermingles mandatory minimal configuration with optional enhancement (_e.g. fingerprinting_).
- `connect-*` manifests load environment variables differently than `backend` and `cron` pods.

### Layer 4 Initial Guidance
The `connect-proxy` pod needs to be able to accept Layer 4 traffic. Assuming clients have made their Platform available to traffic via the provided Ingress object, we should assume there is currently only an AWS Application Load Balancer (ALB) present (_spawned in response to AWS ALB Controller annotations on the Ingress_).

An AWS Network Load Balancer (NLB) is required for the Studio SSH deployment in order to:
1. Tie a DNS record to.
2. Route SSH traffic to the exposed Studio SSH Service object.

The documentation does [touch upon Layer 4 requirements](https://docs.seqera.io/platform-enterprise/enterprise/studios-ssh#step-4-network-access-requirements), but it's done in the midst of the deployment steps rather than called out as something that needs to be figured out **before starting this deployment since it affects downstream PLATFORM/CONNECT configuration and manifest configuration**.

AWS ALB Controller Ingress annotations provide a way to deploy an NLB in front of a deployed ALB. We tried this as our initial approach, hoping to reuse as much of the existing manifests as possible and to make the `connect.` URL dual-use (Layer 4 and Layer 7).  Ingresses only support Layer 7 traffic, so we needed to expose the Studio SSH (Layer 4) endpoint to the NLB some other way. We were able to find workarounds that functioned temporarily (_with a hacky mix of Terraformed AWS resources and K8s resources_), but the core reconciliation loops of the AWS ALB Controller and ExternalDNS stripped these modifications seem they were deemed to be "drift".

We eventually settled on the deployment model I propose for the docs: **define a separate LoadBalancer-type Service with its own dedicated NLB.** Sample manifest:

```yaml
apiVersion: v1
kind: Service
metadata:
  name: connect-proxy-ssh
  labels:
    app.kubernetes.io/component: connect-proxy
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    external-dns.alpha.kubernetes.io/hostname: "connect-ssh.gw.cxpg.dev-seqera.net"
spec:
  type: LoadBalancer
  ports:
    - name: ssh
      port: 2222
      targetPort: 2222
      protocol: TCP
  selector:
    app.kubernetes.io/component: connect-proxy
```

### Mandatory Vs Optional Configuration
We recommend the [existing documentation](https://docs.seqera.io/platform-enterprise/enterprise/studios-ssh)  be modified to address the following issues.

1. **Move SSH fingerprinting to an advanced option.** 
    I appreciate this is a Production best practice, but it is not mandatory for the underlying feature to work. It is currently the very first step in the process, however, and thus likely to be a not-always-necessary distraction. IMO better to move it to an advanced option and let deployers focus on the mandatory subset of configuration first.
2. **Provide overarching guidance for `Step 2: Configure Platform` and `Step 3: Configure proxy` that makes it clear that several keys need to be aligned.**
    - This is somewhat called out in the existing bullets but not immediately evident. A simple table would likely make this much clearer.
    - `TOWER_DATA_STUDIO_CONNECT_SSH_PORT` and `CONNECT_SSH_ADDR`
    - `TOWER_DATA_STUDIO_CONNECT_SSH_ADDRESS` must be aligned to the DNS record assigned to the NLB linked to the SSH Service (_assuming you follow the advice in the previous section_).
3. **Clarify `TOWER_DATA_STUDIO_SSH_ALLOWED_WORKSPACES`**:
    - Current texts says `... # Comma-separated workspace IDs, or leave empty to disable
TOWER_SSH_KEYS_MANAGEMENT_ENABLED: "true"`
    - In our experience, this need to be set to `TOWER_DATA_STUDIO_SSH_ALLOWED_WORKSPACES: ""` to enable the feature in all workspaces and it's not quite clear how to specify the `null` value (_or if this is even required and should be controlled by `CONNECT_SSH_ENABLED: "true"` 
4. **Modify `Step 3: Configure proxy` to have settings in a code block.**
    - This is a stylistic nitpick, but harmonization would have made initial prose scanning easier.

### Connect Manifest
- The [`cron`](https://docs.seqera.io/assets/files/tower-cron-f3ee65d6d147b82304816086fc5c5b27.yml) and [`backend`](https://docs.seqera.io/assets/files/tower-svc-dffe7c174face70918c262200d2ea916.yml) source the majority of their application configuration from a [mounted ConfigMap](https://docs.seqera.io/assets/files/configmap-3ff137fd018690af30c82e198fba4824.yml). This is a clean way to centralize configuration values and share them in a DRY manner.
- The Connect [`proxy`](https://docs.seqera.io/assets/files/proxy-e722aaf01449841b324af098741b44b3.yml) and [`server`](https://docs.seqera.io/assets/files/server-44db2bbcfb83b7aaadf41dbc8c669a97.yml) manifests load their environment variables via direct modification of environment variable definition in the manifest rather than via ConfigMap. 
- This difference in behaviour caused initial configuration and troubleshooting confusion. It was sorted quickly enough, but - given that various Platform and Connect values need to be aligned / synced - we think it makes more sense to migrate the `CONNECT_`-type environment variables from the connect manifests to the already-in-use ConfigMap that serves `backend` and `cron`. This will allow all values to be seen in one place, and should simplify the steps clients must follow to complete a deployment. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested changes stemming from successful implementation of Studios SSH on Enterprise #1194

Background

Major Items

Layer 4 Initial Guidance

Mandatory Vs Optional Configuration

Connect Manifest

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggested changes stemming from successful implementation of Studios SSH on Enterprise #1194

Description

Background

Major Items

Layer 4 Initial Guidance

Mandatory Vs Optional Configuration

Connect Manifest

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions