Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow SSH access into Notebook Pods #23

Open
thesuperzapper opened this issue Jul 9, 2024 · 6 comments
Open

Allow SSH access into Notebook Pods #23

thesuperzapper opened this issue Jul 9, 2024 · 6 comments
Labels
kind/enhancement kind - new features or changes project/notebooks-v2 project - kubeflow notebooks v2

Comments

@thesuperzapper
Copy link
Member

thesuperzapper commented Jul 9, 2024

Whats the Goal?

I am trying to figure out how to allow users to SSH into Notebook Pods from their laptop. The benefit of this is supporting tools like Remote VSCode and JetBrains Gateway (for PyCharm) with the resources (e.g. GPUs) of the Pod.

The main issue is how to expose the Notebook Pod via SSH on the Istio Ingress Gateway.

What's the Problem?

SSH uses TCP which can't do hostname/HTTP-path routing like we do for the web-based UIs of the Notebooks. The naive approach is to have the Istio Ingress Gateway listen on a unique port for each Notebook (which is obviously not scalable or secure).

In my mind there are only TWO ways to make this work:

  1. Use a "jump box" service (which has a single IP/Port) which listen on SSH, but route incoming requests to specific Notebooks Pods based on the SSH-key used to authenticate:
    • This could be implemented by setting the command in authorized_keys to another -t username@<WORKSPACE_NAME>.<NAMESPACE_NAME>.svc.cluster.local] command (see idea here)
    • Or possibly there might be a pre-made opensource ssh-routing tool for this exact use-case.
    • I am not sure what kind of hardening is required on the jump box, but we need to consider stuff like:
      • disabling ssh tunneling
      • ensuring only traffic from the Istio Gateway gets to it (not from Pods inside the mesh)
      • using fail2ban to stop brute forcing
      • regular/automatic rotation of SSH keys
  2. Using some kind of SD-WAN VPN like Tailscale (can be open source hosted), Cloudflare Tunnel, or ngrok:
    • We would run the service both on the laptop and notebook pod, giving the Notebook Pod a special IP address that the laptop can use to access it.
    • This is slightly problematic because it will not be a direct connection from the user to the Pod (and it will probably be slower because traffic might have to be relayed).

Other Notes

While it is technically possible to use kubectl port-forward on the laptop to expose any port that the Notebook Pod is listening on (e.g. SSH port), I am not sure this is desirable at scale because it requires all users to have the pod/exec RBAC on the profile namespace, which is very privileged.

Final Thoughts

There are lots of security considerations to allowing remote SSH access, especially for the people who put Kubeflow on the public internet (NOT advised).

I am interested to hear people's ideas for how we can do this safely.

@thesuperzapper
Copy link
Member Author

@kimwnasptd @jiridanek @ederign @juliusvonkohout I am interested to know your thoughts on this, as allowing SSH into Notebook Pods is a long standing request, but is complex to do safely.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 10, 2024

In practice most people i know use a local vscode and connect it to the workbench/workspace via vscode extensions and a Kubeconfig. So it works on the Kubernetes, not the Kubeflow layer.

@thesuperzapper
Copy link
Member Author

In practice most people i know use a local vscode and connect it to the workbench/workspace via vscode extensions and a Kubeconfig. So it works on the Kubernetes, not the Kubeflow layer.

@juliusvonkohout I assume you are talking about the "attach to container" feature:

Or are you talking about using the code tunnel CLI, which relays through Microsoft servers?


If you are talking about the first option, it still has a few problems:

  1. It only supports VSCode
  2. It requires users to have lots of permissions on the cluster (I would need to check exactly what kubectl permissions, but I imagine they at least need pod/exec)
  3. The licence of VSCode Remote is proprietary (this is less of a problem, but I am just raising it)

Hence why I want to figure out a generic solution for SSH into the Notebook Pods without compromising the security of the cluster.

@juliusvonkohout
Copy link
Member

"Hence why I want to figure out a generic solution for SSH into the Notebook Pods without compromising the security of the cluster." yes, Code-server/vscode is just a workraound

@ederign
Copy link
Member

ederign commented Jul 18, 2024

@thesuperzapper This is indeed an interesting feature that can open up a bunch of new use cases and I agree that we should be careful on security considerations.

I would also start exploring option 1 (jump-box).

One thing we need to figure out is how users will securely add their own SSH keys. The first approach that comes to my mind is to allow them to assign a public key to a given notebook on the spawner UI. Another approach would be a 'key per namespace', that will allow me to ssh in any notebook of a given namespace.

@thesuperzapper thesuperzapper added project/notebooks-v2 project - kubeflow notebooks v2 kind/enhancement kind - new features or changes labels Jul 18, 2024
@Kallepan
Copy link

Kallepan commented Aug 12, 2024

Is relying on the Kubernetes API a scalable and reliable solution for managing workloads? I'm concerned that the kube-apiserver could become a bottleneck if multiple users simultaneously access numerous pods, particularly given that tools like VSCode may generate a high volume of small requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement kind - new features or changes project/notebooks-v2 project - kubeflow notebooks v2
Projects
Status: In Discussion
Development

No branches or pull requests

4 participants