Support executing cells in different gRPC Executor Services - Ephemeral Containers #593

jlewi · 2024-05-26T15:54:45Z

Feature Request

Let the user deploy the gRPC executor service in a container
Let the user select in vscode which executor different cells should run in
- So different cells can run in different executors

Motivation

Frequently when working with Kubernetes and containers you need to kubectl exec into a container and run some commands. This is even more common now with ephemeral containers.

An example is [verifying GKE Workload Identity] (https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#verify). Typically you would start a pod and the kubectl exec and run gcloud commands to test access.

What I'd like is to be able to write a playbook that has a mix of steps that run in different executors e.g.

kubectl apply pod.yaml (runs locally)
gcloud auth list (runs inside the container)

What you can do today

kubectl exec

You can run kubectl exec in a code cell. The output window is interactive and you can enter commands into it. This is pretty nice; especially as it doesn't block other cells from executing. However there are a couple disadvantages

The user would have to copy paste commands into the exec shell to execute them; rather than just executing a code cell
vscode has weird UX issues that make this less than ideal
*. It doesn't seem possible to resize the output cell to show more lines of the terminal
- When you scroll in vscode with the scroll wheel vscode switches between scrolling through the doc and through the output window which is very annoying

Set gRPC custom address

Using RunMe settings you can set a custom address for the executor and so could point it at a gRPC server running in a container.

I think its a bit cumbersome to constantly switch back and forth between the settings page to change where a cell would execute.

Desired UX

I'd like to be able to easily configure different parts of the document to run on different executors. Ideally I'd like to be able to do this without having to dig through the menus as that disrupts the flow. I think one option would be to have code blocks that contain RunMe configuration and are identified by a suitable language Id; e.g. runmeconfig. In the block you could then have yaml to configure runme e.g

grpcExecutor: 1234.1233.123.123

This would configure all subsequent cells to use that executor. It could then be switched back to the local executor

grpcExecutor: ""

Notably, I don't think you should have to execute these cells in order to apply the configuration. The semantics should be that the configuration automatically applies to all cells that come after them.

The text was updated successfully, but these errors were encountered:

sourishkrout · 2024-05-28T13:09:31Z

I'd love to learn more about the specific use cases.

For what it's worth, running Runme commands in containers is coming in runnerv2: https://github.com/stateful/runme/blob/main/experimental/runme.yaml#L47-L54. It likely won't satisfy all requirements yet but it should be able to expand the container/docker support accordingly.

Moreover, I'd like to integrate with the devcontainer.json spec and its CLI.

jlewi · 2024-07-12T20:55:06Z

I tried this and it works with Runner V2 but not V1 see #625.

There are a couple rough spots right now.

If the gRPC service is unavailable you can't serialize/deserialize notebooks.
- That's not a great experience. In my case, I'm running in a Google Cloud Workstation which automatically gets garbage collected after some amount of IDLE time. So could easily lose data if I haven't saved any the wrokstation goes down
It looks like the default behavior of the runner is to change to the working directory of the notebook before executing commands. This seems like desirable behavior.

However, when using a remote runner the directory of the markdown file might not exist in the remote machine
In this case you get an error like

Internal failure executing runner: chdir /Users/jlewi/git_foyle/docs/content/en/docs/integrations: no such file or directory

You can work around this by explicitly setting the cwd of the cell to a directory that exists on the remote machine.

So the good news it more or less works out of the box but there's a couple issues that need to be fix to make this a well supported path.

Can you explain the UX for the forthcoming "container" support? Does each code block end up starting a new container? How is the lifecycle of the container managed? Does RunMe manage the container or can I manage it manually?

adambabik · 2024-07-13T09:30:53Z

Can you explain the UX for the forthcoming "container" support? Does each code block end up starting a new container? How is the lifecycle of the container managed? Does RunMe manage the container or can I manage it manually?

This is very limited at the moment and implemented as a proof-of-concept. It is also only available via runme.yaml AFAIK which is experimental on its own. Overall, runme builds a Docker image and then executes a container using the cell as a spec. Many features like env sharing is not supported. I described a new proposal in #631.

sourishkrout · 2024-07-15T16:00:37Z

It is also only available via runme.yaml AFAIK which is experimental on its own.

aka runner v2

jlewi · 2024-07-16T12:34:07Z

Use Case: Melange

I'm currently working with melange to build apks (apks are basically tarballs and are used to build docker images with Chainguard's toolchain).

melange is containerized. The input is a YAML file and then you use docker to run a container that has melange in it.

I need to use a Cloud Workstation because my local machine is under powered. I have to ssh into the machine.
So my setup looks like the following.

I'd like to run vscode locally and execute commands in both my local machine and my cloud workstation. For example, the basic workflow is

Start the cloud workstation and create tunnel (this runs locally)
Run melange (via docker run) (this runs on the workstation)
Make changes to the melange YAML file and push them to git (this runs locally)
Run git pull to pull latest changes into the workstation (this runs on the workstation)
Run melange (this runs on the workstation)
If the cloud workstation is GC'd because it is IDLE I need to rerun the commands to setup the workstation

Melange also has an interactive mode. If it encounters an error in a build process it drops you into a shell so you can inspect the build environment and run commands interactively to try to fix things. In this case I'd like to be able to start a runme executor so I could directly execute commands inside the container.

Exploratory/Dev Mode

I'm using RunMe in an "exploratory/dev" mode. Concretely this means each cell will be authored and executed once. I think this is different from using RunMe to author repeatable playbooks where a cell will be authored once but executed multiple times in different sessions.

I think this distinction is important because it means adding a new cell needs to be fast; comparable to entering a new command in a shell. This is why I don't want to have to configure a cell by using a context menu. That seems ok if your authoring a cell once and expect it to be executed multiple times because you can amortize the cost. I'd also like to be able to have different configurations for different sections of a notebook so I don't have to constant repeat it for each cell if I have a sequence of cells that all need a particular configuration.

sourishkrout · 2024-07-23T16:39:11Z

Picking up on this side-issue first:

Exploratory/Dev Mode

I'm using RunMe in an "exploratory/dev" mode. Concretely this means each cell will be authored and executed once. I think this is different from using RunMe to author repeatable playbooks where a cell will be authored once but executed multiple times in different sessions.

I think this distinction is important because it means adding a new cell needs to be fast; comparable to entering a new command in a shell. This is why I don't want to have to configure a cell by using a context menu. That seems ok if your authoring a cell once and expect it to be executed multiple times because you can amortize the cost. I'd also like to be able to have different configurations for different sections of a notebook so I don't have to constant repeat it for each cell if I have a sequence of cells that all need a particular configuration.

The Notebook UX allows you to run a cell plus immediately add a new one with the OPTION+RETURN shortcut (I don't know the non-Mac offhand). I'm fairly certain we could replicate the previous cell's settings/annotations on this newly inserted cell. I'd imagine that would reduce the explore/dev overhead quite a bit. Wdyt?

Beyond, I believe the "Runme Terminal" could deliver on this dev/exp mode where a terminal session could add any previous ran command as cell input+output to a notebook side-by-side. Granted, we can make discriminating input from output out of what will just be a character stream (terminal session unaware) work reliably.

sourishkrout · 2024-07-23T16:43:12Z

Use Case: Melange

I'm currently working with melange to build apks (apks are basically tarballs and are used to build docker images with Chainguard's toolchain).

melange is containerized. The input is a YAML file and then you use docker to run a container that has melange in it.

I need to use a Cloud Workstation because my local machine is under powered. I have to ssh into the machine. So my setup looks like the following.

Outset question: Have you already tried using VS Code's Remote SSH Dev support to attach to the remote cloud workstation? I understand that it won't deliver on the desired hybrid setup, however, I'd be curious to hear how far it'll get you. Before going deep into the proposed hybrid solution and execution specifics.

jlewi · 2024-07-24T21:37:17Z

Re: VSCode Remote.

I've been using that and it works really well. I think this is a good solution when ssh is already setup.

The other situation I've been exploring is when setting up ssh to a machine isn't easy. Concretely, I'm running a prebuilt container on GKE and need to execute commands inside the container. In order to do ssh I'd need to

Setup networking to allow ssh (e.g. Tailscale)
Install the ssh daemon inside the container

Rather than doing that I've been using kubectl cp && kubectl exec to upload files to the container and then execute them. I find that with the notebook UX I'm more willing to write long and multi-line commands then I am inside the terminal. I also think AI(Foyle) could help with some of that verbosity.

So I think the takeaway is that I'd be hard pressed to make a strong case that support for different gRPC executor services would be a big unlock. There's probably sufficient ways to work around it right now.

Feel free to close this issue.

sourishkrout · 2024-07-25T23:37:08Z

Re: VSCode Remote.

I've been using that and it works really well. I think this is a good solution when ssh is already setup.

Agreed. SSH is a lot of places, that's why I usually start here.

The other situation I've been exploring is when setting up ssh to a machine isn't easy. Concretely, I'm running a prebuilt container on GKE and need to execute commands inside the container. In order to do ssh I'd need to

Setup networking to allow ssh (e.g. Tailscale)

Install the ssh daemon inside the container

Rather than doing that I've been using kubectl cp && kubectl exec to upload files to the container and then execute them. I find that with the notebook UX I'm more willing to write long and multi-line commands then I am inside the terminal. I also think AI(Foyle) could help with some of that verbosity.

Gotcha. That helps a lot to understand what it's in the way. My stance here is actually that I'd rather build on top of kubectl & docker CLIs aka the Kube & dockerd/containerd APIs than going down stack to a direct integration via gRPC sockets, i.e. network-level. My main driver is that kubectl has solutions for authn/authz and "remote connectivity" & "remote exec" (e.g. attach a sidecar), in a less low-level way than SSH does but likely something that could be built on top of. Also, Runme's TLS is poor person's PKI because it mimics the security model of a UDS where file permissions on one single system protect the socket. We largely did this to support the Runme's Parser API on Windows and going "distributed" will expose us to the paper-cuts that come with it.

So I think the takeaway is that I'd be hard pressed to make a strong case that support for different gRPC executor services would be a big unlock. There's probably sufficient ways to work around it right now.

I do agree, that in the short-term, we lack the resources to pursue what I'm describing above. However, we have started to make in-roads on Docker/container-support and will likely continue.

Feel free to close this issue.

Let's keep it open for a bit longer until we had a chance to harvest some of the the "nuggets" in here into narrower follow-up issues.

jlewi · 2024-07-26T17:19:53Z

My stance here is actually that I'd rather build on top of kubectl & docker CLIs aka the Kube & dockerd/containerd APIs than going down stack to a direct integration via gRPC sockets, i.e. network-level.

That's interesting. So your thinking is if I want to execute a command in a container on K8s (e.g. an ephemeral container); rather then starting RunMe in that container and using gRPC to send the command to that container; runMe would run locally and use kubectl exec (or the underlying API to execute commands in that container).

kubectl has solutions for authn/authz and "remote connectivity" & "remote exec" (e.g. attach a sidecar), in a less low-level way than SSH does but likely something that could be built on top of. Also, Runme's TLS is poor person's PKI

Do you need RunMe to directly support network authn/authz? My assumption was that if RunMe just exposes an HTTP endpoint then customers likely already have ways to do Authz regarding access to this endpoint at the network layer. For example, I use Tailscale.

Long Running Commands over Flaky Connections

One problem I've been hitting with vscode over ssh is that if I fire off long running commands. Concretely, I'm doing a make build and that build can take hours. A lot of time the vscode connection seems to go down and I have to restart reconnect vscode over ssh. As far as I can tell right now, if I was running make build in a RunMe cell and vscode gets disconnected the command is terminated and not able to reconnect. I generally use screen in this case. I haven't tried running screen with RunMe so don't know if that'd work or if there is some other way to start it as a background process.

More generally, if I use vscode over ssh; what does the UX end up looking like if I want to

Launch long running command(s) in a remote machine
Close my laptop
Reconnect later to get the results

sourishkrout added the runnerv2 label May 28, 2024

jlewi mentioned this issue Jul 3, 2024

Internal failure executing runner: fork/exec /bin/zsh: no such file or directory - Trying to run runme in remote server #625

Closed

jlewi mentioned this issue Oct 12, 2024

Let the Serialization Service Use A Different Endpoint than the Executor #682

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support executing cells in different gRPC Executor Services - Ephemeral Containers #593

Support executing cells in different gRPC Executor Services - Ephemeral Containers #593

jlewi commented May 26, 2024

sourishkrout commented May 28, 2024

jlewi commented Jul 12, 2024

adambabik commented Jul 13, 2024

sourishkrout commented Jul 15, 2024

jlewi commented Jul 16, 2024

sourishkrout commented Jul 23, 2024

Exploratory/Dev Mode

sourishkrout commented Jul 23, 2024

Use Case: Melange

jlewi commented Jul 24, 2024

sourishkrout commented Jul 25, 2024 •

edited

Loading

jlewi commented Jul 26, 2024

Support executing cells in different gRPC Executor Services - Ephemeral Containers #593

Support executing cells in different gRPC Executor Services - Ephemeral Containers #593

Comments

jlewi commented May 26, 2024

Feature Request

Motivation

What you can do today

kubectl exec

Set gRPC custom address

Desired UX

sourishkrout commented May 28, 2024

jlewi commented Jul 12, 2024

adambabik commented Jul 13, 2024

sourishkrout commented Jul 15, 2024

jlewi commented Jul 16, 2024

Use Case: Melange

Exploratory/Dev Mode

sourishkrout commented Jul 23, 2024

Exploratory/Dev Mode

sourishkrout commented Jul 23, 2024

Use Case: Melange

jlewi commented Jul 24, 2024

sourishkrout commented Jul 25, 2024 • edited Loading

jlewi commented Jul 26, 2024

Long Running Commands over Flaky Connections

sourishkrout commented Jul 25, 2024 •

edited

Loading