Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Exec to the gateway API. #749

Closed
tonistiigi opened this issue Dec 11, 2018 · 23 comments
Closed

Add Exec to the gateway API. #749

tonistiigi opened this issue Dec 11, 2018 · 23 comments

Comments

@tonistiigi
Copy link
Member

gw.Exec should invoke a command similarily to the frontend.Exec does atm. The main difference from Solve is that Exec is never cached. Should be able to take the same arguments that llb.Exec takes (including the special mounts). More complex cases for things like ReadFile could be implemented with Exec.

The mounts of Exec should take References from completed sources as inputs.

@sipsma
Copy link
Collaborator

sipsma commented Apr 7, 2020

To clarify the intent here, if you also added a feature to connect the Exec'd container's stdio to a pty, would this issue enable an "interactive" container? I.e. a container specified by LLB that spawns a usable shell/editor/other TTY-based program?

@tonistiigi
Copy link
Member Author

Yes, I think there is a need to make stdio accessible through Exec. Not 100% sure about the pty as it somewhat of a trap.

For stdio in regular LLB the current proposal is to use #1337 , llb.Expose(llb.Image(""), llb.Pipe(1, 0))

@hinshun

@tonistiigi
Copy link
Member Author

Thinking about it more, I think we definitely need the ability for the user to interact with a build result like a regular container. This is needed for debug scenarios. Maybe we don't necessarily need the frontend to have this access? And if frontend does use Exec it should show up as a line item in build output (probably with studio logs).

@hinshun
Copy link
Collaborator

hinshun commented Jun 26, 2020

We previously started the conversation on Slack, but I'm paraphrasing it here to save the ideas somewhere. cc @coryb

  • At a high-level we want clients to be able to Exec into a container, outside of LLB and not cached.
  • Using this new primitive, you can then build a debugging experience on top, by Execing the result of a failed solve.
  • Let's begin by brainstorming about the UX around Exec, consider the following scenarios and the possible usage.

Scenario 1: Exec from regular LLB

execState := llb.Image("busybox").Run(llb.Shlex("/bin/sh"))

// Approach #1a
p := execState.ToExecProcess()
p.Run(ctx, io, tty)

// Appraoch #1b
execState.Run(ctx, io, tty)

Scenario 2: Exec into failed solve result

var st llb.State
def, _ := st.Marshal(ctx)

res, err := c.Solve(ctx, client.SolveRequest{Definition: def.ToPB()})
if err != nil {
	var er *ExecError
	if errors.As(err, &er) {
        	// Approach #2a
		c.Exec(ctx, er.Meta(), er.ErrorStateMounts(), io, tty)

		// Approach #2b
		p := ToExecProcess(er)
		p.Run(ctx, io, tty)
	}
}

Note that Approach #2b still needs Approach #2a to be implemented, #2b is an object-oriented UX slapped ontop of #2a.

How can we make this happen? (Ongoing list)

  • Add Exec to the LLBBridge in gateway.proto:
    service LLBBridge {
  • Modify Executor and ExecOp to not release mounts after error, and return them inside ExecError.
    • If mounts are unhandled, they are released with the gateway api release

@coryb
Copy link
Collaborator

coryb commented Jul 1, 2020

I am hoping we can agree on a proto definition for this feature, once we have general agreement on this I should be able to work out a rough implementation.

As I see it we will need 2 grpc components:

  1. the Exec rpc call to the LLBBridge service
  2. IO bridge streaming service to facilitate stdio and possibly resize events

1 - Exec RPC

I think something like this will work:

service LLBBridge {
	rpc Exec(ExecRequest) returns (ExecResponse)
}

message ExecRequest {
	pb.ExecOp op = 1;
	repeated uint32 openFds = 2;
}

message ExecResponse {
    // What goes here?  Anything?  Task Exit Status?
}

I think the pb.ExecOp already has all the information we would need to implement the exec operation via the gateway. The openFds would be a list of fds to to open/close for the task. Open Fds will be connected to the IO bridge streaming service. We will need also need a tty flag, and for that I suggest we modify the pb.Meta to include that flag as:

message Meta {
	repeated string args = 1;
	repeated string env = 2;
	string cwd = 3;
	string user = 4;
	ProxyEnv proxy_env = 5;
	repeated HostIP extraHosts = 6;
	bool tty = 7;
}

since that matches the structure already available in the executor.Meta.

2 - IO bridge service

We need some way to stream input/output to the container task. I suggest creating a grpc service on the client side that can be accessed from the server via the session manager, similar to the Attachable SSHForwarder implementation. This is the proto I was thinking of:

service IOBridge {
	rpc IO(stream FdMessage) returns (stream IOInput)
}

message IOInput {
	 oneof input {
		FdMessage file = 1;
		ResizeMessage resize = 2;
	}
}

message FdMessage{
	uint32 fd = 1; // what fd the data was from
	bool eof = 2;  // true if eof was reached
	bytes data = 3;
}

message ResizeMessage{
	uint32 rows = 1;
	uint32 cols = 2;
	uint32 xpixel = 3;
	uint32 ypixel = 4;
}

Thoughts? Adjustments? Counter proposals?

@tonistiigi
Copy link
Member Author

tonistiigi commented Jul 2, 2020

how about

service LLBBridge {
  rpc NewContainer(NewContainerRequest) returns (NewContainerResponse);
  rpc ReleaseContainer(ReleaseContainerRequest) returns (ReleaseContainerResponse);
  rpc ExecProcess(stream ExecMessage) returns (stream ExecMessage);
}

message NewContainerRequest {
	string Ref = 1;
        // For mount input values we can use random identifiers passed with ref
	repeated pb.Mount mounts = 2;
	pb.NetMode network = 3;
	pb.SecurityMode security = 4
}

message ReleaseContainerRequest {
	string Ref = 1;
}


message ExecMessage {
	 oneof input {
                InitMessage init = 1;
		FdMessage file = 2;
		ResizeMessage resize = 3;
		StartedMessage started = 4;
		ExitMessage exit = 5;
	}
}

message InitMessage{
  pb.Meta Meta = 1;
  repeated uint32 fds = 2;
  bool tty = 3;
  // ?? way to control if this is PID1? probably not needed
}

message ExitMessage {
  uint32 Code = 1;
}

message FdMessage{
	uint32 fd = 1; // what fd the data was from
	bool eof = 2;  // true if eof was reached
	bytes data = 3;
}

message ResizeMessage{
	uint32 rows = 1;
	uint32 cols = 2;
	uint32 xpixel = 3;
	uint32 ypixel = 4;
}

I added "container" concept atm to support multiple exec processes. Not sure if needed initially but probably better to be safe for future ideas. The complex part of this is that it does not allow reusing the current executor directly, eg. runc needs to be invoked with create/start/exec calls instead of a single run. Or we mark one process as pid1 and then I think we can use run+exec.

Sending pb.Meta inside the giant one-off object is objectively ugly but this is grpc limitation that we can't pass the initial message on streamable endpoints (unless with unsafe context metadata).

@tonistiigi
Copy link
Member Author

These are the changes in the Executor go API proposed in slack

type ProcessInfo {
  Meta Meta
  Stdin io.ReadCloser
  Stdout, Stderr io.WriteCloser
  // tty, resize
}
type Executor interface {
  Run(ctx context.Context, ref string, rootfs cache.Mountable, mounts []Mount, process *ProcessInfo, allowExec true) error
  Exec(ctx, ref string, process ProcessInfo) error
}

Some investigation is needed if both runs and contained support exec without setting the pid1 process. If not then process info passed to Run is not a pointer or first exec becomes pid1.

Outline:

# Approach 1
1. (LLBBridge).NewContainer
2. (Executor).Run(process=nil, allowExec=true)
3. (LLBBridge).ExecProcess()
4. Send InitMessage(meta, fd, tty)
	4.a. Wait for StartedMessage
5. (Executor).Exec(process{meta, fd, tty})
	5.a. Send StartedMessage
6. Send/recv FdMessage/ResizeMessage
7. (LLBBridge).ReleaseContainer
	7.a. Wait for ExitMessage
8. (Executor) - cancels context
	8.a. Sends ExitMessage
# Approach 2
1. (LLBBridge).NewContainer
2. (LLBBridge).ExecProcess()
3. Send InitMessage()
	3.a. Wait for StartedMessage
4. If first InitMessage: (Executor).Run()
   Else: (Executor).Exec()
	4.a. Send StartedMessage
5. Send/recv FdMessage/ResizeMessage
6. (LLBBridge).ReleaseContainer
	6.a. Wait for ExitMessage
7. (Executor) - cancels context
	7.a. Sends ExitMessage

I think the executor changes can be done in a separate PR before the changes in proto API with some unit tests to ease the review.

@coryb
Copy link
Collaborator

coryb commented Jul 8, 2020

Runc vs Containerd questions

These seems to be the relevant questions that might influence implementation:
a) Can we create a container without defining pid1?
b) Can we exec into container wihtout pid1 running?
c) Can we attach IO to pid1 after started?

Runc

a) No, config.json must have args where the first arg is validated to be in PATH when container is created (not started)
b) Yes, runc exec will automatically create a pid1 of runc init if nothing else is running
c) No, you cannot create then start a container and then attach IO to pid1. Further if terminal=true in config.json you cannot call runc create, only runc run.

Containerd

a) Yes, you do not have to explicitly define a pid1 (will assume defaults from image)
b) No, container must be running before exec, pid1 is required
c) Yes, IO for pid1 can be attached to after started

Proposal

So it seems for a common implementation we will have to assume "No" to all these questions.

After some discussion with @hinshun this seems like a reasonable approach:

=> NewContainer(ref, rootfs, mounts, network, security) message

  • server records this data (in memory, bolt?), containerd or runc is not called at this point

=> ExecProcess.InitMessage(ref, meta, IO) message

  • containerd:
    • If first:
      • (containerd.Store).Create(container={meta + recorded data)
      • (containerd.Container).NewTask(IO)
    • If !first:
      • (containerd.Task).Exec(meta, IO)
  • runc:
    • If first:
      • (runc).Run(meta + recorded data, IO)
    • If !first:
      • (runc).Exec(meta, IO)

@tonistiigi
Copy link
Member Author

@coryb LGTM

@hinshun
Copy link
Collaborator

hinshun commented Jul 8, 2020

This is a summary of some discussion on slack.

How do we specify the inputs for pb.Mount?

When a solve is completed, a ref is returned.

ref, err := c.Solve(ctx, client.SolveRequest{...})

Under the hood, the proto message has a unique identifier.

message Ref {
	string id = 1;
	pb.Definition def = 2;
}

We can extend the proto for mount, so that we can reference this string id instead of an input index:

message Mount {
	# ...
	string inputRef = 6;
}

What would client usage look like? (psuedo-code)

In this scenario, we have some LLB that runs make on busybox.

rootfs := llb.Image("busybox")
src := llb.Git("git://mygit")

st := rootfs.Run(
	llb.Shlex("make"),
	llb.Dir("/in"),
	llb.Mount("/in", src),
).Root()

However, we want to run it as an exec instead of a solve.

rootfsRef, _ := c.Solve(rootfs.Marshal().ToPB())

srcRef, _ := c.Solve(src.Marshal().ToPB())

execRef := uuid.New()

c.NewContainer(client.NewContainerRequest{
	Ref: execRef,
	Mounts: []mount{
		mount{dest: "/", ref: rootfsRef},
		mount{dest: "/in", ref: srcRef},
	},
})

To make this easier, we may be able to wrap this into a method for llb.ExecState:

rootfs.Run(
	llb.Shlex("make"),
	llb.Dir("/in"),
	llb.Mount("/in", src),
).ToNewContainer()

The ToNewContainer function will be blocking as the solves are being run under the hood.

Alternatively, we could carry the LLB definition in NewContainerRequest so that the solve does not need to be client side, but we'll need the (pb.Mount).inputRef anyway for the error state scenario.

ProgressUI

It would be nice to have the progress UI to indicate the exec being called. There's two ways we thought of:

  1. Determine whether it's long-running by scanning completion times.
  2. Adding a request based status indicator that's sent through statusCh.

This is something done a bit higher level than the client code, but the user's display should switch between the progress UI and directing stdio to the exec process. Perhaps something like this:

[+] Building 6.6s (5/5) IDLE
 => CACHED foobar           0.0s
 => EXEC /bin/sh
/ # 
/ # (^P^Q)
...reloading
[+] Building 1.6s (5/5) IDLE
 => CACHED barbaz           0.0s
 => mkfile /msg22           0.0s
 => EXEC /bin/sh
/ # 

@coryb
Copy link
Collaborator

coryb commented Aug 5, 2020

For the client side, afaict it seems like we need to modify the Client interface in frontend/gateway/client.

I was thinking something like this:

type Client interface {
  ...
  NewContainer(ctx context.Context, containerID string, rootFs pb.Mount, opts ...ContainerOpt) error
  NewProcess(ctx context.Context, containerID string, proc executor.ProcessInfo) (status uint32, err error)
  ReleaseContainer(ctx context.Context, containerID string) error
}

type ContainerOpt func(*pb.NewContainerRequest)

The ContainerOpt would be used for adding extra non-rootfs mounts, setting securityMode, netMode:

func WithMount(pb.Mount) ContainerOpt
func WithSecurityMode(pb.SecurityMode) ContainerOpt
func WithNetMode(pb.NetMode) ContainerOpt

Does this seem like the right approach?

@coryb
Copy link
Collaborator

coryb commented Aug 5, 2020

On further thinking, even though the code path for each process is similar, I think we need to separate the pid1 vs exec since if pid1 dies all the exec process will also die. To be consistent with the gateway api, maybe rename NewProcess to ExecProcess for theexec usage, but require an executor.ProcessInfo in the NewContainer call. So on the client side NewContainer sends the NewContainer message, and then the InitMessage to start pid1 and only return when pid1 exits. So maybe something more like this:

type Client interface {
  ...
  NewContainer(ctx context.Context, containerID string, rootFs pb.Mount, proc executor.ProcessInfo, opts ...ContainerOpt) error
  ExecProcess(ctx context.Context, containerID string, proc executor.ProcessInfo) error
  ReleaseContainer(ctx context.Context, containerID string) error
}

It also seems like return status is unnecessary since we can just wrap the error with executor.ExitError

@hinshun
Copy link
Collaborator

hinshun commented Aug 6, 2020

This looks good to me. I would just change WithMount(pb.Mount) to WithMounts(...pb.Mount) and proc executor.ProcessInfo to init executor.ProcessInfo.

@tonistiigi
Copy link
Member Author

tonistiigi commented Aug 6, 2020

I'm not sure where to draw the boundary between llb cliennt and gateway client packages. Maybe gateway client should mostly copy the protobuf methods. Except instead of using the mount ID use the Reference type as it does today. Then don't need even the variadic parameters. But maybe these can be combined into one and then gateway.client can be more expressive.

The reason for having the newcontainer without starting the process was that there could be a synchronization point between build and starting the process. Eg. this is important for switching client stdio from buildkit progressbar to the attached process. I'd use the same started channel for making sure pid1 is running.

For the API that users should use I'd expect more user-friendliness. Eg.

c := container()
c.run()
c.release()

Previously we also discussed ExecState.ToContainer()

@coryb
Copy link
Collaborator

coryb commented Aug 7, 2020

I think a minimal protobuf-ish api would look something like:

	NewContainer(ctx context.Context, req *pb.NewContainerRequest) error
	ReleaseContainer(ctx context.Context, req *pb.ReleaseContainerRequest) error
	ExecProcess(ctx context.Context, containerID string, process executor.ProcessInfo, started chan<- struct{}) error

Not sure we can get much more minimal than that.

To populate the new pb.Mount.resultID I was expecting we would modify the Reference interface to add an ID func like:

type Reference interface {
	...
	ID(ctx context.Context) (string, error)
}

So gateway client code would roughly look something like:

	c.Build(ctx, client.SolveOpt{}, "", func(ctx context.Context, gc gateway.Client) (*gateway.Result, error) {
		st := llb.Image("alpine")
		def := st.Marshal(ctx, llb.LinuxAmd64)
		res := gc.Solve(ctx, gateway.SolveRequest{
			Definition: def.ToPB(),
		})
		rootID := res.Ref.ID(ctx)
		id := identity.NewID()
		gc.NewContainer(ctx, &gateway.NewContainerRequest{
			ContainerID: id,
			Mounts: []pb.Mount{{
				Dest:      "/",
				ResultID:  rootID,
				MountType: pb.MountType_BIND,
			}},
		})
		defer gc.ReleaseContainer(ctx, &gateway.ReleaseContainerRequest{id})
		meta := executor.Meta{
			Args: []string{"/bin/sh"},
			Cwd: "/",
			Tty: true,
		}
                 gc.ExecProcess(ctx, id, executor.Process{meta, os.Stdin, os.Stdout, os.Stderr, resizeCh}, startedCh)
		return &gateway.Result{}, nil
	}, displayCh)

@coryb
Copy link
Collaborator

coryb commented Aug 7, 2020

I was also thinking we could add any more user friendly wrappers to client.Client and/or ExecState.ToContainer() stuff a bit later as necessary. I think getting the gateway.Client will allow us to experiment a bit in HLB without immediate further changes, although I am sure a few simple patterns will emerge pretty quickly and warrant adding to buildkit.

@tonistiigi
Copy link
Member Author

Actually, looking at that example I don't think it makes sense to create wrapping (except ExecState) to gateway.client as that what user has access to. I'd still also prefer just using the ref object directly and leaving the string ID hidden.

wdyt?

	c.Build(ctx, client.SolveOpt{}, "", func(ctx context.Context, gc gateway.Client) (*gateway.Result, error) {
		st := llb.Image("alpine")
		def := st.Marshal(ctx, llb.LinuxAmd64)
		res := gc.Solve(ctx, gateway.SolveRequest{
			Definition: def.ToPB(),
		})
		rootID := res.Ref.ID(ctx)
		id := identity.NewID()
    
    c := gc.NewContainer(ctx, &gateway.NewContainerRequest{
      // no ID
			Mounts: []gateway.Mount{{
				Dest:      "/",
				Ref:       res.Ref,
				MountType: pb.MountType_BIND,
			}},
    })
		defer c.Release(ctx.TODO())
		meta := executor.Meta{
			Args: []string{"/bin/sh"},
			Cwd:  "/",
			Tty:  true,
		}
		err := c.Run(ctx, executor.Process{meta, os.Stdin, os.Stdout, os.Stderr, resizeCh}, startedCh)
    // could also be c.Start() c.Wait(), Start does not take startedCh then
		return &gateway.Result{}, nil
	}, displayCh)

@coryb
Copy link
Collaborator

coryb commented Aug 7, 2020

Okay, works for me. So this is what we are looking at adding to the gateway.client package:

type Client interface {
	...
	NewContainer(ctx context.Context, req NewContainerRequest) (Container, error)
}

type NewContainerRequest struct {
	Mounts []Mount
	NetMode pb.NetMode
	SecurityMode pb.SecurityMode
}

type Mount struct {
	Selector string
	Dest string
	Ref Reference
	Readonly bool
	MountType pb.MountType
}

// Option A)
type Container interface {
	Run(context.Context, executor.ProcessInfo, started <-chan struct{}) error
}

// Option B)
type Container interface {
	Start(context.Context, executor.ProcessInfo) (ContainerProcess, error)
}

type ContainerProcess interface {
	Wait() error
	// B.1) Anything else useful here?
}

Looking at the two options for process management, I am leaning towards option B as it seems easier to use. Not sure if there are any useful abstractions to add into B.1, although without more it is conveniently implemented by errgroup.Group which I would likely just return from Container.Start

@tonistiigi
Copy link
Member Author

@coryb sgtm

Misssing release on container:

type Container interface {
	Start(context.Context, executor.ProcessInfo) (ContainerProcess, error)
        Release(context.Context)
}

With B we could add sending signals later. Could also do resize as a method there. Unless you want to reuse the type from executor package.

@coryb
Copy link
Collaborator

coryb commented Aug 7, 2020

@tonistiigi Yeah, I was originally thinking to use the executor.ProcessInfo directly to deal with io and resize, but now I am leaning towards adding Resize to the ContainerProcess.

Last question (for now): Should I add these changes to #1627 along with tests? Or we can merge #1627 without tests (I think I need these gateway client updates to effectively test the server side) and I create a new PR for client + tests?

@tonistiigi
Copy link
Member Author

Last question (for now): Should I add these changes to #1627 along with tests? Or we can merge #1627 without tests (I think I need these gateway client updates to effectively test the server side) and I create a new PR for client + tests?

I guess depends on how long the client-side takes. #1627 would be somewhat easier to review if there is some code actually calling the new functions.

@coryb
Copy link
Collaborator

coryb commented Aug 7, 2020

Sounds good, I will update #1627, should be able to get it done next week.

@tonistiigi
Copy link
Member Author

#1627 is merged and follow-up issues have been opened separately so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants