Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the ECS RunTask API to run attached processes #1043

Merged
merged 1 commit into from
Mar 14, 2017
Merged

Use the ECS RunTask API to run attached processes #1043

merged 1 commit into from
Mar 14, 2017

Conversation

ejholmes
Copy link
Contributor

@ejholmes ejholmes commented Mar 1, 2017

Not ready to merge.

This is a prototype for switching emp run to use the ECS RunTask API, instead of calling out to Docker directly. The primary motivation behind this is to allow containers started with emp run to use AWS Roles for ECS tasks. This is described in https://github.com/remind101/empire/wiki/RFC:-Use-RunTask-API-for-attached-runs.

There's some serious caveats with this implementation at the moment:

  1. Because we use AttachToContainer, the container needs to be started with the Tty and OpenStdin flags set to true. Unfortunately, the ECS API and the ECS agent don't currently provide a method to pass these down. We would currently need to fork the agent.
  2. I haven't yet addressed backwards compatibility. It may be a good idea to provide a flag to use the old method for running attached runs entirely through Docker.

On the plus side, this sets up some infrastructure for adding an emp exec/attach command for starting up a process in an existing container, or attaching to a running process.

TODO

  • Figure out how to pass down Tty/OpenStdin flags.
  • Add the ability to limit attached runs to a group of hosts.
  • Enable this feature with a flag.
  • Add ability to limit normal containers to a group of hosts.
  • Docs
    • Expiremental
    • Document Docker TLS requirements
    • Document need for ECS agent patch.

@ejholmes
Copy link
Contributor Author

ejholmes commented Mar 1, 2017

To expand a bit on the idea of adding a method to provide the Tty and OpenStdin flags to CreateContainer in the ECS agent, now that Go 1.8 is released and has support for plugins, we could add a feature in the ECS agent for "task plugins", to manipulate the task before starting. I'll open an issue on the amazon-ecs-agent repo to see if there would be any interest in supporting something like that.

@phobologic
Copy link
Contributor

Oh that would be awesome if they were down for it.

@ejholmes ejholmes force-pushed the ecs-run branch 2 times, most recently from a66a9f2 to 390fb78 Compare March 7, 2017 01:57
@@ -57,6 +57,8 @@ const (
FlagECSServiceRole = "ecs.service.role"
FlagECSLogDriver = "ecs.logdriver"
FlagECSLogOpts = "ecs.logopt"
FlagECSAttachedEnabled = "ecs.attached.enabled"
FlagECSAttachedPlacement = "ecs.attached.placement"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also allow you to set placement constraints for normal procs and detached procs. If not (and assuming you want to restrict attached processes to a group of hosts), normal tasks could get scheduled onto the hosts that you've designated for attached runs.


// Wait for the task to start running. It will stay in the
// PENDING state while the container is being pulled.
if err := m.ecs.WaitUntilTasksRunning(&ecs.DescribeTasksInput{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually wait until it's RUNNING or STOPPED. If you run something that returns quickly (e.g. emp run echo hello world), the task could hit STOPPED before this method had a change to actually see it was running. It should be fine to AttachToContainer on a stopped task, since the ECS agent waits 3 hours to cleanup containers.

@ejholmes
Copy link
Contributor Author

Ok, @phobologic. If you wanna give this another look, I think this is probably good to go. We've been running this on staging, without any hitches so far.

Primary changes since the last review:

  1. Added docs/caveats about the new method, and requirements for patching the ECS agent.
  2. Fixed some bugs with attaching to tasks that enter RUNNING for a brief period of time.
  3. Stripped out support for placement constraints. This requires a more thoughtful implementation, and isn't necessary for most use cases. I'll follow up on this later.
  4. Added an Attaching to task/<task id>... message, since we no longer see the Docker pull output (yay!).

The first commit can just be ignored, since it's just an update to the AWS SDK.

@ejholmes
Copy link
Contributor Author

Noticed some zombie procs lying around. This should probably try a StopTask after detaching.

Copy link
Contributor

@phobologic phobologic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

CHANGELOG.md Outdated
@@ -8,6 +8,7 @@
* An ALIAS record is now created for `<process>.<app>.<zone>` [#1005](https://github.com/remind101/empire/pull/1005)
* You can now provide a `-p` flag to the `emp cert-attach` command to attach a certificate to a specific process (instead of just `web`). [#1014](https://github.com/remind101/empire/pull/1014)
* Empire now supports a SAML authentication backend. [#1017](https://github.com/remind101/empire/pull/1017)
* Empire supports a new (experimental) feature to enabled attached processes to be ran with ECS. [#1043](https://github.com/remind101/empire/pull/1043)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to enabled -> to enable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll update.

@@ -310,6 +311,11 @@ var EmpireFlags = []cli.Flag{
Usage: "Log driver to options. Maps to the --log-opt docker cli arg",
EnvVar: "EMPIRE_ECS_LOG_OPT",
},
cli.BoolFlag{
Name: FlagECSAttachedEnabled,
Usage: "When enabled, indicates that ECS tasks can be attached to, using `docker attach`. When provided, this will also use ECS to run attached processes.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this requiring the patched agent, it might be worth pointing at the doc that details what this means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll update.

*ecs.ECS
}

func (c *ECS) WaitUntilTasksNotPending(input *ecs.DescribeTasksInput) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my linter is going to complain about no docstring here :)

// Amazon ECS Container Agent, since the official agent doesn't
// provide a method to pass these down to the `CreateContainer`
// call.
containerDefinition.DockerLabels["docker.config.Tty"] = aws.String("true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tty flag allocates a psuedo TTY for the container, and OpenStdin opens the stdin stream.

Without OpenStdin, running something like emp run bash would just exit as soon as bash hit the prompt. Without Tty, I think most commands that expect a TTY/terminal wouldn't work (If I try docker run -i ubuntu bash (missing -t flag), my terminal just hangs).

Copy link
Contributor

@phobologic phobologic Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ah, right - not sure why I didn't tie this to docker run --it


// Wait for the task to start running. It will stay in the
// PENDING state while the container is being pulled.
if err := m.ecs.WaitUntilTasksNotPending(&ecs.DescribeTasksInput{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we need some sort of timeout here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will timeout after 10 minutes (6 second delay between refreshes, and 100 retries).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// instance where the task is running.
d, err := m.NewDockerClient(ec2Instance)
if err != nil {
return fmt.Errorf("error establishing docker client connection to %s", aws.StringValue(ec2Instance.InstanceId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this bubble up with the task id, etc? Might want to put some more info in this so it's easier to figure out what happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not atm. I'll add that information to the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it does already show this, since it's output above the error (Attaching to task/<task id>...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but this is totally missing the err.

@ejholmes ejholmes force-pushed the ecs-run branch 3 times, most recently from aef931d to 2f792d9 Compare March 14, 2017 03:36
@ejholmes
Copy link
Contributor Author

This has been working out pretty well for us in our staging environment without any major issues/bugs. Gonna go ahead and merge.

@ejholmes ejholmes merged commit 5bcfd21 into master Mar 14, 2017
@ejholmes ejholmes deleted the ecs-run branch March 14, 2017 04:31
@ejholmes ejholmes mentioned this pull request Sep 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants