-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECS placement constraints/strategy #1059
Conversation
One thing worth mentioning is that, I can see cases where you might want these to be runtime configurable per environment. For example, you might have a CPU intensive task that you want to run on a I think, in these cases, the best thing to do is setup "named profiles" using container instance attributes, that map to their production/staging equivalents. For example, in the scenario above, I might add a web:
ecs:
placement_constraints:
- { type: memberOf, expression: "attribute:profile = cpu" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I saw this last night that was going to be my first comment - it seems likely that we'd want to control placement with config, rather than with the code.
I think your method for dealing with that is probably a good one, especially since it keeps things simple.
Only suggestion here is to at least update the Procfile documentation (btw, I don't think we have a Procfile specific doc - if not, its probably getting complicated enough it deserves it's own page :)).
Could you also put your suggestion for managing environment specific placements in that doc?
default: | ||
err = fmt.Errorf("%s is not supported", req.RequestType) | ||
} | ||
func (p *InstancePortsResource) Create(_ context.Context, req customresources.Request) (string, interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man - does your linter not freak out at you for not having comments describing exported methods? It gets annoying - I'm wondering if I should just find a way to disable it, or if we should try and get better about listening to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I disabled that a long time ago because you just end up with a bunch of Thing does thing
doc comments to appease the linter.
Good call. I'll update the procfile doc with information on that. The Procfile isn't documented in docs/ but there's a readme in ./procfile where all the options are documented, which the docs point to. |
Another thing to think about here is that, if you're including a bunch of different instance types in a single ECS cluster, you probably want services that don't specify any placement constraints to default to some constraints, so they don't go bouncing around a bunch of different instance types. Some kind of first class "profile" support, that maps to some ECS placement constraints might be a better way to go initially. So, when booting Empire, you'd give it a number of named "profiles", which map to ECS placement constraints, and then you specify a default profile. For example: {
"default": {
"placement_constraints": [
{ "type": "memberOf", "expression": "attributes:ecs.instance-type = m3.*" }
]
},
"memory": {
"placement_constraints": [
{ "type": "memberOf", "expression": "attributes:ecs.instance-type = x1.*" }
]
}
} Users would be able to define what "profile" to use for a process in the Procfile: worker:
profile: memory Or they could change this at runtime: $ emp scale worker --profile memory -a <app> Or provide it via $ emp run ./bin/job --profile memory -a <app> Having a simple "named profile" provides users with a lot of simplicity, but still gives operators enough power to define as many profiles as needed, and document those within their org. Food for thought 🤔 |
c0dca8e
to
bbf2b3b
Compare
Alright, I'm feeling pretty good about this one now. I'll start doing some more testing, and giving it a whirl on our staging env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good - it's a big change (seems like there was a bit of cleanup in this PR as well?).
return nil, fmt.Errorf("unable to unmarshal placement constraints: %v", err) | ||
} | ||
log.Println(fmt.Sprintf(" DefaultPlacementConstraints: %v", placementConstraints)) | ||
return twelvefactor.Transform(s, setDefaultPlacementConstraints(placementConstraints)), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, so totally unrelated to this review in particular, but this finally helped me nail down why I really dislike the GoLang pattern of using 1-2 character variables. Take s
for example here. In the review, I can't see what s is without expanding the lines above twice. If it had been named scheduler
, I wouldn't even have to expand it, it would have made perfect sense in it's own context, which makes for easier code reviews. Python has it right ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, this overall method is getting a little long. Worth refactoring to break it up into smaller methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can see how that could make reviewing difficult. I tend to just follow the language communities preference for things like this. In the case of Go: https://github.com/golang/go/wiki/CodeReviewComments#receiver-names.
In regards to breaking it up, there's not much within this that's re-usable, so breaking it up might make it hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't get their dislike of descriptive names. Since when did we run out of bytes for descriptive code?! :)
cmd/empire/main.go
Outdated
@@ -334,6 +335,12 @@ var EmpireFlags = []cli.Flag{ | |||
EnvVar: "EMPIRE_ECS_DOCKER_CERT_PATH", | |||
}, | |||
cli.StringFlag{ | |||
Name: FlagECSPlacementConstraintsDefault, | |||
Value: "", | |||
Usage: "ECS placement constraints to set when a process does not set any.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include a "If this is not set, then this will happen" bit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'll expand on this before it gets merged.
proc := Process{ | ||
Quantity: 1, | ||
} | ||
var proc Process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about this method - it's getting a little long, worth breaking it up?
}); err != nil { | ||
return fmt.Errorf("error pulling image: %v", err) | ||
} | ||
for _, p := range app.Processes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty big change - there will only ever be a single process, right? Is this just short circuiting needing the p
since that's already represented as an array in app.Processes
and it only has a single value when Run
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda addressed this below, but the overall change is basically a noop, since Empire core only ever passes 1 process, so the end result is the same. It does mean that Empire core could pass multiple processes to run them at once, but primarily this was done just to simplify the method signature since the manifest already contains all the process information.
@phobologic I just rebased this into smaller logical commits if it makes reviewing a little easier. The actual change to implement ECS placement constraints and strategies is in the last commit, which is relatively small. Commits 1-4 probably don't need too much attention, since they're mostly internal changes to make the last commit easier.
I'll follow up on the rest of the review comments once we've had a chance to test this in our staging env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at commit 5 and yeah, looks good.
scheduler/cloudformation/template.go
Outdated
"Expression": c.Expression, | ||
}) | ||
} | ||
serviceProperties["PlacementConstraints"] = placementConstraints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be added to the task definition, rather than the service. Reason being, this isn't updatable on a service, which would require a replacement of the service. If we do it on the task definition, then no replacement required. Placement strategies can only be applied to the service, so that would need to remain.
…visioner. Previously, these resources managed the logic for performing resource replacement themselves. After this change, they are based on the `privisioner` type, which makes it simpler to determine what requires a replacement by using a hash.
This commit simplifies the `twelvefactor.Scheduler.Run` interface method to only take a `twelvefactor.Manifest` instead of both a `twelvefactor.Manifest` and `twelvefactor.Process`. The manifest contains the processes that should be run. The commit also changes the `emp run` functionality, so that when a "named" command is run, it will use the stored process configuration (e.g. constraints) instead of the defaults.
This commit simplifies the `twelvefactor.Scheduler.Restart` method to only pass the App ID, instead of a full manifest. The existing implementation required the full manifest for the legacy ECS scheduler, which is no longer required in the CloudFormation world.
This commit adds a `twelvefactor.Transform` method, which returns a wrapped `twelvefactor.Scheduler` that will transform the `twelvefactor.Manifest` before passing it to the downstream scheduler. This can be used to, for example, add default placement constraints to processes that don't define any.
This commit adds support for specifying ECS placement constraints and placement strategies in the extended Procfile format.
This is working pretty nicely in our staging environment. I'll go ahead and pull this in and fix any issues the crop up before the next release. |
This adds support to the procfile for specifying ECS placement constraints and placement strategies.
I cherry-picked #941 into this, since it makes it easier, so may be better to only look at the second commit.