Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support spot fleet requests and container instance draining #40

Merged
merged 9 commits into from Nov 29, 2018

Conversation

Projects
None yet
3 participants
@abicky
Copy link
Member

abicky commented Nov 26, 2018

This PR introduces EcsDeploy::AutoScaler::SpotFleetRequestConfig to support spot fleet requests.
In addition, I've implemented the feature of container instance draining (EcsDeploy::AutoScalerInstanceDrainer). This feature supports spot instances in both of spot fleet requests and auto scaling groups.

I think the diff of README helps you to understand the summary of this PR.

The main changes are the following commits:

@abicky abicky force-pushed the feature/support-spot-fleet-request branch from 50a0307 to 013305b Nov 26, 2018

@abicky abicky force-pushed the feature/support-spot-fleet-request branch from 013305b to ef68076 Nov 26, 2018

@abicky abicky requested review from joker1007 and YAMASHITAHiroki Nov 26, 2018

@abicky abicky self-assigned this Nov 26, 2018

@abicky abicky added the review label Nov 26, 2018

@abicky

This comment has been minimized.

Copy link
Member Author

abicky commented Nov 26, 2018

@joker1007 @YAMASHITAHiroki Please review.


def calculate_active_instance_capacity(cluster)
cl = ecs_client
total_cpu = cl.list_container_instances(status: "ACTIVE").sum do |resp|

This comment has been minimized.

Copy link
@joker1007

joker1007 Nov 27, 2018

Collaborator

Why don't you use wait_until method and :spot_instance_request_fulfilled waiter?

This comment has been minimized.

Copy link
@abicky

abicky Nov 27, 2018

Author Member

Unfortunately, :spot_instance_request_fulfilled waiter is for spot instance requests, not spot fleet requests 😢

This comment has been minimized.

Copy link
@joker1007

joker1007 Nov 27, 2018

Collaborator

I see. Thanks!!

@joker1007
Copy link
Collaborator

joker1007 left a comment

I have one question.

asg_config.update_auto_scaling_group(total_service_count, configs[0])
asg_config.detach_and_terminate_orphan_instances(configs[0])
required_capacity = configs.inject(0) { |sum, s| sum + s.desired_count * s.required_capacity }
cluster_scaling_config.update_desired_capacity(required_capacity, configs[0])

This comment has been minimized.

Copy link
@joker1007

joker1007 Nov 27, 2018

Collaborator

I think that there is one probrem.

It seems to modify desired count of ECS service before autoscaler modifies desired capacity of Spot Fleet.
If autoscaler decreases count of ECS service, spot fleet instance is drained after a ECS task is terminated.
But we cannot control targets that are to be drained. In other words, instances that has a running task may be drained.
Because of this, Extra ECS services is terminated by instance draining.

What do you think about it?

This comment has been minimized.

Copy link
@abicky

abicky Nov 27, 2018

Author Member

You're right, but I think it doesn't matter in practice for the following reasons:

  • The cluster has enough resources to launch extra tasks when autoscaler decreases the desired tasks of the ECS service, so new tasks will be launched usually within 1 minute even if some container instances are drained.
  • The service is likely to have a little extra tasks when downscale is triggered, so the service has enough tasks even if some tasks terminated by instance draining temporarily.

However, I have a concern that upscale might be triggered easily after instance draining. I think users of autoscaler should adjust triggers in such a case.

This comment has been minimized.

Copy link
@joker1007

joker1007 Nov 27, 2018

Collaborator

OK, this probrem is acceptable.
I think that we should have priority to simplicity of programs.
Thanks!!

@joker1007
Copy link
Collaborator

joker1007 left a comment

LGTM

end

ths.each(&:join)

drainer&.stop

This comment has been minimized.

Copy link
@YAMASHITAHiroki

YAMASHITAHiroki Nov 28, 2018

https://github.com/reproio/ecs_deploy/pull/40/files#diff-a841075766b0523896658ad8659a093aR39

I think that to raise exception if @config["spot_instance_intrp_warns_queue_urls"] is nil.

irb(main):001:0> drainer&.stop
Traceback (most recent call last):
        2: from /Users/hirokiyamashita/.rbenv/versions/2.5.1/bin/irb:11:in `<main>'
        1: from (irb):1
NameError (undefined local variable or method `drainer' for main:Object)
irb(main):002:0>

This comment has been minimized.

Copy link
@abicky

abicky Nov 28, 2018

Author Member

I don't think so. drainer will be nil if @config["spot_instance_intrp_warns_queue_urls"] is nil.
For example, the following code outputs nil.

if false
  v = 1
end
p v&.to_s

This comment has been minimized.

Copy link
@YAMASHITAHiroki

YAMASHITAHiroki Nov 29, 2018

Sorry, you are right.

@YAMASHITAHiroki
Copy link

YAMASHITAHiroki left a comment

LGTM 👍

end

ths.each(&:join)

drainer&.stop

This comment has been minimized.

Copy link
@YAMASHITAHiroki

YAMASHITAHiroki Nov 29, 2018

Sorry, you are right.

@abicky abicky added ready and removed review labels Nov 29, 2018

@abicky

This comment has been minimized.

Copy link
Member Author

abicky commented Nov 29, 2018

Thanks for your reviews!

@abicky abicky merged commit 5f3a6b4 into master Nov 29, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@abicky abicky deleted the feature/support-spot-fleet-request branch Nov 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.