Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Feature: Spot Fleet support for worker nodes #112

Closed
7 tasks done
mumoshu opened this issue Nov 30, 2016 · 17 comments
Closed
7 tasks done

Feature: Spot Fleet support for worker nodes #112

mumoshu opened this issue Nov 30, 2016 · 17 comments

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Nov 30, 2016

Quite self explanatory but I'd like to add this to kube-aws.

Upstream issue: kubernetes/kubernetes#24472

Initial Implementation in this project: #113
Documentation: https://github.com/coreos/kube-aws/blob/master/Documentation/kubernetes-on-aws-node-pool.md#deploying-a-node-pool-powered-by-spot-fleet

Spot fleet backed worker nodes are supported since v0.9.2-rc.3:

# Launch a main cluster
kube-aws init ...
kube-aws render
kube-aws up ...

# Launch a node pool powered by Spot Fleet
kube-aws node-pools init --node-pool-name mypoolname ...
echo -e "worker:\n  spotFleet:\n    targetCapacity: 3\n" >> node-pools/mypoolname/cluster.yaml
kube-aws node-pools render --node-pool-name mypoolname
kube-aws node-pools up --node-pool-name mypoolname --s3-uri ...

An experimental feature to automatically taint nodes with user-provided taints is supported since v0.9.2-rc.4(not yet released) so we can ensure only pods tolerant to frequent node terminations are scheduled to spot instances/spot-fleet-powered nodes:


Utilizing Spot Fleet gives us chances to dramatically reduce cost being spent on EC2 instances powering Kubernetes worker nodes
AWS says cost reduction is up to 90%. I can confirm that in my daily used region ap-northeast-1 it is up to 89% right now, with slightly varying cost for each instance type.

I believe that on top of the recent work on Node Pools #46, it is easier than ever to implement a POC of the Spot Fleet support.
I'll send a pull request to show it shortly.
I'd appreciate your feedback(s)!

Several concerns I've came up with until now:

  • cluster-autoscaler doesn't support Spot Fleets
    • If you want to make nodes in a spot fleet auto-scaled, you probably need to tinker resulting CloudFormation templates to include appropriate configuration. See https://aws.amazon.com/jp/blogs/aws/new-auto-scaling-for-ec2-spot-fleets/ for the official announcement of autoscaling for fleets.
    • Upstream issue: Cluster-autoscaler: AWS EC2 Spot Fleets support contrib#2066
      • We need to teach cluster-autoscaler how it selects which node pool to expand
        • It shouldn't select an ASG which is suspended and a spot fleet which all/part of groups are beyond the bid price
        • If a pending node can be scheduled in an ASG or a spot fleet, it should select the one according to user preference
  • It seems there's no way to use cfn-signal like we did for standard, asg-based worker nodes to hold CloudFormation's creation/update completion until e.g. kubelet's become ready
  • I'm not yet sure how we can rolling-update nodes in a spot fleet like we did for standard asg-based worker nodes.
  • I'm assuming users already have the aws-ec2-spot-fleet-role IAM role created in their AWS accounts automatically by accessing Spot Fleets in AWS Console at least once
    • But if an user had not, kube-aws nodepool up will fail while copy-pasting an error message from CloudFormation like IAM role aws-ec2-spot-fleet-role doesn't exist, which may be useless to the user as it doesn't provide any information to notify the user needed to arrive Spot Fleet in AWS console at least once
    • We could create such IAM role like described in https://cloudonaut.io/3-simple-ways-of-saving-up-to-90-of-ec2-costs/, instead of assuming/referencing the possibly existing IAM role

TODOs:

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 1, 2016

Edited several times to cover more TODOs and concerns.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 5, 2016

Addressed the integration tests in f46c711#diff-17c3b4ff0a8d67faed426a76a03f8430R1

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 5, 2016

I'm going to add support for node labels and taints in another pull request(s).

The context is that If we don't need to mix up various types of nodes but just need to use Spot Fleets, labels and taints are not required. So I believe I can cut the #113 now and deliver it so that we can start supporting some use-cases.

For an another use-case like "I want to mix up various types of nodes for blah-blah-blah`, I can author an another pull request addressing labels and taints.

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 6, 2016
See updated nodepool/config/templates/cluster.yaml for the detailed guide of configuration.

This is the initial implementation for kubernetes-retired#112
Beware that this feature may change in backward-incompatible ways
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 6, 2016

The initial implementation for this is now merged into master.
I'm going to work on experimental support for adding custom and probably automatic node labels and taints to node pools next.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 6, 2016

Btw: I don't mean to show off, but my personal project https://github.com/mumoshu/kube-spot-termination-notice-handler would be useful for anyone wants to gracefully stop pods running on spot instances when your spot fleet lost to bids.

Deploying it to your spot instances allows you to automatically run kubectl drain on the node 2 min before its termination thus more time to gracefully reschedule pods.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 7, 2016

Though I thought it would be nice to add initially, do we really need the feature to add user-provided labels to worker nodes?
I'm not aware of exact use-cases for user-provided node labels as we already have taints to implement dedicated nodes.

@pieterlange
Copy link
Contributor

I think this is useful; operators might want to restrict pod scheduling to certain node pools because of node capabilities or security domains.

Example usecase: i have the majority of nodes in private subnets but i start a few in public subnets because they need to directly expose some service to the internet on. With node labels i restrict those pods to the public nodes.

Ref http://kubernetes.io/docs/user-guide/node-selection/

@cknowles
Copy link
Contributor

cknowles commented Dec 7, 2016

Do we know the difference between a node selector on a pod and a taint toleration on a pod? It seems like they could achieve similar things as far as I can see in the taint design docs.

@mumoshu mumoshu changed the title Proposal: Spot Fleet support Feature: Spot Fleet support for worker nodes Dec 8, 2016
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 8, 2016

I'm now taking a look into an issue that worker nodes brought up from spot-fleet-enabled node pool often fail to register themselves.

More specifically, if you've created 2 or more nodes backed by a spot fleet, only one of them are registered. Making TargetCapacity larger hence adding nodes seems to consistently result in spot instances successfully get launched but their corresponding nodes unregistered.

kubelet does report that it successfully registered node. However immediately after that kubelet starts complaining the node it just registered can not be found.
Running systemctl restart kubelet.service on a problematic node doesn't fix the issue.
Also, it doesn't happen in only one of spot instances in a spot fleet.
It doesn't happen in autoscaling-group-based node pool either.
Nonsense!


I suspect that missing KubernetesCluster tags on spot instances results in such a behavior, for now. That tag is a prerequisite for Kubernetes to work according to the upstream doc.

Edit: Bingo! Putting a tag named KubernetesCluster on a problematic spot instance and systemctl restart kubelet.service worked. The node status became READY after approximately 20 seconds after kubelet had restarted.

I'll shortly submit a pull request to address this.

Maybe I'll utilize the quay.io/coreos/awscli docker image to run a command almost the same as what @innovia described in his comment to the upstream issue.

aws ec2 create-tags --region {{.Region}} --resources $INSTANCE_ID \
  --tags "Key=KubernetesCluster,Value={{.ClusterName}}"

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 8, 2016

Btw, just noticed that Experimental.LoadBalancer.Names is (not taken into account for|completely ignored in) spot-fleet-backed node pools. Added to TODOs.

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 8, 2016
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 9, 2016

@pieterlange In that case, wouldn't you like to use taints rather than labels? IMHO taints are more failproof than labels.

If you've used taints to implement dedicated nodes, pods missing tolerations won't be scheduled to anywhere thus you can ensure that only the desired pods are scheduled to desired nodes. On the other hand, if you've used labels, pods missing node selectors will end up with completely useless deployment - pods get distributed over both private and public nodes.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 9, 2016

@c-knowles Both taints and labels could be used to select subset of nodes to schedule pods.
But labels seem to be a bit less failproof than taints when used to implement dedicated nodes because they don't reject pods missing node selectors,

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 10, 2016

IMHO it is perfectly fine to use node labels for purposes other than dedicated nodes(=reserved for specific pods).
Now, I'm not opposed to add support for user-provided node labels to kube-aws.

An example use-case for node labels could be that running administrative tasks on subsets of nodes.
In such case, tolerations could be used in combination with node labels so that users could optionally include dedicated nodes into the subset.

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 14, 2016
This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112)

The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.
@mumoshu mumoshu added this to the v0.9.3-rc.1 milestone Dec 15, 2016
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 15, 2016

All the remaining TODOs are going to be addressed in v0.9.3-rc.1

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 16, 2016
… spot fleet

to conform these nodes to the ones powered by a autoscaling group

ref kubernetes-retired#112
see also http://docs.aws.amazon.com/cli/latest/reference/ec2/create-tags.html and http://stackoverflow.com/a/1250279 for implementation details
mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 16, 2016
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 16, 2016

All the TODOs have been addressed.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 16, 2016

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 19, 2016

Closing this issue as the initial iterations to bring the feature have finished.

@mumoshu mumoshu closed this as completed Dec 19, 2016
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112)

The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
… spot fleet

to conform these nodes to the ones powered by a autoscaling group

ref kubernetes-retired#112
see also http://docs.aws.amazon.com/cli/latest/reference/ec2/create-tags.html and http://stackoverflow.com/a/1250279 for implementation details
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants