Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New integration: Spotinst #5922

Merged
merged 17 commits into from Oct 15, 2018
Merged

Conversation

liranp
Copy link
Contributor

@liranp liranp commented Oct 11, 2018

This PR adds Spotinst as an extension of the existing cloud providers. As part of this implementation, AWS Auto Scaling Groups have been replaced with Spotinst Elastigroups.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 11, 2018
Copy link
Contributor

@chrisz100 chrisz100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liranp - much better!
Some remarks below.

@@ -139,6 +139,10 @@ type CreateClusterOptions struct {
// We can remove this once we support higher versions.
VSphereDatastore string

// Spotinst options
SpotinstProduct string
SpotinstOrientation string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference here? No worries about implementation, but I just want to understand what these things affect.

Copy link
Contributor Author

@liranp liranp Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are cluster-wide options compared to those labels which can be used to toggle on/off some features at the instance group level.

SpotinstProduct is the product description associated with the Spot Instance while SpotinstOrientation is the prediction strategy (Valid values: balanced, cost, equal-distribution and availability. Defaults to balanced).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this at some point within kops as well? If we just add this and tell people that well if you want to use spotinst look at their docs could be a bad user experience. Just a few words with a link to a more detailed explanation at your own documentation would be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same here - can you add this to our documentation? A few words and a link to more details on your website would be more then enough!


if featureflag.SpotinstIntegration.Enabled() {
// Spotinst flags
cmd.Flags().StringVar(&options.SpotinstProduct, "spotinst-product", options.SpotinstProduct, "Set the product code.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What product codes do you offer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the product description associated with the Spot Instance (see productDescription).

Valid values:

  • Linux/UNIX
  • Linux/UNIX (Amazon VPC)
  • Windows
  • Windows (Amazon VPC)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a reference to this into the help output there? Without I see many people feeling lost and as it's just four that should be quite straight forward to show them the options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisz100 Thanks, fixed.

@@ -76,8 +76,10 @@ func awsValidateMachineType(fieldPath *field.Path, machineType string) field.Err
allErrs := field.ErrorList{}

if machineType != "" {
if _, err := awsup.GetMachineTypeInfo(machineType); err != nil {
allErrs = append(allErrs, field.Invalid(fieldPath, machineType, "machine type specified is invalid"))
for _, typ := range strings.Split(machineType, ",") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machine type seemed to be a single machine string. how can this be multiple types at the same time now? Where does this come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotinst Elastigroup supports multiple instance types (and even weights, if needed), for example:

{
  "group": {
    ...
    "compute": {
      ...
      "instanceTypes": {
        "ondemand": "c3.large",
        "spot": [
          "c3.large",
          "c4.large",
          "m3.large",
          "r3.large"
        ]
      }
      ...
    }
    ...
  }
}

By leveraging multiple instance types (and/or availability zones), Spot Instances are far less volatile as there is a far greater chance that Spotinst will find an unused instance for one of the fitting instance types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this with kops and if that doesn't backfire when not using spotinst?

Copy link
Contributor Author

@liranp liranp Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works fine for me after installing this fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@@ -77,6 +77,9 @@ var GoogleCloudBucketAcl = New("GoogleCloudBucketAcl", Bool(false))
// EnableNodeAuthorization enables the node authorization features
var EnableNodeAuthorization = New("EnableNodeAuthorization", Bool(false))

// SpotinstIntegration toggles the use of Spotinst integration.
var SpotinstIntegration = New("SpotinstIntegration", Bool(false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this simpler - just call it "spotinst" - as you activate handling via spotinst and don't activate the possibility to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

// Cloud config.
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this { for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


SecurityLifecycle: &securityLifecycle,
})
if featureflag.SpotinstIntegration.Enabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just leave the deleted part intact and overwrite it if spotinst is active. That way we always have it set, no matter what happens and spotinst will just overload it.

@chrisz100
Copy link
Contributor

chrisz100 commented Oct 11, 2018

Also, please have a look at travis as the formatting is incorrect with upup/pkg/fi/cloudup/apply_cluster.go
FAIL: please execute make gofmt

@chrisz100
Copy link
Contributor

/assign @chrislovecnm @justinsb

@chrisz100
Copy link
Contributor

/ok-to-test

Thanks, appreciate it @liranp !

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 11, 2018
LaunchConfigurationName: input.LaunchConfigurationName,
LoadBalancerNames: input.LoadBalancerNames,
MaxSize: input.MaxSize,
MinSize: input.MinSize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI go 1.11 changed gofmt, and we're still validating against go 1.10 gofmt rules (until k8s/kops 1.13 which moves to 1.11)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb Thanks, fixed.

// Cloud config.
if cfg := b.Cluster.Spec.CloudConfig; cfg != nil {
// Product.
if cfg.SpotinstProduct != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It looks like you don't need the nil check here (nor for SpotinstOrientation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! Fixed.

group.UtilizeReservedInstances = fi.Bool(true)
} else if v == "false" {
group.UtilizeReservedInstances = fi.Bool(false)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A warning here is the value is "bad" would probably save you a lot of head-scratching. I typically like to strings.TrimSpace & strings.ToLower. Same for InstanceGroupLabelFallbackToOnDemand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ig.Spec.MaxSize != nil {
maxSize = *ig.Spec.MaxSize
} else if ig.Spec.Role == kops.InstanceGroupRoleNode {
maxSize = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "pure" AWS uses 2 here - any reason to switch the default behaviour?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb don't we have some file to store default values like these? I am not a big fan of having default values like that somewhere in the models. Makes the behaviour so unpredictable in some cases.

Copy link
Contributor Author

@liranp liranp Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (same as the AWS implementation).

} else if v == "false" {
autoScalerDisabled = false
}
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have a function to parse these boolean labels (not a blocker)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[projects]]
branch = "master"
digest = "1:b410da742bf89fa5f255ef1d95bdf42d2a263a0c8f35272b8d5b4faff49017cf"
name = "github.com/spotinst/spotinst-sdk-go"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a LICENSE to this library please? I know Apache 2 is OK for kubernetes projects, not sure what other licenses are OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -325,6 +336,10 @@ func NewEC2Filter(name string, values ...string) *ec2.Filter {

// DeleteGroup deletes an aws autoscaling group
func (c *awsCloudImplementation) DeleteGroup(g *cloudinstances.CloudInstanceGroup) error {
if c.spotinst != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make it hard to "graduate" spotinst from using a feature-flag - we'll probably need a field somewhere. But we can get this in and then figure it out.. One option that might help is that AWSCloud is an interface, so it might be possible to have a spotinst implementation (even if it delegated most of the functionality to an awsCloudImplementation)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to @justinsb here - flagging spotinst to be active is one way of using it, but in the future it would be nice to just have it as a separate provider you select, that then falls back to the cloud if no special implementation was given. From what I understand you just use your own autoscaling implementation, but that also won't change that much for other clouds as you abstract that away in your own api's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisz100 You're right about our implementation. I'm not sure I'm understanding your suggestion. How it differs from the implementation introduced in #5195 (where we just wrapped the AWS implementation)?

if lc.UserData != nil {
userData, err := base64.StdEncoding.DecodeString(*lc.UserData)
if err != nil {
return nil, fmt.Errorf("spotinst: error decoding user data: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I think (hope) when this is printed it'll be printed alongside the failing task, so you likely don't need the spotinst: bit. But definitely not a blocker - our error messages repeat themselves a lot already :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb
Copy link
Member

Looks really great @liranp - I'm guessing we should do this before #5195.

A few nits, and some small questions, but really great stuff!

One blocker though: I'm reasonably sure we need a license on the library we're adding as a dependency. Ideally it would be Apache 2, though I think others are permitted. (I went looking for the list of approved licenses and couldn't immediately find it.) Do you want to add a license to github.com/spotinst/spotinst-sdk-go - and will it be Apache :-) ?

@@ -98,7 +99,7 @@ func (b *AutoscalingGroupModelBuilder) Build(c *fi.ModelBuilderContext) error {
},
IAMInstanceProfile: link,
ImageID: s(ig.Spec.Image),
InstanceType: s(ig.Spec.MachineType),
InstanceType: s(strings.Split(ig.Spec.MachineType, ",")[0]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test what happens if you have just one or multiple but not with default aws behaviour here as well?

@chrisz100
Copy link
Contributor

As the license is in there now:
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 12, 2018
@chrisz100
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2018
@liranp
Copy link
Contributor Author

liranp commented Oct 12, 2018

Thank you @chrisz100 and @justinsb for being so responsive!

@chrisz100
Copy link
Contributor

@liranp this PR is.the only thing on the list before cutting the 1.11 alpha this weekend. Looking forward to it!

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2018
Copy link
Contributor

@chrisz100 chrisz100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2018
@justinsb
Copy link
Member

/approve
/lgtm

Thanks @liranp

🎉

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrisz100, justinsb, liranp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2018
@chrisz100
Copy link
Contributor

/retest

@justinsb
Copy link
Member

Opened test-infra issue for flake, as it is weird: kubernetes/test-infra#9805

@k8s-ci-robot k8s-ci-robot merged commit 1fbc633 into kubernetes:master Oct 15, 2018
@liranp
Copy link
Contributor Author

liranp commented Oct 16, 2018

Thank you @chrisz100 and @justinsb, much appreciated!

@liranp liranp mentioned this pull request Feb 13, 2019
2 tasks
@liranp liranp deleted the feature-spotinst-aws branch January 22, 2020 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants