New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New integration: Spotinst #5922
New integration: Spotinst #5922
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liranp - much better!
Some remarks below.
@@ -139,6 +139,10 @@ type CreateClusterOptions struct { | |||
// We can remove this once we support higher versions. | |||
VSphereDatastore string | |||
|
|||
// Spotinst options | |||
SpotinstProduct string | |||
SpotinstOrientation string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference here? No worries about implementation, but I just want to understand what these things affect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are cluster-wide options compared to those labels which can be used to toggle on/off some features at the instance group level.
SpotinstProduct
is the product description associated with the Spot Instance while SpotinstOrientation
is the prediction strategy (Valid values: balanced
, cost
, equal-distribution
and availability
. Defaults to balanced
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this at some point within kops as well? If we just add this and tell people that well if you want to use spotinst look at their docs could be a bad user experience. Just a few words with a link to a more detailed explanation at your own documentation would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Same here - can you add this to our documentation? A few words and a link to more details on your website would be more then enough!
cmd/kops/create_cluster.go
Outdated
|
||
if featureflag.SpotinstIntegration.Enabled() { | ||
// Spotinst flags | ||
cmd.Flags().StringVar(&options.SpotinstProduct, "spotinst-product", options.SpotinstProduct, "Set the product code.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What product codes do you offer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the product description associated with the Spot Instance (see productDescription).
Valid values:
- Linux/UNIX
- Linux/UNIX (Amazon VPC)
- Windows
- Windows (Amazon VPC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put a reference to this into the help output there? Without I see many people feeling lost and as it's just four that should be quite straight forward to show them the options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisz100 Thanks, fixed.
@@ -76,8 +76,10 @@ func awsValidateMachineType(fieldPath *field.Path, machineType string) field.Err | |||
allErrs := field.ErrorList{} | |||
|
|||
if machineType != "" { | |||
if _, err := awsup.GetMachineTypeInfo(machineType); err != nil { | |||
allErrs = append(allErrs, field.Invalid(fieldPath, machineType, "machine type specified is invalid")) | |||
for _, typ := range strings.Split(machineType, ",") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machine type seemed to be a single machine string. how can this be multiple types at the same time now? Where does this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotinst Elastigroup supports multiple instance types (and even weights, if needed), for example:
{
"group": {
...
"compute": {
...
"instanceTypes": {
"ondemand": "c3.large",
"spot": [
"c3.large",
"c4.large",
"m3.large",
"r3.large"
]
}
...
}
...
}
}
By leveraging multiple instance types (and/or availability zones), Spot Instances are far less volatile as there is a far greater chance that Spotinst will find an unused instance for one of the fitting instance types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this with kops and if that doesn't backfire when not using spotinst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works fine for me after installing this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
pkg/featureflag/featureflag.go
Outdated
@@ -77,6 +77,9 @@ var GoogleCloudBucketAcl = New("GoogleCloudBucketAcl", Bool(false)) | |||
// EnableNodeAuthorization enables the node authorization features | |||
var EnableNodeAuthorization = New("EnableNodeAuthorization", Bool(false)) | |||
|
|||
// SpotinstIntegration toggles the use of Spotinst integration. | |||
var SpotinstIntegration = New("SpotinstIntegration", Bool(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep this simpler - just call it "spotinst" - as you activate handling via spotinst and don't activate the possibility to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
|
||
// Cloud config. | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this { for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
upup/pkg/fi/cloudup/apply_cluster.go
Outdated
|
||
SecurityLifecycle: &securityLifecycle, | ||
}) | ||
if featureflag.SpotinstIntegration.Enabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just leave the deleted part intact and overwrite it if spotinst is active. That way we always have it set, no matter what happens and spotinst will just overload it.
Also, please have a look at travis as the formatting is incorrect with |
/assign @chrislovecnm @justinsb |
4bae2a0
to
1a11498
Compare
/ok-to-test Thanks, appreciate it @liranp ! |
631942e
to
a5bf55e
Compare
LaunchConfigurationName: input.LaunchConfigurationName, | ||
LoadBalancerNames: input.LoadBalancerNames, | ||
MaxSize: input.MaxSize, | ||
MinSize: input.MinSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI go 1.11 changed gofmt, and we're still validating against go 1.10 gofmt rules (until k8s/kops 1.13 which moves to 1.11)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb Thanks, fixed.
a5bf55e
to
3750ab4
Compare
// Cloud config. | ||
if cfg := b.Cluster.Spec.CloudConfig; cfg != nil { | ||
// Product. | ||
if cfg.SpotinstProduct != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It looks like you don't need the nil check here (nor for SpotinstOrientation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right! Fixed.
group.UtilizeReservedInstances = fi.Bool(true) | ||
} else if v == "false" { | ||
group.UtilizeReservedInstances = fi.Bool(false) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: A warning here is the value is "bad" would probably save you a lot of head-scratching. I typically like to strings.TrimSpace & strings.ToLower. Same for InstanceGroupLabelFallbackToOnDemand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ig.Spec.MaxSize != nil { | ||
maxSize = *ig.Spec.MaxSize | ||
} else if ig.Spec.Role == kops.InstanceGroupRoleNode { | ||
maxSize = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "pure" AWS uses 2 here - any reason to switch the default behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb don't we have some file to store default values like these? I am not a big fan of having default values like that somewhere in the models. Makes the behaviour so unpredictable in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (same as the AWS implementation).
} else if v == "false" { | ||
autoScalerDisabled = false | ||
} | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have a function to parse these boolean labels (not a blocker)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[projects]] | ||
branch = "master" | ||
digest = "1:b410da742bf89fa5f255ef1d95bdf42d2a263a0c8f35272b8d5b4faff49017cf" | ||
name = "github.com/spotinst/spotinst-sdk-go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a LICENSE to this library please? I know Apache 2 is OK for kubernetes projects, not sure what other licenses are OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -325,6 +336,10 @@ func NewEC2Filter(name string, values ...string) *ec2.Filter { | |||
|
|||
// DeleteGroup deletes an aws autoscaling group | |||
func (c *awsCloudImplementation) DeleteGroup(g *cloudinstances.CloudInstanceGroup) error { | |||
if c.spotinst != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to make it hard to "graduate" spotinst from using a feature-flag - we'll probably need a field somewhere. But we can get this in and then figure it out.. One option that might help is that AWSCloud is an interface, so it might be possible to have a spotinst implementation (even if it delegated most of the functionality to an awsCloudImplementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to @justinsb here - flagging spotinst to be active is one way of using it, but in the future it would be nice to just have it as a separate provider you select, that then falls back to the cloud if no special implementation was given. From what I understand you just use your own autoscaling implementation, but that also won't change that much for other clouds as you abstract that away in your own api's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisz100 You're right about our implementation. I'm not sure I'm understanding your suggestion. How it differs from the implementation introduced in #5195 (where we just wrapped the AWS implementation)?
if lc.UserData != nil { | ||
userData, err := base64.StdEncoding.DecodeString(*lc.UserData) | ||
if err != nil { | ||
return nil, fmt.Errorf("spotinst: error decoding user data: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I think (hope) when this is printed it'll be printed alongside the failing task, so you likely don't need the spotinst:
bit. But definitely not a blocker - our error messages repeat themselves a lot already :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great @liranp - I'm guessing we should do this before #5195. A few nits, and some small questions, but really great stuff! One blocker though: I'm reasonably sure we need a license on the library we're adding as a dependency. Ideally it would be Apache 2, though I think others are permitted. (I went looking for the list of approved licenses and couldn't immediately find it.) Do you want to add a license to github.com/spotinst/spotinst-sdk-go - and will it be Apache :-) ? |
@@ -98,7 +99,7 @@ func (b *AutoscalingGroupModelBuilder) Build(c *fi.ModelBuilderContext) error { | |||
}, | |||
IAMInstanceProfile: link, | |||
ImageID: s(ig.Spec.Image), | |||
InstanceType: s(ig.Spec.MachineType), | |||
InstanceType: s(strings.Split(ig.Spec.MachineType, ",")[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test what happens if you have just one or multiple but not with default aws behaviour here as well?
As the license is in there now: |
adc9061
to
0b1d1e8
Compare
/lgtm |
Thank you @chrisz100 and @justinsb for being so responsive! |
@liranp this PR is.the only thing on the list before cutting the 1.11 alpha this weekend. Looking forward to it! |
34aee59
to
e1f85e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve Thanks @liranp 🎉 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrisz100, justinsb, liranp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Opened test-infra issue for flake, as it is weird: kubernetes/test-infra#9805 |
Thank you @chrisz100 and @justinsb, much appreciated! |
This PR adds Spotinst as an extension of the existing cloud providers. As part of this implementation, AWS Auto Scaling Groups have been replaced with Spotinst Elastigroups.