New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added OnDemand and Spot Price models addressing #131 #486

Open
wants to merge 30 commits into
base: master
from

Conversation

@mrcrgl

mrcrgl commented Nov 23, 2017

No description provided.

mrcrgl added some commits Nov 22, 2017

Fixed broken instance-info test; Added areness of tenancy and effecti…
…vity of the offer provided by the aws pricing api
Merge branch 'master' into issue/131
# Conflicts:
#	cluster-autoscaler/cloudprovider/aws/ec2_instance_types.go

@mrcrgl mrcrgl changed the title from Issue/131 to Added OnDemand and Spot Price models addressing #131 Nov 23, 2017

@mrcrgl

This comment has been minimized.

mrcrgl commented Nov 23, 2017

Re-opened PR #484
Related to: #131

import "strconv"
// stringRefToFloat64 converts fields of type *float64 to float64 (fallback to zero value if nil)

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

string to float?

This comment has been minimized.

@mrcrgl

mrcrgl Nov 23, 2017

Yes, it's used when transforming AWS API DTO's to local ones.
For instance at this place: https://github.com/kubernetes/autoscaler/pull/486/files#diff-fc9a082c5fda90baeac6f105e7a018b9R61

Would you suggest to not put it in a function?

This comment has been minimized.

@kokhang

kokhang May 19, 2018

Member

i think @mwielgus was referring to the comment.

This comment has been minimized.

@KierranM

KierranM Oct 8, 2018

To be more specific, I believe this is the change wanted:

of type *float64 to float64 => of type *string to float64

// stringRefToFloat64 converts fields of type *float64 to float64 (fallback to zero value if nil)
// it's mostly used to handle aws api responds nicely
func stringRefToFloat64(p *string) (float64, error) {

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

What is the point of using *string. Strings are already pointers (slices).

This comment has been minimized.

@mrcrgl

mrcrgl Nov 23, 2017

It's the way AWS API's are designed, all values are pointers in their DTO's to carry the information whether the value is provided or not.
do you think it's more transparent for the reader to make that conversion on caller side?

This comment has been minimized.

@kokhang

kokhang May 19, 2018

Member

There is a package aws.String which lets you do all these type of conversion. You can use it instead of *string. https://docs.aws.amazon.com/sdk-for-go/api/aws/

// EC2LaunchConfiguration holds AWS EC2 Launch Configuration information
type EC2LaunchConfiguration struct {
HasSpotMarkedBid bool

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

Please add comments to the fields.

@mrcrgl mrcrgl force-pushed the fid-dev:issue/131 branch from 41f42b5 to bb80b4f Nov 23, 2017

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Nov 23, 2017

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mwielgus

This comment has been minimized.

Contributor

mwielgus commented Nov 23, 2017

This PR is HUGE and not reviewable in the current form.

Please:

  • Provide a detailed explanation what are you doing and how to get the price for both regular and spot instances, so that your code can be validated against the description.

  • Consider splitting the PR into smaller PRs. PRs that are <200 LOC are MUCH easier to understand.
    Maybe spot / non spot. Maybe submit some utils first.

}
func newLCFakeService(
name string,

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

single line

}
type cases []testCase
var (

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

no need for var, just :=

// EC2AutoscalingGroup holds AWS Autoscaling Group information
type EC2AutoscalingGroup struct {
Name string

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

Comments

// InstanceInfo holds AWS EC2 instance information
type InstanceInfo struct {
InstanceType string

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

Comments.

type instanceInfoService struct {
client httpClient
cache instanceInfoCache
mu sync.RWMutex

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

don't use single/two letter names for field in struct.
moreover you can do
type xxxx struct {
sync.RwMutex // <<<NO FIELD NAME.
}
and then just xxx.Lock().

type productOffers map[string]productOffer
type productOffer struct {

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

Can we get these from some official library?

This comment has been minimized.

@mrcrgl

mrcrgl Nov 23, 2017

I didn't find anything about it, just started here (https://aws.amazon.com/blogs/aws/new-aws-price-list-api/) and moved over

This comment has been minimized.

This comment has been minimized.

@mrcrgl

mrcrgl Nov 23, 2017

Good point, I'll check that

This comment has been minimized.

@mwielgus

mwielgus Dec 1, 2017

Contributor

Has this been checked?

This comment has been minimized.

@mrcrgl

mrcrgl Dec 3, 2017

Not yet in detail, I need to find some time. Bug briefly checked that it could be replaced without too many code changes.

This comment has been minimized.

@kokhang

kokhang May 19, 2018

Member

+1 on using the pricing API

}
type productPriceDimension struct {
RateCode string `json:"rateCode"`

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

either all privte or all public.

This comment has been minimized.

@mwielgus

mwielgus Nov 23, 2017

Contributor

Here and elsewhere.

This comment has been minimized.

@mrcrgl

mrcrgl Nov 23, 2017

In my opinion, the struct productPriceDimension is just a local DTO used to represent the json structure. Therefore, it's not part of packages api and doesn't need to be kept stable/backward compatible. Unfortunately, the marshaller is not able to read private fields of foreign structs.
This is the reason why I've decided to make it like this. Does that work for you?

This comment has been minimized.

@mwielgus

mwielgus Dec 1, 2017

Contributor

ok

This comment has been minimized.

@mwielgus

mwielgus Dec 1, 2017

Contributor

ok, if we are super sure that this cannot be done with the official library.

@mwielgus

This comment has been minimized.

Contributor

mwielgus commented Nov 23, 2017

@mrcrgl

This comment has been minimized.

mrcrgl commented Nov 23, 2017

@mwielgus thank you for the feedback concerning the PR size. I consider to create some kind of documentation how the code was meant to be structured and how it should work.

@pawelprazak

This comment has been minimized.

pawelprazak commented Jun 22, 2018

is there a way to run cluster-autoscaler with spot instances before this PR is merged?

@komljen

This comment has been minimized.

komljen commented Jun 22, 2018

is there a way to run cluster-autoscaler with spot instances before this PR is merged?

Sure, but you will not be able to scale instance groups based on pricing.

@bhack

This comment has been minimized.

bhack commented Jun 30, 2018

Is there any specific issue why this PR is stalled here? I've seen that @mrclgl has always given great availability to fix reviews.

@aleksandra-malinowska

This comment has been minimized.

Contributor

aleksandra-malinowska commented Jul 2, 2018

Is there any specific issue why this PR is stalled here? I've seen that @mrclgl has always given great availability to fix reviews.

It needs to be LGTMed by someone familiar with AWS cloud provider code.

@bhack

This comment has been minimized.

bhack commented Jul 2, 2018

There are already AWS employee in this thread.

@vietwow

This comment has been minimized.

vietwow commented Jul 6, 2018

Hi,

This looks very useful. Any update on this ?

Thanks

@jpds

This comment has been minimized.

jpds commented Jul 20, 2018

Seeing these:

root@8cb052d688ef:/go/src/k8s.io/autoscaler/cluster-autoscaler# go build
# k8s.io/autoscaler/cluster-autoscaler/cloudprovider/aws
cloudprovider/aws/aws_price_model.go:45:40: undefined: Asg
cloudprovider/aws/aws_price_model.go:77:40: cannot use instance (type *AwsInstanceRef) as type *AwsRef in argument to pm.asgs.GetAsgForInstance
@jpds

This comment has been minimized.

jpds commented Jul 20, 2018

This needs documentation on how to actually enable this. I built a docker image with the cluster-autoscaler binary from this, added --expander=price to my configuration and just got:

main.go:278] Failed to create autoscaler: Not implemented
@mrcrgl

This comment has been minimized.

mrcrgl commented Jul 23, 2018

@jpds thank you for taking time to review this.
It must have been crashed by merging the master into it. I’ll spend some time go get it work again.

mrcrgl added some commits Jul 30, 2018

Merge branch 'master' into issue/131
# Conflicts:
#	cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go
#	cluster-autoscaler/cloudprovider/aws/aws_manager.go
@mrcrgl

This comment has been minimized.

mrcrgl commented Jul 30, 2018

@jpds PTAL
I've merged the current master. The current build will be tested on our spare cluster this week.

@leosunmo

This comment has been minimized.

leosunmo commented Sep 27, 2018

Does anyone have an update on this? It'd be a nice feature.

@vietwow

This comment has been minimized.

vietwow commented Sep 27, 2018

@mrcrgl is this merged ?

@oded-dd

This comment has been minimized.

oded-dd commented Oct 3, 2018

+1

@vietwow

This comment has been minimized.

vietwow commented Oct 3, 2018

Look like many people are looking forward this pull request. Any reviewer ? cc @losipiuk @bskiba @mwielgus

@mrcrgl

This comment has been minimized.

mrcrgl commented Oct 3, 2018

@vietwow not yet

@jpds

This comment has been minimized.

jpds commented Oct 5, 2018

Needs documentation in https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws that ec2:DescribeSpotPriceHistory is required in IAM role.

I created a test cluster with kops, configured two instance groups with min nodes 0, and ran the cluster-autoscaler from this branch with this manifest:

Added --expander=price to args in the above, deployed nginx (using https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/#creating-and-exploring-an-nginx-deployment) to the new cluster and got:

I1005 15:27:51.339738       1 scale_up.go:291] Upcoming 0 nodes
W1005 15:27:51.415165       1 price.go:115] Failed to calculate node price for nodes.test.platform.k8s.local: failed to describe price for asg nodes.test.platform.k8s.local: no spot price information for instance m5d.large in availability zone eu-west-1b
W1005 15:27:51.482996       1 descriptor.go:128] no spot price information newer than 30m0s, using last known price of 0.071000 which is 5h20m1.482985862s old
W1005 15:27:51.483029       1 price.go:115] Failed to calculate node price for nodes-c5d.xlarge.test.platform.k8s.local: failed to describe price for asg nodes-c5d.xlarge.test.platform.k8s.local: no spot price information for instance c5d.xlarge in availability zone eu-west-1c

...but the cluster doesn't autoscale.

Interestingly, the price in the log above appears to correspond to c5d.xlarge (0.07) and not m5d.large (0.03) as the log (ordering) suggests.

@derekm

This comment has been minimized.

derekm commented Oct 11, 2018

There have been code review requests to look at porting this code over to:

https://docs.aws.amazon.com/sdk-for-go/api/aws/

and

http://docs.aws.amazon.com/sdk-for-go/api/service/pricing/

Have those requests been addressed? Is it worth doing?

@OleksandrSlobodian

This comment has been minimized.

OleksandrSlobodian commented Nov 16, 2018

Does anyone have an update on this?

@vietwow

This comment has been minimized.

vietwow commented Nov 17, 2018

Look like many people wait for this feature but no reviewer care :(

@KierranM

This comment has been minimized.

KierranM commented Nov 21, 2018

Having used a custom image built from this branch for a few weeks, I've noticed that if you have a large number of ASGs (we have around 40 worker ASGs (1 per instance type per az for spot and ondemand), most of which are size 0) that it doesn't seem to consider spot instances when scaling up, it always seems to choose the ondemand variant every time. I'll try get some logs from it the next time it scales up.

@rjanovski

This comment has been minimized.

rjanovski commented Nov 28, 2018

To all those who wait on this,
Note that AWS added new features to ASG: you can now use a single ASG with multiple instance types and purchase options (on-demand, reserved, and spot, and have policies to decide when to use each, including price considerations).

Having a single "smart" ASG may be a simpler solution than placing the pricing logic inside the autoscaler.

https://aws.amazon.com/blogs/aws/new-ec2-auto-scaling-groups-with-multiple-instance-types-purchase-options/

@gjtempleton

This comment has been minimized.

Contributor

gjtempleton commented Nov 28, 2018

@rjanovski Unfortunately however the cluster autoscaler won't be able to successfully work with these new smarter ASGs. This was discussed at the Autoscaling SIG on the 26th of November 2018.

Off the back of that I'm in the process of gathering people's use cases/why they want to use these new ASGs to allow further discussion of how the cluster autoscaler can support these here: https://docs.google.com/document/d/1m-2lQCOwxMrlv1rCz1JqUyBrZAjLOZAELswjuljJVjg

Please add any comments/further use cases you may have.

@thejosephstevens

This comment has been minimized.

thejosephstevens commented Dec 12, 2018

So, one datapoint here, I actually tried to use the new AWS ASGs last week and they really don't solve my problem set. I'm looking for some prioritized ladder of instance to scale up (like you would get in a pricing/value-based model), and the way AWS actually implemented their heterogeneous ASGs doesn't actually take that option into account, so I'm still hoping I can get it from the CA. The way the new ASG works with support for on demand and spot assumes that you have some sort of optimal blend you're looking for (the levers are "start with X on demand instances baseline" and "after the baseline, provision Y% on demand"). They actually don't really do anything with value prioritization or even simple fail-up from spot to on demand (our most common failure mode is that spot instances just go completely out of availability when on-demand is still plentiful). That aside, a really optimal configuration for my use cases would be to have prioritized expanders, even with a fully custom, explicit prioritization queue (first spot r5d, next on demand r5d, finally r5), supporting fully separate expandable node pools (that can scale to 0) in order to alleviate the issues of needing different volume mapping/requests by instance type.

Short story, at the moment I'm trying to figure out if the CA or AWS can solve my problem first, but longer term I really need it to work through the CA in order to support different clouds and for smarter, request-driven scaling.

If anyone has an idea whether an explicit value queue could be hacked in on the kube admin side I'd really appreciate a pointer (spoofing cost values to the CA config?) or if there's a clear place this could be implemented in the service I may be able to snag some free time from my team to work on it.

@mrcrgl

This comment has been minimized.

mrcrgl commented Dec 13, 2018

@KierranM we're running this setup in production for about a year. It'll be great if you could provide some logs, so I can investigate your case.

@jpds @derekm Thank you for your feedback. I'll spend some time till xmas to address your suggestions and resolve the conflicts.

I'm looking forward to get this PR merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment