Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set volumeBindingMode as WaitForFirstConsumer #8416

Closed
wants to merge 3 commits into from
Closed

set volumeBindingMode as WaitForFirstConsumer #8416

wants to merge 3 commits into from

Conversation

cpanato
Copy link
Member

@cpanato cpanato commented Jan 27, 2020

In some clusters we have which are very small, 5 worker node and one in each AWS AZ.
we notice that when deploying apps that have a PVC it was creating the PV in on worker node that was at full capacity and then the Pod never starts because the node does not accept more pods but the PV was already attached.

Doing some research we discovered this: https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode and then changed the storageClass called gp2 that Kops deploys and added the volumeBindingMode: WaitForFirstConsumer

Doing that we did not get more failures when scheduling a POD that needs a PV/PVC

So though this is a good fix here. However, don't know if that is the correct file to change.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 27, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cpanato
To complete the pull request process, please assign chrisz100
You can assign the PR to them by writing /assign @chrisz100 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rifelpet
Copy link
Member

Thanks @cpanato! If you run ./hack/updated-expected.sh it should fix the failing tests.

@cpanato
Copy link
Member Author

cpanato commented Jan 27, 2020

@rifelpet got this

./hack/update-expected.sh
fatal: not a git repository (or any of the parent directories): .git
cd /Users/cpanato/go/src/k8s.io/kops; /Users/cpanato/go/src/k8s.io/kops/.build/local/go-bindata -o upup/models/bindata.go -pkg models -ignore="\\.DS_Store" -ignore="bindata\\.go" -ignore="vfs\\.go" -prefix upup/models/ upup/models/... && GO111MODULE=on go run golang.org/x/tools/cmd/goimports -w -v upup/models/bindata.go
bindata: Failed to stat input path 'upup/models': lstat upup/models: no such file or directory
make: *** [upup/models/bindata.go] Error 1

any ideas?

@rifelpet
Copy link
Member

Ah apologies, we really need to get that script into a make target with dependencies...

make kops-gobindata upup/models/bindata.go first should do it.

@cpanato
Copy link
Member Author

cpanato commented Jan 27, 2020

@rifelpet still getting the same error 😓

@rifelpet
Copy link
Member

how about just a make kops ?

@cpanato
Copy link
Member Author

cpanato commented Jan 27, 2020

@rifelpet AH! i need to be in the src/k8s.io/kops and not src/github.com/kubernetes/kops

@johngmyers
Copy link
Member

Please also bump this version to "1.15.1-kops.1"

Signed-off-by: Carlos Panato <ctadeu@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 27, 2020
@cpanato
Copy link
Member Author

cpanato commented Jan 27, 2020

it is failing locally as well when i run /hack/updated-expected.sh

--- FAIL: TestBootstrapChannelBuilder_BuildTasks (0.18s)
    compare.go:51: HACK_UPDATE_EXPECTED_IN_PLACE: writing expected output tests/bootstrapchannelbuilder/simple/manifest.yaml
    compare.go:61: output did not match expected for "tests/bootstrapchannelbuilder/simple/manifest.yaml"
    compare.go:51: HACK_UPDATE_EXPECTED_IN_PLACE: writing expected output tests/bootstrapchannelbuilder/cilium/manifest.yaml
    compare.go:61: output did not match expected for "tests/bootstrapchannelbuilder/cilium/manifest.yaml"
    compare.go:51: HACK_UPDATE_EXPECTED_IN_PLACE: writing expected output tests/bootstrapchannelbuilder/weave/manifest.yaml
    compare.go:61: output did not match expected for "tests/bootstrapchannelbuilder/weave/manifest.yaml"
    compare.go:51: HACK_UPDATE_EXPECTED_IN_PLACE: writing expected output tests/bootstrapchannelbuilder/amazonvpc/manifest.yaml
    compare.go:61: output did not match expected for "tests/bootstrapchannelbuilder/amazonvpc/manifest.yaml"

UPDATE:
after run make kops and then ./hack/update-expected.sh works

@cpanato
Copy link
Member Author

cpanato commented Jan 27, 2020

now are green, thanks
@rifelpet @johngmyers for the support

@johngmyers
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2020
@olemarkus
Copy link
Member

volumeBindingMode is immutable so for existing cluster, this will cause the channel updater to enter a crash loop trying to apply this change.
If this is to be applied, one needs to delete the existing resource first

@johngmyers
Copy link
Member

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2020
@cpanato
Copy link
Member Author

cpanato commented Jan 29, 2020

@olemarkus thanks for your review, I did not know that apply again when it is in the upgrade process.
How we should proceed? there are other resources that have similar thing?

@olemarkus
Copy link
Member

I am not aware of that. but I would really like for this to be added in as well. but someone more senior than me need to weigh in here.

@rifelpet
Copy link
Member

I've added this topic to the agenda for our call this Friday and I'll post any updates here 👍

@cpanato
Copy link
Member Author

cpanato commented Jan 29, 2020

@rifelpet thank you! made a comment, I'm not able to attend this meeting :/

@johngmyers
Copy link
Member

I suspect we may need to remove the storageclass.beta.kubernetes.io/is-default-class annotation from the gp2 StorageClass and create a new StorageClass with a different name that has both a storageclass.beta.kubernetes.io/is-default-class: "true" annotation and volumeBindingMode: WaitForFirstConsumer.

(I hate these inexplicably immutable fields. If deleting and recreating the resource has no repercussions, the field might as well be mutable.)

@joshbranham
Copy link
Contributor

Yeah that is what I have done in my post cluster steps by deleting and re-creating with some changes.

@olemarkus
Copy link
Member

I have also deleted and re-creating storageclass resources. New PVs won't be created in the tiny interval that the SC is gone, but besides that the operation should be safe.

Maybe this would be an opportunity to fix the thing where we have a storageclass called default and a storageclass called gp2 where the gp2 SC is the actual default one 😃

@johngmyers
Copy link
Member

Since the field is immutable, we will now have three storage classes.

@rifelpet
Copy link
Member

@cpanato the result of the discussion today was that we should create a new StorageClass instead of updating the existing one. Its name and whether it becomes the new default storage class will be up for discussion here :)

@johngmyers
Copy link
Member

gp2-wait?

@olemarkus
Copy link
Member

I am thinking it may be a good idea to namespace the name with kops. I can easily see gp2-wait existing already

@cpanato
Copy link
Member Author

cpanato commented Feb 4, 2020

@rifelpet ok thanks! i will work on that!

@joshbranham
Copy link
Contributor

I went down this rabbit hole as well so sorry for the duplication, although I modified a few other pieces. Happy to discuss/merge if needed: #8582

@cpanato cpanato closed this Feb 20, 2020
@cpanato cpanato deleted the update_storage_class branch February 20, 2020 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants