Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: wait for the etcd cluster to be available when growing it #72984

Merged
merged 1 commit into from
Jan 20, 2019
Merged

kubeadm: wait for the etcd cluster to be available when growing it #72984

merged 1 commit into from
Jan 20, 2019

Conversation

ereslibre
Copy link
Contributor

@ereslibre ereslibre commented Jan 16, 2019

What this PR does / why we need it:

When the etcd cluster grows we need to explicitly wait for it to be
available. This ensures that we are not implicitly doing this in
following steps when they try to access the apiserver.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#1353

Does this PR introduce a user-facing change?:

kubeadm: explicitly wait for `etcd` to have grown when joining a new control plane

/kind bug

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 16, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @ereslibre. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 16, 2019
@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 16, 2019
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 17, 2019
@rosti
Copy link
Contributor

rosti commented Jan 17, 2019

/ok-to-test
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 17, 2019
@rosti
Copy link
Contributor

rosti commented Jan 17, 2019

/assign @fabriziopandini @timothysc @neolit123

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this @ereslibre
i think apart from the retry rate this is good.

also please add a release note instead of NONE
kubeadm: ......

certsVolumeName = "etcd-certs"
etcdVolumeName = "etcd-data"
certsVolumeName = "etcd-certs"
etcdHealthyCheckInterval = 1 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do your tests reveal for the interval and n-retries @ereslibre ?

i think the 1 second rate might be too high. i would do something like 5 seconds, with 20 retries.
but let's gather more comments on this one before changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my environment it's succeeding around the third time. An environment that has been alive for seconds, so I agree that 5 seconds looks reasonable (if the cluster of one machine is long-living more sync would be neded).

I'll adapt the PR, thanks!

@@ -146,7 +147,7 @@ type Member struct {
}

// AddMember notifies an existing etcd cluster that a new member is joining
func (c Client) AddMember(name string, peerAddrs string) ([]Member, error) {
func (c *Client) AddMember(name string, peerAddrs string) ([]Member, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a note/TODO: in a separate PR we need to make all the client methods to use pointers.
golang encourages a matching pattern.

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ereslibre !
Overall it looks good, though we need to fix the AddMember call.

cmd/kubeadm/app/util/etcd/etcd.go Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 17, 2019
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-lgtm after things are back to normal.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2019
@ereslibre
Copy link
Contributor Author

/retest

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ereslibre Great work! this is a candidate a cherry pick!
Only a few small nits mostly UX related
Ping me when this is ready for approval

cmd/kubeadm/app/phases/etcd/local.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/phases/etcd/local.go Outdated Show resolved Hide resolved
@@ -121,6 +124,12 @@ func CreateStackedEtcdStaticPodManifestFile(client clientset.Interface, manifest
}

fmt.Printf("[etcd] Wrote Static Pod manifest for a local etcd instance to %q\n", kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.Etcd, manifestDir))

fmt.Println("[etcd] Waiting for the etcd cluster to be healthy")
if _, err := etcdClient.WaitForClusterAvailable(etcdHealthyCheckRetries, etcdHealthyCheckInterval); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this function printing

    [util/etcd] Attempt timed out
    [util/etcd] Waiting 5s until next retry
    [util/etcd] Attempt timed out
    [util/etcd] Waiting 5s until next retry

IMO those output should be removed (or converted into log messages) in order to be consistent with all the other waiters in kubeadm.

However, considering that this requires to add "This can take up to ..." in every place where WaitForClusterAvailable is used, this goes out of the scope of this PR, so please open an issue to track this as a todo/good first issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed to use klog here too but @rosti didn't want to address this change on this PR, only changing the one potentially long with the endpoints. I agree with your point of view though @fabriziopandini.

@rosti, wdyt? Should I change this now that @fabriziopandini also raised this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as resolved as per discussion with @fabriziopandini, leaving it as it was as @rosti proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think, that we need some sort of indication about the reason we wait another 5 seconds. This is tightly coupled with the UX of end users, that run kubeadm directly on command line. For that matter I am not a fan of klogging this. In my opinion it should go out via print.
On the other hand, we can certainly reduce the output here to say a single, more descriptive message per retry.
However, as @fabriziopandini mentioned, this will require changes in a few more places, thus it may better be done in another PR. We can file a backlog issue for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to get more feedback on the 5 seconds and the interval of 20 tries.
if we can get a check faster than 5 on the average, possibly we can reduce the value?
also 20 tries is a lot. in reality, we might get the failed state much sooner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last run it took 4 retries (of 5 seconds interval), this one was way off-charts and with a clean environment :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've mentioned this on slack:

we may want to keep the overall time under 40seconds to match the kubelet timeout.
how about 2 seconds rate with 20 retries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini @rosti please give you stamp of approval for the above comment.

Copy link
Contributor

@rosti rosti Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the 40 seconds idea, but let's keep the steps at 5 sec. Bear in mind, that we have just written out the static pod spec, so the kubelet needs to detect it, spin it up and for etcd to become responsive. On some systems it's easy for this to come above 2 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, @rosti is voting for 5 sec / 8 retries.
@fabriziopandini ?

cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 18, 2019
When the etcd cluster grows we need to explicitly wait for it to be
available. This ensures that we are not implicitly doing this in
following steps when they try to access the apiserver.
@rosti
Copy link
Contributor

rosti commented Jan 18, 2019

@ereslibre can you, please, re-run the update-gofmt.sh script to fix the verify test? Thanks!

@ereslibre
Copy link
Contributor Author

/retest

@fabriziopandini
Copy link
Member

Looking at the code I don't see a relation between the kubelet timeout for TLS bottstrap and the timeout this PR sets for waiting for the new etcd member joining the cluster.

Said that, IMO 5s * 8 is reasonable for unblocking this fix and start the cherry-picking process;
so, considering that it seems there is consensus on those settings also from the other reviewers, and that the proposed solution is definetly an improvement vs current situation
/approve
/lgtm

Let's continue the discussion in parallel (slack/at the next office hours meeting) eventually asking to broader sig for some more feedbacks from the field

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2019
@k8s-ci-robot k8s-ci-robot merged commit f2b133d into kubernetes:master Jan 20, 2019
@MalloZup
Copy link
Contributor

MalloZup commented Jan 20, 2019

We could also use kind of backoff algo for don't use a fixed timeout. https://en.m.wikipedia.org/wiki/Exponential_backoff?wprov=sfla1

@ereslibre
Copy link
Contributor Author

@fabriziopandini I also have the same feeling and have discussed it with @neolit123 previously. So, just for the record, WaitForKubeletAndFunc basically waits for two things:

  1. WaitForHealthyKubelet before some initial timeout (40 seconds)
  2. The given function to WaitForKubeletAndFunc

When any of these two functions (given one or WaitForHealthyKubelet) returns, WaitForKubeletAndFunc returns. If the given function takes more than 40 seconds to finish, then the sleep on WaitForHealthyKubelet timeouts and actually checks that the kubelet is healthy.

I guess that I would say WaitForKubeletAndFunc basically means "please, run this function I give you, if it fails or succeeds it's fine, but if it takes more than 40 seconds to answer, then we check that the kubelet is running and healthy"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubeadm join does not explicitly wait for etcd to have grown when joining secondary control plane
7 participants