Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet should resume csr bootstrap #47856

Merged
merged 1 commit into from
Jun 23, 2017

Conversation

mikedanese
Copy link
Member

Right now the kubelet creates a new csr object with the same key every
time it restarts during the bootstrap process. It should resume with the
old csr object if it exists. To do this the name of the csr object must
be stable.

Issue #47855

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 21, 2017
@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Jun 21, 2017
@gmarek gmarek added this to the v1.7 milestone Jun 21, 2017
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 21, 2017
certificates.UsageDigitalSignature,
certificates.UsageKeyEncipherment,
certificates.UsageClientAuth,
})
}

// RequestCertificate will create a certificate signing request using the PEM
// requestCertificate will create a certificate signing request using the PEM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to include the purpose of the name argument, that existing requests matching the name will be re-used.

hash.Write([]byte(subject.CommonName))
for _, org := range subject.Organization {
hash.Write([]byte(org))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include usages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a comment about how this hash should be kept up to date to include everything in the CSR, but can't be a hash of the csrData because of the random elements.

case err == nil:
case errors.IsAlreadyExists(err):
glog.Infof("csr for this node already exists, reusing")
req, err = client.Get(name, metav1.GetOptions{})
Copy link
Member

@liggitt liggitt Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to make sure the existing csr object actually matches (subject, usages, and private key), and if it doesn't match, I'm not really sure what the kubelet would do... hard fail, I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 21, 2017
@marun
Copy link
Contributor

marun commented Jun 21, 2017

/retest

@mikedanese
Copy link
Member Author

unit tests are actually broken.

@mikedanese mikedanese added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jun 21, 2017
@mikedanese mikedanese force-pushed the bootstrap-resume branch 2 times, most recently from f35f1ae to 46c8085 Compare June 22, 2017 09:56
@mikedanese
Copy link
Member Author

/retest

@gmarek
Copy link
Contributor

gmarek commented Jun 22, 2017

@dchen1107 - this is a release blocker from scalability perspective. Without is it's impossible to start clusters noticeably bigger than 300 Nodes.

@mikedanese
Copy link
Member Author

@liggitt @jcbsmpsn addressed comments

@mikedanese
Copy link
Member Author

I was able to bring up an 1000 node cluster without a problem with this patch.

// certificates and with ensureCompatible.
func digestedName(privateKeyData []byte, subject *pkix.Name, usages []certificates.KeyUsage) string {
hash := sha512.New512_256()
hash.Write(privateKeyData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need separators between fields written to the hash

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@liggitt
Copy link
Member

liggitt commented Jun 22, 2017

actually, looks like we have code that cleans up the generated key file if the CSR bootstrap fails. if we want to reuse the existing private key on the next bootstrap, we need to leave that. can you test the following to make sure this works as expected:

  1. start up bootstrap kubelet without the approver controller running
  2. manually reject the CSR with kubectl (the kubelet should die)
  3. manually approve the CSR with kubectl (the signing controller should sign)
  4. start up the kubelet again, make sure the CSR is stable (reuses key, calculates same name, finds existing CSR and uses the cert)

@mikedanese
Copy link
Member Author

ok. I had not tested with (semi) graceful exit, just sigterms.

Right now the kubelet creates a new csr object with the same key every
time it restarts during the bootstrap process. It should resume with the
old csr object if it exists. To do this the name of the csr object must
be stable. Also using a list watch here eliminates a race condition
where a watch event is missed and the kubelet stalls.
@jcbsmpsn
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 22, 2017
@dchen1107
Copy link
Member

/approve based on @liggitt and @jcbsmpsn review. Thanks!

@mikedanese mikedanese added do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 22, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, jcbsmpsn, mikedanese

Associated issue: 47855

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@gmarek
Copy link
Contributor

gmarek commented Jun 23, 2017

/test pull-kubernetes-e2e-gce-etcd3

@mikedanese
Copy link
Member Author

These look like flakes

@mikedanese
Copy link
Member Author

/retest

@mikedanese mikedanese removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 23, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)

@k8s-github-robot k8s-github-robot merged commit fcfbfec into kubernetes:master Jun 23, 2017
@mikedanese mikedanese deleted the bootstrap-resume branch June 23, 2017 11:28
@dchen1107 dchen1107 added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jun 23, 2017
@caesarxuchao
Copy link
Member

@mikedanese can you try to manually cherrypick this one? It causes conflicts. I'll batch cherry-pick others first.

@caesarxuchao
Copy link
Member

i'll do a manual cherrypick.

caesarxuchao added a commit that referenced this pull request Jun 23, 2017
…#47856-upstream-release-1.7

Automated cherry pick of #47856
@caesarxuchao
Copy link
Member

It's cherrypicked to release-1.7 already. The cherrypick bot should had removed the cherrypick-approved label. I'll manually remove the label.

@caesarxuchao caesarxuchao removed the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jun 27, 2017
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-blocker release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants