Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: upload the ClusterConfiguration during the upgrade #77513

Merged
merged 1 commit into from
May 9, 2019
Merged

kubeadm: upload the ClusterConfiguration during the upgrade #77513

merged 1 commit into from
May 9, 2019

Conversation

ereslibre
Copy link
Contributor

@ereslibre ereslibre commented May 6, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
During the upgrade process, kubeadm will take the current
ClusterConfiguration, update the KubernetesVersion to the latest
version, and call to UploadConfig.

This change makes sure that when the mutation happens, not only the
ClusterStatus is mutated, but the ClusterConfiguration object
inside the kubeadm-config ConfigMap as well; it will contain the
new KubernetesVersion.

Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#1556

Special notes for your reviewer:
This is the straightforward fix with the least intrusive changes. However, I think the upgrade path could instead of calling to UploadConfig make explicit what it needs to mutate inside the kubeadm-config ConfigMap (I think it's only the KubernetesVersion inside the ClusterConfiguration object).

This can be refactored later if desired, I think it would be better for the upgrade path to be explicit about what it mutates instead of blindly calling to UploadConfig.

Does this PR introduce a user-facing change?:

NONE

/assign @fabriziopandini
/assign @rosti
/assign @neolit123

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 6, 2019
@k8s-ci-robot k8s-ci-robot requested review from liztio and rosti May 6, 2019 16:00
@ereslibre
Copy link
Contributor Author

ereslibre commented May 6, 2019

After this change, I can see the kubeadm-config configmap correctly mutated with the right kubernetesVersion. I upgraded from 1.14.1 to v1.15.0-alpha.2.278+ad57e7408dd918 using a kubeadm that had this patch.

vagrant@master:~$ kubectl get configmap kubeadm-config -n kube-system -o yaml
apiVersion: v1
data:
  ClusterConfiguration: |
    kind: ClusterConfiguration
    kubernetesVersion: v1.15.0-alpha.2.278+ad57e7408dd918
    <snip>
  <snip>
kind: ConfigMap
<snip>

@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2019
During the upgrade process, `kubeadm` will take the current
`ClusterConfiguration`, update the `KubernetesVersion` to the latest
version, and call to `UploadConfiguration`.

This change makes sure that when the mutation happens, not only the
`ClusterStatus` is mutated, but the `ClusterConfiguration` object
inside the `kubeadm-config` ConfigMap as well; it will contain the
new `KubernetesVersion`.
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the quick fix @ereslibre
@fabriziopandini PTAL.

I think the upgrade path could instead of calling to UploadConfig make explicit what it needs to mutate inside the kubeadm-config ConfigMap (I think it's only the KubernetesVersion inside the ClusterConfiguration object).

hm, i think at this point (e.g. during upgrade) the ClusterConfiguration could have changed from v1beta1 to v1beta2, so we need to upload the whole object instead of patching the version field only?

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 6, 2019
@neolit123
Copy link
Member

/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2019
@ereslibre
Copy link
Contributor Author

hm, i think at this point (e.g. during upgrade) the ClusterConfiguration could have change from v1beta1 to v1beta2, so we need to upload the whole object instead of patching the version field only?

You are right, dismiss that comment then, uploading the whole versioned object is the right thing to do. I omitted any reference to that in the commit message on purpose.

@neolit123
Copy link
Member

neolit123 commented May 6, 2019

You are right, dismiss that comment then, uploading the whole versioned object is the right thing to do. I omitted any reference to that in the commit message on purpose.

or we can check if the API object version changed, and only do upload then. but it's probably faster to just re-upload.

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 6, 2019
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ereslibre
If I got it right, probably the cleanest solution is to implement one function:using create or replace for the full upload path (to be used by init, upgrade, config-upload) and one function using create or mutate for the change ClusterStatus path (to be used by join, reset).
That's means that only join/reset operations can be executed in parallel, but I`m fine with this.
However, let's discuss this tomorrow in the kubeadm office hours

@ereslibre
Copy link
Contributor Author

ereslibre commented May 7, 2019

I agree that UploadConfiguration has way too many use cases: init, join, upgrade:

  • init should only care about creating the configmap, and by definition does not need to update or mutate the configmap.
  • join should only care about modifying the cluster status.
  • upgrade should only care about modifying the cluster configuration.

Reset is already split on ResetClusterStatusForNode. The question IMO would be if we should refactor the previous actions in different code paths because they perform different actions. Right now, this PR ensures that the ConfigMap update is safe in terms of concurrency, and it's basically the same logic as before when we were using a mere CreateOrUpdateConfigMap. The reason why this is working is that UploadConfiguration calls to CreateOrMutateConfigMap, and:

  • init will never execute the mutation logic, it will succeed on the creation because by definition in this case there's no ConfigMap at all yet.
  • join mutates the ClusterStatus but fetches the ClusterConfiguration from the cluster, not performing any change to it.
  • upgrade mutates the ClusterConfiguration fetched from the cluster, not performing any change in the ClusterStatus.

They are all safe in terms of concurrency because if you have competing clients, only one will succeed with the change, the rest will be returned a 409 Conflict by the apiserver, and since the CreateOrMutateConfigMap has the retry on conflict logic baked in, the rest of the clients will get the resource again, call to the mutator gofunc, update again [again, at this point, only one will succeed, the rest will perform the same loop again, until all of them succeed, or the current client retry limit of 20 is reached].

@ereslibre
Copy link
Contributor Author

All that being said, I don't know if that refactor should be done on this PR.

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @ereslibre !
I am fine with merging this now and do a refactoring later on.

@ereslibre
Copy link
Contributor Author

@neolit123 @fabriziopandini, do you think we can remove the hold?

@neolit123
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 8, 2019
@k8s-ci-robot k8s-ci-robot merged commit eda5a29 into kubernetes:master May 9, 2019
@ereslibre ereslibre deleted the upload-cluster-configuration-after-upgrade branch June 6, 2019 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible regression on upgrade on master
5 participants