-
Notifications
You must be signed in to change notification settings - Fork 295
Release Prep to v0.13.x branch #1589
Release Prep to v0.13.x branch #1589
Conversation
…en away in the api package. Update to k8s v1.13.5 Put the worker/kubelet and admin certs on the controllers. Disabled apiserver insecure port 8080 - only https on 443 alllowed. Configure controllers kubelet to do TLS bootstrapping same as workers (if >=1.14). Update Networking Components (calico v3.6.1, flannel v0.11.0) Enable PodPriority by default Enable Metrics-server by default and remove heapster Enable CoreDNS for Cluster DNS resolution Refactor install-kube-system (group related manifests for clarity and deploy with single apply/delete for performance) Update install-kube-system to clean up deprecated services and objects (.e.g. heapster) Update Kiam to 3.2 - WARNING! Kiam Server Certificate now needs to be re-generated to include SAN "kiam-server" (previously was just kiam-server:443) Remove Experminental Settings for TLSBootstrap, Pod Priority, NodeAuthorizer, PersistentVolumeClaimResize Remove experimental Mutating and Validating Webhooks which are now enabled by default. Update the node role label to node.kubernetes.io/role which is allowed by the NodeRestriction AdmissionController
…branch and then switch to using them in the 0.14 release branch. Disable Admission Controller NodeRestriction in 0.13 release branch.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Codecov Report
@@ Coverage Diff @@
## v0.13.x #1589 +/- ##
===========================================
- Coverage 25.87% 25.67% -0.21%
===========================================
Files 98 98
Lines 5074 5052 -22
===========================================
- Hits 1313 1297 -16
+ Misses 3614 3610 -4
+ Partials 147 145 -2
Continue to review full report at Codecov.
|
NodeStatusUpdateFrequency is not definable for Controller nodes
…ization Add RBAC objects to allow unauthenticated access to the kubelet's /healthz endpoint (so that cfn-signal can curl it without creds)
…do things after the plugin manifests and/or helm charts have been deployed)
…ogs to authenticated users
I'm trying to upgrade a test cluster created with kube-aws 0.12.3, and some of the cluster resources in the kube-system namespace refuses to start when the control-plane nodes are being updated. Resulting in a rollback stack rollback. Etcd and network stacks did update nicely. The test cluster has three etcd nodes and two controller nodes. In addition it has two regular worker nodes. |
Hi thanks for testing @paalkr! Do you have any more details on what the errors are? |
… that controller nodes can create mirror pods. Remove writing kube-aws version to the motd - causing extended rolls just to update version number which is available on a tag anyway.
@paalkr do you have any further details? I was able to provision a v0.12.3 cluster and then upgrade to v0.13.0 but my cluster.yaml and cluster state could be very different to yours! I'm sure there is an issue here, but we'll have to dig deeper in order to pinpoint what is happening! Could you perhaps try the upgrade and capture the kubelet logs on a node which fails? If it's a controller can you send me:-
Thanks! 🙏 |
Thanks! I will try to gather more information. BTW, I'm updating the cluster in a sequence, following these steps I will try a new update now to collect more logs. |
The install-kube-system log did reveal a problem with the Kubernetes Dashboard deployment, retrying over and over again. This corresponds with a lot of the pods being started and terminated as shown in my previous screenshot.
My initial dashboard configuration was just
Altering to the values as shown below fixed the dashboard deployment, and made the control plane node start successfully and signaling OK to the stack :)
|
My next problem is that kiam server and client version 3.2 are stuck in crash loop. My initial kube-aws 0.12.3 cluster did run kiam 2.7. I'm trying to upgrade to 3.2. This the the describe server pod output
And this is the client
|
Looking at the kiam default deployment, the command and args has changed from
and
to just /kiam for both server and client, with a different set of args.
|
Sorry for spamming you with comments @davidmccormick ;) helm/tiller issue that I also experience after upgrade of the control-plane. Execute a helm version fails as described in this issue: helm/helm#3104 |
Hi - many thanks for the extra info here - I fat fingered a change which set default resource limits on the dashboard and didn't notice because in my cluster.yaml they are set explicitly! I've pushed a commit which will hopefully resolve this one. |
Ah great! Thanks for pointing that out! I have checked our manifests against the official versions and updated the command-line args. Can you take another look? Many thanks! 🙏 |
@paalkr regarding the helm/tiller issue - I just pushed a fix to RBAC that I believe should fix things! |
Thanks @davidmccormick . I will make a new build and execute some testes. |
So what is the plan to support kiam versions prior to 3.0? The new args and command line you added will not work with older version, like if someone specified this in their cluster.yaml. Should you check for the kiam version, or will it be enough to document this as a breaking change?
|
helm/tiller seems to work as intended now. Thanks for fixing this issue |
The changes you introduced to add localhost to the kiam cert, to make kiam health check to work forces you to generate new certificates using kube-aws render certificates (--kiam). That again introduces a lot of pain, like flannel not starting I'm not sure if the flannel issue is related to what I describe above, but it started to happen after I regenerated the certs. What is the recommended workflow upgrading a kube-aws 0.12.3 cluster with kiam 2.7 to kube-aws 0.13 with kiam 3.2? I'm not sure of what commands to execute to provide whatever logs you might need. |
@paalkr I appreciate that the mechanism is a bit klunky! We actually use our own tool and Vault to manage all of the credentials files and certificates. My only suggestion would be to back up your credentials directory, run the re-generate to get new kiam certs and then restore and replace just your kiam certs. |
I am going to merge this into the branch now and make it a beta release. Please do continue to test but can you raise an issue for any further bugs that you find (one issue for all would be fine)? |
@davidmccormick , thnaks. My plan regarding the kiam cert was to do exactly as you described. I will continue to test, and can raise a general ticket for 0.13 beta testing results. |
Kube-aws 0.13 Release PR
The 0.13.x release adds new node.kubernetes.io/role labels to all nodes but does not use them. They will become active in the 0.14.x release where the NodeRestriction Admission Controller will be enabled which denies the use of the existing labels.
Changes in this release: -
TLSBootstrap
,PodPriority
,PodSecurityPolicy
,NodeAuthorizer
,PersistentVolumeClaimResize
which are all now enabled by default.