-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2044347: bump to k8s 1.23.3 #1145
Bug 2044347: bump to k8s 1.23.3 #1145
Conversation
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
In translation lib rbd pluin, the monitors slice is nil which has to be initialized with an empty value. This commit address the same. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
…19 from the rules Signed-off-by: Carlos Panato <ctadeu@gmail.com>
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
In case stacked etcd is used, the code that does expiration checks does not validate if the etcd CA is "external" (missing key) and if the etcd CA signed certificates are valid. Add a new function UsingExternalEtcdCA() similar to existing functions for the cluster CA and front-proxy CA, that performs the checks for missing etcd CA key and certificate validity. This function only runs for stacked etcd, since if etcd is external kubeadm does not track any certs signed by that etcd CA. This fixes a bug where the etcd CA will be reported as local even if the etcd/ca.key is missing during "certs check-expiration".
Apply a small fix to ensure the kubeconfig files that kubeadm manages have a CA when printed in the table of the "check expiration" command. "CAName" is the field used for that. In practice kubeconfig files can contain multiple credentials from different CAs, but this is not supported by kubeadm and there is a single cluster CA that signs the single client cert/key in kubeadm managed kubeconfigs.
When the "kubeadm certs check-expiration" command is used and if the ca.key is not present, regular on disk certificate reads pass fine, but fail for kubeconfig files. The reason for the failure is that reading of kubeconfig files currently requires reading both the CA key and cert from disk. Reading the CA is done to ensure that the CA cert in the kubeconfig is not out of date during renewal. Instead of requiring both a CA key and cert to be read, only read the CA cert from disk, as only the cert is needed for kubeconfig files. This fixes printing the cert expiration table even if the ca.key is missing on a host (i.e. the CA is considered external).
Revert to previous behavior in 1.21/1.20 of setting pod phase to failed during graceful node shutdown. Setting pods to failed phase will ensure that external controllers that manage pods like deployments will create new pods to replace those that are shutdown. Many customers have taken a dependency on this behavior and it was breaking change in 1.22, so this change reverts back to the previous behavior. Signed-off-by: David Porter <david@porter.me>
…-fix-123 kubelet: set failed phase during graceful shutdown
…k-of-#106641-upstream-release-1.23 Automated cherry pick of kubernetes#106641: DelegateFSGroupToCSIDriver e2e: skip tests with chgrp
…pick-of-#106891-origin-release-1.23 Automated cherry pick of kubernetes#106891: kubeadm: validate local etcd certficates during
…pick-of-#106927-origin-release-1.23 Automated cherry pick of kubernetes#106854: kubeadm: avoid requiring a CA key during kubeconfig
…ck-of-#106878-upstream-release-1.23 Automated cherry pick of kubernetes#106878: rbd: initialize ceph monitors slice with an empty value.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Previously these would return lists that were too long because we appended to pre-initialized lists with a specific size. Since the primary place these functions are used is in the mean and standard deviation calculations for the NUMA distribution algorithm, it meant that the results of these calculations were often incorrect. As a result, some of the unit tests we have are actually incorrect (because the results we expect do not actually produce the best balanced distribution of CPUs across all NUMA nodes for the input provided). These tests will be patched up in subsequent commits. Signed-off-by: Kevin Klues <kklues@nvidia.com>
This fixes two related tests to better test our "balanced" distribution algorithm. The first test originally provided an input with the following number of CPUs available on each NUMA node: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 20 It then attempted to distribute 48 CPUs across them with an expectation that each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node 0 with no more CPUs in the end). This would have resulted in the following amount of CPUs on each node: Node 0: 0 Node 1: 4 Node 2: 4 Node 3: 20 Which results in a standard deviation of 7.6811 However, a more balanced solution would actually be to pull 16 CPUs from NUMA nodes 1, 2, and 3, and leave 0 untouched, i.e.: Node 0: 16 Node 1: 4 Node 2: 4 Node 3: 4 Which results in a standard deviation of 5.1961524227066 To fix this test we changed the original number of available CPUs to start with 4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.: Node 0: 18 Node 1: 20 Node 2: 20 Node 3: 16 So that we end up with a result of: Node 0: 2 Node 1: 4 Node 2: 4 Node 3: 16 Which pulls the CPUs from where we want and results in a standard deviation of 5.5452 For the second test, we simply reverse the number of CPUs available for Nodes 0 and 3 as: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 18 Which forces the allocation to happen just as it did for the first test, except now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2. Signed-off-by: Kevin Klues <kklues@nvidia.com>
Now that the algorithm for balancing CPU distributions across NUMA nodes is correct, this test actually behaves differently for the "packed" vs. "distributed" allocation algorithms (as it should). In the "packed" case we need to ensure that CPUs are allocated such that they are packed onto cores. Since one CPU is already allocated from a core on NUMA node 0, we want the next CPU to be its hyperthreaded pair (even though the first available CPU id is on Socket 1). In the "distributed" case, however, we want to ensure CPUs are allocated such that we have an balanced distribution of CPUs across all NUMA nodes. This points to allocating from Socket 1 if the only other CPU allocated has been done on Socket 0. To allow CPUs allocations to be packed onto full cores, one can allocate them from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of hypthreads per core (in this case 2). We added an explicit test case for this, demonstrating that we get the same result as the "packed" algorithm does, even though the "distributed" algorithm is in use. Signed-off-by: Kevin Klues <kklues@nvidia.com>
Previously the algorithm was too restrictive because it tried to calculate the minimum based on the number of *available* NUMA nodes and the number of *available* CPUs on those NUMA nodes. Since there was no (easy) way to tell how many CPUs an individual NUMA node happened to have, the average across them was used. Using this value however, could result in thinking you need more NUMA nodes to possibly satisfy a request than you actually do. By using the *total* number of NUMA nodes and CPUs per NUMA node, we can get the true minimum number of nodes required to satisfy a request. For a given "current" allocation this may not be the true minimum, but its better to start with fewer and move up than to start with too many and miss out on a better option. Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Without this fix, the algorithm may decide to allocate "remainder" CPUs from a NUMA node that has no more CPUs to allocate. Moreover, it was only considering allocation of remainder CPUs from NUMA nodes such that each NUMA node in the remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these two issues in play, one could end up with an accounting error where not enough CPUs were allocated by the time the algorithm runs to completion. The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from the set of NUMA nodes considered for allocating remainder CPUs. Additionally, we now consider *all* combinations of nodes from the remainder set of size 1..len(remainderSet). This allows us to find a better solution if allocating CPUs from a smaller set leads to a more balanced allocation. Finally, we loop through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have been accounted for and allocated. This ensure that we will not hit an accounting error later on because we explicitly remove CPUs from the remainder set until there are none left. A follow-on commit adds a set of unit tests that will fail before these changes, but succeeds after them. Signed-off-by: Kevin Klues <kklues@nvidia.com>
@soltysh: Overrode contexts on behalf of soltysh: ci/prow/verify-commits In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/payload 4.10 ci blocking |
@soltysh: trigger 5 jobs of type blocking for the ci release of OCP 4.10
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4b7bcd0-7f5b-11ec-8d7c-d99303b7cdb0-0 trigger 5 jobs of type informing for the ci release of OCP 4.10
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4b7bcd0-7f5b-11ec-8d7c-d99303b7cdb0-1 trigger 7 jobs of type blocking for the nightly release of OCP 4.10
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4b7bcd0-7f5b-11ec-8d7c-d99303b7cdb0-2 trigger 44 jobs of type informing for the nightly release of OCP 4.10
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4b7bcd0-7f5b-11ec-8d7c-d99303b7cdb0-3 |
/hold cancel |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: soltysh, wallylewis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-aws-downgrade |
1 similar comment
/test e2e-aws-downgrade |
/override ci/prow/e2e-aws-downgrade |
@soltysh: Overrode contexts on behalf of soltysh: ci/prow/e2e-aws-downgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/verify-commits |
@soltysh: Overrode contexts on behalf of soltysh: ci/prow/verify-commits In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
6 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/override ci/prow/e2e-aws-fips |
@soltysh: Overrode contexts on behalf of soltysh: ci/prow/e2e-aws-fips In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@soltysh: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@soltysh: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with Bugzilla bug 2044347 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR has the following changes:
Replaces #1139