-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeadm: check health status of all control plane components in wait-control-plane phase for inti #119598
Kubeadm: check health status of all control plane components in wait-control-plane phase for inti #119598
Changes from all commits
b30ec06
823dc89
3f4ad62
a00df8d
b041dcc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,183 @@ | ||
/* | ||
Copyright 2023 The Kubernetes Authors. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package controlplane | ||
|
||
import ( | ||
"context" | ||
"encoding/json" | ||
"fmt" | ||
"io" | ||
"os" | ||
"path/filepath" | ||
"time" | ||
|
||
"github.com/pkg/errors" | ||
"gopkg.in/yaml.v2" | ||
|
||
v1 "k8s.io/api/core/v1" | ||
"k8s.io/apimachinery/pkg/util/wait" | ||
"k8s.io/client-go/rest" | ||
"k8s.io/klog/v2" | ||
kubeletconfig "k8s.io/kubelet/config/v1beta1" | ||
|
||
kubeadmconstants "k8s.io/kubernetes/cmd/kubeadm/app/constants" | ||
"k8s.io/kubernetes/cmd/kubeadm/app/phases/kubeconfig" | ||
"k8s.io/kubernetes/cmd/kubeadm/app/util/staticpod" | ||
) | ||
|
||
// ControlPlaneComponents contains all components in control plane | ||
var ControlPlaneComponents = []string{ | ||
kubeadmconstants.KubeAPIServer, | ||
kubeadmconstants.KubeControllerManager, | ||
kubeadmconstants.KubeScheduler, | ||
} | ||
Comment on lines
+42
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's make this a function instead that returns a private list:
|
||
|
||
type component struct { | ||
name string | ||
labels map[string]string | ||
touched bool | ||
} | ||
|
||
// WaitForControlPlaneComponents wait for control plane component to be ready by check pod status returned by kubelet | ||
func WaitForControlPlaneComponents(componentNames []string, timeout time.Duration, manifestDir, kubeletDir, certificatesDir string) error { | ||
certFile := filepath.Join(certificatesDir, kubeadmconstants.APIServerKubeletClientCertName) | ||
keyFile := filepath.Join(certificatesDir, kubeadmconstants.APIServerKubeletClientKeyName) | ||
Comment on lines
+57
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we use the cert and key for the kubelet client stored in /etc/kubernetes/kubelet.conf (they link to files under /var/lib/kubelet...)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean file curl -k --cert /var/lib/kubelet/pki/kubelet-client-current.pem --key /var/lib/kubelet/pki/kubelet-client-current.pem https://127.0.0.1:10250/pods got the following result: By the way, why cert and key of apiserver-kubelet-client is not good to fetch pods info from kubelet? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i was wondering if we can just use the less privileged kubelet cert. apparently not. let's continue with the apiserver cert for now. thanks |
||
|
||
client, err := rest.HTTPClientFor(&rest.Config{ | ||
TLSClientConfig: rest.TLSClientConfig{ | ||
CertFile: certFile, | ||
KeyFile: keyFile, | ||
Insecure: true, | ||
}, | ||
}) | ||
if err != nil { | ||
return errors.Wrap(err, "failed to create kubelet client") | ||
} | ||
|
||
kubeletEndpoint, err := getKubeletEndpoint(filepath.Join(kubeletDir, kubeadmconstants.KubeletConfigurationFileName)) | ||
if err != nil { | ||
return errors.Wrap(err, "failed to get kubelet endpoint") | ||
} | ||
|
||
components := make([]*component, len(componentNames)) | ||
for i, name := range componentNames { | ||
labels, err := getComponentLabels(name, manifestDir) | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to get pod labels of %s component", name) | ||
} | ||
|
||
components[i] = &component{name, labels, false} | ||
} | ||
|
||
return wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, false, func(ctx context.Context) (bool, error) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we use the same retry interval as the legacy waiter for API server: |
||
klog.V(1).Infoln("[control-plane] polling status of control plane components...") | ||
|
||
resp, err := client.Get(kubeletEndpoint) | ||
if err != nil { | ||
fmt.Printf("[kubelet client] Error getting pods [%v]\n", err) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of printing these errors we can store the last error as a
|
||
return false, nil | ||
} | ||
|
||
defer resp.Body.Close() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unless i'm mistaken the linter in this repo will complain that the return value is not checked. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, the linter did complaint with this, but it is not a required check, and i also check some old code that did not check the returned err with closing a response body, so I just levea is as it is. I will check this returned error in new commit |
||
|
||
data, err := io.ReadAll(resp.Body) | ||
if err != nil { | ||
fmt.Printf("[kubelet client] Error reading pods from response body [%v]\n", err) | ||
return false, nil | ||
} | ||
|
||
pods := &v1.PodList{} | ||
if err := json.Unmarshal(data, pods); err != nil { | ||
fmt.Printf("[kubelet client] Error parsing pods from response body: %q\n", data) | ||
return false, nil | ||
} | ||
|
||
for _, comp := range components { | ||
labels := comp.labels | ||
match_pod: | ||
for _, pod := range pods.Items { | ||
podLabels := pod.ObjectMeta.Labels | ||
for key, value := range labels { | ||
if podLabels[key] != value { | ||
continue match_pod | ||
} | ||
} | ||
|
||
comp.touched = true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on a quick look it's not obvious why we the code is tracking labels and touched state. can you elaborate why status.Ready is not enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, i will check the comments later. |
||
|
||
for _, status := range pod.Status.ContainerStatuses { | ||
if !status.Ready { | ||
klog.V(1).Infof("[control-plane] component: %s is not ready\n", comp.name) | ||
return false, nil | ||
} | ||
} | ||
|
||
klog.V(1).Infof("[control-plane] component: %s is ready\n", comp.name) | ||
} | ||
} | ||
|
||
for _, comp := range components { | ||
if !comp.touched { | ||
fmt.Printf("[kubelet client] Couldn`t find pod for component: %s with labels: [%v]\n", comp.name, comp.labels) | ||
return false, nil | ||
} | ||
} | ||
|
||
return true, nil | ||
}) | ||
} | ||
|
||
func getKubeletEndpoint(configFile string) (string, error) { | ||
config := &kubeletconfig.KubeletConfiguration{} | ||
|
||
data, err := os.ReadFile(configFile) | ||
if err != nil { | ||
return "", err | ||
} | ||
|
||
if err := yaml.Unmarshal(data, config); err != nil { | ||
return "", err | ||
} | ||
|
||
if config.Authorization.Mode == kubeletconfig.KubeletAuthorizationModeWebhook { | ||
// make sure cluster admins role binding is created, thus request to kubelet will pass server authorization | ||
if _, err := kubeconfig.EnsureAdminClusterRoleBinding(kubeadmconstants.KubernetesDir, nil); err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isn't the RB already created at this point? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This RB is created in function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if the CRB is required earlier perhaps we can call a in runWaitControlPlanePhase there is a but, in this discussion we talked about using the apiserver client cert/key i.e.
kubeconfig.EnsureAdminClusterRoleBinding() ensures that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
previously
kubelet server will install auth filter with
Why put this code in Now, I need to revert There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
there might be a "chicken and egg" problem, because data.ClientWithoutBootstrap was used because data.Client() does not work yet. i.e. the CRB cannot be performed due to apiserver pod not ready. you can try it, of course.
i see, so it's because of this: if it's possible to create the Client() earlier that might be best, but if it's not, perhaps we can:
|
||
return "", err | ||
} | ||
} | ||
|
||
kubeletPort := config.Port | ||
if kubeletPort == 0 { | ||
kubeletPort = kubeadmconstants.KubeletPort | ||
} | ||
|
||
return fmt.Sprintf("https://127.0.0.1:%d/pods", kubeletPort), nil | ||
} | ||
|
||
func getComponentLabels(component string, manifestDir string) (map[string]string, error) { | ||
pod, err := staticpod.ReadStaticPodFromDisk(kubeadmconstants.GetStaticPodFilepath(component, manifestDir)) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
labels := pod.ObjectMeta.Labels | ||
if labels == nil { | ||
return nil, errors.New("Empty labels") | ||
} | ||
|
||
return labels, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a new hidden phases seems ok.
i guess on join we are not calling WaitForAPI or have a dedicated phase to wait for the apiserver to come up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we need to add legacy
WaitForAPI
code when this feature gate is disabled in this hidden phaseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, i'm just trying to remember what happens on join.
i guess we are not even calling WaitForAPI on join, so adding the new hidden phase seems OK.
later if we decide we can make it non-hidden.