Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduces KubeAPIReadinessChecker used by startup monitor to assess Kube API server readiness/health condition #1180

Merged

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Jul 16, 2021

this PR implements the following checks

  • checks if we are not dealing with the old kas
  • checks kube-apiserver /healthz/etcd endpoint
  • checks kube-apiserver /healthz endpoint
  • checks kube-apiserver /readyz endpoint
  • checks if the kas pod is running at the expected revision
  • checks that kubelet has reporting readiness for the new pod

xref: openshift/enhancements#833

@p0lyn0mial
Copy link
Contributor Author

/assign @sttts

@p0lyn0mial p0lyn0mial force-pushed the sno-readiness-checks branch 2 times, most recently from f4601e9 to 92c73fc Compare July 19, 2021 14:23
return doKubeReadyCheck(ctx, ch.client, ch.baseRawURL, 3, 5*time.Second)
},

// doRevisionCheck in "strict" mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is strict? Better describe here

}

// doRevisionCheck checks if the kas pod is running at the expected revision
func doRevisionCheck(ctx context.Context, podClient corev1client.PodInterface, monitorRevision int, strictMode bool) (bool, string, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more descriptive with strictMode. What does it mean?

return !strictMode, "NotReady", "waiting for Kube API server pod to show up"
}
if len(apiServerPods.Items) != 1 {
return !strictMode, "NotReady", fmt.Sprintf("unexpected number of Kube API server pods %d, expected only one pod", len(apiServerPods.Items))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we change the name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiplePods

}

if revision != monitorRevision {
return false, "NotReady", fmt.Sprintf("the running Kube API is at unexpected revision %d, expected %d", revision, monitorRevision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use better reasons for all of these. NotReady should be reserved for the time we see the pod in the API, but it is not ready yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WrongRevision

func doRevisionCheck(ctx context.Context, podClient corev1client.PodInterface, monitorRevision int, strictMode bool) (bool, string, string) {
apiServerPods, err := podClient.List(ctx, metav1.ListOptions{LabelSelector: "apiserver=true"})
if err != nil {
return !strictMode, "NotReadyError", fmt.Sprintf("falied to get Kube API server pod due to %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodListError

return !strictMode, "NotReadyError", fmt.Sprintf("falied to get Kube API server pod due to %v", err)
}
if len(apiServerPods.Items) == 0 {
return !strictMode, "NotReady", "waiting for Kube API server pod to show up"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodNotRunning

kasPod := apiServerPods.Items[0]
revisionString, found := kasPod.Labels["revision"]
if !found {
return false, "NotReady", fmt.Sprintf("pod %s doesn't have revision label", kasPod.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodNotRunning

return false, "NotReady", fmt.Sprintf("pod %s doesn't have revision label", kasPod.Name)
}
if len(revisionString) == 0 {
return false, "NotReady", fmt.Sprintf("empty revision label on %s pod", kasPod.Name)
Copy link
Contributor

@sttts sttts Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidPod

return false, "NotReady", fmt.Sprintf("empty revision label on %s pod", kasPod.Name)
}
revision, err := strconv.Atoi(revisionString)
if err != nil || revision < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidPod

Transport: utilnet.SetTransportDefaults(&http.Transport{
TLSClientConfig: tlsConfig,
}),
Timeout: responseTimeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http.Client.Timeout may timeout even if the connection was sucesfull

https://cs.opensource.google/go/go/+/refs/tags/go1.16.6:src/net/http/client.go;l=90-94

I fixed recently a bug in kops because of this, https://github.com/kubernetes/kops/pull/11884/files#diff-8a8f23d88ae96f6029ac0558e279ed8850568082c913852bfd45c2b54e26166fR81-R93

We are operating in a 2 seconds timeframe that seems to short, but still can race.
What about having more granularity (I'm making up the values here)?

	httpClient := &http.Client{
		Transport: &http.Transport{
			DialContext: (&net.Dialer{
				Timeout:   5 * time.Second,
				KeepAlive: 5 * time.Second,
			}).DialContext,
			TLSHandshakeTimeout:   5 * time.Second,
			ResponseHeaderTimeout: 5 * time.Second,
			IdleConnTimeout:       5 * time.Second,
		},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This http client will be talking to Kube API over localhost. In case of an error/timeout requests will be retired for 5 min.

Do we still think 2 seconds is too short?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, this is always localhost, maybe I'm being too paranoid 😄
these requests doesn't go through the apf thing, right? there is no chance the server can take more than 2 seconds to reply? The different timeouts help to known which part timed out exactly, but as you say this may not apply here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these requests doesn't go through the apf thing, right?

it does, we will authN as "system:master" which always is privileged.

there is no chance the server can take more than 2 seconds to reply?

there is, for example getting a pod might take much longer if etcd is slow for example. I can increase the timeout to 5 seconds.

The purpose of these checks is to assess the readiness of the Kube API server and fallback to the previous version in case of any issues. If the API server is unable to serve requests because the connection to etcd is "slow" we should fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the Pod.List() and other api calls take 2 seconds on busy CIs, can this be a concern here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, this uses the kubeclient not the http.client


// SetRestConfig called by startup monitor to provide a valid configuration for authN/authZ against Kube API server
func (ch *KubeAPIReadinessChecker) SetRestConfig(restConfig *rest.Config) {
ch.restConfig = restConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this restConfig limited by the client-go parameters QPS, burst, RateLimiter?
if affirmative, should we override it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default values QPS=5, Burst=10 should be enough, we are going to send just request per second.


}
if ch.client == nil {
client, err := createHTTPClient(2*time.Second, ch.restConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should this be a constant?

const httpClientTimeout = 2* time.Second

}

if ch.kubeClient == nil {
kubeClient, err := kubernetes.NewForConfig(ch.restConfig)
Copy link
Contributor

@aojea aojea Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a timeout here too? I mean, like in the http.client{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should, I slightly reworked the PR


// note that we will be talking to Kube API over localhost and in case of an error/timeout requests will be retired for 5 min.
// setting the global timeout to a short value seems to be fine
ch.restConfig.Timeout = 4 * time.Second
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea increased the timeout to 4s and set QPS to 10 and Burst to 15

healthy: false,
reason: "UnhealthyError",
// we don't check the entire rsp from the server
msg: "falied while performing the check due to",
Copy link
Contributor

@aojea aojea Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit , you were carrying over this typo falied, I see it in 4 places

@aojea
Copy link
Contributor

aojea commented Jul 20, 2021

LGTM

// TODO: watch /var/log/kube-apiserver/termination.log for the first start-up attempt (beware of the race of startup-monitor startup and kube-apiserver startup). Set Reason=NeverStartedUp when this times out.
// TODO: watch /var/log/kube-apiserver/termination.log for more than one start-up attempt. Set Reason=CrashLooping if more than one is found and the monitor times out.

// doRevisionCheck in "lazy" mode to avoid false positive - failing readyz check when the previous instance hasn't terminated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is lazy mode?

return doRevisionCheck(ctx, ch.kubeClient.CoreV1().Pods(operatorclient.TargetNamespace), revision, false)
},

// checks etcd health condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be precise: this is "check etcd health how kube-apiserver sees it"


// doRevisionCheck in "lazy" mode to avoid false positive - failing readyz check when the previous instance hasn't terminated
func() (bool, string, string) {
return doRevisionCheck(ctx, ch.kubeClient.CoreV1().Pods(operatorclient.TargetNamespace), revision, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a revision check if the revision does not matter?! versus line 107

return doRevisionCheck(ctx, ch.kubeClient.CoreV1().Pods(operatorclient.TargetNamespace), revision, true)
},

// checks if kas pod is in PodRunning phase and has PodReady condition set to true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that kubelet has reporting readiness of the pod

}

// loop through a list of ordered checks for assessing Kube API readiness condition
for _, checkFn := range []func(context.Context) (bool, string, string){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 – nice and short

noOldRevisionPodExists(ch.kubeClient.CoreV1().Pods(operatorclient.TargetNamespace), revision),

// check kube-apiserver /healthz/etcd endpoint
goodETCDEndpoint(ch.client, ch.baseRawURL),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: good(KubeApiserver)HealthzEtcdEndpoint – this sounds like we check etcd directly.

func doHTTPCheckAndTransform(ctx context.Context, client *http.Client, rawURL string, checkName string, httpCheckFn func(ctx context.Context, client *http.Client, rawURL string) (int, string, error)) (bool, string, string) {
statusCode, response, err := httpCheckFn(ctx, client, rawURL)
if err != nil {
errMsg := fmt.Sprintf("failed while performing the check due to %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this helpful. does err have e.g. the path? Would also follow the normal format with colon: .... the check: %v

newRevisionPodExists(ch.kubeClient.CoreV1().Pods(operatorclient.TargetNamespace), revision),

// check that kubelet has reporting readiness of the pod
newPodHasStateRunning(ch.kubeClient.CoreV1().Pods(operatorclient.TargetNamespace)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see the last two linked into one check. It's the final check and it should ensure consistency. Now there is a slight race because we do two client calls.

}

if revision != monitorRevision {
return false, "InvalidPod", fmt.Sprintf("the running Kube API is at unexpected revision %d, expected %d", revision, monitorRevision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnexpectedRevision

@sttts
Copy link
Contributor

sttts commented Jul 22, 2021

/retest
/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 22, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2021
@sttts
Copy link
Contributor

sttts commented Jul 22, 2021

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 22, 2021

@p0lyn0mial: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-single-node abb0257 link /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit e1b7cc3 into openshift:master Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants