Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Probes] Incorrect application of the initialDelaySeconds #96614

Open
Kanshiroron opened this issue Nov 16, 2020 · 32 comments
Open

[Probes] Incorrect application of the initialDelaySeconds #96614

Kanshiroron opened this issue Nov 16, 2020 · 32 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@Kanshiroron
Copy link

Hello,

We are experiencing an issue with pod probes where they are not ran in sync with the configuration, which let suggests that the initialDelaySeconds is not correctly applied (or some other random delay is introduced).

We have a deployment that contains a livenessProbe and a readinessProbe configured as followed:

        livenessProbe:
          exec:
            command: [ "/opt/prime/bin/prime_health_checker", "control_status" ]
          initialDelaySeconds: 30
          periodSeconds: 30
          timeoutSeconds: 10
        readinessProbe:
          exec:
            command: [ "/opt/prime/bin/prime_health_checker", "rabbit_riak_connectivity", "log_blocking_fix" ]
          initialDelaySeconds: 15
          periodSeconds: 30
          timeoutSeconds: 10

According to the configuration both probes should be running 15 seconds apart from one another, but this is not the case. The time difference between pods are actually variable and random for each pods and replicas.

Here are some logs from Kubelet, showing only 6 seconds delay between the two probes:

Nov 13 13:44:14 ip-10-213-59-169 kubelet[17547]: I1113 13:44:14.954625   17547 prober.go:162] Exec-Probe Pod: &Pod{ObjectMeta:{taskman-sandbox-0 taskman-sandbox- sandbox /api/v1/namespaces/sandbox/pods/taskman-sandbox-0 0615148c-1210-4cb5-850b-52992f9a7ac9 46134867 0 2020-11-11 04:51:13 +0000 UTC <nil> <nil> map[controller-revision-hash:taskman-sandbox-755dc74dd8 pypestream.com/app:taskman pypestream.com/component:taskman pypestream.com/env:sandbox pypestream.com/group:prime pypestream.com/namespace:sandbox spotinst.io/restrict-scale-down:true statefulset.kubernetes.io/pod-name:taskman-sandbox-0] map[cni.projectcalico.org/podIP:100.119.77.208/32 cni.projectcalico.org/podIPs:100.119.77.208/32 fluentbit.io/parser:prime_log kubernetes.io/config.seen:2020-11-13T10:26:49.001091487Z kubernetes.io/config.source:api] [{apps/v1 StatefulSet taskman-sandbox bfb661c9-4ebc-4c6f-9c2f-98deaec09e56 0xc0005e7c16 0xc0005e7c17}] []  []},Spec:PodSpec{Volumes:[]Volume{Volume{Name:taskman-keys,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:&SecretVolumeSource{SecretName:taskman-keys,Items:[]KeyToPath{},DefaultMode:*420,Optional:nil,},NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,},},Volume{Name:default-token-rcwqc,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:&SecretVolumeSource{SecretName:default-token-rcwqc,Items:[]KeyToPath{},DefaultMode:*420,Optional:nil,},NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,},},},Containers:[]Container{Contai
Nov 13 13:44:14 ip-10-213-59-169 kubelet[17547]: ,Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:common-aws-s3-presentation-key,},Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:common-freshchat-token,},Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:taskman-env,},Optional:nil,},SecretRef:nil,},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:taskman-aws-keys,},Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:taskman-rabbitmq-user,},Optional:nil,},},},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:ip-10-213-59-169.us-west-2.compute.internal,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:taskman-sandbox-0,Subdomain:taskman-sandbox,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{Toleration{Key:node.kubernetes.io/not-ready,Operator:Exists,Value:,Effect:NoExecute,TolerationSeconds:*300,},Toleration{Key:node.kubernetes.io/unreachable,Operator:Exists,Value:,Effect:NoExecute,TolerationSeconds:*300,},},HostAliases:[]HostAlias{},PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]Topolo
Nov 13 13:44:17 ip-10-213-59-169 kubelet[17547]: I1113 13:44:17.678245   17547 prober.go:129] Liveness probe for "taskman-sandbox-0_sandbox(0615148c-1210-4cb5-850b-52992f9a7ac9):taskman-sandbox" succeeded
Nov 13 13:44:20 ip-10-213-59-169 kubelet[17547]: I1113 13:44:20.547366   17547 prober.go:162] Exec-Probe Pod: &Pod{ObjectMeta:{taskman-sandbox-0 taskman-sandbox- sandbox /api/v1/namespaces/sandbox/pods/taskman-sandbox-0 0615148c-1210-4cb5-850b-52992f9a7ac9 46134867 0 2020-11-11 04:51:13 +0000 UTC <nil> <nil> map[controller-revision-hash:taskman-sandbox-755dc74dd8 pypestream.com/app:taskman pypestream.com/component:taskman pypestream.com/env:sandbox pypestream.com/group:prime pypestream.com/namespace:sandbox spotinst.io/restrict-scale-down:true statefulset.kubernetes.io/pod-name:taskman-sandbox-0] map[cni.projectcalico.org/podIP:100.119.77.208/32 cni.projectcalico.org/podIPs:100.119.77.208/32 fluentbit.io/parser:prime_log kubernetes.io/config.seen:2020-11-13T10:26:49.001091487Z kubernetes.io/config.source:api] [{apps/v1 StatefulSet taskman-sandbox bfb661c9-4ebc-4c6f-9c2f-98deaec09e56 0xc0005e7c16 0xc0005e7c17}] []  []},Spec:PodSpec{Volumes:[]Volume{Volume{Name:taskman-keys,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:&SecretVolumeSource{SecretName:taskman-keys,Items:[]KeyToPath{},DefaultMode:*420,Optional:nil,},NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,},},Volume{Name:default-token-rcwqc,VolumeSource:VolumeSource{HostPath:nil,EmptyDir:nil,GCEPersistentDisk:nil,AWSElasticBlockStore:nil,GitRepo:nil,Secret:&SecretVolumeSource{SecretName:default-token-rcwqc,Items:[]KeyToPath{},DefaultMode:*420,Optional:nil,},NFS:nil,ISCSI:nil,Glusterfs:nil,PersistentVolumeClaim:nil,RBD:nil,FlexVolume:nil,Cinder:nil,CephFS:nil,Flocker:nil,DownwardAPI:nil,FC:nil,AzureFile:nil,ConfigMap:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Projected:nil,StorageOS:nil,CSI:nil,},},},Containers:[]Container{Contai
Nov 13 13:44:20 ip-10-213-59-169 kubelet[17547]: ,Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:common-aws-s3-presentation-key,},Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:common-freshchat-token,},Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:taskman-env,},Optional:nil,},SecretRef:nil,},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:taskman-aws-keys,},Optional:nil,},},EnvFromSource{Prefix:,ConfigMapRef:nil,SecretRef:&SecretEnvSource{LocalObjectReference:LocalObjectReference{Name:taskman-rabbitmq-user,},Optional:nil,},},},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,},},RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:ip-10-213-59-169.us-west-2.compute.internal,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,},ImagePullSecrets:[]LocalObjectReference{},Hostname:taskman-sandbox-0,Subdomain:taskman-sandbox,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{Toleration{Key:node.kubernetes.io/not-ready,Operator:Exists,Value:,Effect:NoExecute,TolerationSeconds:*300,},Toleration{Key:node.kubernetes.io/unreachable,Operator:Exists,Value:,Effect:NoExecute,TolerationSeconds:*300,},},HostAliases:[]HostAlias{},PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:*true,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]Topolo
Nov 13 13:44:23 ip-10-213-59-169 kubelet[17547]: I1113 13:44:23.377094   17547 prober.go:129] Readiness probe for "taskman-sandbox-0_sandbox(0615148c-1210-4cb5-850b-52992f9a7ac9):taskman-sandbox" succeeded

Here is what we noticed:

  • The time difference stay unchanged if the pod is restarted (by k8s, after some liveness probes failed).
  • If the pod is re-created, timings changes.
  • Timings are random and "unique" per replica.
  • This issue doesn't come from a time drift as it's observable as soon as the pod is created.

Environment:

  • K8s 1.16.10
  • Kops 1.16.3
  • Debian Stretch
  • Linux 4.9.0-13-amd64 # 1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux

Thanks for the help and the support

@Kanshiroron Kanshiroron added the kind/bug Categorizes issue or PR as related to a bug. label Nov 16, 2020
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 16, 2020
@Kanshiroron
Copy link
Author

/sig Scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 16, 2020
@clarkmcc
Copy link

clarkmcc commented Nov 21, 2020

I believe that this is expected behavior. I'll report my findings and we'll see if someone who knows this better then me can confirm. The InitialDelaySeconds prevents a probe from running before the delay has passed always. So I would imagine that you're at least seeing that the probes aren't occurring before that initial delay.

if int32(time.Since(c.State.Running.StartedAt.Time).Seconds()) < w.spec.InitialDelaySeconds {
return true
}

What you're probably seeing is a product of this

func (w *worker) run() {
probeTickerPeriod := time.Duration(w.spec.PeriodSeconds) * time.Second
// If kubelet restarted the probes could be started in rapid succession.
// Let the worker wait for a random portion of tickerPeriod before probing.
time.Sleep(time.Duration(rand.Float64() * float64(probeTickerPeriod)))

which indicates that the actual first probe time can be expressed as anytime after the InitialDelaySeconds and before InitialDelaySeconds + PeriodSeconds. Given the code comment, the idea behind this "random timing" is the avoid a flood of probes for a kubelet that has just started/restarted, we're basically providing some "jitter" here.

Again, I didn't write this but based on your description and what I'm seeing in the code, I believe what you're seeing is expected behavior.

Also I believe this falls under sig/node.

@clarkmcc
Copy link

/sig Node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Nov 21, 2020
@Cweiping
Copy link

I agree with @clarkmcc .

The InitialDelaySeconds of readiness probes are determined according to the start-up time of the container, not according to the readinessProbe -livenessProbe >= InitialDelaySeconds.

if int32(time.Since(c.State.Running.StartedAt.Time).Seconds()) < w.spec.InitialDelaySeconds {
return true
}

@Kanshiroron
Copy link
Author

@clarkmcc Thanks for the answer. I understand the underlined idea to not have kubelet over-flooded with probes when it starts/restarts.
But I think it's quite important to keep the same time difference between the two probes in the case something needs to be locked during the execution of the probe (which is our case).
-> Would it be possible to have the same random value applied to both probes then?

@clarkmcc
Copy link

@Kanshiroron it wouldn't be random if the same value were applied to all probes. Kubernetes wasn't designed to get into the specifics of your application's life cycle. If all instances of your application require coordination with each other, that's a problem that should be solved in the application layer, not the infrastructure layer.

@Kanshiroron
Copy link
Author

Kanshiroron commented Feb 22, 2021

@clarkmcc That's not what I meant, sorry if that wasn't clear enough.
What I suggested is to have the same random value applied to both the readiness and the liveness probe in the same pod. And the reason for this request is that in some cases we can have a shared lock in those probes thus resulting in some health check failing (if the added random values result in having the two probes running at the same time).

@matthyx
Copy link
Contributor

matthyx commented May 21, 2021

/assign

@mengjiao-liu
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 24, 2021
@n4j n4j added this to Triaged in SIG Node Bugs Jul 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 22, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node Bugs automation moved this from Triaged to Done Nov 21, 2021
@Kanshiroron
Copy link
Author

/reopen

@k8s-ci-robot
Copy link
Contributor

@Kanshiroron: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Nov 21, 2021
SIG Node Bugs automation moved this from Done to Triage Nov 21, 2021
@SergeyKanzhelev
Copy link
Member

SergeyKanzhelev commented Nov 24, 2021

Since we have a PR open #102064, removing rotten:

/remove-lifecycle rotten

@Kanshiroron do you have any means to check if PR addresses the bug?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 24, 2021
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Nov 24, 2021
@Kanshiroron
Copy link
Author

@SergeyKanzhelev sorry I actually missed @matthyx PR. No I unfortunately no easy way to test this, but I guess with some guidance I may be able to.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 22, 2022
@vaibhav2107
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 24, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2022
@vaibhav2107
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2022
@qiutongs
Copy link
Contributor

qiutongs commented Sep 4, 2022

Reporting a new finding in 1.21+.

The jitter is only added when periodSeconds > the elapsed time since Kubelet has started. This means the initial invocation of doProbe() for all probes in a container will ALWAYS happen at the same time. initialDelaySeconds is only a mask to auto-ok doProbe()

See https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/prober/worker.go#L137-L139

4870e64

Say a container has 3 probes with same periodSeconds and different initialDelaySeconds. The 3 probes will always be executed in the same seconds eventually.

@qiutongs
Copy link
Contributor

qiutongs commented Sep 4, 2022

I see 3 problems with predictible probe invocation.

  1. initialDelaySeconds doesn't work as the spec says: Number of seconds after the container has started before probes are initiated. Intuitively, this simply means the first invocation of the probe is containerStatuses.state.running.startedAt + initialDelaySeconds. The implementation is far from that, as [Probes] Incorrect application of the initialDelaySeconds #96614 (comment).

  2. The jitters given to the probes in the same container are different, as [Probes] Incorrect application of the initialDelaySeconds #96614 (comment). synchronize readiness and startup workers on kubelet start #102064 is trying to solve it.

  3. Since 1.21, the jitters are only added when kubelet is recently started/restarted, as [Probes] Incorrect application of the initialDelaySeconds #96614 (comment)

@matthyx
Copy link
Contributor

matthyx commented Sep 6, 2022

@qiutongs thanks for your analysis, so then what do you suggest?

@rphillips
Copy link
Member

rphillips commented Sep 6, 2022

Probes are started immediately but return without running the probe if the Pod.Running start time < InitialDelaySeconds [source]. This is probably due to the event handling to detect if the pod has terminated earlier in the function.

@qiutongs
Copy link
Contributor

qiutongs commented Sep 14, 2022

So far, I believe the correct logic is:

func (w *worker) run() {
  podStatus = GetPodStatus()
  
  if shouldNotProbe(podStatus) {
    return
  }
  
  // This is the current value
  jitter = rand(0, 1) * PeriodSeconds

  // podStatus.startTime is when the pod is acknowledged by Kubelet. Based on my test, this value remains the same on Kubelet restart
  // w.probeManager.start indicates when Kubelet starts
  //
  // If pod is acknowledged later than Kubelet start, it means this is a new pod. We simply sleep initialDelaySeconds
  // If Kubelet starts later than pod is acknowledged, it means this is an existing pod and Kubelet just restarted. We give some jitter.
  if podStatus.startTime > w.probeManager.start {
    sleep(initialDelaySeconds)
  } else {
    sleep(jitter)
  }

  for doProbe() {
    ...
  }
}

Notes:

  • Based on my test, the PodStatus.startTime remains the same on kubelet restart.
  • I will find time(likely 2 weeks later) to implement it with real code, as matthyx@ suggested. In the meanwhile, please correct my pseudocode if you see problems.

@SergeyKanzhelev
Copy link
Member

sleep(jitter)

what is the jitter value in this pseudocode?

if "pod is created after kubelet starts" {

Can you explain what it means? Pod is created while kubelet was still running as oppose to worker is being initialized after kubelet has restarted?

I think the good first step in improving this logic is to analyze the test coverage for it - which scenarios are covered and which needs to be added to ensure the consistency of docs and probes. For example, do we have jitter logic tested? Do we need a test that demonstrates how running all probes at once at a start is a problem and jitter actually helps?

@qiutongs
Copy link
Contributor

qiutongs commented Sep 15, 2022

what is the jitter value in this pseudocode?

Same as the current value in
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/prober/worker.go#L138. This is the jitter I meant. The purpose of this jitter is to avoid thundering herd problem when kubelet restarts.

if "pod is created after kubelet starts" {

Can you explain what it means? Pod is created while kubelet was still running as oppose to worker is being initialized after kubelet has restarted?

No. Sorry for the confusion.
I mean comparing PodStatus.startTime and the timestamp that Kubelet starts. Then, for this pod, we can know it is (1) Kubelet restart or (2) new pod. See my update in the pseudocode. #96614 (comment)

I think the good first step in improving this logic is to analyze the test coverage for it

That is a good idea. I believe we don't have tests for this jitter today. Any recommendation for testing such random code?

@matthyx
Copy link
Contributor

matthyx commented Sep 15, 2022

Do we need a test that demonstrates how running all probes at once at a start is a problem and jitter actually helps?

Testing a thundering herd problem looks like a performance test to me... not easy.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 13, 2023
@vaibhav2107
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.