Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector #11453

jan-kantert · 2023-10-04T16:02:08Z

What problem are you trying to solve?

I have an issue when our Kubernetes cluster is under high CPU load. In this case kubelet will be slow to read readiness and liveness probe responses. In some cases we see kubernetes restarting linkerd-proxy pods due to failed livenessProbes. To reduce the chance of this happing we would like to increase the timeout (from the default 1s) to something like 10s or 20s which would be enough even under very high load.

How should the problem be solved?

Add some config parameter for probe timeouts in linkerd-proxy-injector. I would also set the timeout for livenessProbe a bit higher by default to follow kubernetes best practice.

Any alternatives you've considered?

This is partially caused by kubernetes/kubernetes#89898 and there already have been some improvements. More is coming. As a workaround we can reserve more CPU for kubelet but that harms resource utilization because less CPU will be available for payload on our nodes.

How would users interact with this feature?

Users can optionally set this timeout in their helm chart or in their linkerd-proxy-injector config.

Would you like to work on this feature?

yes

kflynn · 2023-10-11T16:33:28Z

Hey @jan-kantert! This seems like a good thing and if you're willing to work on it, we'd love to support you. What would help you? 🙂

jan-kantert · 2023-10-26T11:38:06Z

Hey @jan-kantert! This seems like a good thing and if you're willing to work on it, we'd love to support you. What would help you? 🙂

Have a look at my PR #11458. Is that how you expect the change?

stale · 2024-01-25T03:04:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@jan-kantert

This release addresses some issues in the destination service that could cause it to behave unexpectedly when processing updates. * Fixed a race condition in the destination service that could cause panics under very specific conditions ([#12022]; fixes [#12010]) * Changed how updates to a `Server` selector are handled in the destination service. When a `Server` that marks a port as opaque no longer selects a resource, the resource's opaqueness will reverted to default settings ([#12031]; fixes [#11995]) * Introduced Helm configuration values for liveness and readiness probe timeouts and delays ([#11458]; fixes [#11453]) (thanks @jan-kantert!) [#12010]: #12010 [#12022]: #12022 [#11995]: #11995 [#12031]: #12031 [#11453]: #11453 [#11458]: #11458 Signed-off-by: Matei David <matei@buoyant.io>

jan-kantert added the enhancement label Oct 4, 2023

jan-kantert mentioned this issue Oct 5, 2023

Config for timeouts and delays in probes #11458

Merged

stale bot added the wontfix label Jan 25, 2024

stale bot closed this as completed Feb 8, 2024

mateiidavid mentioned this issue Feb 8, 2024

edge-24.2.2 #12053

Merged

github-actions bot locked as resolved and limited conversation to collaborators Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector #11453

Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector #11453

jan-kantert commented Oct 4, 2023

kflynn commented Oct 11, 2023

jan-kantert commented Oct 26, 2023

stale bot commented Jan 25, 2024

Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector #11453

Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector #11453

Comments

jan-kantert commented Oct 4, 2023

What problem are you trying to solve?

How should the problem be solved?

Any alternatives you've considered?

How would users interact with this feature?

Would you like to work on this feature?

kflynn commented Oct 11, 2023

jan-kantert commented Oct 26, 2023

stale bot commented Jan 25, 2024