You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While introducing DWD, we set KCM's --node-monitor-grace-period to 120s, but this is very long and way beyond KCM's default of 40s. We should go back to that default, for seeds and shoots. See also gardener/dependency-watchdog#79.
Why is this needed:
We saw during "zone outage simulations" that recovery happens very slowly. It takes 2m for KCM to put a node into Unknown state and only then things move forward like endpoint and loadbalancer deregistration. Notable is especially our Istio ingress gateway/the API server that appears available for 2m before it gets updated even though it's dead for the entire time.
The text was updated successfully, but these errors were encountered:
vlerenc
changed the title
Switch default KCM --node-monitor-grace-period to 40s
Switch default KCM --node-monitor-grace-period to 40sMar 23, 2023
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
What would you like to be added:
While introducing DWD, we set KCM's
--node-monitor-grace-period
to120s
, but this is very long and way beyond KCM's default of40s
. We should go back to that default, for seeds and shoots. See also gardener/dependency-watchdog#79.Why is this needed:
We saw during "zone outage simulations" that recovery happens very slowly. It takes
2m
for KCM to put a node intoUnknown
state and only then things move forward like endpoint and loadbalancer deregistration. Notable is especially our Istio ingress gateway/the API server that appears available for 2m before it gets updated even though it's dead for the entire time.The text was updated successfully, but these errors were encountered: