You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We mark a node offline after MAX_UNAVAILABLE_INTERVAL_DEFAULT (30 seconds) of failures to respond to heartbeats.
We should be more generous during startup: when a pageserver sends us a re-attach request, we should tip off the heartbeater to be more generous. Currently the pageserver's processing of the re-attach respond can be quite time consuming.
This is similar to the k8s distinction between a readiness check and a status check: we should be more tolerant when waiting for readiness during startup, than when checking for responsiveness during normal runtime.
(The actual init_tenant_mgr slowness is addressed in #7553, but this ticket still stands: we should be more tolerant during startup than we are during normal operation.)
The text was updated successfully, but these errors were encountered:
We mark a node offline after
MAX_UNAVAILABLE_INTERVAL_DEFAULT
(30 seconds) of failures to respond to heartbeats.We should be more generous during startup: when a pageserver sends us a re-attach request, we should tip off the heartbeater to be more generous. Currently the pageserver's processing of the re-attach respond can be quite time consuming.
This is similar to the k8s distinction between a readiness check and a status check: we should be more tolerant when waiting for readiness during startup, than when checking for responsiveness during normal runtime.
(The actual init_tenant_mgr slowness is addressed in #7553, but this ticket still stands: we should be more tolerant during startup than we are during normal operation.)
The text was updated successfully, but these errors were encountered: