Downscale webhook fails when currently upscaling #72

jhalterman · 2023-07-18T01:29:49Z

If a pod is in the process of being created for a statefulset, the downscale webhook will reject an attempt to change a resource that would cause pods to downscale:

level=error ts=2023-07-18T01:23:33.406992007Z name=ingester-zone-a resource=statefulsets namespace=mimir-dev-11 request_gvk="apps/v1, Kind=StatefulSet" old_replicas=225 new_replicas=5 msg="downscale not allowed due to error" err="Post "http://ingester-zone-a-218.ingester-zone-a.mimir-dev-11.svc.cluster.local:80/ingester/prepare-shutdown": dial tcp: lookup ingester-zone-a-218.ingester-zone-a.mimir-dev-11.svc.cluster.local on 10.188.0.10:53: no such host"

This was discovered when an HPA was scaling up too aggressively, and when trying to revert the change that caused that, the downscale webhook rejected the change since the statefulset was currently upscaling.

56quarters · 2023-07-18T13:24:42Z

What would be the correct behavior in this case? Did the prepare-shutdown call eventually succeed once the pod started?

jhalterman · 2023-07-18T15:48:39Z

What would be the correct behavior in this case?

We could ignore "no such host" errors when performing this check since that implies the machine wasn't running in the first place. This might not be a perfect solution, but an improvement at least.

Did the prepare-shutdown call eventually succeed once the pod started?

Yes, it would succeed for a pod eventually, but in this scenario the HPA was regularly creating new pods, so then the same error would be hit on a new pod the next time a resource change was attempted.

jhalterman changed the title ~~Downscale webhook fails if any pod is being created~~ Downscale webhook fails when currently upscaling Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downscale webhook fails when currently upscaling #72

Downscale webhook fails when currently upscaling #72

jhalterman commented Jul 18, 2023 •

edited

56quarters commented Jul 18, 2023

jhalterman commented Jul 18, 2023

Downscale webhook fails when currently upscaling #72

Downscale webhook fails when currently upscaling #72

Comments

jhalterman commented Jul 18, 2023 • edited

56quarters commented Jul 18, 2023

jhalterman commented Jul 18, 2023

jhalterman commented Jul 18, 2023 •

edited