--allow-dynamic-scaling does not respond to pod disruptions #123

tekicode · 2023-10-03T04:09:09Z

In the readme about --allow-dynamic-scaling:

By default, the controller does not react to voluntary/involuntary disruptions to receiver replicas in the StatefulSet. This flag allows the user to enable this behavior. When enabled, the controller will react to voluntary/involuntary disruptions to receiver replicas in the StatefulSet. When a Pod is marked for termination, the controller will remove it from the hashring and the replica essentially becomes a "router" for the hashring. When a Pod is deleted, the controller will remove it from the hashring. When a Pod becomes unready, the controller will remove it from the hashring. This behaviour can be considered for use alongside the Ketama hashing algorithm.

The two highlighted lines are incorrect, the controller does not have a podInformer subscribed to receive updates from pods associated with the hashring.

As such, the allow-dynamic-scaling flag only responds to changes in the replica count of the statefulset. This only happens if the statefulset is updated; that is separate from the health of pods.

I've explored adding a podInformer, updating the configmapInformer, and reworking the logic around how pods are chosen while keeping backwards compatibility.

But I've seen a lot of previous discussion/issues about this/related problems in the past. Is this seen as a problem? (it is to me) If so, what opinions do others have on how the controller should behave in this situation?

The text was updated successfully, but these errors were encountered:

christopherzli · 2024-01-10T22:24:19Z

I am also looking into this and wonder if there is any follow up?

chit786 · 2024-06-04T08:02:04Z

+1 on this feature. We also encountered situation where the receiver waiting for WAL replay but still was not moved out of hashring, leading to writes being errored out with.

 msg="failed to handle request" err="get appender: TSDB not ready"

I wonder if restartpolicy of pod could be utilised to mark it as "Terminating" and then the controller picks up from there ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--allow-dynamic-scaling does not respond to pod disruptions #123

--allow-dynamic-scaling does not respond to pod disruptions #123

tekicode commented Oct 3, 2023

christopherzli commented Jan 10, 2024

chit786 commented Jun 4, 2024 •

edited

Loading

--allow-dynamic-scaling does not respond to pod disruptions #123

--allow-dynamic-scaling does not respond to pod disruptions #123

Comments

tekicode commented Oct 3, 2023

christopherzli commented Jan 10, 2024

chit786 commented Jun 4, 2024 • edited Loading

chit786 commented Jun 4, 2024 •

edited

Loading