Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the ring auto-forget distributors which have been in an unhealthy state for too long #571

Closed
replay opened this issue Dec 2, 2021 · 5 comments

Comments

@replay
Copy link
Contributor

replay commented Dec 2, 2021

We should make the distributor ring auto forget the distributor instances which have been unhealthy for a defined amount of time because when a distributor gets hard killed then it has no chance to unregister itself from the ring. Over time this can lead to a situation where the ring is full of old unhealthy instances, growing the size of the ring and putting load on the memberlist client.

Since the distributors have no state there should be no risk associated with forgetting about the unhealthy ones.

Distributors use ring.Lifecycler while the code we already have requires ring.BasicLifecycler. We should move distributors to ring.BasicLifecycler and use the auto-forget feature.

@hi-rustin
Copy link
Contributor

@replay Can you provide more tips please? I would like to implement it, at the moment I don't know how to handle the ingestionRateStrategy properly, it looks like BasicLifecycler doesn't implement ReadLifecycler.

@replay
Copy link
Contributor Author

replay commented Apr 22, 2022

Hi @hi-rustin,

Thanks for looking into this!
I think we could look at the store-gateway as an example for how to use the auto-forget feature of the BasicLifecycler, you can see it here:

delegate = ring.NewAutoForgetDelegate(ringAutoForgetUnhealthyPeriods*gatewayCfg.ShardingRing.HeartbeatTimeout, delegate, logger)

But to use that feature the ring Distributor will have to use a BasicLifecycler instead of a Lifecycler. I guess the tracking of the number of healthy instances would have to be added to the BasicLifecycler somehow, maybe it could even be implemented as a delegate (BasicLifecyclerDelegate) which would be passed into the BasicLifecycler when it gets instantiated?

@hi-rustin
Copy link
Contributor

I guess the tracking of the number of healthy instances would have to be added to the BasicLifecycler somehow, maybe it could even be implemented as a delegate (BasicLifecyclerDelegate) which would be passed into the BasicLifecycler when it gets instantiated?

I wonder if I can keep track of it via OnRingInstanceRegister and OnRingInstanceStopping?

@replay
Copy link
Contributor Author

replay commented Apr 25, 2022

I wonder if I can keep track of it via OnRingInstanceRegister and OnRingInstanceStopping?

I'm not sure that's sufficient, as the health status can also change at other times than when an instance is registering or stopping.
I'd look at how it is implemented in the Lifecycler, it updates the property healthyInstancesCount in updateCounters here, which is called by updateConsul here.

So maybe the same thing could be done in the BasicLifecycler.updateInstance method as well

@pracucci
Copy link
Collaborator

Done in #2154.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants