Adding redundancy with de-duplication to Mimir Ruler? #2434
Replies: 1 comment 7 replies
-
The HA tracker is implemented through a leader election. A new leader is elected after no data is received from the current leader for X time (configurable timeout). Due to the nature of ruler (rule evaluations are regular intervals) I think you can't set a low timeout, and so you could end up with missing rule samples anyway. Happy to be proved to be wrong.
Compactor (as it works today) will not de-duplicate samples, because sample timestamps for series generated by both ruler replicas will be different. |
Beta Was this translation helpful? Give feedback.
-
What We're Trying to Achieve
We have set up a Mimir v2.1.0 stack including a Ruler in remote mode. Our
undestanding is that in steady state a particular rule group is evaluated by
the Ruler pool exactly once. Briefly, evaluation cycles might be missed during
Ruler replica restarts, resulting in "holes" in the data.
Our goal is to eliminate those holes. To achieve that, we'd like to add
redundancy: Evaluate each rule group twice, then de-duplicate the results.
Question
What would be the recommended approach to implementing the redundancy as
described above?
Ideas
We've come up with the following ideas on how to implement the redundancy.
HA Tracker
TL;DR have two pools of Rulers with the same number of workers; the pair of
corresponding workers (one from each pool) would choose a leader using HA
tracker -- achieving the de-duplication.
For such de-duplication to work well, a particular rule group would always have
to hash to the same replica in both pools. In other words, each pair of workers
competing for leadership has to be evaluating the same set of rule groups.
This approach would require changes in how the ring is used by the Ruler pools
(e.g. when a worker is down, other workers in the same pool must not pick up
its work). Also, resizing the pools -- without introducing holes and/or
duplicate data -- would be a challenge.
Compactor
TL;DR have two pools of Rulers, have each pool (as a whole) write to a disjoint
set of Ingesters (maybe separate pools?), then let the Compactor de-duplicate.
Thanks for any pointers.
Beta Was this translation helpful? Give feedback.
All reactions