-
Notifications
You must be signed in to change notification settings - Fork 104
duplicate datapoints are handled differently depending on whether re-orderBuffer is used #1201
Comments
You're saying even with ROB enabled, you want to retain the older value even if an update comes in for that point? Why? |
I am less worried about whether we persist the first value or latest value. More important is that we increment the out-of-order counter. Currently if a user is sending duplicate data and has ROB enabled, we are blind to it. |
fixes #1201 Without the reorder_buffer, MT only persists the first value received for each timestamp. With the re-order buffer we should do the same.
fixes #1201 Without the reorder_buffer, MT only persists the first value received for each timestamp. With the re-order buffer we should do the same.
It seems weird to me that the ROB feature would allow one to send older data than what's already received, but cannot handle (or refuses to handle) a point coming in with the same timestamp as what's already received. to me this seems like a more benign variant of the same problem. I agree with you though that we don't have enough consistency, or rather that we don't have insights into all the problems being handled by the ROB. How about we allow the duplicate in the ROB, and do an overwrite in that case, but we emit a metric along the lines of |
consistency!!!! without the ROB, we cant overwrite values. So we shouldnt do it with the ROB. |
I don't agree. I think that duplicate timestamps is different problem, with a less obvious default behavior. What if it's a misconfigured publisher? Should duplicate datapoints that aren't the most recent one be treated any differently? In my mind, the ROB is there for any cases where tiny blips might cause out of order data to make it into kafka. Duplicate data on the other hand is just weird and maybe a particular behavior shouldn't be relied upon? |
I don't see this as a good argument. we're talking about a feature that enables us to handle a case that we can't handle without that feature, so consistency for consistency's sake doesn't make sense to me. Also, Anthony I'm having some trouble understanding your exact goal, you've stated "I am less worried about whether we persist the first value or latest value", "More important is that we increment the out-of-order counter" (i assume here you meant: "when duplicates are received"), "if a user is sending duplicate data and has ROB enabled, we are blind to it." and "consistency!!!! without the ROB, we cant overwrite values. So we shouldnt do it with the ROB.". All 4 of these seem like different goals and even somewhat contradictory. How about this:
|
That sounds fine to me. Why do we need to change |
Every point we receive should either be persisted or we should increment a counter to say we discarded it. For this concern, whether we persist the first received or last received point doesn't matter. The count of points persisted and count discarded will still be the same. I 100% believe that we need to be consistent with persisting the first or last received point for each timestamp. Otherwise it becomes complicated to explain to users what the expected behaviour is, as it depends on whether the series have the ROB enabled. Which can vary between series on the same instance (as the ROB is set via the storage-schemas file). So, yes. Lets do "1" and "2". This is updating the metric names is something @fkaleo can probably do as part of the existing discarded samples change he is already working on |
Lines 463 to 467 in 25c16d3
When currentChunk.Push() fails, it could be either because the ts of the point == the ts of the last point, or because it was older. we don't differentiate these cases, but i suggest we should, in accordance to the differentiation proposed for the ROB metrics |
I will take that issue. Just a remark though: tank.metrics_duplicate will only work for a very specific case which probably will not happen very often (possibly slightly more often in the ROB case as timestamps are aligned on boundaries). Is it going to be a meaningful metrics to the users? Also its name 'tank.metrics_duplicate' might need to make it clear that if a duplicate data point was too old it will not be included in it. |
I think having a metric showing many duplicates were seeing would be quite useful. you're right that the name should be clearer, e.g. include "discarded" |
i think we should be consistent with the metric names, eg always prefix with "discarded" or perhaps even use the form
|
well there's:
We can only pick one, and either one works for me (though i think i have a slight preference for 2)
not everything is in tank though. for some of them the decision to discard is made elsewhere, e.g. in the input plugin. but that's fine, graphite users will have to query for |
Note that it is not really practical to be exhaustive here. e.g. the kafka-mdm input will soon support batches. prometheus is already essentially batched. so we can't just expose a counter of metrics discarded because of e.g. batch decode errors. |
When a datapoint is for a series that has the same timestamp as a point already received, it is handled differently depending on whether the re-orderBuffer is being used.
without re-orderBuffer
Points are discarded and the
tank.metrics_too_old
counter is incremented. In this case, the value for the first datapoint received for a specific timetamp is the one that is kept.with re-orderBuffer
The value in the rob at the specified timestamp is set, regardless of whether an existing value exists or not. In this case, the value for the last point received for a timestamp is the one kept.
We need to update the re-orderBuffer match the behavior used when there is no re-orderBuffer.
https://github.com/grafana/metrictank/blob/master/mdata/reorder_buffer.go#L59-L61
The text was updated successfully, but these errors were encountered: