Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

duplicate datapoints are handled differently depending on whether re-orderBuffer is used #1201

Closed
woodsaj opened this issue Jan 7, 2019 · 14 comments · Fixed by #1288
Closed
Assignees
Milestone

Comments

@woodsaj
Copy link
Member

woodsaj commented Jan 7, 2019

When a datapoint is for a series that has the same timestamp as a point already received, it is handled differently depending on whether the re-orderBuffer is being used.

without re-orderBuffer

Points are discarded and the tank.metrics_too_old counter is incremented. In this case, the value for the first datapoint received for a specific timetamp is the one that is kept.

with re-orderBuffer

The value in the rob at the specified timestamp is set, regardless of whether an existing value exists or not. In this case, the value for the last point received for a timestamp is the one kept.

We need to update the re-orderBuffer match the behavior used when there is no re-orderBuffer.
https://github.com/grafana/metrictank/blob/master/mdata/reorder_buffer.go#L59-L61

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 29, 2019

You're saying even with ROB enabled, you want to retain the older value even if an update comes in for that point? Why?

@woodsaj
Copy link
Member Author

woodsaj commented Mar 29, 2019

We need to update the re-orderBuffer match the behavior used when there is no re-orderBuffer.

I am less worried about whether we persist the first value or latest value. More important is that we increment the out-of-order counter. Currently if a user is sending duplicate data and has ROB enabled, we are blind to it.

woodsaj added a commit that referenced this issue Apr 10, 2019
fixes #1201

Without the reorder_buffer, MT only persists the first value received for each timestamp. With the re-order buffer we should do the same.
Dieterbe pushed a commit that referenced this issue Apr 10, 2019
fixes #1201

Without the reorder_buffer, MT only persists the first value received for each timestamp. With the re-order buffer we should do the same.
@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 10, 2019

  • too_old means too old, thus we couldn't handle it. if we could handle it (e.g. an overwrite). we should not use this metric.
  • if we drop the data for another reason than being too old, e.g. because it is a duplicate of the most current point, we should not abuse the "too_old" counter for this. this is an objection with handle duplicate datapoints in a consistent manner #1276

It seems weird to me that the ROB feature would allow one to send older data than what's already received, but cannot handle (or refuses to handle) a point coming in with the same timestamp as what's already received. to me this seems like a more benign variant of the same problem.
IOW I think the choice-of-which-point-we-persist behavior may be different between the ROB-enabled vs ROB-disabled case, precisely because the ROB enables us to handle that problem nicely.

I agree with you though that we don't have enough consistency, or rather that we don't have insights into all the problems being handled by the ROB.

How about we allow the duplicate in the ROB, and do an overwrite in that case, but we emit a metric along the lines of tank.metrics_overwritten ? That way the benefits of the ROB are exposed as both tank.metrics_reordered and tank.metrics_overwritten ? And we keep tank.metrics_too_old to always mean too old.

@woodsaj
Copy link
Member Author

woodsaj commented Apr 10, 2019

consistency!!!!

without the ROB, we cant overwrite values. So we shouldnt do it with the ROB.

@shanson7
Copy link
Collaborator

this seems like a more benign variant of the same problem

I don't agree. I think that duplicate timestamps is different problem, with a less obvious default behavior. What if it's a misconfigured publisher? Should duplicate datapoints that aren't the most recent one be treated any differently? In my mind, the ROB is there for any cases where tiny blips might cause out of order data to make it into kafka. Duplicate data on the other hand is just weird and maybe a particular behavior shouldn't be relied upon?

@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 10, 2019

consistency!!!! without the ROB, we cant overwrite values. So we shouldnt do it with the ROB.

I don't see this as a good argument. we're talking about a feature that enables us to handle a case that we can't handle without that feature, so consistency for consistency's sake doesn't make sense to me.
I am however open to discuss this based on the merits or non-merits of the functionality itself. Sean brings up a good point. going back in time vs sending the same value twice, is typically caused by very different things.
I'm open to concrete arguments about why accepting overwrites is bad.
I was also thinking it seems useful to be able to do updates of points, but the ROB doesn't seem like the right solution for this because by the time you realize you want to do an update, time (and the data in the ROB) probably has moved on significantly so I think this was an invalid argument.

Also, Anthony I'm having some trouble understanding your exact goal, you've stated "I am less worried about whether we persist the first value or latest value", "More important is that we increment the out-of-order counter" (i assume here you meant: "when duplicates are received"), "if a user is sending duplicate data and has ROB enabled, we are blind to it." and "consistency!!!! without the ROB, we cant overwrite values. So we shouldnt do it with the ROB.". All 4 of these seem like different goals and even somewhat contradictory.

How about this:

  1. we reject all dupes, always (whether for the latest point or not, to Sean's question) (handle duplicate datapoints in a consistent manner #1276 does this)
  2. we increment a different metric. not tank.metrics_too_old but tank.metrics_duplicate or something? note that we will need to make a change in AggMetric.add() as well to differentiate these two cases, but that seems easily done

@shanson7
Copy link
Collaborator

That sounds fine to me. Why do we need to change AggMetric.add?

@woodsaj
Copy link
Member Author

woodsaj commented Apr 10, 2019

Also, Anthony I'm having some trouble understanding your exact goal,

Every point we receive should either be persisted or we should increment a counter to say we discarded it. For this concern, whether we persist the first received or last received point doesn't matter. The count of points persisted and count discarded will still be the same.

I 100% believe that we need to be consistent with persisting the first or last received point for each timestamp. Otherwise it becomes complicated to explain to users what the expected behaviour is, as it depends on whether the series have the ROB enabled. Which can vary between series on the same instance (as the ROB is set via the storage-schemas file).
As the only option when not using the ROB is to persist the first point, that is the approach we need to use every where.

So, yes. Lets do "1" and "2". This is updating the metric names is something @fkaleo can probably do as part of the existing discarded samples change he is already working on

@Dieterbe
Copy link
Contributor

That sounds fine to me. Why do we need to change AggMetric.add?

if err := currentChunk.Push(ts, val); err != nil {
log.Debugf("AM: failed to add metric to chunk for %s. %s", a.key, err)
metricsTooOld.Inc()
return
}

When currentChunk.Push() fails, it could be either because the ts of the point == the ts of the last point, or because it was older. we don't differentiate these cases, but i suggest we should, in accordance to the differentiation proposed for the ROB metrics

@fkaleo
Copy link
Contributor

fkaleo commented Apr 11, 2019

I will take that issue. Just a remark though: tank.metrics_duplicate will only work for a very specific case which probably will not happen very often (possibly slightly more often in the ROB case as timestamps are aligned on boundaries). Is it going to be a meaningful metrics to the users? Also its name 'tank.metrics_duplicate' might need to make it clear that if a duplicate data point was too old it will not be included in it.

@fkaleo fkaleo self-assigned this Apr 11, 2019
@fkaleo fkaleo added this to the vnext milestone Apr 11, 2019
@Dieterbe
Copy link
Contributor

I think having a metric showing many duplicates were seeing would be quite useful. you're right that the name should be clearer, e.g. include "discarded"

@woodsaj
Copy link
Member Author

woodsaj commented Apr 11, 2019

i think we should be consistent with the metric names, eg always prefix with "discarded" or perhaps even use the form tank.discarded.<reason>

  • discarded.duplicate
  • discarded.to_old
  • discarded.invalid_ts
  • discarded.invalid_name
  • discarded.invalid_tags
    etc...

@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 15, 2019

i think we should be consistent with the metric names

well there's:

  1. consistent between exposed via carbon and exposed via prometheus
  2. consistent between how we have been exposing it over carbon so far and how we will expose it via carbon in the future.

We can only pick one, and either one works for me (though i think i have a slight preference for 2)

or perhaps even use the form tank.discarded.

not everything is in tank though. for some of them the decision to discard is made elsewhere, e.g. in the input plugin. but that's fine, graphite users will have to query for X.discarded.* where X is *, *.* and *.*.* since in input plugins we break it down by 1/2 extra nodes

fkaleo added a commit that referenced this issue Apr 17, 2019
- New Carbon metric 'tank.discarded.new_value_for_timestamp'
- Prometheus metric 'discarded_samples_total' has a new reason 'new-value-for-timestamp'

Fixes #1201, #1202, #1203
@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 18, 2019

woodsaj: Every point we receive should either be persisted or we should increment a counter to say we discarded it
...
Dieterbe: not everything is in tank though. for some of them the decision to discard is made elsewhere, e.g. in the input plugin. but that's fine, graphite users will have to query for X.discarded.* where X is , . and .. since in input plugins we break it down by 1/2 extra nodes

Note that it is not really practical to be exhaustive here. e.g. the kafka-mdm input will soon support batches. prometheus is already essentially batched. so we can't just expose a counter of metrics discarded because of e.g. batch decode errors.
I think this is fine, and it all depends what exactly "receive" means, e.g. we can just rephrase woodsaj's comment as "every point that made it past the data decoding step" should either be persisted or we should increment a counter to say we discarded it. in practical terms, the input plugins to the decoding and then hand off to the DefaultHandler, so that's where this notion should start.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.