Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove delta? #1772

Closed
brian-brazil opened this Issue Jun 29, 2016 · 24 comments

Comments

Projects
None yet
4 participants
@brian-brazil
Copy link
Member

brian-brazil commented Jun 29, 2016

As we come up to 1.0, I'm wondering if we should remove delta.
I don't think I've ever seen a use case for it, and it's acting as an attractive nuisance. Every time someone asks about it on IRC we advise them to use something else (usually deriv or increase).

What do others think of removing it?

@brian-brazil brian-brazil added this to the v1.0.0 milestone Jun 29, 2016

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jun 29, 2016

It's used exactly once in all the rules at SoundCloud. ;)
The case can probably be worked around with deriv, but in that case, the user really meant delta.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jun 29, 2016

I'm for keeping it. We have a complete, easy to understand matrix of variants:

Counter (compensates resets) Gauge (no compensation)
per second rate derive
per interval increase delta
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 29, 2016

What @matthiasr said. Actually, that matrix should probably just be in the docs.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jun 29, 2016

Derive doesn't fit in that matrix, as it's using least squares rather than subtraction.

It's used exactly once in all the rules at SoundCloud. ;)

May I ask what for?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jun 30, 2016

derive is indeed substantially different from the others.

The one SC usage is essentially about a gauge describing a queue length. You want to know if the queue has grown or shrunk over the last 5m. increase would be plainly wrong, and the averaging of deriv is great for trending but not always desired.

A very similar effect could be created with the offset modifier, with the subtle difference that the first value would be the last before the interval starts and not the first after. In the use case above, it might even be the better answer, but the syntax is much more verbose.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jun 30, 2016

You want to know if the queue has grown or shrunk over the last 5m. increase would be plainly wrong, and the averaging of deriv is great for trending but not always desired.

Every time I've seen that use case previously deriv was the right answer, as it's resilient to brief outliers in a way that delta isn't. I'm guessing this is in an alert?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jun 30, 2016

Yes, it's an alert, and the user is probably interested in brief outliers... That's why I think deriv is undesired. However, one could argue a (xxx) - (xxx offset 5m) is closer to what is desired. It would also give integral results.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jun 30, 2016

When I've seen this sort of thing the sign of the delta/deriv was all that was of interest, so I guess I'm thinking of a different use case.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 1, 2016

The alert fires on the sign, but then the human receiving it is interested in the amount, too.

However, the sign of delta and deriv might be different: If the gauge steadily but slowly moves down but then shoots up for a short time, deriv could still be negative while delta will show positive spikes.

I can totally believe that people use delta where they should use deriv, but that's because they have misunderstood, or simply didn't know about deriv. It doesn't mean we can replace delta by deriv in general. The crucial point is if the non-extrapolating and backwards-looking xxx - (xxx offset 5m) is a good or even better replacement for delta(xxx[5m]), which is limited to the given interval and extrapolates.

If we had no delta, I'd probably point people to xxx - (xxx offset 5m) and would ask them to come back to us if that's not sufficient. Now that we already have delta, I'd be cautious to remove it, given that it is in a subtle but substantial way different from the alternatives.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jul 1, 2016

I'd be cautious to remove it, given that it is in a subtle but substantial way different from the alternatives.

The question I'd ask is whether it causes more harm than good. This is practically speaking our last chance to consider whether this function should stick around, so I think we should have a stronger argument than inertia.

I've always found the extrapolation particularly weird too. We could always re-add it again later.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 5, 2016

I'd like to do an RC within this week. Any agreement becoming clearer here?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 5, 2016

My point about inertia was actually the other way round: If we did not have delta, inertia would keep us from introducing it.

Anyway, I vote for keeping it, but by a narrow margin. (If any of you feels strongly in favor of removing it and nobody feels strongly in favor of keeping it, I'm happy to remove.)

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jul 5, 2016

I'm for removing it, but not extremely strongly. It has always seemed like a bit of a wart.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jul 6, 2016

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 10, 2016

I value everything that makes us more lightweight. Having two solutions to one problem is not great either.

Given that we can always add it back later, removing it seems not like a reasonable decision to me. Especially as @beorn7 suggested he might not add it again today.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 10, 2016

Having two solutions to one problem is not great either.

We don't have two solution for one problem. The solutions we have each solve different problems. The differences are subtle at time but could have very noticeable effects.

Given that we can always add it back later, removing it seems not like a reasonable decision to me.

One "not" too many?

Especially as @beorn7 suggested he might not add it again today.

As for any new feature, I would ask for a thorough analysis of use cases.

Since we have the feature already, we cannot know how many are already using it legitimately. That's a rather different situation. Removing a feature and see who screams is not the way I would like to treat my users. I see a few legitimate use cases, I'm just not sure how relevant they are. If that's helpful for the decision, we can open a discussion about them. Currently, we seem to have quite a stalemate, so perhaps that's necessary. Although @fabxc 's vote was given based on the assumption that we have multiple ways of solving the same problem, which is not strictly true.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 13, 2016

Okay, I don't feel too strongly. Can you and Brian please agree and either remove it or close the issue so it's removed from the milestone very soon?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 13, 2016

We chatted a bit about it at the conference. The more I think about it, the stronger I lean towards keeping it. Brian is more like "let's remove it, and if somebody comes up with a legitimate use-case that is broken now, let's just quickly add it back". I understand the concerns about the potential for confusion, but I believe having the functionality is worth it. I accept that we weigh those aspects differently. Not sure how to resolve. I would really like to keep it.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jul 13, 2016

I'm not a fan of having multiple subtly different ways of doing things, as that leads to confusion and support overhead in the long run. I've yet to see any use case that's not largely (and arguably better) covered by the offset or deriv approaches so I still think it should be removed.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 13, 2016

This is the use case:

I am monitoring a work queue via a gauge telling me the length of the queue. I want to know the progress I have made working through my backlog over the last hour.

  • If I use deriv, I get linear trending, which gives me a very different answer if the queue length has a highly non-linear development (which is pretty common). deriv could very well tell me my backlog has grown over the last hour if it has in fact shrunk (but only suddenly at the end of the interval).
  • If I use offset, I don't get any result if the gauge only sprang into existence less than an hour ago.
@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented Jul 13, 2016

deriv could very well tell me my backlog has grown over the last hour if it has in fact shrunk (but only suddenly at the end of the interval).

Hmm. I'd argue that a brief dip in backlog isn't necessarily indicative, and having to wait a few minutes for the trend to be confirmed is okay. I'm not convinced that the difference between deriv and delta is significant here in practice and that you'd make notable different decisions based on it.

If I use offset, I don't get any result if the gauge only sprang into existence less than an hour ago.

I'm not sure this is a problem in practice. A backlog in a brand new queue seems like a niche use case, as queues tend not to be created too often.

There's a more general question here of how the various range vector functions should behave in the face of incomplete data (beyond the rate discussions we've already had) so it'd not implausible that another solution that doesn't have this particular problem may arise as more advanced time-related functions are added.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Jul 13, 2016

To flesh the use-case out a bit:

  • My queue processor has crashed. I restart it.
  • It comes up with a huge backlog.
  • I want to see the progress it makes soon. No problem with a custom query. But that dashboard that displays queue length change over the last hour displays nothing for an hour with offset. I'd say that's a common and valid use case, and one of those cases where you need your monitoring most.
  • If I had a constantly growing backlog for a while and now do something that unclogs the pipes, and suddenly everything works smoothly, I again want to see that soon. The averaging of deriv is really confusing here, especially if I didn't want an average in the first place.

Brian and I discussed a bit longer in person, but it looks we weigh the different pros and cons differently, so that we continue to come to opposing conclusions. At least we also agreed that there are bigger battles to fight elsewhere, and we shouldn't spend too much time. Anybody up for serving as a convincing tie breaker?

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 14, 2016

Okay, this is blocking us for 1.0 so I'm making the call here and close.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.