Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring metrics endpoint duplicate metric name handling #6404

Closed
ddewaele opened this issue Jul 16, 2016 · 17 comments
Closed

Spring metrics endpoint duplicate metric name handling #6404

ddewaele opened this issue Jul 16, 2016 · 17 comments
Labels
status: invalid An issue that we don't feel is valid

Comments

@ddewaele
Copy link

Currently the metrics endpoint stores all metrics in a hashmap, thus avoiding duplicates but potentially overwriting important metric values.

For example the following Zuul Monitors expose the following monitors :

MonitorConfig{name=zuul.filter-FormBodyWrapperFilter, tags=filtertype=pre,status=SKIPPED, policy=DefaultPublishingPolicy}
Entry{accessTime=1468705353715, value=StepCounter{config=MonitorConfig{name=zuul.filter-FormBodyWrapperFilter, tags=filtertype=pre,status=SUCCESS,NORMALIZED, policy=DefaultPublishingPolicy}, count=StepLong{init=0, data=[0, 0, 0, 2], lastPollTime=[1468705538951, 0], lastInitPos=[24478425, 146870535]}}}


MonitorConfig{name=zuul.filter-FormBodyWrapperFilter, tags=filtertype=pre,status=SUCCESS, policy=DefaultPublishingPolicy}
Entry{accessTime=1468705538950, value=StepCounter{config=MonitorConfig{name=zuul.filter-FormBodyWrapperFilter, tags=filtertype=pre,status=SKIPPED,NORMALIZED, policy=DefaultPublishingPolicy}, count=StepLong{init=0, data=[8, 12, 9, 3], lastPollTime=[1468705538951, 0], lastInitPos=[24478425, 146870553]}}}

That will results in the following metrics:

Metric [name=normalized.servo.zuul.filter-formbodywrapperfilter, value=0.0, timestamp=Sat Jul 16 23:44:37 CEST 2016]
Metric [name=normalized.servo.zuul.filter-formbodywrapperfilter, value=0.21666666666666667, timestamp=Sat Jul 16 23:44:37 CEST 2016]

Only one will end up in the metrics endpoint.

Not sure if this is really a spring boot issue, as I can imagine that the metrics provider should guarantee uniqueness in terms of names, but the metrics endpoint is silently ignoring (by overwriting) some potentially important metrics this way.

@snicoll
Copy link
Member

snicoll commented Jul 17, 2016

I am not sure I follow. Spring Boot currently supports counters and gauges. That metric is from the latter category where we record one value. If we have two events from the same key, we'll record 0.0 and then 0.21666666666666667 but we'll display the latest value we know about. If you look at other metrics (like the response time of a given path), this is exactly what happens. Feel free to reopen if I missed anything.

@snicoll snicoll closed this as completed Jul 17, 2016
@snicoll snicoll added status: invalid An issue that we don't feel is valid and removed status: waiting-for-triage An issue we've not yet triaged labels Jul 17, 2016
@ddewaele
Copy link
Author

ddewaele commented Jul 18, 2016

I was just wondering how a system should behave when if finds 2 metrics with the same name inside the publicMetrics collection.

(I'm using a prometheus spring boot client lib that assumes that the metric names coming out of that collection are unique. Is that a correct assumption or should it remove the duplicates (by for example storing the collection in a map)

I also wasn't aware this was a gauge. I assumed it was a counter (as shown here)

StepCounter{config=MonitorConfig{name=zuul.filter-FormBodyWrapperFilter, tags=filtertype=pre,status=SKIPPED,NORMALIZED, policy=DefaultPublishingPolicy}, count=StepLong{init=0, data=[0, 0, 0, 32], lastPollTime=[1468824196802, 0], lastInitPos=[24480403, 146882392]}}

This particular metric is coming from Zuul, so a Spring Cloud library is responsible for converting this into a Spring metric. I noticed that the ServoMetricReader is converting this counter into a metric :

  • Original name : zuul.filter-FormBodyWrapperFilter
  • Original tags : SmallTagMap{filtertype=pre,status=SKIPPED,NORMALIZED}

The ServoMetricReader converts this Into normalized.servo.zuul.filter-formbodywrapperfilter (it takes into account some tags, but not the ones specified here

I was just wondering if

  • This particular metric reader shouldn't take into account the tags, and add them to the name to make it unique.
  • If it is ok to have metrics in the publicMetrics collection with the same name at a particular point in time (this is what happens here).

@dsyer
Copy link
Member

dsyer commented Jul 18, 2016

To answer your last 2 questions:

The issue you have is with a particular Spring Cloud adapter (from servo to Spring Boot metrics), so it might be worth raising the issue there, and proposing a fix if you have one.

PublicMetrics is supposed to be a view over all the possible metrics sources. If you have knowledge of those sources that means you know how to disambiguate ones with similar names, then you should encode hat knowledge in your implementation. In any case. I don't think there is anything to do in Spring Boot here.

@brian-brazil
Copy link

Can you please clarify what your intended semantics are?

Is the status here part of the identity of the metric, or is it merely an annotation that should be ignored and thrown away?

If it's part of the identity, why are metrics being incorrectly and silently thrown away?

If it's an annotation, why do you permit users to have duplicate metrics and what is the correct algorithm to select the correct duplicate?

@dsyer
Copy link
Member

dsyer commented Jan 24, 2017

You're asking the question in the wrong place (as already indicated above). If you want to include the servo status in the metric name, please open an issue or maybe a PR in spring cloud netflix.

@brian-brazil
Copy link

My question is about Spring itself - what do you expect providers to put here? How do you expect consumers to process this data?

I'm the maintainer of the Prometheus Java client library, and this came up as a problem when integrating with Spring. Telling us to go talk to another team doesn't really help, as we don't know exactly what is going wrong here. We need to solve this generally and that means understanding the intended semantics of Spring metrics.

@dsyer
Copy link
Member

dsyer commented Jan 24, 2017

The semantics of Spring Boot metrics are simple, and actually pretty obvious from the Java APIs - a metric has a name, a value and a timestamp. That's it. It's intentionally very simple. If people want to attach semantics to particular formats in the metric names, that's understandable, and normal (we copied graphite, basically, and there's a huge community of people doing the same over there), but it isn't contained in the Spring Boot API. You actually do need to talk to the people who designed individual metrics to understand what they intended (and it looks like there might be a bug in the servo metrics, but this isn't the right place to fix that). Does that help?

@brian-brazil
Copy link

So to paraphrase, that the servo people have these duplicate metrics is likely a misunderstanding on their end and not considered to be valid metrics by Spring?
Is this expectation documented anywhere, or are there any exceptions Spring should be throwing due to this invalid data?

@brian-brazil
Copy link

we copied graphite, basically, and there's a huge community of people doing the same over there

To clarify a bit here, the issue as hand is whether Spring is actually following the Graphite data model. Thus far we'd been presuming that you are. This issue is about a case which indicates that either a) this is not the graphite data model or b) there's a bug/design flaw somewhere.

@dsyer
Copy link
Member

dsyer commented Jan 24, 2017

not considered to be valid metrics by Spring?

Anything with a name, value and timestamp is valid, so that's not completely accurate, is it? I think you get it.

@brian-brazil
Copy link

Anything with a name, value and timestamp is valid, so that's not completely accurate, is it?

In the example in the original post we have two metrics with identical names and timestamps but different values. That can't be valid in a metrics-based instrumentation system.

If Spring considers this okay because of the differing tags, then you're following the Prometheus data model. We'll need to update our code to take advantage of that and the spring boot JSON metrics output needs to be fixed as dropping this vital information on the floor would be a major bug.

If Spring considers this okay, by not forbidding it though code or convention, then that's a fundamental flaw and we should try to turn it into one of the other cases.

If Spring doesn't consider this okay, then this is a provider bug combined with a flaw that allows such invalid state to be represented (i.e. PublicMetrics uses a Collection rather than a Map). The docs/sanity checks need improving to avoid confusion in the future and the code we have on the Prometheus end is already correct.

Where duplicate entries would be valid would be an event logging based system, as multiple distinct events with the same name can happen in the same second. I'm fairly sure you didn't intend to build such a system and it's plausible that providers have gotten confused about this, so better docs/sanity checks will also cover that. I'm guessing this is what happened, combined with the 3rd case.

@jzampieron
Copy link

Just to further what @brian-brazil is saying...

There is clearly a disconnect here between what the /metrics end point is showing and what's actually input to the metrics collection system. (In my case prometheus).

At least to the casual user data (myself) data is being overwritten, or omitted and/or then magically created.

If that's how the actuator collector works, that's fine, but it should be documented that the inputs need to ensure proper namespace separation to avoid data corruption.

Let's be clear here, we're basically talking about the potential for silent data corruption in our monitoring system.

If this is a real possibility, then the documentation needs to be updated to clearly define the PreConditions for use. (And it would be good to log warnings as possible).

Agreed that there might also need to be a spring-cloud-netflix update depending on the decision about semantics (and corresponding documentation/code updates).

@dsyer
Copy link
Member

dsyer commented Jan 24, 2017

I still don't see why anything needs to change in Spring Boot (which doesn't rule out ever making changes, I'm just talking about the need with the current codebase). The metrics have a name, and they don't have tags or annotations or dimensions or anything fancy. Why would we need to document something that isn't there?

@jzampieron
Copy link

@dsyer Thanks. I think you just said it.
That's the specification: Folks that write metrics plugins need to know that names must be unique otherwise data will be overwritten.

Might be worth putting that note somewhere in the docs.

@brian-brazil : I think this means that metrics in spring boot aren't intended to be per-se dimensional and therefore uniqueness needs to be determined by name and not the name + tags.

Perhaps then defining the actual name as something like: "counter_xxx{ image = "foo" }" is the appropriate semantics?

@brian-brazil
Copy link

The metrics have a name, and they don't have tags or annotations or dimensions or anything fancy. Why would we need to document something that isn't there?

There's an implicit assumption in the current code that isn't documented, and at least two providers are violating this assumption. Thus documentation is appropriate to help clarify this situation, so we can then go to these providers, point at the clarified docs and get the bugs fixed on their end.

More specifically, we need to clearly define the constraints on what can go in a PublicMetrics. Right now there are no constraints that I can find documented on PublicMetrics.

Perhaps then defining the actual name as something like: "counter_xxx{ image = "foo" }" is the appropriate semantics?

That's a completely separate discussion. Given this is Graphite-like I'd be very wary of making it look Prometheus-like, as I can see that creating much breakage. Better to use the Prometheus client library directly if you want to go down that route, rather than trying to hack it into Spring.

@dsyer
Copy link
Member

dsyer commented Jan 24, 2017

Right now there are no constraints that I can find documented on PublicMetrics.

What kind of constraints would you like? I'm not really getting the point of this discussion yet. It is what it is, and the existing APIs are pretty simple to understand (although I admit maybe we could be more explicit about which ones we consider public).

@brian-brazil
Copy link

What kind of constraints would you like?

That the name must be unique within a PublicMetrics collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

6 participants