-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count_values string formatting #8763
Comments
hello, I think this is a breaking change in Prometheus, so not doable as-is in Prometheus 2.x. We would need to think about alternatives, like adding an extra "formatting" parameter for the count_values. I would also need to check what the openmetrics spec for formatting is and what the actual implementation is to see the better solution. Why do you speak about %g but your proposed code does not use %g ? |
Your application could maybe also expose http_apdex_target_seconds with the correct label. |
The "%g" terminology came from some previous bug reports around different bucket le labels in different clients. Before I found the actual code. It's also referenced in the OpenMetrics spec. |
Okay, would not %g render +inf, Nan etc out of the box as we would expect? |
The advantage with using a numeric , instead of a hard coded label, is that I can use max/min to pick if two different versions of the same app happen to provide different values. (I could expose the metrics with both the value and the bucket, I'd actually have to recreate the openmetrics string code in my own app to know how the thing would be rendered). Hopefully it makes sense that my example case here is just an example and that formalising the actually format count_values uses (even if it needs an extra arg or different function), has some value in itself. |
The code in the PR is taken from the prom/common repo so does handle rendering +/-Inf and NaN, and enforces the .0 suffix. |
You could use |
The exact choice of how to pick the highest isn't super relevant (we just use min at present I think). When generating the metric I'd have to generate...
And have to be confident that I'm rendering the le label in the same way that the client library does. Phrased differently... currently the format for the resulting label from |
The original sin here is that we (ab-)use label values (which are strings) as numbers (in this case bucket boundaries). The most striking dissonance here comes from the different formatting conventions in different languages, mostly that Go doesn't append a OpenMetrics tried to address that problem, sadly not by reintroducing sanitation of the format (and, in continuation, keeping open the possibility of a TSDB that can handle those numbers as actual numbers, as Prometheus will do with the new Sparse Histograms), but by requiring the instrumented target to do the "correct" formatting. The latter wouldn't be too bad if the "correct" formatting were actually well defined. But in fact it is not. The OM spec works for integer numbers and certainly covers most cases in practice, but there are numbers that can be correctly represented in up to nine different ways, and the OM spec does not specify which one to pick in those cases. (The usual formatting algorithms and language specs are silent about this. They are all only concerned that the text representation will parse back into exactly the same float, but that isn't enough for our case.) The conclusion is that OM might work in most practical cases, but it is not a proper solution to the problem. (With something like exponential buckets I could see the above described problem to also occur in practice. Needs to be investigated, but my assumption is that this is not purely academical.) Another aspect of the problem is that we have maneuvered Prometheus 2.x into a position where it is hard to get out without significant friction of breaking changes. If you switch Go targets to using OM, all your bucket and quantile timeseries change. If we re-introduced sanitizing, all the Java/Python/... time series would change. And in neither case would the price we pay get us to a proper solution (as formatting of numbers with many decimal places will still not be well defined). We mitigated the situation for With the label value created by |
FWIW, I decided to "bite the bullet" and switch to OpenMetrics for our Go apps, and that is what brought me to this. The decision was based entirely on it being the only way to get exemplar support, and I can imagine a lot of people will likely do the same. It is worth a few days of broken heat maps. I've not seen any discussion of introducing exemplars in the non-OM expfmt (though I get if proto comes back, that's an option). |
I'll crack on with the regexp. |
It feels like this is easily done (openmetrics -> text format):
The other way around (text format -> openmetrics):
|
Cool. Just that it is the other way around, isn't it? The first relabels OM into the old (client_golang) behavior, while the second relabels old (client_golang) behavior into OM. (Note that the old text format didn't specify any particular float format. Prometheus 1.x was sanitizing the floats-as-string anyway.) |
Thanks, I have updated my comment. |
Thanks for the tips. I didn't raise this to fix an unresolvable problem though, if count_values is going to convert numbers to strings, it seems like it should do so in a "known way". I appreciate @beorn7 points, though I would say that there is, so far as I can see, a 1 <-> 1 mapping of the OpenMetrics "%g with trailing .0", and the behaviour should be reproducable by clients, so it seemed like a sensible option. If changing the behaviour is undesirable, it might be worth at least documenting the existing behaviour (as golang %f at least), and possibly mentioning that it is not string matchable against OpenMetrics histogram le labels without regexp. On a side note, purely out of curiosity, would that le relabeling impact exemplar lookup? |
Saying that OM uses %g is incorrect. We are using %g in the text format too: https://github.com/prometheus/common/blob/main/expfmt/text_create.go#L449 |
It should yes. |
I appreciate it does not use %g directly (the PR I've closed included the copied code from client_golang so that the behaviour would match, as mentioned above), the terminology was stuck in my head from reading previous PRs that referred to it as such. It's not terribly relevant to anything if this isn't going ahead. |
What |
It would be worth OM documenting whatever it expects languages to do, probably without reference to "Go %g". |
OM's formatting behavior is documented in their draft spec, and it's even discussing a number of more subtle issues with number formatting. Definitely agree that the
Definitely we'll have exemplars in the new histogram world, too. It just won't be "one per bucket" anymore, so probably a separate section in the protocol. But those are details to flesh out later. |
From what I remember of the code, |
I've just checked it out: It uses Now I'm thinking we might want to add an optional parameter to |
That sounds like a good idea, though I'm not sure if there's literally a sprintf form that would be OM compatible. A reimplementation of a %.. formatter seems ideal but feels like a large job. |
We could use the Go formatter, but add another verb that would format in OpenMetrics style. |
Well today I learned of fmt.Formatter, though there is no obvious way to fall back, so this kind of thing doesn't look super performant. Fun though: |
It's not obvious that the full power of a Printf strings makes that much sense (e.g., there's not much point in multiple values. so maybe it'd be better to just allow something like |
Sounds good to me. We could use |
|
But if we use https://golang.org/pkg/strconv/#FormatFloat , we can stick to the letters allowed there, and there is no |
@beorn7 I can have a go at implementing this, but would it be worth discussing on prometheus-dev? The use of "thing,soemthing" seem like the kind of thing that could prove contentious, and I'd rather not start if it's going to get shot down. |
Yeah, this sounds like a good idea. |
Is prometheus-developers@googlegroups.com the right list? |
yes. |
Just for the record, conversation continued here: |
It seems that the consensus and direction in the mailing list is to introduce %. If someone wants to implement Tristan's suggestion, I would welcome it. |
I didn't get much of a consensus vibe, but such that anything has support beyond myself and @beorn7 , I think a |
I clarified in the mailing list that I agree, indeed. |
A workaround in recording rules (where you can't use metric_relabel_configs) is to use |
Proposal
It would be useful/consistent if count_values used %g formatting, consistent with the le bucket label in histograms. It is useful to be able to assume the two are consistent. This is of practical use when calculating a true apdex as it permits the application to expose a numeric apdex target value, which we can then translate into a bucket name for matching purposes.
The numeric value is useful for establishing which apdex target value to use in the case where a deployment has change the target apdex value.
The technique is documented here:
https://medium.com/@tristan_96324/prometheus-apdex-alerting-d17a065e39d0
but is unreliable, and additional regex stages are likely needed to append a .0 to integer values in the case where %g has been used for the bucket labels, but is not used by count_values.
The text was updated successfully, but these errors were encountered: