Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse created timestamps from OpenMetrics-Text format #13506

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ArthurSens
Copy link
Member

@ArthurSens ArthurSens commented Jan 31, 2024

Fix #12980

The already existing feature-flag 'created-timestamp-zero-ingestion' is extended to also cover OpenMetrics-Text format.
Once enabled, _created lines, that before were treated as a whole new timeseries, will be appended as a synthetic zero to the timeseries they are related to.

@ArthurSens ArthurSens marked this pull request as draft February 2, 2024 22:11
@ArthurSens ArthurSens changed the title [WIP] Implement Created Timestamp parsing for OM text parser Parse created timestamps from OpenMetrics-Text format Feb 4, 2024
@ArthurSens ArthurSens marked this pull request as ready for review February 4, 2024 15:19
// The difference between an usual metric and the created timestamp is that for
// the created timestamp, the metric name ends with "_created".
//
// Created timestamps are always exposed right after the metric they are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right after

Is that part of the OpenMetrics spec? I think in practice I have only seen _created as the last metric in a MetricFamily, but I haven't read anything in the spec that requires that. It does have to be with the other samples, but e.g. _created before _total might be possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the example in their specification, I see _created lines for each metric point and not MetricFamily.

In the section Overall Structure, we have one created line for acme_http_router_request_seconds_created{path="/api/v1",method="GET"} and another for acme_http_router_request_seconds_created{path="/api/v2",method="POST"} (notice the label difference while using the same name)

but e.g. _created before _total might be possible

Do you mean for the same MetricPoint? All the examples I've seen have the _created line as the last line of a MetricPoint, but I couldn't find anything in the spec saying that this order is a MUST. I'm not sure if we should expect something different 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, I meant MetricPoint, sorry!

I agree that all of the samples show _created as the last line, but that doesn't seem enforced so I think it is possible and valid OpenMetrics for _created to come first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭😭😭😭

That makes things so much more complicated 🥲

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be worth getting an opinion from someone more involved in OpenMetrics than myself as I agree, it would be nice not to have to worry about the order!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not more involved in OpenMetrics, but I think it's totally fine that an experimental feature relies on this accidental property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the order of _created is not guaranteed to be the last. But also I don't understand you worries @ArthurSens -- if it's last we already have the problem of having to "buffer" previous samples related to the _created metric which is potentially at the end. So it does not matter if it's end, middle or first. We still have to buffer samples (and corresponding series, which should be the same) no matter what, so we can add potential 0 sample OR in future add it to metadata.

Right now CreatedTimestamp() returns number ONLY when parser hits _created metric, which is clearly not enough for OM as we already written sample for related series no? We have to change bit this logic to e.g. buffer series in (OpenMetrics parsing only).

To me that means e.g. having different append function all together for OM, and that's fine. It should motivate, heavy changes to newer OM protocol to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure yet(doing research at the moment), but I believe all official client libraries that offer OpenMetrics support today expose _created lines after the related metric. If I can confirm this, and seeing that this feature is experimental, could we rely on this order for this PR?

Expecting different order for the _created lines will make the code a lot more complex, we could make reviews easier if we split this up in different work streams :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but my point stays Arthur, how it makes the code more complex if it's before , in the middle or after the line? The main challenge is that you have to keep buffering samples somewhere BEFORE appending until you know that _created line. This is because we try to be lean and stream parsing line by line.

This is the main complexity to the code, not searching where that line is or something 🤔 I also don't see that buffering in this PR, can you elaborate a planned algorithm for parsing with OM text then you envision (with and without assumption of _created line being last)? Maybe in pseudocode here?

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few stylistic thoughts.
Note there are some merge clashes , also a CI error.

model/textparse/interface.go Outdated Show resolved Hide resolved
scrape/scrape.go Outdated Show resolved Hide resolved
scrape/scrape.go Outdated Show resolved Hide resolved
if et, err = p.Next(); err != nil {
if errors.Is(err, io.EOF) {
err = nil
if !skipCallNextEntry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would be simpler to re-arrange so Next has always been called at the start of the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see how that could be done... (or maybe I couldn't understand what you mean)

For OM-text, we need to parse 2 lines to identify created timestamps and the parser can't go back once a line is parsed, therefore we need to skip parsing yet another line if we haven't found a _created line in the last call 🤔

@ArthurSens
Copy link
Member Author

whooops, a bad push accidentally closed the PR 😅

@ArthurSens
Copy link
Member Author

ArthurSens commented Mar 2, 2024

I can see that the changes I've made have broken the fuzz testing, but I do not understand how I'm supposed to fix it 🤔, could someone point me in the right direction?

And the go tests are passing locally... not sure if this one is flaky or not 😬

@SuperQ
Copy link
Member

SuperQ commented Mar 4, 2024

I think the Go test is currently flakey, I'm seeing various build failures on other PRs.

@SuperQ
Copy link
Member

SuperQ commented Mar 7, 2024

@ArthurSens

# github.com/prometheus/prometheus/promql
./fuzz.go:64:16: assignment mismatch: 2 variables but textparse.New returns 3 values

Looks like you just need to update /promql/fuzz.go.

func fuzzParseMetricWithContentType(in []byte, contentType string) int {
  p, _, warning := textparse.New(in, contentType, false, symbolTable)

@ArthurSens
Copy link
Member Author

@ArthurSens

# github.com/prometheus/prometheus/promql
./fuzz.go:64:16: assignment mismatch: 2 variables but textparse.New returns 3 values

Looks like you just need to update /promql/fuzz.go.

func fuzzParseMetricWithContentType(in []byte, contentType string) int {
  p, _, warning := textparse.New(in, contentType, false, symbolTable)

Oh wow, not sure how my IDE didn't catch that 🤔
Thanks for that!

@ArthurSens
Copy link
Member Author

Conflicts resolved again, reviews are appreciated 😅

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! Thanks! Sorry for delays in reviews.

However, something is off with the current flow for OM text, shouldn't we buffer appends or potential series related to _created metric?

Let's figure this out https://github.com/prometheus/prometheus/pull/13506/files#r1506306215

cmd/prometheus/main.go Show resolved Hide resolved
model/textparse/interface.go Outdated Show resolved Hide resolved
model/textparse/openmetricsparse.go Outdated Show resolved Hide resolved
@@ -64,7 +64,10 @@ _metric_starting_with_underscore 1
testmetric{_label_starting_with_underscore="foo"} 1
testmetric{label="\"bar\""} 1
# TYPE foo counter
foo_total 17.0 1520879607.789 # {id="counter-test"} 5`
foo_total 17.0 1520879607.789 # {id="counter-test"} 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change order of _created to explictly show (and test) how it will be handled.

// CT is an experimental feature. For now, we don't need to fail the
// scrape on errors updating the created timestamp, log debug.
level.Debug(sl.l).Log("msg", "Error when appending CT in scrape loop", "series", string(met), "ct", *ctMs, "t", t, "err", err)
if sl.enableCTZeroIngestion {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest burning that _created metric in fire in this mode (not ingest it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get it 😬 what do you mean with "in this mode"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are (trying) to do this already with that skip line? (in this mode == when CT zero ingestion, so CT handling is enabled. if enabled we should probably kill _created metrics too once used).

scrape/scrape.go Outdated Show resolved Hide resolved
// The difference between an usual metric and the created timestamp is that for
// the created timestamp, the metric name ends with "_created".
//
// Created timestamps are always exposed right after the metric they are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the order of _created is not guaranteed to be the last. But also I don't understand you worries @ArthurSens -- if it's last we already have the problem of having to "buffer" previous samples related to the _created metric which is potentially at the end. So it does not matter if it's end, middle or first. We still have to buffer samples (and corresponding series, which should be the same) no matter what, so we can add potential 0 sample OR in future add it to metadata.

Right now CreatedTimestamp() returns number ONLY when parser hits _created metric, which is clearly not enough for OM as we already written sample for related series no? We have to change bit this logic to e.g. buffer series in (OpenMetrics parsing only).

To me that means e.g. having different append function all together for OM, and that's fine. It should motivate, heavy changes to newer OM protocol to fix this.

@ArthurSens ArthurSens force-pushed the om-text-ctparser branch 2 times, most recently from 56772d3 to f189fd0 Compare April 5, 2024 15:02
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse Created timestamps from OpenMetrics text format
5 participants