Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus: Implement Streaming JSON Parser #48477

Merged
merged 34 commits into from
May 13, 2022

Conversation

toddtreece
Copy link
Member

@toddtreece toddtreece commented Apr 29, 2022

What this PR does / why we need it:
This is the second step in a series of PRs that will use the prometheus JSON streaming parser added by #44520.

The current plan is:

  1. Move existing logic to buffered package (Prometheus: Move existing query logic to new buffered package #48668)
  2. Add new feature flag and querydata package that uses a standard HTTP client instead of the prometheus go client and the streaming response parser (this PR)
  3. Remove buffered package, remove feature flag, and rename streaming package (future PR once things have stabilized)
name     old time/op    new time/op    delta
Json-32    79.1ms ±13%    36.8ms ± 5%  -53.51%  (p=0.000 n=10+9)

name     old alloc/op   new alloc/op   delta
Json-32    41.3MB ± 0%    19.7MB ± 0%  -52.29%  (p=0.000 n=10+10)

name     old allocs/op  new allocs/op  delta
Json-32      754k ± 0%      389k ± 0%  -48.40%  (p=0.000 n=8+8)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

@ryantxu
Copy link
Member

ryantxu commented Apr 29, 2022

Nice -- this will be required for the new sparse heatmap coming down the pike.

Can you implement this with a feature flag, and keep both approaches possible while we evaluate and gain confidence? I am sure we will learn something when deploying to ops :)

@toddtreece toddtreece force-pushed the toddtreece/prom-streaming-matrix-vector branch from d71de33 to 495e414 Compare April 29, 2022 16:18
@toddtreece toddtreece force-pushed the toddtreece/prom-streaming-matrix-vector branch from 224eb09 to 126fc2a Compare May 2, 2022 15:43
return nil, err
}

u.Path = path.Join(u.Path, "api/v1/query")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed, or should we overwrite the path from the user supplied URL string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough to say 🤷‍♂️

@ryantxu
Copy link
Member

ryantxu commented May 3, 2022

Quick scan/review -- are the changes to "buffered" and "query" folders, really just renaming the folder, and no meaningful changes?

Is this really just adding "streaming" as an alternative client?

I wonder if there is a structure we could use that makes the no-feature-flagged path feel less like a big change 🤔. The folder names "buffered" and "streaming" only make sense in contrast to one another -- eventually we do not want the name anywhere (i think)

Maybe we should do an independent PR that makes whatever changes required to the "buffered" client so that swapping the implemenation is easy. In this version, it is hard to tell that the feature flag only affects the "streaming" branch :)

@toddtreece
Copy link
Member Author

toddtreece commented May 3, 2022

are the changes to "buffered" and "query" folders, really just renaming the folder, and no meaningful changes?

@ryantxu buffered is a copy of the original code for building the request, executing the query, and parsing the response. i copied it into a separate package so that there was less chance of introducing a regression. my thinking was that it should allow us to delete that folder and rename streaming once we are ready to remove the feature flag.

query is part of the refactor for streaming. although streaming and query contain some of the same chunks of code as buffered, the structure has changed quite a bit due to the different pre and post processing requirements.

"buffered" and "streaming" only make sense in contrast to one another -- eventually we do not want the name anywhere (i think)

agreed, my thought was that they would be temporary until the feature flag is removed

should do an independent PR that makes whatever changes required to the "buffered" client so that swapping the implemenation is easy.

that's a good idea to make it easier to review. i'll hold off on that until i get your feedback on my responses above

@toddtreece toddtreece force-pushed the toddtreece/prom-streaming-matrix-vector branch from 42810bf to e00bbb7 Compare May 4, 2022 02:46
@toddtreece toddtreece changed the base branch from main to toddtreece/prom-buffered May 4, 2022 02:47
@toddtreece toddtreece added this to the 9.0.0 milestone May 4, 2022
@toddtreece toddtreece added the no-backport Skip backport of PR label May 4, 2022
@toddtreece toddtreece marked this pull request as ready for review May 4, 2022 11:43
@toddtreece toddtreece requested a review from gabor May 4, 2022 11:44
@toddtreece
Copy link
Member Author

@gabor could you take a look at this approach since you have worked with the streaming parser as well?

@toddtreece toddtreece force-pushed the toddtreece/prom-streaming-matrix-vector branch from b5733bc to 115b6f1 Compare May 12, 2022 16:53
@toddtreece toddtreece force-pushed the toddtreece/prom-streaming-matrix-vector branch from 115b6f1 to b5cc5c2 Compare May 12, 2022 17:39
@toddtreece
Copy link
Member Author

@gabor the changes from #48698 were removed on accident during a rebase. the feature flag should be back now.

i think the only thing remaining is to figure out what to do about []float64 vs []*float64 (and NaN vs nil)

@gabor
Copy link
Contributor

gabor commented May 13, 2022

@toddtreece

i think the only thing remaining is to figure out what to do about []float64 vs []*float64 (and NaN vs nil)

i see 3 possible approaches:

  1. we do the NaN->null in javascript.
    • this means we can keep the go-code simple. the disadvantages:
      • in javascript we do extra work. but maybe we can make that one very fast (for cases when no NaN ?)
      • technically we are changing the dataframe-format for alerting-queries. but maybe we can live with that (after all, before grafana8.3.0 the alerting-response did not have NaN->null either)
  2. we do the []float64 -> []*float64 and NaN->nil somewhere around https://github.com/grafana/grafana/blob/toddtreece/prom-streaming-matrix-vector/pkg/tsdb/prometheus/querydata/response.go#L32
    • the advantage is that we can keep the current dataframe-format, and we can keep the converter codebase simple
    • the disadvantage is that we do extra work and also will use the pointer-arrays, which are probably less efficient than value-arrays
    • (maybe we can do something where first we check the values, and if there is no NaN we just keep the data as it is?).. again, this causes a little alerting-mode-backward-incompatibility, and also means technically our dataframe-format changes a little based on content, but maybe that's OK?)
  3. we add a boolean-flag to ReadPrometheusStyleResult (
    func ReadPrometheusStyleResult(iter *jsoniter.Iterator) *backend.DataResponse {
    ) named doWhatGrafanaDoesForPrometheus, and if it's false, it works like now, and if true, it does the whole NaN->null and []float64 -> []*float64 there.
    • the advantage is that we can keep the current dataframe-format, and it will be faster than approach [2]
    • the disadvantage is more complicated code, also, very ugly 😿

i like [1] the most, but it is the most risky (mostly because of the alerting-format change). if [1] is not possible, i vote for [2] .. probably.. not 100% sure.

what do you think?

@toddtreece
Copy link
Member Author

@gabor check out this example on play.grafana.com

image

it seems like NaN behaves the same as null. i'm wondering if we should test this as-is since this behind a feature flag? the change is closer to what prometheus returned, which seems like a good thing to me. thoughts?

@toddtreece
Copy link
Member Author

@gabor it looks like the NaN issue in alerting was fixed in 8.4:

image

@gabor
Copy link
Contributor

gabor commented May 13, 2022

it seems like NaN behaves the same as null. i'm wondering if we should test this as-is since this behind a feature flag? the change is closer to what prometheus returned, which seems like a good thing to me. thoughts?

i like the keep_nan_as_it_is approach 👍 . the problem will be that if we want to keep the behind-feature-flag mode in a way that produces different testdata-snapshots, then we have to update the PR to not run the tests, otherwise we can never merge it 😄

Copy link
Contributor

@gabor gabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend area/frontend datasource/Prometheus no-backport Skip backport of PR no-changelog Skip including change in changelog/release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants