Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtool: add a "tsdb dump-openmetrics" to dump in OpemMetrics format. #13194

Merged
merged 1 commit into from Feb 28, 2024

Conversation

machine424
Copy link
Collaborator

@machine424 machine424 commented Nov 27, 2023

This closes the loop, as the output can be fed into "tsdb create-blocks-from openmetrics"

Native histograms are not supported.

Fixes #12022

@machine424 machine424 requested a review from dgl as a code owner November 27, 2023 12:14
cmd/promtool/tsdb_test.go Outdated Show resolved Hide resolved
b.WriteByte('{')
i := 0
ls.Range(func(l Label) {
if i > 0 {
b.WriteByte(',')
b.WriteByte(' ')
if addSpace {
b.WriteByte(' ')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this space only breaks two "formatting" tests. Maybe we could get rid on it in 3.0
Keeping it as it "breaks" promtool tsdb dump

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this into promtool where you need it.
The interface of Labels is too big already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? Do you mean redefining the same String on promtool side without the b.WriteByte(' ') line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CondensedString, yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bboreham Could you take a look now? thanks.

@machine424
Copy link
Collaborator Author

cc @bboreham (I feel there are at least some naming improvements to make)
(@dgl already agreed on this here https://cloud-native.slack.com/archives/C01AUBA4PFE/p1700053651502859)

for ss.Next() {
series := ss.At()
lbs := series.Labels()
labelsBuilder := labels.NewBuilder(lbs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe reuse the same builder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently added labels.DropMetricName() which does this more efficiently.

However it's probably even more efficient to write out what you need, i.e. re-code CondensedString to pull MetricName out first and add the non-name labels after.
It could even print to stdout to save allocating a string, but that is perhaps overkill.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for DropMetricName, I pushed a version using it.

I think it's better to pull out MetricName once per metric than for every time series, even though the util is cheap.

I tested with delegating the whole formatting (even the printing) to CondensedString and on my machine and for my testdata (prometheus' metrics), that didn't yield any improvement: more B/op as we add the metric name to every buffer.

With:

 
 // CondensedString is labels.Labels.String() without spaces after the commas.
-func CondensedString(ls labels.Labels) string {
+func CondensedString(metricName string, ls labels.Labels, val float64, ts int64) string {
        var b bytes.Buffer
 
+       b.WriteString(metricName)
        b.WriteByte('{')
        i := 0
        ls.Range(func(l labels.Label) {
@@ -794,6 +795,7 @@ func CondensedString(ls labels.Labels) string {
                i++
        })
        b.WriteByte('}')
+       b.WriteString(fmt.Sprintf(" %g %.3f\n", val, float64(ts)/1000))
        return b.String()
 }
 
@@ -801,13 +803,13 @@ func formatSeriesSetOpenMetrics(ss storage.SeriesSet) error {
        for ss.Next() {
                series := ss.At()
                lbs := series.Labels()
-               labelsBuilder := labels.NewBuilder(lbs)
                metricName := lbs.Get(labels.MetricName)
-               lbs = labelsBuilder.Del(labels.MetricName).Labels()
+               lbs = lbs.DropMetricName()
                it := series.Iterator(nil)
                for it.Next() == chunkenc.ValFloat {
                        ts, val := it.At()
-                       fmt.Printf("%s%s %g %.3f\n", metricName, CondensedString(lbs), val, float64(ts)/1000)
+                       fmt.Print(CondensedString(metricName, lbs, val, ts))
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/cmd/promtool
       │     pr      │ pr_delegate                      │
       │   sec/op    │     sec/op      vs base          │
XXX-10   551.8m ± 3%      556.8m ± 4%  ~ (p=0.247 n=10)

       │      pr      │   pr_delegate                        │
       │     B/op     │     B/op      vs base                │
XXX-10   110.7Mi ± 0%   188.4Mi ± 0%  +70.21% (p=0.000 n=10)

       │     pr      │  pr_delegate                       │
       │  allocs/op  │  allocs/op   vs base               │
XXX-10   3.473M ± 0%   3.807M ± 0%  +9.61% (p=0.000 n=10)

and after delegating the printing as well:

 // CondensedString is labels.Labels.String() without spaces after the commas.
-func CondensedString(ls labels.Labels) string {
+func CondensedString(metricName string, ls labels.Labels, val float64, ts int64) {
        var b bytes.Buffer
 
+       b.WriteString(metricName)
        b.WriteByte('{')
        i := 0
        ls.Range(func(l labels.Label) {
@@ -794,20 +795,21 @@ func CondensedString(ls labels.Labels) string {
                i++
        })
        b.WriteByte('}')
-       return b.String()
+       b.WriteString(fmt.Sprintf(" %g %.3f\n", val, float64(ts)/1000))
+       fmt.Print(b.String())
 }
 
 func formatSeriesSetOpenMetrics(ss storage.SeriesSet) error {
        for ss.Next() {
                series := ss.At()
                lbs := series.Labels()
-               labelsBuilder := labels.NewBuilder(lbs)
                metricName := lbs.Get(labels.MetricName)
-               lbs = labelsBuilder.Del(labels.MetricName).Labels()
+               lbs = lbs.DropMetricName()
                it := series.Iterator(nil)
                for it.Next() == chunkenc.ValFloat {
                        ts, val := it.At()
-                       fmt.Printf("%s%s %g %.3f\n", metricName, CondensedString(lbs), val, float64(ts)/1000)
+                       CondensedString(metricName, lbs, val, ts)
                }

got

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/cmd/promtool
       │     pr      │ pr_deleg_print                             │
       │   sec/op    │          sec/op           vs base          │
XXX-10   551.8m ± 3%                557.6m ± 4%  ~ (p=0.353 n=10)

       │      pr      │ pr_deleg_print                             │
       │     B/op     │        B/op         vs base                │
XXX-10   110.7Mi ± 0%         188.4Mi ± 0%  +70.24% (p=0.000 n=10)

       │     pr      │ pr_deleg_print                             │
       │  allocs/op  │      allocs/op       vs base               │
XXX-10   3.473M ± 0%           3.807M ± 0%  +9.61% (p=0.000 n=10)

( I tested with strings.Builder, a slightly better sec and B, but 20% more allocs.)

Maybe There is a optimization we can take advantage of, but I didn't spent time on that, especially that the this OM dump looks way efficient than the existing one that no one looks to be complaining about:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/cmd/promtool
       │    dump     │          dump_openmetrics           │
       │   sec/op    │   sec/op     vs base                │
XXX-10   679.3m ± 4%   551.8m ± 3%  -18.77% (p=0.000 n=10)

       │     dump     │           dump_openmetrics           │
       │     B/op     │     B/op      vs base                │
XXX-10   215.9Mi ± 0%   110.7Mi ± 0%  -48.74% (p=0.000 n=10)

       │    dump     │          dump_openmetrics           │
       │  allocs/op  │  allocs/op   vs base                │
XXX-10   4.083M ± 0%   3.473M ± 0%  -14.92% (p=0.000 n=10)

again, the prom dump puts more stuff into buffers, and has only to pass metric names into strconv.Quote.

I could add some benchmarks for the dump, if you want, but it'll be in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get my point across. Once you know the metric name you don't need to drop it from the Labels, you can just skip it in the loop. But it's not important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this benchmark code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, I can open a PR if you're interested in adding it.
It's just a:

func BenchmarkXXX(b *testing.B) {

	b.ReportAllocs()
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
	
		err := dumpSamples(
			context.Background(),
			"/XXX/github/prometheus/data",
			math.MinInt64,
			math.MaxInt64,
			[]string{"{__name__=~'(?s:.*)'}"},
			formatSeriesSetOpenMetrics,
		)

		if err != nil {
			panic(err)
		}
	
	}
}

I kept it send everything to stdout and grepped benchmarks results.

/XXX/github/prometheus/data containing prometheus metrics that I made it scrape itself every 1s for some minutes. (of course I stopped it before running the benchmarks)

Maybe I shared the first benchstat twice, but I'm sure both the changes yielded inferior perfs than the code that was merged. (you'll be able to validate that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I profiled that code, using TSDB data from TestTSDBDumpOpenMetrics, and it spends all its time in loadDataAsQuerier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on that? Are you talking about ‘ formatSeriesSetOpenMetrics’ vs ‘ formatSeriesSet’?

@machine424
Copy link
Collaborator Author

machine424 commented Feb 14, 2024

When you have time to take a look at my PRs @bboreham it'd be great if you start with this one :)
Thanks

@beorn7
Copy link
Member

beorn7 commented Feb 27, 2024

@bboreham @dgl will you be able to finish the review on this one?

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this; seems pretty close but I think it could be a little neater if the output function did what was needed directly.

for ss.Next() {
series := ss.At()
lbs := series.Labels()
labelsBuilder := labels.NewBuilder(lbs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently added labels.DropMetricName() which does this more efficiently.

However it's probably even more efficient to write out what you need, i.e. re-code CondensedString to pull MetricName out first and add the non-name labels after.
It could even print to stdout to save allocating a string, but that is perhaps overkill.

This closes the loop, as the output can be fed into "tsdb create-blocks-from openmetrics"

Native histograms are not supported.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
lbs := series.Labels()
metricName := lbs.Get(labels.MetricName)
lbs = lbs.DropMetricName()
it := series.Iterator(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw the reason for passing a pointer to Iterator is to re-use the memory of the old iterator.

for ss.Next() {
series := ss.At()
lbs := series.Labels()
labelsBuilder := labels.NewBuilder(lbs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get my point across. Once you know the metric name you don't need to drop it from the Labels, you can just skip it in the loop. But it's not important.

Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bboreham bboreham merged commit e79b9ed into prometheus:main Feb 28, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[promtool] Export metrics in openmetrics format
3 participants