Benchmarks #3

codebien · 2021-10-28T16:59:59Z

goos: linux
goarch: amd64
pkg: github.com/grafana/xk6-output-influxdb/pkg/influxdb
cpu: AMD Ryzen 7 4800H with Radeon Graphics
BenchmarkWritePoints-16    	     192	   6670961 ns/op
PASS
ok  	github.com/grafana/xk6-output-influxdb/pkg/influxdb	1.930s

with v1 I got ~220 iteration

yorugac

LGTM but I think I'm not fully clear on the goal of the PR: if it's to compare v2 API to v1 API, shouldn't the benchmark for v1 be present as well?

codebien · 2021-10-29T09:52:23Z

LGTM but I think I'm not fully clear on the goal of the PR: if it's to compare v2 API to v1 API, shouldn't the benchmark for v1 be present as well?

@yorugac thanks for your review. The goal is to understand what we can expect in terms of performance from the integration between this extension and a real InfluxDB server. v1 comparison is something additional for getting a better overview and to know if we are hitting important performance differences.

We can evaluate if it makes sense to push (and maintain) the equivalent version of the code for v1 but before I would have a stable and reliable benchmark here.

About the reliability I have still some doubts:

Should we have a different number of samples? If yes, how much? Currently, it's pushing a static set of 10 samples. Probably, it isn't a representative case of the average of k6 iterations?
Same story for Tags, just small and static set.

Maybe, we should think of a common set of use-cases for benchmarking the k6 outputs, in this way we could normalize the benchmarks and have a comparison also between different outputs?

yorugac · 2021-10-29T10:12:22Z

Got it, thanks. Yes, checking larger set of samples might be beneficial too. There is one more thing I noticed: this benchmark is basically only for Influx API while batchFromSamples remains un-benchmarked. For example, in PRW output extension the main performance problem is on processing metrics pre-request and that is also the part that we can directly improve on. I.e we cannot do anything with limitations imposed by API of Influx or RW of Prometheus, etc.

So perhaps, something like this: benchmark our side of processing and have some evaluation on what can be reasonably expected with real server in question?

mstoykov

I am really surprised the influxdb v2 is worse than v1 :( I don't know what the marketed benefits of v2 were supposed to be, but not making ingestion faster seems like ..., not a good thing.

I would argue you are currently benchmarking the v2 influxdb API, not this extension, to the later I would be calling AddMetricsSamples of the extension. This though also needs to take into account the fact that this needs to be flushed. Which is one of the many reasons I do think this things should be tested by just running the whole k6 with just a bunch of custom metrics being added each iteration and waiting for k6 to finish while measuring how many iterations (metric emissions) it was actually possible to send with the given output.

This though is kind of ... bad for a go benchmark IME, so maybe this is fine as a benchmark 🤷 , it just isn't really showing us how influxdb v2 output compares to v1 or let's say stated with tags 🤷 which arguably is both the thing users will want to know and we can compare and make statements such as "influxdb v2 "is faster/slower than influxdb v1 output for this script".

Again my biggest reason to want a direct comparison of script running is that is the thing we actually care about and currently the influxdb output in k6, doesn't ... do well :(. So if this new one does ... worse when it's synchronously writing, maybe it will do better if it does it asynchronously ... or maybe not, but maybe if we change something else ... and so on. But for this we need to see how it goes from one end to the other.

I would argue also that the major problem of outputs currently is that k6 doesn't aggregate anything and most of them don't either so they end up writing all 20000 updates to a counter every second as 20000 different samples, while arguably no user will care for a resolution under 1s and if they did we can have it be configurable. Once this is fixed I would expect most outputs to be a lot more ... performant ;)

mstoykov · 2021-11-01T12:54:32Z

pkg/influxdb/bench_test.go

+	})
+	require.NoError(b, err)
+
+	samples := make(stats.Samples, 10)


10 seems like a really small number. I would probably would've done 1000 or even 10000, maybe a table driven ~~test~~ benchmakr 🤷 .

mstoykov · 2021-11-01T13:12:34Z

pkg/influxdb/bench_test.go

+	batch := o.batchFromSamples([]stats.SampleContainer{samples})
+
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		err := o.pointWriter.WritePoint(ctx, batch...)
+		if err != nil {
+			b.Fatal(err)
+		}


I would argue you should be adding Metics the same way the engine would be - through the AddMetricSamples. This also will test all of the other parts (like batchFromSamples) of writing metrics and arguably can be of help to making something like grafana/k6#2208 more doable.

codebien · 2021-11-01T17:06:13Z

Which is one of the many reasons I do think this things should be tested by just running the whole k6 with just a bunch of custom metrics being added each iteration and waiting for k6 to finish while measuring how many iterations (metric emissions) it was actually possible to send with the given output.

@mstoykov I agree and that is exactly the attempt with TestOutputThroughput. I did it in Go just for ~~two~~ three reasons:

it's reproducible
we can see what is the achieved performance just by looking into the asserted value
Easier define a metric and tracking it

Can we achieve the same using the k6 run --out=... command? I guess we can have a bench.js file in the repo and then report in a section in the README the latest seen value with running the master branch, WDYT?

mstoykov · 2021-11-03T08:09:08Z

Yeah the idea is that we will likely make a script :

constant arrival rate for consistency and the ability to see if we drop iterations
maybe constant vus as well to put pressure on it, but in a different script/run 🤷
I would probably just go with custom metrics of each time and just add samples to each of them on each iteration
then we look at how long it took to finish and how many iterations were actually finished, maybe memory and CPU usage for the constant arrival rate and w/e else you think of.

p.s. those were three reasons ;)

yorugac · 2021-11-03T16:14:38Z

I would argue also that the major problem of outputs currently is that k6 doesn't aggregate anything

Totally agree with this, from my observations from PRW output behavior.

A suggestion: it might also be possible to make an estimate of a "metrics rate" as the number of metric samples that are processed by the Output in each flush period. That would allow to re-formulate the problem of output's performance in terms of the metrics rate this given Output can process without errors / dropping samples / etc. For example, the end result would be something like "Output A can handle X samples per second with standard setup™ + addition of 1 custom metric raises the sample rate by Y".
(The last part about custom metrics is likely independent of the Output in question.)

codebien · 2022-03-10T14:09:38Z

At the moment we don't have the time for focusing on this in short term, so I'm closing this and we will reopen maybe a new one better defined in the future. Feel free to reopen or create an issue if you have different opinions and/or ideas.

codebien added 8 commits October 18, 2021 11:50

go module init

44b8f57

readme: project details and documentation

3858420

docker: build, run and common case for compose

c3e2f84

pkg: Output implementation

308c1ca

main: Register the extension

6fcf50b

pkg: use an async metric flusher

37fa3ad

gomod: imported master version

8036a5c

benchmark with real influx

6d9e976

codebien requested review from mstoykov, yorugac and na-- October 28, 2021 16:59

codebien self-assigned this Oct 28, 2021

codebien mentioned this pull request Oct 28, 2021

Output #1

Merged

5 tasks

yorugac reviewed Oct 29, 2021

View reviewed changes

mstoykov reviewed Nov 1, 2021

View reviewed changes

codebien force-pushed the output branch from 37fa3ad to 119c9d3 Compare November 3, 2021 15:34

codebien force-pushed the output branch from 40ef1d4 to f627b2c Compare November 5, 2021 12:37

yorugac mentioned this pull request Nov 8, 2021

Normalized benchmark for outputs grafana/k6#2208

Closed

codebien changed the base branch from output to main November 8, 2021 09:23

codebien closed this Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks #3

Benchmarks #3

codebien commented Oct 28, 2021

yorugac left a comment

codebien commented Oct 29, 2021

yorugac commented Oct 29, 2021

mstoykov left a comment

mstoykov Nov 1, 2021

mstoykov Nov 1, 2021

codebien commented Nov 1, 2021 •

edited

mstoykov commented Nov 3, 2021

yorugac commented Nov 3, 2021

codebien commented Mar 10, 2022 •

edited

Benchmarks #3

Benchmarks #3

Conversation

codebien commented Oct 28, 2021

yorugac left a comment

Choose a reason for hiding this comment

codebien commented Oct 29, 2021

yorugac commented Oct 29, 2021

mstoykov left a comment

Choose a reason for hiding this comment

mstoykov Nov 1, 2021

Choose a reason for hiding this comment

mstoykov Nov 1, 2021

Choose a reason for hiding this comment

codebien commented Nov 1, 2021 • edited

mstoykov commented Nov 3, 2021

yorugac commented Nov 3, 2021

codebien commented Mar 10, 2022 • edited

codebien commented Nov 1, 2021 •

edited

codebien commented Mar 10, 2022 •

edited