Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks #3

Closed
wants to merge 8 commits into from
Closed

Benchmarks #3

wants to merge 8 commits into from

Conversation

codebien
Copy link
Collaborator

goos: linux
goarch: amd64
pkg: github.com/grafana/xk6-output-influxdb/pkg/influxdb
cpu: AMD Ryzen 7 4800H with Radeon Graphics
BenchmarkWritePoints-16    	     192	   6670961 ns/op
PASS
ok  	github.com/grafana/xk6-output-influxdb/pkg/influxdb	1.930s

with v1 I got ~220 iteration

@codebien codebien self-assigned this Oct 28, 2021
@codebien codebien mentioned this pull request Oct 28, 2021
5 tasks
Copy link

@yorugac yorugac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think I'm not fully clear on the goal of the PR: if it's to compare v2 API to v1 API, shouldn't the benchmark for v1 be present as well?

@codebien
Copy link
Collaborator Author

LGTM but I think I'm not fully clear on the goal of the PR: if it's to compare v2 API to v1 API, shouldn't the benchmark for v1 be present as well?

@yorugac thanks for your review. The goal is to understand what we can expect in terms of performance from the integration between this extension and a real InfluxDB server. v1 comparison is something additional for getting a better overview and to know if we are hitting important performance differences.

We can evaluate if it makes sense to push (and maintain) the equivalent version of the code for v1 but before I would have a stable and reliable benchmark here.

About the reliability I have still some doubts:

  • Should we have a different number of samples? If yes, how much? Currently, it's pushing a static set of 10 samples. Probably, it isn't a representative case of the average of k6 iterations?
  • Same story for Tags, just small and static set.

Maybe, we should think of a common set of use-cases for benchmarking the k6 outputs, in this way we could normalize the benchmarks and have a comparison also between different outputs?

@yorugac
Copy link

yorugac commented Oct 29, 2021

Got it, thanks. Yes, checking larger set of samples might be beneficial too. There is one more thing I noticed: this benchmark is basically only for Influx API while batchFromSamples remains un-benchmarked. For example, in PRW output extension the main performance problem is on processing metrics pre-request and that is also the part that we can directly improve on. I.e we cannot do anything with limitations imposed by API of Influx or RW of Prometheus, etc.

So perhaps, something like this: benchmark our side of processing and have some evaluation on what can be reasonably expected with real server in question?

Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really surprised the influxdb v2 is worse than v1 :( I don't know what the marketed benefits of v2 were supposed to be, but not making ingestion faster seems like ..., not a good thing.

I would argue you are currently benchmarking the v2 influxdb API, not this extension, to the later I would be calling AddMetricsSamples of the extension. This though also needs to take into account the fact that this needs to be flushed. Which is one of the many reasons I do think this things should be tested by just running the whole k6 with just a bunch of custom metrics being added each iteration and waiting for k6 to finish while measuring how many iterations (metric emissions) it was actually possible to send with the given output.

This though is kind of ... bad for a go benchmark IME, so maybe this is fine as a benchmark 🤷 , it just isn't really showing us how influxdb v2 output compares to v1 or let's say stated with tags 🤷 which arguably is both the thing users will want to know and we can compare and make statements such as "influxdb v2 "is faster/slower than influxdb v1 output for this script".

Again my biggest reason to want a direct comparison of script running is that is the thing we actually care about and currently the influxdb output in k6, doesn't ... do well :(. So if this new one does ... worse when it's synchronously writing, maybe it will do better if it does it asynchronously ... or maybe not, but maybe if we change something else ... and so on. But for this we need to see how it goes from one end to the other.

I would argue also that the major problem of outputs currently is that k6 doesn't aggregate anything and most of them don't either so they end up writing all 20000 updates to a counter every second as 20000 different samples, while arguably no user will care for a resolution under 1s and if they did we can have it be configurable. Once this is fixed I would expect most outputs to be a lot more ... performant ;)

})
require.NoError(b, err)

samples := make(stats.Samples, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seems like a really small number. I would probably would've done 1000 or even 10000, maybe a table driven test benchmakr 🤷 .

Comment on lines +151 to +158
batch := o.batchFromSamples([]stats.SampleContainer{samples})

b.ResetTimer()
for i := 0; i < b.N; i++ {
err := o.pointWriter.WritePoint(ctx, batch...)
if err != nil {
b.Fatal(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue you should be adding Metics the same way the engine would be - through the AddMetricSamples. This also will test all of the other parts (like batchFromSamples) of writing metrics and arguably can be of help to making something like grafana/k6#2208 more doable.

@codebien
Copy link
Collaborator Author

codebien commented Nov 1, 2021

Which is one of the many reasons I do think this things should be tested by just running the whole k6 with just a bunch of custom metrics being added each iteration and waiting for k6 to finish while measuring how many iterations (metric emissions) it was actually possible to send with the given output.

@mstoykov I agree and that is exactly the attempt with TestOutputThroughput. I did it in Go just for two three reasons:

  • it's reproducible
  • we can see what is the achieved performance just by looking into the asserted value
  • Easier define a metric and tracking it

Can we achieve the same using the k6 run --out=... command? I guess we can have a bench.js file in the repo and then report in a section in the README the latest seen value with running the master branch, WDYT?

@mstoykov
Copy link
Contributor

mstoykov commented Nov 3, 2021

Yeah the idea is that we will likely make a script :

  • constant arrival rate for consistency and the ability to see if we drop iterations
  • maybe constant vus as well to put pressure on it, but in a different script/run 🤷
  • I would probably just go with custom metrics of each time and just add samples to each of them on each iteration
  • then we look at how long it took to finish and how many iterations were actually finished, maybe memory and CPU usage for the constant arrival rate and w/e else you think of.

p.s. those were three reasons ;)

@yorugac
Copy link

yorugac commented Nov 3, 2021

I would argue also that the major problem of outputs currently is that k6 doesn't aggregate anything

Totally agree with this, from my observations from PRW output behavior.

A suggestion: it might also be possible to make an estimate of a "metrics rate" as the number of metric samples that are processed by the Output in each flush period. That would allow to re-formulate the problem of output's performance in terms of the metrics rate this given Output can process without errors / dropping samples / etc. For example, the end result would be something like "Output A can handle X samples per second with standard setup™ + addition of 1 custom metric raises the sample rate by Y".
(The last part about custom metrics is likely independent of the Output in question.)

@codebien
Copy link
Collaborator Author

codebien commented Mar 10, 2022

At the moment we don't have the time for focusing on this in short term, so I'm closing this and we will reopen maybe a new one better defined in the future. Feel free to reopen or create an issue if you have different opinions and/or ideas.

@codebien codebien closed this Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants