Add output option for csv format #1067

Sirozha1337 · 2019-06-29T09:40:18Z

Hi, I've noticed that json output takes up a lot of disk space for long running tests. I've found this issue #321, the csv output option was discussed there and I've decided to implement it. It works similar to json output, writing one sample per row. It uses twice as less space than json, so I've decided to open a pull request for it.

Maybe someone can suggest a way, how to transform the samples, so it would be one line per http request. Then I will make the changes and reopen the pull request.

I'm new to Go language, so I'm open to criticism.

stats/csv/collector.go

CLAassistant · 2019-06-29T09:40:49Z

All committers have signed the CLA.

codecov · 2019-06-29T09:51:51Z

Codecov Report

Merging #1067 into master will decrease coverage by 0.44%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.34%   -0.45%     
==========================================
  Files         133      134       +1     
  Lines        9905     9966      +61     
==========================================
  Hits         7210     7210              
- Misses       2278     2339      +61     
  Partials      417      417

Impacted Files	Coverage Δ
cmd/collectors.go	`0% <0%> (ø)`	⬆️
stats/csv/collector.go	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...e9cc0c0. Read the comment docs.

codecov · 2019-06-29T09:51:51Z

Codecov Report

Merging #1067 into master will decrease coverage by 0.44%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.34%   -0.45%     
==========================================
  Files         133      134       +1     
  Lines        9905     9966      +61     
==========================================
  Hits         7210     7210              
- Misses       2278     2339      +61     
  Partials      417      417

Impacted Files	Coverage Δ
cmd/collectors.go	`0% <0%> (ø)`	⬆️
stats/csv/collector.go	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...e9cc0c0. Read the comment docs.

codecov · 2019-06-29T09:51:52Z

Codecov Report

Merging #1067 into master will decrease coverage by 0.44%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.34%   -0.45%     
==========================================
  Files         133      134       +1     
  Lines        9905     9966      +61     
==========================================
  Hits         7210     7210              
- Misses       2278     2339      +61     
  Partials      417      417

Impacted Files	Coverage Δ
cmd/collectors.go	`0% <0%> (ø)`	⬆️
stats/csv/collector.go	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...e9cc0c0. Read the comment docs.

codecov · 2019-06-29T09:51:52Z

Codecov Report

Merging #1067 into master will decrease coverage by 0.05%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.73%   -0.06%     
==========================================
  Files         133      134       +1     
  Lines        9905     9995      +90     
==========================================
+ Hits         7210     7270      +60     
- Misses       2278     2303      +25     
- Partials      417      422       +5

Impacted Files	Coverage Δ
cmd/collectors.go	`0% <0%> (ø)`	⬆️
stats/csv/collector.go	`68.18% <68.18%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...0dc9707. Read the comment docs.

stats/csv/collector.go

na--

Thanks for this pull request! I've noted some issues inline in the code, but besides fixing those, can you also add some unit tests to this PR? Since the New() function accepts an afero.Fs object, mocking the FS for them should be fairly easy.

stats/csv/collector.go

na-- · 2019-07-02T09:55:30Z

stats/csv/collector.go

+func (c *Collector) Run(ctx context.Context) {
+	log.WithField("filename", c.fname).Debug("CSV: Writing CSV metrics")
+	<-ctx.Done()
+	_ = c.outfile.Close()


Since nowhere in the Collect() is it checked if the context is done, I'm not completely sure there isn't a race condition here. What happens if Collect() is still writing in the file while we're closing it here?

stats/csv/collector.go

na-- · 2019-07-02T10:07:44Z

stats/csv/collector.go

+	row = append(row, fmt.Sprintf("%f", sample.Value))
+	sampleTags := sample.Tags.CloneTags()
+
+	for _, tag := range resTags {


This way of implementing the tags in the CSV format would mean that any custom extra tags that are attached to the metrics will be silently discarded, which would probably surprise and annoy a lot of users. conf.SystemTags, which you pass to the constructor, is just a list of the keys for tags that k6 internally emits - users can add their own custom ones.

I can see two ways of fixing this:

have an option in the constructor that specifically allows users to add extra columns with their custom metric tags

have a final column "extra tags" that just contains any extra tags, either as a JSON value, or as an url-encoded key1=val1&key2=map2 map (probably better, since quotes can be escaped)

na-- · 2019-07-02T10:09:41Z

Maybe someone can suggest a way, how to transform the samples, so it would be one line per http request. Then I will make the changes and reopen the pull request.

I took a look at the original issue, but one line per metric sample (like how you've currently done it) seems the better approach to me.

stats/csv/collector_test.go

stats/csv/collector.go

codecov-io · 2019-07-05T13:50:37Z

Codecov Report

Merging #1067 into master will increase coverage by 0.07%.
The diff coverage is 77.48%.

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
+ Coverage   72.79%   72.86%   +0.07%     
==========================================
  Files         133      135       +2     
  Lines        9905    10056     +151     
==========================================
+ Hits         7210     7327     +117     
- Misses       2278     2301      +23     
- Partials      417      428      +11

Impacted Files	Coverage Δ
cmd/collectors.go	`0% <0%> (ø)`	⬆️
cmd/config.go	`75.92% <100%> (+0.14%)`	⬆️
stats/csv/config.go	`79.31% <79.31%> (ø)`
stats/csv/collector.go	`83.78% <83.78%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...bd1997a. Read the comment docs.

stats/csv/collector.go

stats/csv/collector_test.go

Sirozha1337 · 2019-07-05T13:58:05Z

@na-- , thank you for your input! I've made some changes and added some tests. I couldn't avoid using the "CloneTags" function, because there's no other way to get the extra-tags supplied by the user.

Also @golangcibot has found some errors in my code, but I don't understand what they are and how to fix them, if you could point me in the right direction, that would be great. Thanks!

stats/csv/collector_test.go

Sirozha1337 · 2019-07-10T07:22:35Z

@na--, thanks for the advice. I wrapped the code inside ranges into lambda, now the scopelint doesn't say anything about it. The only issue, that remains with it, is that I use a long string with csv data to test that collector writes correct csv to file.

na-- · 2019-07-10T07:30:32Z

Split it with a concatenation, or add a //nolint: lll comment before it.

You can install golangci-lint locally by running go get -u github.com/golangci/golangci-lint/cmd/golangci-lint and then test if you have any linter issues in your code by running golangci-lint run --out-format=tab --new-from-rev master ./... in your local k6 repo.

Sirozha1337 · 2019-07-10T07:38:02Z

Fixed it. Thanks!

na-- · 2019-07-10T08:51:58Z

Thank you as well for working on this! 🙂 I'll do another top to bottom review of the code in the next few days.

mstoykov

Sorry for the long delay we were finishing and releasing 0.25.0, and soon 0.25.1 😭

Looks, good, but some changes are required IMO.

I noticed you decide that calling lambda is a good way of not getting your variables changed by a for cycle. While this will definitely work, it is preferable to just shadow the variable with the same such as varname := varname this way you get a new variable that the for cycle won't change underneath, while not adding additional functional call and indentation.
I have commented on that in few places, just decided to add it here as well.

As a whole I like the PR it has the benefit over other outputs that we don't translate all the k6 metrics to some other format and than write 20k metrics at the same time, but just translate one at a time which is definitely better, especially given my testing around #1084 .

I did a quick "scientific" benchmark where I ran this script

import { Counter } from "k6/metrics";

var c = new Counter("awesome")

export default function() {
    c.add(1);
}

With both this code and the json output. the results are that for 2seconds and 1 vu (this code generates A lot of metrics) I got :
csv.iterations 124940 62468.459215/s and 27M of a file and
json.iterations 60376 29844.35324/s and 36M of a file.
So around twice better in performance and more than twice better in storage performance.
The funny thing is that the iteration duration is not different (or not by enough) but it's just apparently much faster to write the csv files so it runs much better ... or maybe the json one is much buggier :(

While looking at this PR I started wondering if both the json and the csv won't benefit from a gzip(or other compression) on the fly, as it will lower the amount of data written to disk. @na-- , what do you think? a future PR ?

mstoykov · 2019-08-09T09:17:31Z

stats/csv/collector.go

+
+	"github.com/loadimpact/k6/lib"
+	"github.com/loadimpact/k6/stats"
+	log "github.com/sirupsen/logrus"


I know that you probably got this from the other collectors but we actually have an issue to not rename logrus to log #1016 , so I would prefer if the new code doesn't use the alias ;)

mstoykov · 2019-08-09T09:18:09Z

stats/csv/collector.go

+)
+
+const (
+	saveInterval = 1 * time.Second


I would really prefer if this is configurable

mstoykov · 2019-08-09T09:27:47Z

stats/csv/collector.go

+					if err != nil {
+						log.WithField("filename", c.fname).Error("CSV: Error writing to file")
+					}
+				}(sample)


I don't understand why you making a lambda and than calling it ... Are you worried the value of sample will change before SampleToRow finishes ? I don't think that is possible but if it was it would be much better to do
sample := sample

mstoykov · 2019-08-09T09:28:33Z

stats/csv/collector.go

+
+// Link returns a dummy string, it's only included to satisfy the lib.Collector interface
+func (c *Collector) Link() string {
+	return ""


Maybe you can return the fname although not sure that will be all that useful

mstoykov · 2019-08-09T09:42:12Z

stats/csv/collector.go

+func SampleToRow(sample *stats.Sample, resTags []string, ignoredTags []string) []string {
+	if sample == nil {
+		return nil
+	}


Can this even happen? and if it does, what will csvWriter.Write do ?

mstoykov · 2019-08-09T09:55:00Z

stats/csv/collector.go

+				break
+			}
+			prev = true
+		}


I don't think extra is needed at all :). You can probably try to use sort.SearchStrings on both the resTags and ignoredTags if you sort them as well. Not certain whether this will have beneficial results but it could be benchmarked

mstoykov · 2019-08-09T10:08:51Z

stats/csv/collector_test.go

+				sort.Strings(collector.ignoredTags)
+				assert.Equal(t, expected.ignoredTags, collector.ignoredTags)
+			})
+		}(configs[i], expected[i])


You could just do config, expected := configs[i], expected[i] instead of the function call.
It will both be much shorter and much easier to support - if you want to add a new field you will now need to add it twice

mstoykov · 2019-08-09T10:12:45Z

stats/csv/collector_test.go

+			}),
+		},
+	}
+	t.Run("Collect", func(t *testing.T) {


You don't need to call t.Run if you are not going to make at least two subtests in a test. The same goes for the other times in this file where t.Run is just called once in a test. This is not a problem but it's just adding more indentation

mstoykov · 2019-08-09T10:13:52Z

stats/csv/collector_test.go

+	}
+
+	for testname, tags := range testdata {
+		func(testname string, tags []string) {


This function call can be replaced by testname, tags := testname, tags

mstoykov · 2019-08-09T10:31:39Z

stats/csv/collector.go

+		return nil
+	}
+
+	row := []string{}


You do know how many columns you will have (3 + len(restags) + 1 for the the extra tags. You can do row := make([]string, 0, 3 + len(resTags) + 1) and not change any of the other code.

On similar note you can probably reuse the same slice over and over again as you are populating it and than writing it. This will have a lot of performance gain IMHO as it will practically remove a big chunk of allocation

stats/csv/collector.go

stats/csv/config.go

stats/csv/collector_test.go

stats/csv/config.go

stats/csv/config_test.go

stats/csv/collector_test.go

Sirozha1337 · 2019-08-14T09:03:07Z

Thank you for code review! I've made some changes to the code. Please check it, when you have the time.

mstoykov

Thanks again, and sorry for the delay, I started reading this code like 20 times ...
Even better now, but I should've added how sort.Search works 😭

I am going to test as I tested #1114 and #1113 to see how it performs once we finalize it :)

stats/csv/collector.go

Sirozha1337 · 2019-08-19T11:42:14Z

Hi, I've added some more tests and fixed the error you pointed out. Please review it, when you have the time.

mstoykov

LGTM! @na-- will need to take a look as well and we are probably not going to merge it for a while, but we are probably going to try to put it in a release with the #1114 and #1113 .

I did some benchmarking and it seems to be doing much better than the json output. Maybe in the future we can add gzip compression to the csv output as well. If you find time, maybe you can add it here but it's not needed.

If you decide you can also possibly remove all the appends in SampleToRow as you can calculate all the indexes and using append will (even with all the optimizations in golang) generate some amount of new slices even if it doesn't generate new underlying array .

stats/csv/collector.go

stats/csv/collector_test.go

Sirozha1337 · 2019-08-20T10:57:18Z

I instantiated a slice with initial length and removed "append" functions. Should be good now.
Thank you for your comments!

mstoykov · 2019-08-20T14:01:14Z

Thank you for all the hard work!!! 🎉

imiric

LGTM, thanks for your contribution!

Remember to cleanup/squash the commits before the final merge (not necessarilly into one commit, but whatever makes sense).

na--

LGTM, thanks for contributing this! I dislike the complexity of the configuration and some other things here, but those can't be improved until we tackle the underlying architectural k6 issues described in #883 and #1075 😞

Add output option for csv format

e9cc0c0

golangcibot reviewed Jun 29, 2019

View reviewed changes

Fix naming, add comments, fix writing column names

63ede6a

golangcibot reviewed Jun 29, 2019

View reviewed changes

stats/csv/collector.go Outdated Show resolved Hide resolved

na-- requested changes Jul 2, 2019

View reviewed changes

na-- mentioned this pull request Jul 2, 2019

Investigate telegraf integration in k6 #1064

Closed

Sirozha1337 added 2 commits July 5, 2019 15:57

Save extra tags, flush writer after multiple rows

d881ca7

Add test for csv collector

f55b4eb

golangcibot reviewed Jul 5, 2019

View reviewed changes

Write data to buffer to avoid blocks on slow disk operations

0dc9707

golangcibot reviewed Jul 5, 2019

View reviewed changes

stats/csv/collector.go Outdated Show resolved Hide resolved

stats/csv/collector.go Outdated Show resolved Hide resolved

stats/csv/collector_test.go Outdated Show resolved Hide resolved

stats/csv/collector_test.go Outdated Show resolved Hide resolved

Attempt to fix scopelint and gosec warnings

2efe635

golangcibot reviewed Jul 9, 2019

View reviewed changes

stats/csv/collector_test.go Outdated Show resolved Hide resolved

stats/csv/collector_test.go Outdated Show resolved Hide resolved

Fix incorrect variable pin

7a1b4b7

na-- mentioned this pull request Jul 9, 2019

Refactor the collectors/outputs #1075

Closed

Fix scopelint issues by using lambda functions

2d39019

Break up the long string into smaller ones to avoid lint errors

5364400

Add more test coverage

c7862b2

Fix inconsistent test results due to the nature of a set

f5b82d2

na-- mentioned this pull request Jul 11, 2019

Efficient output of metrics to a binary file #321

Closed

mstoykov requested changes Aug 9, 2019

View reviewed changes

Sirozha1337 added 2 commits August 12, 2019 12:19

Resolve code review issues

a7f5a49

Make CSV collector more configurable

e137f59

golangcibot reviewed Aug 14, 2019

View reviewed changes

Fix GolangCI issues

4720f16

mstoykov requested changes Aug 15, 2019

View reviewed changes

stats/csv/collector.go Outdated Show resolved Hide resolved

stats/csv/collector.go Outdated Show resolved Hide resolved

Sirozha1337 added 3 commits August 19, 2019 14:07

Add better unit tests, fix error with sort.SearchStrings

e43bcf9

Optimize function for checking if string is in slice

12754c2

Fix tests being dependent on current time

9363a2b

mstoykov approved these changes Aug 20, 2019

View reviewed changes

Instantiate slice with length and use indexes instead of append

5c8a878

golangcibot reviewed Aug 20, 2019

View reviewed changes

stats/csv/collector.go Outdated Show resolved Hide resolved

stats/csv/collector.go Outdated Show resolved Hide resolved

stats/csv/collector_test.go Outdated Show resolved Hide resolved

Fixed golang-ci errors

bd1997a

mstoykov mentioned this pull request Aug 26, 2019

Output check/threshold results in a machine-readable unit test format to publish test results in CI #1120

Closed

na-- added this to the v0.26.0 milestone Aug 29, 2019

na-- requested review from imiric and cuonglm August 29, 2019 08:50

imiric approved these changes Aug 29, 2019

View reviewed changes

na-- approved these changes Aug 29, 2019

View reviewed changes

mstoykov merged commit 780a032 into grafana:master Aug 29, 2019

srguglielmo pushed a commit to srguglielmo/k6 that referenced this pull request Nov 3, 2019

Add output option for csv format (grafana#1067)

28e4eb7

mstoykov mentioned this pull request Jul 12, 2020

csv output isn't mentioned anywhere grafana/k6-docs#54

Closed

Add output option for csv format #1067

Add output option for csv format #1067

Conversation

Sirozha1337 commented Jun 29, 2019

CLAassistant commented Jun 29, 2019 • edited

codecov bot commented Jun 29, 2019

Codecov Report

codecov bot commented Jun 29, 2019

Codecov Report

codecov bot commented Jun 29, 2019

Codecov Report

codecov bot commented Jun 29, 2019 • edited

Codecov Report

na-- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- commented Jul 2, 2019

codecov-io commented Jul 5, 2019 • edited

Codecov Report

Sirozha1337 commented Jul 5, 2019

Sirozha1337 commented Jul 10, 2019

na-- commented Jul 10, 2019

Sirozha1337 commented Jul 10, 2019

na-- commented Jul 10, 2019 • edited

mstoykov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sirozha1337 commented Aug 14, 2019

mstoykov left a comment

Choose a reason for hiding this comment

Sirozha1337 commented Aug 19, 2019

mstoykov left a comment

Choose a reason for hiding this comment

Sirozha1337 commented Aug 20, 2019

mstoykov commented Aug 20, 2019

imiric left a comment

Choose a reason for hiding this comment

na-- left a comment

Choose a reason for hiding this comment

CLAassistant commented Jun 29, 2019 •

edited

codecov bot commented Jun 29, 2019 •

edited

codecov-io commented Jul 5, 2019 •

edited

na-- commented Jul 10, 2019 •

edited