Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add output option for csv format #1067

Merged
merged 19 commits into from Aug 29, 2019
Merged

Add output option for csv format #1067

merged 19 commits into from Aug 29, 2019

Conversation

Sirozha1337
Copy link
Contributor

Hi, I've noticed that json output takes up a lot of disk space for long running tests. I've found this issue #321, the csv output option was discussed there and I've decided to implement it. It works similar to json output, writing one sample per row. It uses twice as less space than json, so I've decided to open a pull request for it.

Maybe someone can suggest a way, how to transform the samples, so it would be one line per http request. Then I will make the changes and reopen the pull request.

I'm new to Go language, so I'm open to criticism.

stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Show resolved Hide resolved
stats/csv/collector.go Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Jun 29, 2019

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Jun 29, 2019

Codecov Report

Merging #1067 into master will decrease coverage by 0.44%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.34%   -0.45%     
==========================================
  Files         133      134       +1     
  Lines        9905     9966      +61     
==========================================
  Hits         7210     7210              
- Misses       2278     2339      +61     
  Partials      417      417
Impacted Files Coverage Δ
cmd/collectors.go 0% <0%> (ø) ⬆️
stats/csv/collector.go 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...e9cc0c0. Read the comment docs.

2 similar comments
@codecov
Copy link

codecov bot commented Jun 29, 2019

Codecov Report

Merging #1067 into master will decrease coverage by 0.44%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.34%   -0.45%     
==========================================
  Files         133      134       +1     
  Lines        9905     9966      +61     
==========================================
  Hits         7210     7210              
- Misses       2278     2339      +61     
  Partials      417      417
Impacted Files Coverage Δ
cmd/collectors.go 0% <0%> (ø) ⬆️
stats/csv/collector.go 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...e9cc0c0. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 29, 2019

Codecov Report

Merging #1067 into master will decrease coverage by 0.44%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.34%   -0.45%     
==========================================
  Files         133      134       +1     
  Lines        9905     9966      +61     
==========================================
  Hits         7210     7210              
- Misses       2278     2339      +61     
  Partials      417      417
Impacted Files Coverage Δ
cmd/collectors.go 0% <0%> (ø) ⬆️
stats/csv/collector.go 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...e9cc0c0. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 29, 2019

Codecov Report

Merging #1067 into master will decrease coverage by 0.05%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
- Coverage   72.79%   72.73%   -0.06%     
==========================================
  Files         133      134       +1     
  Lines        9905     9995      +90     
==========================================
+ Hits         7210     7270      +60     
- Misses       2278     2303      +25     
- Partials      417      422       +5
Impacted Files Coverage Δ
cmd/collectors.go 0% <0%> (ø) ⬆️
stats/csv/collector.go 68.18% <68.18%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...0dc9707. Read the comment docs.

stats/csv/collector.go Outdated Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request! I've noted some issues inline in the code, but besides fixing those, can you also add some unit tests to this PR? Since the New() function accepts an afero.Fs object, mocking the FS for them should be fairly easy.

stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Show resolved Hide resolved
func (c *Collector) Run(ctx context.Context) {
log.WithField("filename", c.fname).Debug("CSV: Writing CSV metrics")
<-ctx.Done()
_ = c.outfile.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since nowhere in the Collect() is it checked if the context is done, I'm not completely sure there isn't a race condition here. What happens if Collect() is still writing in the file while we're closing it here?

stats/csv/collector.go Show resolved Hide resolved
row = append(row, fmt.Sprintf("%f", sample.Value))
sampleTags := sample.Tags.CloneTags()

for _, tag := range resTags {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of implementing the tags in the CSV format would mean that any custom extra tags that are attached to the metrics will be silently discarded, which would probably surprise and annoy a lot of users. conf.SystemTags, which you pass to the constructor, is just a list of the keys for tags that k6 internally emits - users can add their own custom ones.

I can see two ways of fixing this:

  1. have an option in the constructor that specifically allows users to add extra columns with their custom metric tags
  2. have a final column "extra tags" that just contains any extra tags, either as a JSON value, or as an url-encoded key1=val1&key2=map2 map (probably better, since quotes can be escaped)

@na--
Copy link
Member

na-- commented Jul 2, 2019

Maybe someone can suggest a way, how to transform the samples, so it would be one line per http request. Then I will make the changes and reopen the pull request.

I took a look at the original issue, but one line per metric sample (like how you've currently done it) seems the better approach to me.

stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Show resolved Hide resolved
stats/csv/collector_test.go Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Jul 5, 2019

Codecov Report

Merging #1067 into master will increase coverage by 0.07%.
The diff coverage is 77.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
+ Coverage   72.79%   72.86%   +0.07%     
==========================================
  Files         133      135       +2     
  Lines        9905    10056     +151     
==========================================
+ Hits         7210     7327     +117     
- Misses       2278     2301      +23     
- Partials      417      428      +11
Impacted Files Coverage Δ
cmd/collectors.go 0% <0%> (ø) ⬆️
cmd/config.go 75.92% <100%> (+0.14%) ⬆️
stats/csv/config.go 79.31% <79.31%> (ø)
stats/csv/collector.go 83.78% <83.78%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdd417e...bd1997a. Read the comment docs.

stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
@Sirozha1337
Copy link
Contributor Author

@na-- , thank you for your input! I've made some changes and added some tests. I couldn't avoid using the "CloneTags" function, because there's no other way to get the extra-tags supplied by the user.

Also @golangcibot has found some errors in my code, but I don't understand what they are and how to fix them, if you could point me in the right direction, that would be great. Thanks!

stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
@Sirozha1337
Copy link
Contributor Author

@na--, thanks for the advice. I wrapped the code inside ranges into lambda, now the scopelint doesn't say anything about it. The only issue, that remains with it, is that I use a long string with csv data to test that collector writes correct csv to file.

@na--
Copy link
Member

na-- commented Jul 10, 2019

Split it with a concatenation, or add a //nolint: lll comment before it.

You can install golangci-lint locally by running go get -u github.com/golangci/golangci-lint/cmd/golangci-lint and then test if you have any linter issues in your code by running golangci-lint run --out-format=tab --new-from-rev master ./... in your local k6 repo.

@Sirozha1337
Copy link
Contributor Author

Fixed it. Thanks!

@na--
Copy link
Member

na-- commented Jul 10, 2019

Thank you as well for working on this! 🙂 I'll do another top to bottom review of the code in the next few days.

Copy link
Collaborator

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay we were finishing and releasing 0.25.0, and soon 0.25.1 😭

Looks, good, but some changes are required IMO.

I noticed you decide that calling lambda is a good way of not getting your variables changed by a for cycle. While this will definitely work, it is preferable to just shadow the variable with the same such as varname := varname this way you get a new variable that the for cycle won't change underneath, while not adding additional functional call and indentation.
I have commented on that in few places, just decided to add it here as well.

As a whole I like the PR it has the benefit over other outputs that we don't translate all the k6 metrics to some other format and than write 20k metrics at the same time, but just translate one at a time which is definitely better, especially given my testing around #1084 .

I did a quick "scientific" benchmark where I ran this script

import { Counter } from "k6/metrics";

var c = new Counter("awesome")

export default function() {
    c.add(1);
}

With both this code and the json output. the results are that for 2seconds and 1 vu (this code generates A lot of metrics) I got :
csv.iterations 124940 62468.459215/s and 27M of a file and
json.iterations 60376 29844.35324/s and 36M of a file.
So around twice better in performance and more than twice better in storage performance.
The funny thing is that the iteration duration is not different (or not by enough) but it's just apparently much faster to write the csv files so it runs much better ... or maybe the json one is much buggier :(

While looking at this PR I started wondering if both the json and the csv won't benefit from a gzip(or other compression) on the fly, as it will lower the amount of data written to disk. @na-- , what do you think? a future PR ?


"github.com/loadimpact/k6/lib"
"github.com/loadimpact/k6/stats"
log "github.com/sirupsen/logrus"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you probably got this from the other collectors but we actually have an issue to not rename logrus to log #1016 , so I would prefer if the new code doesn't use the alias ;)

)

const (
saveInterval = 1 * time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer if this is configurable

if err != nil {
log.WithField("filename", c.fname).Error("CSV: Error writing to file")
}
}(sample)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you making a lambda and than calling it ... Are you worried the value of sample will change before SampleToRow finishes ? I don't think that is possible but if it was it would be much better to do
sample := sample


// Link returns a dummy string, it's only included to satisfy the lib.Collector interface
func (c *Collector) Link() string {
return ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can return the fname although not sure that will be all that useful

func SampleToRow(sample *stats.Sample, resTags []string, ignoredTags []string) []string {
if sample == nil {
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this even happen? and if it does, what will csvWriter.Write do ?

break
}
prev = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think extra is needed at all :). You can probably try to use sort.SearchStrings on both the resTags and ignoredTags if you sort them as well. Not certain whether this will have beneficial results but it could be benchmarked

sort.Strings(collector.ignoredTags)
assert.Equal(t, expected.ignoredTags, collector.ignoredTags)
})
}(configs[i], expected[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just do config, expected := configs[i], expected[i] instead of the function call.
It will both be much shorter and much easier to support - if you want to add a new field you will now need to add it twice

}),
},
}
t.Run("Collect", func(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to call t.Run if you are not going to make at least two subtests in a test. The same goes for the other times in this file where t.Run is just called once in a test. This is not a problem but it's just adding more indentation

}

for testname, tags := range testdata {
func(testname string, tags []string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function call can be replaced by testname, tags := testname, tags

return nil
}

row := []string{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do know how many columns you will have (3 + len(restags) + 1 for the the extra tags. You can do row := make([]string, 0, 3 + len(resTags) + 1) and not change any of the other code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On similar note you can probably reuse the same slice over and over again as you are populating it and than writing it. This will have a lot of performance gain IMHO as it will practically remove a big chunk of allocation

stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/config.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/config.go Outdated Show resolved Hide resolved
stats/csv/config_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
@Sirozha1337
Copy link
Contributor Author

Thank you for code review! I've made some changes to the code. Please check it, when you have the time.

Copy link
Collaborator

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, and sorry for the delay, I started reading this code like 20 times ...
Even better now, but I should've added how sort.Search works 😭

I am going to test as I tested #1114 and #1113 to see how it performs once we finalize it :)

stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
@Sirozha1337
Copy link
Contributor Author

Hi, I've added some more tests and fixed the error you pointed out. Please review it, when you have the time.

Copy link
Collaborator

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @na-- will need to take a look as well and we are probably not going to merge it for a while, but we are probably going to try to put it in a release with the #1114 and #1113 .

I did some benchmarking and it seems to be doing much better than the json output. Maybe in the future we can add gzip compression to the csv output as well. If you find time, maybe you can add it here but it's not needed.

If you decide you can also possibly remove all the appends in SampleToRow as you can calculate all the indexes and using append will (even with all the optimizations in golang) generate some amount of new slices even if it doesn't generate new underlying array .

stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector.go Outdated Show resolved Hide resolved
stats/csv/collector_test.go Outdated Show resolved Hide resolved
@Sirozha1337
Copy link
Contributor Author

I instantiated a slice with initial length and removed "append" functions. Should be good now.
Thank you for your comments!

@mstoykov
Copy link
Collaborator

Thank you for all the hard work!!! 🎉

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution!

Remember to cleanup/squash the commits before the final merge (not necessarilly into one commit, but whatever makes sense).

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for contributing this! I dislike the complexity of the configuration and some other things here, but those can't be improved until we tackle the underlying architectural k6 issues described in #883 and #1075 😞

@mstoykov mstoykov merged commit 780a032 into grafana:master Aug 29, 2019
srguglielmo pushed a commit to srguglielmo/k6 that referenced this pull request Nov 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants