Properly handle fractional samples #64

jasonrhansen · 2019-02-26T23:43:51Z

Fixes #43

codecov-io · 2019-02-27T00:41:37Z

Codecov Report

Merging #64 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master     #64      +/-   ##
=========================================
+ Coverage   77.54%   77.6%   +0.06%     
=========================================
  Files          15      15              
  Lines        1287    1286       -1     
=========================================
  Hits          998     998              
+ Misses        289     288       -1

Impacted Files	Coverage Δ
src/flamegraph/color/mod.rs	`73.33% <100%> (ø)`	⬆️
src/flamegraph/mod.rs	`84.41% <100%> (+0.06%)`	⬆️
src/flamegraph/merge.rs	`91.46% <100%> (-0.21%)`	⬇️
src/collapse/perf.rs	`82.41% <0%> (+0.34%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4863e01...7af2604. Read the comment docs.

jonhoo · 2019-02-27T03:44:43Z

Neat! I wonder what the performance impact of this change is.. Could you try benchmarking with one of the larger data files? Maybe copy its contents into a file several times over and fold that to get a more representative sample?

jasonrhansen · 2019-02-27T04:36:33Z

Here are the results using a 2GB test file. I ran it multiple times on master and fractional-samples branches.

master

5.55s user 0.85s system 85% cpu 7.510 total
5.42s user 0.98s system 85% cpu 7.453 total
5.34s user 1.14s system 88% cpu 7.362 total
5.56s user 0.92s system 87% cpu 7.372 total

fractional-samples

5.06s user 1.21s system 84% cpu 7.383 total
5.27s user 1.06s system 86% cpu 7.317 total
5.30s user 1.04s system 85% cpu 7.379 total
5.21s user 1.04s system 86% cpu 7.269 total

Looks like there's very little difference in performance, and surprisingly the fractional-samples version ran slightly faster for me.

jonhoo · 2019-02-27T15:33:07Z

Oh, that's very interesting.. Could you try running it with hyperfine so we also get things like variance? Take a look at compare.sh for inspiration.

jonhoo · 2019-02-27T15:33:49Z

FWIW: I've approved these changes, but would still like hyperfine results before we merge

jasonrhansen · 2019-02-27T17:55:23Z

Here are the hyperfine results. These benchmarks were also done on a different machine than the last ones.

master

  Time (mean ± σ):      7.805 s ±  0.037 s    [User: 6.308 s, System: 1.482 s]
  Range (min … max):    7.740 s …  7.865 s    10 runs

fractional-samples

  Time (mean ± σ):      7.850 s ±  0.050 s    [User: 6.346 s, System: 1.493 s]
  Range (min … max):    7.787 s …  7.922 s    10 runs

jonhoo · 2019-02-27T18:21:43Z

Okay, that makes a little more sense -- seems like there's a very slight performance regression. I wonder how often fractional samples come up in practice 🤔 The regression is small enough that I think we should merge this though :)

jasonrhansen · 2019-02-27T18:55:53Z

I think the performance difference is too small to really worry about, but now I'm wondering why there would ever be fractional samples in the first place. I assumed they would come up somewhere since the Perl version handles them, but it doesn't make sense to me. You either got a sample or you didn't. How would you ever get a partial sample? Can you think of any reason fractional samples would exist?

jonhoo · 2019-02-27T19:15:20Z

Doing some digging led me to brendangregg/FlameGraph#12, which implemented this feature upstream. Quoting @timbunce from there:

On some platforms the timing data from NYTProf is in tenths of a microsecond.

There's also brendangregg/FlameGraph#79 indicating that someone is using it.

That said, in brendangregg/FlameGraph#18 which introduced --factor, the author of that original PR states that the cumulative error you see from using fractional values is enough of a problem that fractional values should simply not be used. Specifically, the recommendation is to have all sample counts be integers, scale any fractional values up in the producer of the sample data, and then use --factor 0.1 or whatever to scale the value back down at the very end.

I actually kind of like the idea of getting rid of support for fractional sample counts. It suggests we should still parse fractional values, but we should discard the fractional part (like we do today) and instead just add support for the --factor argument (which'll be an f64). We will probably want to emit a warning if we detect a fractional sample value where the fractional part is not 0 though, just in case someone runs into this!

One alternative would be to automatically set factor and scale each read sample count by some arbitrarily chosen factor (1000 maybe) if we detect that they are fractional. That should allow us to still to all the computation on integers while still giving the right end result.

Thoughts?

jasonrhansen · 2019-02-27T19:40:28Z

I think I prefer to simply emit a warning for fractional samples and discard them. Maybe we should just close this PR and I'll add support for --factor instead.

@jonhoo, are you good with this solution?

Properly handle fractional samples

7af2604

jonhoo approved these changes Feb 27, 2019

View reviewed changes

jasonrhansen closed this Feb 27, 2019

jonhoo mentioned this pull request Feb 27, 2019

Upstream flamegraph doesn't discard fractional samples #43

Closed

jasonrhansen deleted the fractional-samples branch March 2, 2019 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly handle fractional samples #64

Properly handle fractional samples #64

jasonrhansen commented Feb 26, 2019

codecov-io commented Feb 27, 2019 •

edited

Loading

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019 •

edited

Loading

Properly handle fractional samples #64

Properly handle fractional samples #64

Conversation

jasonrhansen commented Feb 26, 2019

codecov-io commented Feb 27, 2019 • edited Loading

Codecov Report

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019

jonhoo commented Feb 27, 2019

jasonrhansen commented Feb 27, 2019 • edited Loading

codecov-io commented Feb 27, 2019 •

edited

Loading

jasonrhansen commented Feb 27, 2019 •

edited

Loading