-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram with two values causes failure #24
Comments
schneems
added a commit
to zombocom/derailed_benchmarks
that referenced
this issue
Jan 6, 2020
Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24
schneems
added a commit
to zombocom/derailed_benchmarks
that referenced
this issue
Jan 6, 2020
Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24
mrkn
added a commit
that referenced
this issue
Jan 11, 2020
mrkn
added a commit
that referenced
this issue
Jan 11, 2020
enumerable-statistics 2.0.0 has a bug of hitogram which is the cause of the issue #24.
@schneems I released the version 0.0.4. Please use it.
|
Thank you!
--
Richard Schneeman
https://www.schneems.com
|
schneems
added a commit
to zombocom/derailed_benchmarks
that referenced
this issue
Jan 14, 2020
Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24
schneems
added a commit
to zombocom/derailed_benchmarks
that referenced
this issue
Jan 14, 2020
Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24
schneems
added a commit
to zombocom/derailed_benchmarks
that referenced
this issue
Mar 24, 2020
Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Gives me:
Ruby 2.7.0
It looks like when values get passed to BarChart the arguments include a
nil
:This doesn't appear to happen when rendering a histogram with 1 or 3 items.
Update: It looks like this only happens on specific input numbers, for example this works with no error:
I'm guessing the edge condition has something to do with the calculated bins.
The text was updated successfully, but these errors were encountered: