Histogram with two values causes failure #24

schneems · 2020-01-06T18:20:44Z

# fails
require 'stringio'
require 'unicode_plot'

values = [1, 2]

plot = UnicodePlot.histogram(values, title: "Histogram")
plot.render($stdout)

Gives me:

Traceback (most recent call last):
        7: from /Users/rschneeman/.rubies/ruby-2.7.0/bin/irb:23:in `<main>'
        6: from /Users/rschneeman/.rubies/ruby-2.7.0/bin/irb:23:in `load'
        5: from /Users/rschneeman/.rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/irb-1.2.1/exe/irb:11:in `<top (required)>'
        4: from (irb):5
        3: from /Users/rschneeman/.gem/ruby/2.7.0/gems/unicode_plot-0.0.3/lib/unicode_plot/histogram.rb:42:in `histogram'
        2: from /Users/rschneeman/.gem/ruby/2.7.0/gems/unicode_plot-0.0.3/lib/unicode_plot/barplot.rb:103:in `barplot'
        1: from /Users/rschneeman/.gem/ruby/2.7.0/gems/unicode_plot-0.0.3/lib/unicode_plot/barplot.rb:103:in `min'
ArgumentError (comparison of NilClass with 1 failed)

Ruby 2.7.0

It looks like when values get passed to BarChart the arguments include a nil:

[["\e[90m[\e[0m1.0\e[90m, \e[0m2.0\e[90m)\e[0m", "\e[90m[\e[0m2.0\e[90m, \e[0m3.0\e[90m)\e[0m", "\e[90m[\e[0m3.0\e[90m, \e[0m4.0\e[90m)\e[0m"], [1, nil, 1]]

Notice the nil in the debug output.

This doesn't appear to happen when rendering a histogram with 1 or 3 items.

Update: It looks like this only happens on specific input numbers, for example this works with no error:

# works 
require 'stringio'
require 'unicode_plot'

values = [0.017029, 0.018725]

plot = UnicodePlot.histogram(values, title: "Histogram")
plot.render($stdout)

I'm guessing the edge condition has something to do with the calculated bins.

The text was updated successfully, but these errors were encountered:

Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24

enumerable-statistics 2.0.0 has a bug of hitogram which is the cause of the issue #24.

mrkn · 2020-01-11T04:14:45Z

@schneems I released the version 0.0.4. Please use it.

$ irb -Ilib -runicode_plot --simple-prompt
>> UnicodePlot.histogram([1, 2], title: "Histogram").render
                              Histogram
              ┌                                        ┐
   [1.0, 1.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1
   [1.5, 2.0) ┤ 0
   [2.0, 2.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1
              └                                        ┘
                              Frequency
=> nil
>>

schneems · 2020-01-13T02:21:46Z

Thank you! -- Richard Schneeman https://www.schneems.com

Turns out that reducing a whole bunch of numbers to a single value (average for example, or median) means that we're getting rid of a huge amount of information. One way to add context back in without drowning users in raw data is to include a histogram in the output. With histograms in the output the user can see the distributions of their two runs to make better informed decisions about the validity of the data. Here's an example histogram: ``` Histogram ┌ ┐ [3.1, 3.2) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 23 [3.2, 3.3) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 14 [3.3, 3.4) ┤▇▇▇▇▇▇▇▇ 5 [3.4, 3.5) ┤▇▇▇▇▇ 3 [3.5, 3.6) ┤▇▇ 1 [3.6, 3.7) ┤▇▇▇▇▇▇▇▇ 5 [3.7, 3.8) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇ 8 [3.8, 3.9) ┤▇▇▇ 2 └ ┘ Frequency ``` Here's how the output looks in a test run: ``` 👎👎👎(NOT Statistically Significant) 👎👎👎 [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" - (0.0201745 seconds) SLOWER 🐢🐢🐢 by: 0.8833x [older/newer] -13.2063% [(older - newer) / older * 100] [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" - (0.017821 seconds) Iterations per sample: 10 Samples: 2 Test type: Kolmogorov Smirnov Confidence level: 95.0 % Is significant? (max > critical): false D critical: 1.7308183826022854 D max: 1.0 Histogram - [3054e1d584] "Merge pull request #36506 from kamipo/group_by_with_order_by_virtual_count_attribute" ┌ ┐ [0.015, 0.02 ) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.02 , 0.025) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Histogram - [80f989aece] "Remove duplicated attribute alias resolution in `_select!`" ┌ ┐ [0.017, 0.018) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 [0.018, 0.019) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1 └ ┘ Frequency Output: tmp/compare_branches/2020-01-06-12-23-1578335006-352084000/results.txt ``` > Note it's not that interesting with only two samples ## Caveats The histograms that are presented do not have identical/correct axis. So while the shape of the histograms are correct in comparison to each other, you cannot place them side by side for an accurate representation because they have different bin sizes and start from a different value. In the short term I do think that the visual data being present with this caveat is better than nothing. In the long term support would need to be added to `unicode_plot` to support this behavior. ## Blockers There is a bug that raises an error when 2 values are given to generate a histogram if the values have some special undetermined relationship. This is tracked here: red-data-tools/unicode_plot.rb#24

schneems mentioned this issue Jan 6, 2020

Add histogram to output zombocom/derailed_benchmarks#169

Merged

mrkn mentioned this issue Jan 11, 2020

Histogram may contain nil in weights mrkn/enumerable-statistics#21

Closed

mrkn added a commit that referenced this issue Jan 11, 2020

Add test for examining issue #24

cf5d32c

mrkn mentioned this issue Jan 11, 2020

UnicodePlot.histogram([1, 2]) shouldn't raise error #28

Merged

mrkn added a commit that referenced this issue Jan 11, 2020

Use enumerable-statistics >= 2.0.1

56349cf

enumerable-statistics 2.0.0 has a bug of hitogram which is the cause of the issue #24.

mrkn closed this as completed in #28 Jan 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram with two values causes failure #24

Histogram with two values causes failure #24

schneems commented Jan 6, 2020 •

edited

Loading

mrkn commented Jan 11, 2020

schneems commented Jan 13, 2020 via email

Histogram with two values causes failure #24

Histogram with two values causes failure #24

Comments

schneems commented Jan 6, 2020 • edited Loading

mrkn commented Jan 11, 2020

schneems commented Jan 13, 2020 via email

schneems commented Jan 6, 2020 •

edited

Loading