Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Proof of concept - Shared Edge histogram #23
Sorry for the messy PR, this isn't intended to be merged as is, but I wanted to instead make a proof of concept, explain my problem, and then start a dialog about how we can address that problem.
I want to use histograms to how two different distributions compare to one another schneems/derailed_benchmarks#169 however, if I generate a histogram for each, the edges aren't aligned so it's impossible to tell at a glance which distribution is "better".
This Solution Implementation
I had the idea that we could align the edges of the two different distributions so that they can be easily graphed and compared. That's what this PR does. It generates an edge list for two different distributions, then it turns them into one edge list. To do this I averaged the bin size for each edge list and then I built another edge list going from the smallest edge value to the largest edge value using the averaged bin size.
I want to thank you a ton for creating an maintaining this library. I would like your feedback on this PR. Please let me know if anything is not clear or if you have any questions.
I made my own histogram library https://github.com/zombocom/mini_histogram. It's slower, but not that much slower
There might be a faster way to count weights than you're currently doing https://github.com/zombocom/mini_histogram/blob/b7d5804fe207910d43e1342f97e95be2ff0c4356/lib/mini_histogram.rb#L80-L89.
If you know step size, and the lowest value then you can calculate the index of the bin directly without having to do any search
array.each do |x| index = ((x - edge_lo) / step).floor @weights[index] += 1 end
This PR isn't really needed anymore for my purposes. Feel free to close, or reply back.