Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram bin width does not match actual bin range #246

Closed
SamuelLKane opened this issue Sep 1, 2020 · 6 comments
Closed

Histogram bin width does not match actual bin range #246

SamuelLKane opened this issue Sep 1, 2020 · 6 comments

Comments

@SamuelLKane
Copy link

Hi all! I'm trying to use ChartFX's Histogram to plot pre-binned data and noticed a discrepancy that is echoed in the sample code (HistogramSample.java).

What are the issues:

  1. The edges of the bin do not reflect the actual range of the bin
  2. The data point is not always the center of the bin, instead it can be another point within the bin

Examples

image
This image shows the green dataset3 from HistogramSample.java which has fixed bins going from 3-4,4-5, and 5-10 (among others). In this case all the visible data points are correctly centered on the bins but the edges of the bins are incorrectly placed (this has the effect of making it look like the data points are incorrectly placed). This is most obvious with the left edge of the bin going from 5-10 which begins at 6, not 5.

image
This image is from our own implementation using pre-binned data and shows the data points which are placed above the lower bound of a bin, resulting in a histogram that is off by 1/2 a bar. The visible bins are -0.5-10.3, 10.3-21.1, 21.1-31.9, 31.9-42.6.

@RalphSteinhagen
Copy link
Member

@SamuelLKane do you use by chance non-equidistant bins (see also discussions in issue #80)?

@SamuelLKane
Copy link
Author

@RalphSteinhagen Thanks for the reference. I've poured over issue 80 and confirmed we're using equidistant bins (all are 10.8 in size). We are using the new Histogram(title, numBins, min, max, horizontal) constructor that should create equidistant bins regardless and I confirmed it was via .isEquiDistant(). I swapped to the new Histogram(title, double[]) and was able to get a correctly binned and rendered histogram. I don't know if it's relevant but we also loose the first bin's data in the render when using the numBins, min, max constructor (it only displays bins 1 - numBins as if they were one bin less; bin 1's height in bin 0's place, bin 2's height in bin 1's place, etc.)

wirew0rm added a commit that referenced this issue Sep 4, 2020
Improves documentation for Histogram, explaining overflow bins and bin
indexing.

Improve the constructors of Histogram to allow the resulting Histogram
to either have the outer bounds or the bin centers of the first and last
bin on the given min/max value. previously, the values where shifted
from the given range by one half of the bin size.

adresses #246
@wirew0rm
Copy link
Member

wirew0rm commented Sep 4, 2020

This is probably a classic of by 1 error, the convention for the Histogram class is that bin 0 and bin nBins+1 contain the under-/overflow data, so the actual data bins start at index 1.

This was only documented in Histogram#getBinContent(int)I've added a note to the interface itself.
I hope this solves your issue, I've tried with a very simple Histogram and all bins where rendered correctly.

I also addressed the issue of the left shift, which was also present in our implementation, it is now possible to choose whether the given range should be interpreted as center to center or outer bound to outer bound of the first and last bin.

wirew0rm added a commit that referenced this issue Sep 7, 2020
Improves documentation for Histogram, explaining overflow bins and bin
indexing.

Improve the constructors of Histogram to allow the resulting Histogram
to either have the outer bounds or the bin centers of the first and last
bin on the given min/max value. previously, the values where shifted
from the given range by one half of the bin size.

adresses #246
RalphSteinhagen pushed a commit that referenced this issue Sep 7, 2020
Improves documentation for Histogram, explaining overflow bins and bin
indexing.

Improve the constructors of Histogram to allow the resulting Histogram
to either have the outer bounds or the bin centers of the first and last
bin on the given min/max value. previously, the values where shifted
from the given range by one half of the bin size.

adresses #246
RalphSteinhagen pushed a commit that referenced this issue Sep 10, 2020
Improves documentation for Histogram, explaining overflow bins and bin
indexing.

Improve the constructors of Histogram to allow the resulting Histogram
to either have the outer bounds or the bin centers of the first and last
bin on the given min/max value. previously, the values where shifted
from the given range by one half of the bin size.

adresses #246
@RalphSteinhagen
Copy link
Member

@SamuelLKane we updated the Histogram description and @wirew0rm kindly added the HistogramBasicSample to clarify and illustrate the functionality:

https://github.com/GSI-CS-CO/chart-fx/blob/1f94103f57e1a7e4cdfb97816ea34bac9e8cc206/chartfx-dataset/src/main/java/de/gsi/dataset/Histogram.java#L3-L14

Does this answer your question/issue?

Thanks again for posting this, this helped in clarifying/improving our interface documentation and samples! 👍

@SamuelLKane
Copy link
Author

Thanks for being so helpful! The clarity you provided is super helpful as is the sample. We found a work around but as soon as we do another pull and build of your repo I'll let y'all know if I find any more related bugs.

@RalphSteinhagen
Copy link
Member

RalphSteinhagen commented Sep 11, 2020

Great! This API convention is coming originally from ROOT, which in turn got it from PAW which in turn got it from a hermit in the swiss alps who claims that this has been passed over eons through word-of-mouth by his ancestors which our little tribe takes for granted now, while other more modern people moved to more modern concepts. ;-)

Let us know if you find things or want to contribute in that area. We are aware that dealing with histograms is still a bit sparse ...

I will close this issue if you are OK with it. Feel free to re-open in case your issue hasn't been solved by this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants