Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoxChart goes below 0 although min value is > 0 #410

Closed
beatngu13 opened this issue Jan 24, 2020 · 8 comments
Closed

BoxChart goes below 0 although min value is > 0 #410

beatngu13 opened this issue Jan 24, 2020 · 8 comments

Comments

@beatngu13
Copy link

beatngu13 commented Jan 24, 2020

MWE:

final BoxChart chart = new BoxChartBuilder().build();
chart.addSeries("seriesName", List.of(1_000, 5_000, 60_000));
new SwingWrapper(chart).displayChart();

Creates the following chart:

Bildschirmfoto 2020-01-24 um 16 05 36

As can be seen in the code example, the min value is 1,000, but the boxplot goes below 0.

Is this a config issue or a bug?

@Mr14huashao
Copy link
Collaborator

@beatngu13
The calculation formula is as follows,and you can see readme.md
Q1 = 1000
Q2 = 5000
Q3 = 60000
Upper limit = Q3 + 1.5 * IQR = Q3 + 1.5 *(Q3 - Q1)
Lower limit = Q1 - 1.5 * IQR = Q1 - 1.5 *(Q3 -Q1) = -81500
so the boxplot goes below 0

@beatngu13
Copy link
Author

beatngu13 commented Feb 17, 2020

Hey @Mr14huashao,

Thanks for pointing to the underlying formula. I already had a quick look at it, but is the calculation correct this way?

AFAIK boxplots are based on the five-number summary, why should the lower limit be less than the minimum?

If I go to e.g. this online calculator and use the example dataset (1000, 5000, 60000), I get:

  • Minimum: 1000
  • Quartile Q1: 3000
  • Quartile Q2 (median): 5000
  • Quartile Q3: 32500
  • Maximum: 60000

Then the lower fence (not limit?) should be:

Lower fence = Q1 - 1.5 * IQR
            = Q1 - 1.5 * (Q3 - Q1)
            = 3000 - 1.5 * (32500 - 3000)
            = 3000 - 1.5 * 29500
            = -41250

Since there are no outliers, the lower and upper limits should correspond to the minimum and maximum values I would say.

Also, if you use other online chart makers, you get different results. For example, plot.ly:

Or BoxPlotR:

@Mr14huashao
Copy link
Collaborator

Mr14huashao commented Feb 18, 2020

@beatngu13 can you spare me a few minutes to see this issue?

@beatngu13
Copy link
Author

@Mr14huashao sure, how can I help?

@Mr14huashao
Copy link
Collaborator

@beatngu13
I have found the problem and I will fix it.

@timmolter
Copy link
Member

@Mr14huashao Thanks!

BTW, the original inspiration for this was MATLAB: https://www.mathworks.com/help/stats/boxplot.html

@Mr14huashao
Copy link
Collaborator

@timmolter
To calculate the quartile, there are several ways to calculate the position of Qi:

  • “n + 1”: determine the position of the quartile, where Qi is = i (n + 1) / 4, where i = 1, 2, and 3. n represents the number of items contained in the sequence.
    Calculate the corresponding quartile based on location
  • “n-1”: Determine the position of the quartile, where Qi is = i (n-1) / 4, where i = 1, 2, and 3. n represents the number of items contained in the sequence.
    Calculate the corresponding quartile based on location
  • “np”: Determine the position of the quartile, where Qi is np = (i * n) / 4, where i = 1, 2, and 3. n represents the number of items contained in the sequence.
    If np is not an integer, Qi = X [np + 1]
    If np is an integer, Qi = (X [np] + X [np + 1]) / 2
  • “(n-1) / 4 +1”: Determine the position of the quartile, where Qi is = i (n-1) / 4 + 1, where i = 1, 2, 3. n represents the number of items contained in the sequence.
    Calculate the corresponding quartile based on location

Example:
An example of a set of sequence numbers: 12, 15, 17, 19, 20, 23, 25, 28, 30, 33, 34, 35, 36, 37

  • Method 1:
    Q1's position = (14 + 1) /4=3.75,
    Q1 = 0.25 × third term + 0.75 × fourth term = 0.25 × 17 + 0.75 × 19 = 18.5;
  • Method 2:
    Q1's location = (14-1) /4=3.25,
    Q1 = 0.75 × third term + 0.25 × fourth term = 0.75 × 17 + 0.25 × 19 = 17.5;
  • Method 3:
    Q1's position = 14 * 0.25 = 3.5,
    Q1 = 19;
  • Method 4:
    Q1's location = (14-1) / 4 + 1 = 4.25
    Q1 = 0.75 × the fourth term + 0.25 × the fifth term = 0.75 × 19 + 0.25 × 20 = 19.25.

Are all four calculation methods implemented?

timmolter added a commit that referenced this issue Feb 28, 2020
[resolves #410] Box plot feature fixes and code optimizations
@beatngu13
Copy link
Author

Thanks for the quick fix! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants