Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxplot percentiles for whiskers #10357

Closed
mcspritz opened this issue Jan 31, 2018 · 8 comments
Closed

Boxplot percentiles for whiskers #10357

mcspritz opened this issue Jan 31, 2018 · 8 comments
Labels
API: consistency Documentation status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. status: inactive Marked by the “Stale” Github Action

Comments

@mcspritz
Copy link

Bug report

I'm plotting a boxplot with the data attached and found that when setting the whiskers to 5th and 95th percentiles their are different from the numpy calculation of the same percentiles.

Code for reproduction

from numpy import percentile
import matplotlib.pyplot as plt

d = [166.007971541528, 839.535219535014, 140.828458698247, 171.769452405593, 169.714353600027, 163.837703911465, 153.0957460413, 188.213003041855, 150.988202687771]

fig, ax = plt.subplots()
bp = ax.boxplot(d, vert=False, showmeans=False, whis=[5, 95])

p = {
    5: percentile(d, 5),
    95: percentile(d, 95),
}

for i, t in enumerate(bp['whiskers']):
    print(i, t.get_xdata())
print(p)

ax.grid(ls='dotted')
ax.set_xscale('log')

fig.savefig(
    'boxplot.png',
    format='png',
)

Actual outcome

0 [49.24244394 46.96971576]
1 [ 92.78568024 107.40745037]
{5: 45.01011901414854, 95: 300.2964164148635}

Expected outcome

Matplotlib version

  • Operating system: linux
  • Matplotlib version: 2.1.2
  • Matplotlib backend (print(matplotlib.get_backend())): Qt5Agg
  • Python version: 3.6.4
  • Other libraries: numpy 1.14.0

Installed using manjaro (distro linux) package manager.

@jklymak
Copy link
Member

jklymak commented Jan 31, 2018

Ping @phobson.

Oddly, I get different results than you. Can you check your actual outcome? I get:

0 [ 153.09574604  150.98820269]
1 [ 171.76945241  188.21300304]
{5: 144.89235629405661, 95: 579.00633293775013}

This says to me that Matplotlib is using the data for the percentiles (i.e. using the last value less than the 95th percentile) whereas numpy is interpolating to where the 95th percentile would be. I don't quite get what algorithm they are using for that, I assume just linear interpolation?

@jklymak jklymak added the status: needs clarification Issues that need more information to resolve. label Jan 31, 2018
@mcspritz
Copy link
Author

You are right, I get the same

0 [153.09574604 150.98820269]
1 [171.76945241 188.21300304]
{5: 144.8923562940566, 95: 579.0063329377501}

I might have changed the set of values before copying them here.

Apologies.

@jklymak jklymak added API: consistency and removed status: needs clarification Issues that need more information to resolve. labels Jan 31, 2018
@jklymak
Copy link
Member

jklymak commented Jan 31, 2018

I'll mark as API consistency, but its really consistency with numpy and/or a documentation issue (where we need to be explicit that we are not consistent w/ numpy). But I don't commonly use whisker plots, so I won't personally comment on what is the "right" thing to be doing here...

@phobson
Copy link
Member

phobson commented Jan 31, 2018

This is a nuance of boxplots themselves. That nuance is that you don't show any values that you don't actually have. When you provide percentiles as as the whis parameter, we compute the high/low values with numpy:

loval = np.percentile(x, whis[0])
hival = np.percentile(x, whis[1])

But then we go through the same compression process to find the move extreme data point within those ranges, e.g.,

        # get high extreme
        wiskhi = np.compress(x <= hival, x)
        if len(wiskhi) == 0 or np.max(wiskhi) < q3:
            stats['whishi'] = q3
        else:
            stats['whishi'] = np.max(wiskhi)

The point of all that is that the fences always represent an actual value in the dataset

I'd be willing to entertain the idea that we should special case percentile-based whiskers.

@timhoffm
Copy link
Member

timhoffm commented Feb 3, 2018

I suspect, that compression should not be used when using percentiles.

Does someone have experience with different whisker usages across fields? Or alternatively, does someone has access to
doi:10.2307/2683468
doi:10.2307/2685173

@afvincent
Copy link
Contributor

@timhoffm Well, something like Sci-Hub might be an option? I am likely to have access to those papers but I guess that it would pose some copyright issues posting them directly here (or even on the devel mailing-list) :/...

@story645
Copy link
Member

story645 commented Feb 8, 2018

@timhoffm I have those papers. How do I share them?

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Apr 30, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2023
@rcomer rcomer added the status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. label May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API: consistency Documentation status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. status: inactive Marked by the “Stale” Github Action
Projects
None yet
Development

No branches or pull requests

7 participants