Boxplot stats w/ equal quartiles #5343

Merged
merged 15 commits into from Mar 14, 2016

Conversation

Projects
None yet
6 participants
Member

phobson commented Oct 29, 2015

See #5331

Addresses the concern raised in the issue above and cleans up the docstring.

Previous behavior available through the autorange kwarg.

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
@@ -1867,39 +1867,41 @@ def delete_masked_points(*args):
return margs
-def boxplot_stats(X, whis=1.5, bootstrap=None, labels=None):
- '''
- Returns list of dictionaries of staticists to be use to draw a series of
- box and whisker plots. See the `Returns` section below to the required
- keys of the dictionary. Users can skip this function and pass a user-
- defined set of dictionaries to the new `axes.bxp` method instead of
- relying on MPL to do the calcs.
+def boxplot_stats(X, whis=1.5, autorange=False, bootstrap=None,
+ labels=None):
+ """
+ Returns list of dictionaries of staticists to be use to draw a
@mdboom

mdboom Oct 29, 2015

Owner

sp. statistics

"to be use" -> "to use"

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
@@ -1867,39 +1867,41 @@ def delete_masked_points(*args):
return margs
-def boxplot_stats(X, whis=1.5, bootstrap=None, labels=None):
- '''
- Returns list of dictionaries of staticists to be use to draw a series of
- box and whisker plots. See the `Returns` section below to the required
- keys of the dictionary. Users can skip this function and pass a user-
- defined set of dictionaries to the new `axes.bxp` method instead of
- relying on MPL to do the calcs.
+def boxplot_stats(X, whis=1.5, autorange=False, bootstrap=None,
+ labels=None):
+ """
+ Returns list of dictionaries of staticists to be use to draw a
+ series of box and whisker plots. See the `Returns` section below to
+ the required keys of the dictionary. Users can skip this function
+ and pass a user-defined set of dictionaries to the new `axes.bxp`
+ method instead of relying on MPL to do the calcs.
@mdboom

mdboom Oct 29, 2015

Owner

calcs -> calculations

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
- `whis` will be automatically set to 'range'
-
- bootstrap : int or None (default)
- Number of times the confidence intervals around the median should
- be bootstrapped (percentile method).
-
- labels : sequence
- Labels for each dataset. Length must be compatible with dimensions
- of `X`
+ As a float, determines the reach of the whiskers past the first
+ and third quartiles (e.g., Q3 + whis*IQR, QR = interquartile
+ range, Q3-Q1). Beyond the whiskers, data are considered outliers
+ and are plotted as individual points. This can be set this to an
+ ascending sequence of percentile (e.g., [5, 95]) to set the
+ whiskers at specific percentiles of the data. Finally, `whis`
+ can be the string 'range' to force the whiskers to the min and
@mdboom

mdboom Oct 29, 2015

Owner
``'range'``
@mdboom

mdboom Oct 29, 2015

Owner

min -> minimum
max -> maximum

@mdboom mdboom and 1 other commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
- Number of times the confidence intervals around the median should
- be bootstrapped (percentile method).
-
- labels : sequence
- Labels for each dataset. Length must be compatible with dimensions
- of `X`
+ As a float, determines the reach of the whiskers past the first
+ and third quartiles (e.g., Q3 + whis*IQR, QR = interquartile
+ range, Q3-Q1). Beyond the whiskers, data are considered outliers
+ and are plotted as individual points. This can be set this to an
+ ascending sequence of percentile (e.g., [5, 95]) to set the
+ whiskers at specific percentiles of the data. Finally, `whis`
+ can be the string 'range' to force the whiskers to the min and
+ max of the data. In the edge case that the 25th and 75th
+ percentiles are equivalent, `whis` can be automatically set to
+ 'range' via the ``autorange`` option.
@mdboom

mdboom Oct 29, 2015

Owner

Double ticks around 'range'

Single ticks around autorange

@phobson

phobson Oct 29, 2015

Member

so that's

def boxplots_stats(...):
""" ...
``'range'`` via the `autorange` option.
...
... """

?

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
- be bootstrapped (percentile method).
-
- labels : sequence
- Labels for each dataset. Length must be compatible with dimensions
- of `X`
+ As a float, determines the reach of the whiskers past the first
+ and third quartiles (e.g., Q3 + whis*IQR, QR = interquartile
+ range, Q3-Q1). Beyond the whiskers, data are considered outliers
+ and are plotted as individual points. This can be set this to an
+ ascending sequence of percentile (e.g., [5, 95]) to set the
+ whiskers at specific percentiles of the data. Finally, `whis`
+ can be the string 'range' to force the whiskers to the min and
+ max of the data. In the edge case that the 25th and 75th
+ percentiles are equivalent, `whis` can be automatically set to
+ 'range' via the ``autorange`` option.
+ autorange : bool (default = False)
@mdboom

mdboom Oct 29, 2015

Owner

bool, optional (False)

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
-
- labels : sequence
- Labels for each dataset. Length must be compatible with dimensions
- of `X`
+ As a float, determines the reach of the whiskers past the first
+ and third quartiles (e.g., Q3 + whis*IQR, QR = interquartile
+ range, Q3-Q1). Beyond the whiskers, data are considered outliers
+ and are plotted as individual points. This can be set this to an
+ ascending sequence of percentile (e.g., [5, 95]) to set the
+ whiskers at specific percentiles of the data. Finally, `whis`
+ can be the string 'range' to force the whiskers to the min and
+ max of the data. In the edge case that the 25th and 75th
+ percentiles are equivalent, `whis` can be automatically set to
+ 'range' via the ``autorange`` option.
+ autorange : bool (default = False)
+ When True and the data are distributed such that the 25th and
@mdboom

mdboom Oct 29, 2015

Owner

"When True "

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
- labels : sequence
- Labels for each dataset. Length must be compatible with dimensions
- of `X`
+ As a float, determines the reach of the whiskers past the first
+ and third quartiles (e.g., Q3 + whis*IQR, QR = interquartile
+ range, Q3-Q1). Beyond the whiskers, data are considered outliers
+ and are plotted as individual points. This can be set this to an
+ ascending sequence of percentile (e.g., [5, 95]) to set the
+ whiskers at specific percentiles of the data. Finally, `whis`
+ can be the string 'range' to force the whiskers to the min and
+ max of the data. In the edge case that the 25th and 75th
+ percentiles are equivalent, `whis` can be automatically set to
+ 'range' via the ``autorange`` option.
+ autorange : bool (default = False)
+ When True and the data are distributed such that the 25th and
+ 75th percentiles are equal, ``whis`` is set to "range" such that
@mdboom

mdboom Oct 29, 2015

Owner
``'range'``

@mdboom mdboom commented on an outdated diff Oct 29, 2015

lib/matplotlib/cbook.py
- Labels for each dataset. Length must be compatible with dimensions
- of `X`
+ As a float, determines the reach of the whiskers past the first
+ and third quartiles (e.g., Q3 + whis*IQR, QR = interquartile
+ range, Q3-Q1). Beyond the whiskers, data are considered outliers
+ and are plotted as individual points. This can be set this to an
+ ascending sequence of percentile (e.g., [5, 95]) to set the
+ whiskers at specific percentiles of the data. Finally, `whis`
+ can be the string 'range' to force the whiskers to the min and
+ max of the data. In the edge case that the 25th and 75th
+ percentiles are equivalent, `whis` can be automatically set to
+ 'range' via the ``autorange`` option.
+ autorange : bool (default = False)
+ When True and the data are distributed such that the 25th and
+ 75th percentiles are equal, ``whis`` is set to "range" such that
+ the whisker ends are at the min and max of the data.
@mdboom

mdboom Oct 29, 2015

Owner

min -> minimum
max -> maximum

Owner

mdboom commented Oct 29, 2015

Cool. Much improved. 👍

Member

phobson commented Oct 29, 2015

@mdboom thanks for the comments. I think I got them all as they came in 😄

...aaaaaand here's the part where I pitch the idea that we add in an option to pass your own bootstrapper.

Something like:

def boxplots_stats(..., bootstrap_fxn=None):
    if bootstrap_fxn is None:
        bootstrap_fxn = _bootstrap_median

    # ...
    def _compute_conf_interval(data, med, iqr, bootstrap):
        if bootstrap is not None:
            # Do a bootstrap estimate of notch locations.
            # get conf. intervals around median
            CI = bootstrap_fxn(data, N=bootstrap)
            notch_min = CI[0]
            notch_max = CI[1]
        else:
            N = len(data)
            notch_min = med - 1.57 * iqr / np.sqrt(N)
            notch_max = med + 1.57 * iqr / np.sqrt(N)

    # yada yada 
Owner

tacaswell commented Oct 29, 2015

I am starting to think that box plots are a complicated enough topic that they should be spun off into a sub-project (which is allowed to do things like require pandas and scipy). Probably take violin plots with them too.

Member

QuLogic commented Oct 29, 2015

Isn't that essentially seaborn?

Member

WeatherGod commented Oct 29, 2015

We would also still want basic boxplot/violinplot functionality for those
who do not need to delve into pandas or seaborn.

On Wed, Oct 28, 2015 at 10:28 PM, Elliott Sales de Andrade <
notifications@github.com> wrote:

Isn't that essentially seaborn?


Reply to this email directly or view it on GitHub
#5343 (comment)
.

Member

phobson commented Nov 20, 2015

rebased with current master (branch had gotten stale)

@tacaswell tacaswell commented on an outdated diff Nov 24, 2015

lib/matplotlib/axes/_axes.py
If the function should adjust the xlim and xtick locations.
+ autorange : bool, optional (False)
+ When `True` and the data are distributed such that the 25th and
+ 75th percentiles are equal, ``whis`` is set to ``'range'`` such
+ that the whisker ends are at the minimum and maximum of the
+ data.
+ meanline : bool, optional (False)
@tacaswell

tacaswell Nov 24, 2015

Owner

the meanline docs should move back up to be in the right order to match the signature.

@tacaswell

tacaswell Nov 24, 2015

Owner

eh, maybe not, I can go either way on this on further consideration.

Owner

tacaswell commented Jan 26, 2016

tagged this as a bug fix, but I am not sure if that is the correct tag for this.

Member

phobson commented Jan 26, 2016

@tacaswell should I rebase on 1.5.X or 2?

Owner

tacaswell commented Jan 26, 2016

on to master then we will back-port the merge to where ever we decide this will go.

Member

QuLogic commented Jan 26, 2016

It looks like you committed the conflicts in the SVG files:

.../result_images/test_axes/boxplot_autorange_false_whiskers-expected.svg:470: parser error : StartTag: invalid element name
<<<<<<< HEAD
    ^
Member

phobson commented Jan 26, 2016

gah -- thanks for the heads up. @QuLogic

Member

phobson commented Jan 26, 2016

Any objections to removing the PDF/SVG files for this test entirely?

Owner

tacaswell commented Jan 27, 2016

No objection from me, makes the tests go faster :)

Member

phobson commented Jan 27, 2016

unrelated failure in py3.4

======================================================================
FAIL: matplotlib.tests.test_mathtext.test_mathfont_dejavuserif_02.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/matplotlib/matplotlib/venv/lib/python3.4/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/testing/decorators.py", line 55, in failer
    result = f(*args, **kwargs)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/testing/decorators.py", line 259, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /home/travis/build/matplotlib/matplotlib/result_images/test_mathtext/mathfont_dejavuserif_02.png vs. /home/travis/build/matplotlib/matplotlib/result_images/test_mathtext/mathfont_dejavuserif_02-expected.png (RMS 29.168)
----------------------------------------------------------------------

phobson closed this Jan 27, 2016

phobson reopened this Jan 27, 2016

@mdboom mdboom added needs_review and removed needs_review labels Jan 27, 2016

Member

phobson commented Feb 18, 2016

... another rebase with current master

@QuLogic QuLogic commented on an outdated diff Feb 18, 2016

lib/matplotlib/axes/_axes.py
If the function should adjust the xlim and xtick locations.
+ autorange : bool, optional (False)
+ When `True` and the data are distributed such that the 25th and
+ 75th percentiles are equal, ``whis`` is set to ``'range'`` such
+ that the whisker ends are at the minimum and maximum of the
+ data.
+ meanline : bool, optional (False)
+ If `True` (and ``showmeans`` is `True`), will try to render
+ the mean as a line spanning the full width of the box
+ according to ``meanprops`` (see below). Not recommended if
+ ``shownotches`` is also True. Otherwise, means will be shown
+ as points.
+
+ Additional Options
+ ---------------------
+ The following boolean options toogle the drawing of individual
@QuLogic

QuLogic Feb 18, 2016

Member

toogle -> toggle

@QuLogic QuLogic commented on an outdated diff Feb 18, 2016

lib/matplotlib/axes/_axes.py
+ data.
+ meanline : bool, optional (False)
+ If `True` (and ``showmeans`` is `True`), will try to render
+ the mean as a line spanning the full width of the box
+ according to ``meanprops`` (see below). Not recommended if
+ ``shownotches`` is also True. Otherwise, means will be shown
+ as points.
+
+ Additional Options
+ ---------------------
+ The following boolean options toogle the drawing of individual
+ components of the boxplots:
+ - showcaps: the caps on the ends of whiskers
+ (default is True)
+ - showbox: the central box (default is True)
+ - showfliers: the outlierd beyone the caps (default is True)
@QuLogic

QuLogic Feb 18, 2016

Member

outlierd -> outliers
beyone -> beyond

@QuLogic QuLogic commented on an outdated diff Feb 18, 2016

lib/matplotlib/cbook.py
@@ -1760,39 +1760,42 @@ def delete_masked_points(*args):
return margs
-def boxplot_stats(X, whis=1.5, bootstrap=None, labels=None):
- '''
- Returns list of dictionaries of staticists to be use to draw a series of
- box and whisker plots. See the `Returns` section below to the required
- keys of the dictionary. Users can skip this function and pass a user-
- defined set of dictionaries to the new `axes.bxp` method instead of
- relying on MPL to do the calcs.
+def boxplot_stats(X, whis=1.5, autorange=False, bootstrap=None,
+ labels=None):
+ """
+ Returns list of dictionaries of statistics used to draw a series
+ of box and whisker plots. See the `Returns` section below to the
@QuLogic

QuLogic Feb 18, 2016

Member

to -> for (I think; the sentence is unclear.)

@QuLogic QuLogic commented on an outdated diff Feb 18, 2016

lib/matplotlib/axes/_axes.py
+ that the 25th and 75th percentiles are equivalent, *whis*
+ will be automatically set to ``'range'``.
+ bootstrap : int, optional
+ Specifies whether to bootstrap the confidence intervals
+ around the median for notched boxplots. If bootstrap==None,
+ no bootstrapping is performed, and notches are calculated
+ using a Gaussian-based asymptotic approximation (see McGill,
+ R., Tukey, J.W., and Larsen, W.A., 1978, and Kendall and
+ Stuart, 1967). Otherwise, bootstrap specifies the number of
+ times to bootstrap the median to determine its 95%
+ confidence intervals. Values between 1000 and 10000 are
+ recommended.
+ usermedians : array-like, optional
+ An array or sequence whose first dimension (or length) is
+ compatible with ``x``. This overrides the medians computed
+ by matplotlib for each element of *usermedians* that is not
@QuLogic

QuLogic Feb 18, 2016

Member

Missed changing usermedians from asterisks to backticks (and on line below.)

@QuLogic QuLogic commented on an outdated diff Feb 18, 2016

lib/matplotlib/axes/_axes.py
+ will be automatically set to ``'range'``.
+ bootstrap : int, optional
+ Specifies whether to bootstrap the confidence intervals
+ around the median for notched boxplots. If bootstrap==None,
+ no bootstrapping is performed, and notches are calculated
+ using a Gaussian-based asymptotic approximation (see McGill,
+ R., Tukey, J.W., and Larsen, W.A., 1978, and Kendall and
+ Stuart, 1967). Otherwise, bootstrap specifies the number of
+ times to bootstrap the median to determine its 95%
+ confidence intervals. Values between 1000 and 10000 are
+ recommended.
+ usermedians : array-like, optional
+ An array or sequence whose first dimension (or length) is
+ compatible with ``x``. This overrides the medians computed
+ by matplotlib for each element of *usermedians* that is not
+ `None`. When an element of *usermedians* == None, the median
@QuLogic

QuLogic Feb 18, 2016

Member

== -> is

@QuLogic QuLogic commented on an outdated diff Feb 18, 2016

lib/matplotlib/axes/_axes.py
+ using a Gaussian-based asymptotic approximation (see McGill,
+ R., Tukey, J.W., and Larsen, W.A., 1978, and Kendall and
+ Stuart, 1967). Otherwise, bootstrap specifies the number of
+ times to bootstrap the median to determine its 95%
+ confidence intervals. Values between 1000 and 10000 are
+ recommended.
+ usermedians : array-like, optional
+ An array or sequence whose first dimension (or length) is
+ compatible with ``x``. This overrides the medians computed
+ by matplotlib for each element of *usermedians* that is not
+ `None`. When an element of *usermedians* == None, the median
+ will be computed by matplotlib as normal.
+ conf_intervals : array-like, optional
+ Array or sequence whose first dimension (or length) is
+ compatible with ``x`` and whose second dimension is 2. When
+ the current element of ``conf_intervals`` is not None, the
@QuLogic

QuLogic Feb 18, 2016

Member

What does "current" mean here? I'm guessing "corresponding to the same location in x" is the meaning intended.

@phobson phobson TST: update boxplot-autorange test images
e4634bc
Member

phobson commented Feb 18, 2016

Bump -- just gave this another rebase to keep it current with master

@phobson phobson DOC: boxplot typos and clarification
14192ca
Member

phobson commented Feb 18, 2016

Build failures are unrelated. Something's wacky with colorbars, e.g.,
FAIL: matplotlib.tests.test_colorbar.test_colorbar_closed_patch.test

Member

WeatherGod commented Feb 18, 2016

Yes, this came about due to a merge of a different PR last night, I think.

On Thu, Feb 18, 2016 at 12:16 PM, Paul Hobson notifications@github.com
wrote:

Build failures are unrelated. Something's wacky with colorbars, e.g.,
FAIL: matplotlib.tests.test_colorbar.test_colorbar_closed_patch.test


Reply to this email directly or view it on GitHub
#5343 (comment)
.

tacaswell closed this Feb 18, 2016

tacaswell reopened this Feb 18, 2016

@tacaswell tacaswell added needs_review and removed needs_review labels Feb 18, 2016

Owner

tacaswell commented Feb 18, 2016

Sorry, I broke all the branches. Merged a PR that passed before we put the zero-tolerance in. There were some regions of those images where the 8bit blue value changed by 1

@phobson phobson DOC: add whitespace around params
c887ad7
Owner

jenshnielsen commented Feb 19, 2016

The failure on appveyor is

FAIL: Github issue #1256 identified a bug in Line.draw method 
---------------------------------------------------------------------- 
Traceback (most recent call last):
  File "C:\conda\envs\test-environment\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "c:\projects\matplotlib\lib\matplotlib\testing\decorators.py", line 152, in wrapped_callable
    func(*args, **kwargs)
  File "c:\projects\matplotlib\lib\matplotlib\tests\test_lines.py", line 61, in test_invisible_Line_rendering
    assert_true(slowdown_factor < slowdown_threshold)
AssertionError: False is not true 

which is known to be flaky

Member

phobson commented Mar 8, 2016

Bump -- give me a shout if y'all want any changes made to this.

@tacaswell tacaswell commented on an outdated diff Mar 9, 2016

lib/matplotlib/cbook.py
@@ -1760,39 +1760,42 @@ def delete_masked_points(*args):
return margs
-def boxplot_stats(X, whis=1.5, bootstrap=None, labels=None):
- '''
- Returns list of dictionaries of staticists to be use to draw a series of
- box and whisker plots. See the `Returns` section below to the required
- keys of the dictionary. Users can skip this function and pass a user-
- defined set of dictionaries to the new `axes.bxp` method instead of
- relying on MPL to do the calcs.
+def boxplot_stats(X, whis=1.5, autorange=False, bootstrap=None,
@tacaswell

tacaswell Mar 9, 2016

Owner

Is this considered a public function? If so the new arg needs to go last.

Owner

tacaswell commented Mar 9, 2016

Read through this, other than my one comment 👍.

The docstrings are much better.

Can this get a note in api_changes? To check my understanding:

  • this is a minor API change (ex the plot will be different for same inputs) but
    1. only in semi-pathological edge case
    2. the change will make edge case more consistent with normal case
    3. the way it was is considered confusing by those skilled in the art of boxplots
    4. there is a way to get the old plots back
Member

phobson commented Mar 13, 2016

@tacaswell your understanding matches mine.

and to make sure I'm clear -- I don't modify api_changes.rstdirectly, I just add a short, single file: api_changes/2016-03-12-PMH.rst?

Owner

tacaswell commented Mar 13, 2016

Yes. The individual files helps prevent rebase-due-to-doc-conflicts
thrashing.

On Sun, Mar 13, 2016 at 6:41 PM Paul Hobson notifications@github.com
wrote:

@tacaswell https://github.com/tacaswell your understanding matches mine.

and to make sure I'm clear -- I don't modify api_changes.rstdirectly, I
just add a short, single file: api_changes/2016-03-12-PMH.rst?


Reply to this email directly or view it on GitHub
#5343 (comment)
.

phobson added some commits Mar 13, 2016

@phobson phobson API: moved new autorange opt to the end of fxn sig a99c023
@phobson phobson DOC: add api-change file describing new autorange behavior
6fe8a72

@jenshnielsen jenshnielsen added a commit that referenced this pull request Mar 14, 2016

@jenshnielsen jenshnielsen Merge pull request #5343 from phobson/bxp-equal-quartiles
Boxplot stats w/ equal quartiles
455cb92

@jenshnielsen jenshnielsen merged commit 455cb92 into matplotlib:master Mar 14, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

mdboom removed the needs_review label Mar 14, 2016

@jenshnielsen jenshnielsen added a commit to jenshnielsen/matplotlib that referenced this pull request Mar 14, 2016

@jenshnielsen @jenshnielsen jenshnielsen + jenshnielsen Merge pull request #5343 from phobson/bxp-equal-quartiles
Boxplot stats w/ equal quartiles
54ec43f
Owner

jenshnielsen commented Mar 14, 2016

Backport wasn't clean (conflict in removed svg file) so doing it via #6153

@jenshnielsen jenshnielsen added a commit that referenced this pull request Mar 14, 2016

@jenshnielsen jenshnielsen Merge pull request #6153 from jenshnielsen/backport_5343
Backport #5343 from phobson/bxp-equal-quartiles
7266b4f

phobson deleted the phobson:bxp-equal-quartiles branch Mar 14, 2016

Member

phobson commented Mar 14, 2016

thanks for the merge and help, everyone!

@tacaswell tacaswell added a commit to tacaswell/matplotlib that referenced this pull request May 22, 2016

@jenshnielsen @tacaswell jenshnielsen + tacaswell Merge pull request #5343 from phobson/bxp-equal-quartiles
Boxplot stats w/ equal quartiles
abe9561
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment