Handle nan/masked values Axes.vlines and hlines #7408

Merged
merged 3 commits into from Nov 8, 2016

Conversation

Projects
None yet
5 participants
Member

phobson commented Nov 4, 2016

Closes #7406.

Also applied fixes to Axes.hlines and wrote tests for both.

phobson added this to the 2.0.1 (next bug fix release) milestone Nov 4, 2016

phobson added the needs_review label Nov 4, 2016

@dopplershift

This looks good to me. I'm not holding it up for that line.

lib/matplotlib/axes/_axes.py
@@ -979,14 +979,14 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',
verts = [((thisxmin, thisy), (thisxmax, thisy))
for thisxmin, thisxmax, thisy in zip(xmin, xmax, y)]
- coll = mcoll.LineCollection(verts, colors=colors,
+ lines = mcoll.LineCollection(verts, colors=colors,
linestyles=linestyles, label=label)
@dopplershift

dopplershift Nov 4, 2016

Contributor

Shouldn't this line need to indent to keep things lined up (pep8 and all that...)

@NelleV

NelleV Nov 4, 2016

Contributor

nice catch!

NelleV changed the title from Use np.nanmin/max to set ax limits in Axes.vlines to [MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines Nov 4, 2016

Owner

efiring commented Nov 4, 2016

This supports nans in the input, but not masked arrays; in most cases we try to support both. If you want to handle masked arrays, you could run the arguments through a function analogous to cbook.safe_masked_invalid, but going in the opposite direction by using np.ma.filled(x, np.nan) if x is floating point.

Overall, we have a somewhat confusing mixture of handling missing points via nans (which only works for floating point, and which requires nanmax etc.) and via masked arrays, for which methods work mostly as they do on plain ndarrays.

Owner

efiring commented Nov 4, 2016

Second thought: for these functions, would it make more sense to simply filter out the bad values at the start? I don't see any point in pushing the handling of bad values down into the min and max calculations. Admittedly, I haven't looked closely.

Member

phobson commented Nov 5, 2016

@efiring Filtering was my thought when I first went into this. Let me mess around with feeding masked arrays into this function before we merge.

In either case, handling both masked arrays and sequences with invalid (np.nan) values in a simple way will take some thinking.

phobson changed the title from [MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines to [No Merge] Use np.nanmin/max to set ax limits in Axes.vlines Nov 5, 2016

Member

phobson commented Nov 5, 2016

With the current implementation in this PR, masked arrays does not work as expected.

For example

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

fig3, ax5 = plt.subplots()
x5 = np.ma.masked_equal([2, 4, 6, 8, 10, 12], 8)
ymin5 = np.ma.masked_equal([0, 1, -1, 0, 2, 1], 2)
ymax5 = np.ma.masked_equal([13, 14, 15, 16, 17, 18], 18)
ax5.vlines(x5, ymin5, ymax5, colors='k', linewidth=2)

Produces a plot with 5 lines. Only the line where x5 == 8 is omitted.

I'll get back to this later this weekend.

@phobson phobson added needs_revision and removed needs_review labels Nov 5, 2016

Owner

efiring commented Nov 5, 2016

Filtering a single array is easy: x = np.ma.compressed(cbook.safe_masked_invalid(x)).
For vlines, however, one to three arrays have to be filtered together. This is accomplished by cbook.delete_masked_points. Right after the unit conversion, insert

    x, ymin, ymax = cbook.delete_masked_points(x, ymin, ymax)

and I think that will take care of everything.

Member

phobson commented Nov 5, 2016

@efiring Great. Thanks for the tip. I'll include that and the example from my previous comment as a test.

@phobson phobson Use np.nanmin/max to set ax limits in Axes.vlines
Closes GH #7406. Also applied fixes to Axes.hlines
and wrote tests for both.
747c3cd

phobson changed the title from [No Merge] Use np.nanmin/max to set ax limits in Axes.vlines to [MRG] Use np.nanmin/max to set ax limits in Axes.vlines Nov 7, 2016

complete reworked how nan/masked values are handled. need fresh review.

complete reworked how nan/masked values are handled. need fresh review.

phobson added the needs_review label Nov 7, 2016

phobson changed the title from [MRG] Use np.nanmin/max to set ax limits in Axes.vlines to [MRG] Handle nan/masked values Axes.vlines and hlines Nov 7, 2016

@dopplershift

dopplershift requested changes Nov 7, 2016 edited

Looks good to me, but would like a clarification on the question below.

lib/matplotlib/axes/_axes.py
- minx = min(xmin.min(), xmax.min())
- maxx = max(xmin.max(), xmax.max())
+ minx = np.min([xmin.min(), xmax.min()])
+ maxx = np.max([xmin.max(), xmax.max()])
@dopplershift

dopplershift Nov 7, 2016

Contributor

What's the rationale behind the change from using the builtin min and max to numpy's? If we're talking micro-performance-optimization, the builtins will be faster.

@phobson

phobson Nov 7, 2016

Member

Originally, vlines used the builtin min and max, while hlines used a mix of builtin and numpy. I was going for consistency.

To my eyes, the numpy syntax is more familiar and it feels "safer" (not sure if that's true). I wasn't considering performance with those changes.

My understanding is also that it's faster:

x = numpy.ranndom.normal(size=3700)
%timeit x.max()

The slowest run took 53.27 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.88 µs per loop
where (2.9 * 55 ~ 160 µs)

%timeit max(x)

1000 loops, best of 3: 231 µs per loop

@dopplershift

dopplershift Nov 7, 2016 edited

Contributor

I'm referring specifically to this change:

minx = min(xmin.min(), xmax.min())

to

minx = np.min([xmin.min(), xmax.min()])

and similar for max()--not the change from min(ymin) to ymin.min(), etc. This latter change I agree with. For the former, I dislike using numpy's min just to find the minimum of two scalars. You end up creating an extra list, which then descends into numpy and creates an array. On my system, taking the minimum of two scalars takes 10us with numpy, 4us with the builtin. There's also more for my eye to scan across with how you've changed it.

But I don't want to bikeshed this to death (6us is not going to have any impact on matplotlib's slowness)--but I'd be interested if we as developers have consensus on what the convention is or should be.

@efiring

efiring Nov 7, 2016

Owner

I agree: builtin min is cleaner for comparing two scalars. As @anntzer has pointed out, however, there is a difference in the way NaN is handled. Numpy also has minimum, which can be used with two scalars and is faster than using np.min:

In [1]: import numpy as np
In [2]: min(np.nan, 1.1)
nan
In [3]: min(1.1, np.nan)
1.1
In [7]: timeit np.minimum(2.2, 1.1)
The slowest run took 16.35 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.13 µs per loop

In [8]: timeit min(2.2, 1.1)
The slowest run took 11.75 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 197 ns per loop

In [9]: timeit np.min([2.2, 1.1])
The slowest run took 260.96 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.37 µs per loop

Conclusion: if there is any possibility of nans, use np.minimum; otherwise, builtin min is fine.

@phobson

phobson Nov 7, 2016

Member

I dislike using numpy's min just to find the minimum of two scalars.

Ahh that makes sense. I'll make the change. You've convinced me :)

@phobson

phobson Nov 7, 2016

Member

@efiring since we're using e.g., cbook.delete_masked_points(x, ymin, ymax) I think the possibility for NaNs has been removed.

@phobson phobson Use builtin min/max in hlines/vlines
This is a little simpler since we're finding the min
or max of two scalars (avoids an intermediate list +
diving into numpy).

Also synced up the decorator between the two functions.
61c214e

efiring removed the needs_revision label Nov 7, 2016

Contributor

dopplershift commented Nov 7, 2016

Test failure is a weird linkchecker failure--seems unrelated.

Member

QuLogic commented Nov 7, 2016

Link checker failed because the documentation does not exist; the failure is here and appears to be related to this PR.

Contributor

dopplershift commented Nov 7, 2016

Wow, not sure how I missed that.

Member

QuLogic commented Nov 7, 2016

We should probably change Travis to not run link checker when the doc build fails.

@phobson phobson remove masked/invalid points after forcing to iterable
1b3fb65
Member

phobson commented Nov 8, 2016

That failure is really confusing to me because the tests pass and the docs build on my local branch.

Nonetheless, I think the latest commit (deleting masked/invalid points after we make sure the first arg is an iterable) should do the trick.

Contributor

dopplershift commented Nov 8, 2016

All green--I think we're good here.

@efiring efiring merged commit d7b8992 into matplotlib:master Nov 8, 2016

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.003%) to 61.832%
Details
Contributor

dopplershift commented Nov 8, 2016

If this is actually intended for 2.0, it will need to be cherry-picked.

@efiring efiring added a commit that referenced this pull request Nov 8, 2016

@efiring efiring Merge pull request #7408 from phobson/vlines-with-nan
[MRG] Handle nan/masked values Axes.vlines and hlines
Conflicts:
	lib/matplotlib/axes/_axes.py
We keep the v2.x version on 2 lines: use unpack_labeled_data, instead of
_preprocess_data, which is only in master.
ad192f0
Owner

efiring commented Nov 8, 2016

back-ported to v2.x as ad192f0

QuLogic removed the needs_review label Nov 8, 2016

phobson deleted the phobson:vlines-with-nan branch Nov 8, 2016

phobson changed the title from [MRG] Handle nan/masked values Axes.vlines and hlines to Handle nan/masked values Axes.vlines and hlines Nov 8, 2016

Member

phobson commented Nov 8, 2016

thanks for the guidance and merging everyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment