Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle nan/masked values Axes.vlines and hlines #7408

Merged
merged 3 commits into from
Nov 8, 2016

Conversation

phobson
Copy link
Member

@phobson phobson commented Nov 4, 2016

Closes #7406.

Also applied fixes to Axes.hlines and wrote tests for both.

@phobson phobson added this to the 2.0.1 (next bug fix release) milestone Nov 4, 2016
dopplershift
dopplershift previously approved these changes Nov 4, 2016
Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'm not holding it up for that line.

@@ -979,14 +979,14 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',

verts = [((thisxmin, thisy), (thisxmax, thisy))
for thisxmin, thisxmax, thisy in zip(xmin, xmax, y)]
coll = mcoll.LineCollection(verts, colors=colors,
lines = mcoll.LineCollection(verts, colors=colors,
linestyles=linestyles, label=label)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this line need to indent to keep things lined up (pep8 and all that...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@NelleV NelleV changed the title Use np.nanmin/max to set ax limits in Axes.vlines [MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines Nov 4, 2016
NelleV
NelleV previously approved these changes Nov 4, 2016
Copy link
Member

@NelleV NelleV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phobson !

@efiring
Copy link
Member

efiring commented Nov 4, 2016

This supports nans in the input, but not masked arrays; in most cases we try to support both. If you want to handle masked arrays, you could run the arguments through a function analogous to cbook.safe_masked_invalid, but going in the opposite direction by using np.ma.filled(x, np.nan) if x is floating point.

Overall, we have a somewhat confusing mixture of handling missing points via nans (which only works for floating point, and which requires nanmax etc.) and via masked arrays, for which methods work mostly as they do on plain ndarrays.

@efiring
Copy link
Member

efiring commented Nov 4, 2016

Second thought: for these functions, would it make more sense to simply filter out the bad values at the start? I don't see any point in pushing the handling of bad values down into the min and max calculations. Admittedly, I haven't looked closely.

@phobson
Copy link
Member Author

phobson commented Nov 5, 2016

@efiring Filtering was my thought when I first went into this. Let me mess around with feeding masked arrays into this function before we merge.

In either case, handling both masked arrays and sequences with invalid (np.nan) values in a simple way will take some thinking.

@phobson phobson changed the title [MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines [No Merge] Use np.nanmin/max to set ax limits in Axes.vlines Nov 5, 2016
@phobson
Copy link
Member Author

phobson commented Nov 5, 2016

With the current implementation in this PR, masked arrays does not work as expected.

For example

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

fig3, ax5 = plt.subplots()
x5 = np.ma.masked_equal([2, 4, 6, 8, 10, 12], 8)
ymin5 = np.ma.masked_equal([0, 1, -1, 0, 2, 1], 2)
ymax5 = np.ma.masked_equal([13, 14, 15, 16, 17, 18], 18)
ax5.vlines(x5, ymin5, ymax5, colors='k', linewidth=2)

Produces a plot with 5 lines. Only the line where x5 == 8 is omitted.

I'll get back to this later this weekend.

@efiring
Copy link
Member

efiring commented Nov 5, 2016

Filtering a single array is easy: x = np.ma.compressed(cbook.safe_masked_invalid(x)).
For vlines, however, one to three arrays have to be filtered together. This is accomplished by cbook.delete_masked_points. Right after the unit conversion, insert

    x, ymin, ymax = cbook.delete_masked_points(x, ymin, ymax)

and I think that will take care of everything.

@phobson
Copy link
Member Author

phobson commented Nov 5, 2016

@efiring Great. Thanks for the tip. I'll include that and the example from my previous comment as a test.

Closes GH matplotlib#7406. Also applied fixes to Axes.hlines
and wrote tests for both.
@phobson phobson changed the title [No Merge] Use np.nanmin/max to set ax limits in Axes.vlines [MRG] Use np.nanmin/max to set ax limits in Axes.vlines Nov 7, 2016
@phobson phobson dismissed stale reviews from dopplershift and NelleV November 7, 2016 16:10

complete reworked how nan/masked values are handled. need fresh review.

@phobson phobson changed the title [MRG] Use np.nanmin/max to set ax limits in Axes.vlines [MRG] Handle nan/masked values Axes.vlines and hlines Nov 7, 2016
Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but would like a clarification on the question below.

minx = min(xmin.min(), xmax.min())
maxx = max(xmin.max(), xmax.max())
minx = np.min([xmin.min(), xmax.min()])
maxx = np.max([xmin.max(), xmax.max()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind the change from using the builtin min and max to numpy's? If we're talking micro-performance-optimization, the builtins will be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, vlines used the builtin min and max, while hlines used a mix of builtin and numpy. I was going for consistency.

To my eyes, the numpy syntax is more familiar and it feels "safer" (not sure if that's true). I wasn't considering performance with those changes.

My understanding is also that it's faster:

x = numpy.ranndom.normal(size=3700)
%timeit x.max()

The slowest run took 53.27 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.88 µs per loop
where (2.9 * 55 ~ 160 µs)

%timeit max(x)

1000 loops, best of 3: 231 µs per loop

Copy link
Contributor

@dopplershift dopplershift Nov 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring specifically to this change:

minx = min(xmin.min(), xmax.min())

to

minx = np.min([xmin.min(), xmax.min()])

and similar for max()--not the change from min(ymin) to ymin.min(), etc. This latter change I agree with. For the former, I dislike using numpy's min just to find the minimum of two scalars. You end up creating an extra list, which then descends into numpy and creates an array. On my system, taking the minimum of two scalars takes 10us with numpy, 4us with the builtin. There's also more for my eye to scan across with how you've changed it.

But I don't want to bikeshed this to death (6us is not going to have any impact on matplotlib's slowness)--but I'd be interested if we as developers have consensus on what the convention is or should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: builtin min is cleaner for comparing two scalars. As @anntzer has pointed out, however, there is a difference in the way NaN is handled. Numpy also has minimum, which can be used with two scalars and is faster than using np.min:

In [1]: import numpy as np
In [2]: min(np.nan, 1.1)
nan
In [3]: min(1.1, np.nan)
1.1
In [7]: timeit np.minimum(2.2, 1.1)
The slowest run took 16.35 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.13 µs per loop

In [8]: timeit min(2.2, 1.1)
The slowest run took 11.75 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 197 ns per loop

In [9]: timeit np.min([2.2, 1.1])
The slowest run took 260.96 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.37 µs per loop

Conclusion: if there is any possibility of nans, use np.minimum; otherwise, builtin min is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike using numpy's min just to find the minimum of two scalars.

Ahh that makes sense. I'll make the change. You've convinced me :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiring since we're using e.g., cbook.delete_masked_points(x, ymin, ymax) I think the possibility for NaNs has been removed.

This is a little simpler since we're finding the min
or max of two scalars (avoids an intermediate list +
diving into numpy).

Also synced up the decorator between the two functions.
@dopplershift
Copy link
Contributor

Test failure is a weird linkchecker failure--seems unrelated.

@QuLogic
Copy link
Member

QuLogic commented Nov 7, 2016

Link checker failed because the documentation does not exist; the failure is here and appears to be related to this PR.

@dopplershift
Copy link
Contributor

Wow, not sure how I missed that.

@QuLogic
Copy link
Member

QuLogic commented Nov 7, 2016

We should probably change Travis to not run link checker when the doc build fails.

@phobson
Copy link
Member Author

phobson commented Nov 8, 2016

That failure is really confusing to me because the tests pass and the docs build on my local branch.

Nonetheless, I think the latest commit (deleting masked/invalid points after we make sure the first arg is an iterable) should do the trick.

@dopplershift
Copy link
Contributor

All green--I think we're good here.

@efiring efiring merged commit d7b8992 into matplotlib:master Nov 8, 2016
@dopplershift
Copy link
Contributor

If this is actually intended for 2.0, it will need to be cherry-picked.

efiring added a commit that referenced this pull request Nov 8, 2016
[MRG] Handle nan/masked values Axes.vlines and hlines
Conflicts:
	lib/matplotlib/axes/_axes.py
We keep the v2.x version on 2 lines: use unpack_labeled_data, instead of
_preprocess_data, which is only in master.
@efiring
Copy link
Member

efiring commented Nov 8, 2016

back-ported to v2.x as ad192f0

@phobson phobson deleted the vlines-with-nan branch November 8, 2016 21:29
@phobson phobson changed the title [MRG] Handle nan/masked values Axes.vlines and hlines Handle nan/masked values Axes.vlines and hlines Nov 8, 2016
@phobson
Copy link
Member Author

phobson commented Nov 8, 2016

thanks for the guidance and merging everyone

@QuLogic QuLogic modified the milestones: 2.0.1 (next bug fix release), 2.0 (style change major release) Dec 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NaN causes plt.vlines to not scale y limits
5 participants