Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Handle nan/masked values Axes.vlines and hlines #7408
Conversation
phobson
added this to the
2.0.1 (next bug fix release)
milestone
Nov 4, 2016
phobson
added the
needs_review
label
Nov 4, 2016
| @@ -979,14 +979,14 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid', | ||
| verts = [((thisxmin, thisy), (thisxmax, thisy)) | ||
| for thisxmin, thisxmax, thisy in zip(xmin, xmax, y)] | ||
| - coll = mcoll.LineCollection(verts, colors=colors, | ||
| + lines = mcoll.LineCollection(verts, colors=colors, | ||
| linestyles=linestyles, label=label) |
dopplershift
Nov 4, 2016
Contributor
Shouldn't this line need to indent to keep things lined up (pep8 and all that...)
NelleV
changed the title from
Use np.nanmin/max to set ax limits in Axes.vlines to [MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines
Nov 4, 2016
|
This supports nans in the input, but not masked arrays; in most cases we try to support both. If you want to handle masked arrays, you could run the arguments through a function analogous to Overall, we have a somewhat confusing mixture of handling missing points via nans (which only works for floating point, and which requires nanmax etc.) and via masked arrays, for which methods work mostly as they do on plain ndarrays. |
|
Second thought: for these functions, would it make more sense to simply filter out the bad values at the start? I don't see any point in pushing the handling of bad values down into the min and max calculations. Admittedly, I haven't looked closely. |
|
@efiring Filtering was my thought when I first went into this. Let me mess around with feeding masked arrays into this function before we merge. In either case, handling both masked arrays and sequences with invalid ( |
phobson
changed the title from
[MRG+2] Use np.nanmin/max to set ax limits in Axes.vlines to [No Merge] Use np.nanmin/max to set ax limits in Axes.vlines
Nov 5, 2016
|
With the current implementation in this PR, masked arrays does not work as expected. For example %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig3, ax5 = plt.subplots()
x5 = np.ma.masked_equal([2, 4, 6, 8, 10, 12], 8)
ymin5 = np.ma.masked_equal([0, 1, -1, 0, 2, 1], 2)
ymax5 = np.ma.masked_equal([13, 14, 15, 16, 17, 18], 18)
ax5.vlines(x5, ymin5, ymax5, colors='k', linewidth=2)Produces a plot with 5 lines. Only the line where I'll get back to this later this weekend. |
phobson
added needs_revision and removed needs_review
labels
Nov 5, 2016
|
Filtering a single array is easy: x, ymin, ymax = cbook.delete_masked_points(x, ymin, ymax)and I think that will take care of everything. |
|
@efiring Great. Thanks for the tip. I'll include that and the example from my previous comment as a test. |
phobson
changed the title from
[No Merge] Use np.nanmin/max to set ax limits in Axes.vlines to [MRG] Use np.nanmin/max to set ax limits in Axes.vlines
Nov 7, 2016
phobson
dismissed
dopplershift’s
review
Nov 7, 2016
complete reworked how nan/masked values are handled. need fresh review.
phobson
dismissed
NelleV’s
review
Nov 7, 2016
complete reworked how nan/masked values are handled. need fresh review.
phobson
added the
needs_review
label
Nov 7, 2016
phobson
changed the title from
[MRG] Use np.nanmin/max to set ax limits in Axes.vlines to [MRG] Handle nan/masked values Axes.vlines and hlines
Nov 7, 2016
dopplershift
requested changes
Nov 7, 2016
•
Looks good to me, but would like a clarification on the question below.
| - minx = min(xmin.min(), xmax.min()) | ||
| - maxx = max(xmin.max(), xmax.max()) | ||
| + minx = np.min([xmin.min(), xmax.min()]) | ||
| + maxx = np.max([xmin.max(), xmax.max()]) |
dopplershift
Nov 7, 2016
Contributor
What's the rationale behind the change from using the builtin min and max to numpy's? If we're talking micro-performance-optimization, the builtins will be faster.
phobson
Nov 7, 2016
Member
Originally, vlines used the builtin min and max, while hlines used a mix of builtin and numpy. I was going for consistency.
To my eyes, the numpy syntax is more familiar and it feels "safer" (not sure if that's true). I wasn't considering performance with those changes.
My understanding is also that it's faster:
x = numpy.ranndom.normal(size=3700)
%timeit x.max()
The slowest run took 53.27 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.88 µs per loop
where(2.9 * 55 ~ 160 µs)
%timeit max(x)
1000 loops, best of 3: 231 µs per loop
dopplershift
Nov 7, 2016
•
Contributor
I'm referring specifically to this change:
minx = min(xmin.min(), xmax.min())to
minx = np.min([xmin.min(), xmax.min()])and similar for max()--not the change from min(ymin) to ymin.min(), etc. This latter change I agree with. For the former, I dislike using numpy's min just to find the minimum of two scalars. You end up creating an extra list, which then descends into numpy and creates an array. On my system, taking the minimum of two scalars takes 10us with numpy, 4us with the builtin. There's also more for my eye to scan across with how you've changed it.
But I don't want to bikeshed this to death (6us is not going to have any impact on matplotlib's slowness)--but I'd be interested if we as developers have consensus on what the convention is or should be.
efiring
Nov 7, 2016
Owner
I agree: builtin min is cleaner for comparing two scalars. As @anntzer has pointed out, however, there is a difference in the way NaN is handled. Numpy also has minimum, which can be used with two scalars and is faster than using np.min:
In [1]: import numpy as np
In [2]: min(np.nan, 1.1)
nan
In [3]: min(1.1, np.nan)
1.1
In [7]: timeit np.minimum(2.2, 1.1)
The slowest run took 16.35 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.13 µs per loop
In [8]: timeit min(2.2, 1.1)
The slowest run took 11.75 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 197 ns per loop
In [9]: timeit np.min([2.2, 1.1])
The slowest run took 260.96 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.37 µs per loop
Conclusion: if there is any possibility of nans, use np.minimum; otherwise, builtin min is fine.
phobson
Nov 7, 2016
Member
I dislike using numpy's min just to find the minimum of two scalars.
Ahh that makes sense. I'll make the change. You've convinced me :)
phobson
Nov 7, 2016
Member
@efiring since we're using e.g., cbook.delete_masked_points(x, ymin, ymax) I think the possibility for NaNs has been removed.
efiring
removed the
needs_revision
label
Nov 7, 2016
|
Test failure is a weird linkchecker failure--seems unrelated. |
|
Link checker failed because the documentation does not exist; the failure is here and appears to be related to this PR. |
|
Wow, not sure how I missed that. |
|
We should probably change Travis to not run link checker when the doc build fails. |
|
That failure is really confusing to me because the tests pass and the docs build on my local branch. Nonetheless, I think the latest commit (deleting masked/invalid points after we make sure the first arg is an iterable) should do the trick. |
|
All green--I think we're good here. |
efiring
merged commit d7b8992
into matplotlib:master
Nov 8, 2016
|
If this is actually intended for 2.0, it will need to be cherry-picked. |
efiring
added a commit
that referenced
this pull request
Nov 8, 2016
|
|
efiring |
ad192f0
|
|
back-ported to v2.x as ad192f0 |
QuLogic
removed the
needs_review
label
Nov 8, 2016
phobson
deleted the
phobson:vlines-with-nan branch
Nov 8, 2016
phobson
changed the title from
[MRG] Handle nan/masked values Axes.vlines and hlines to Handle nan/masked values Axes.vlines and hlines
Nov 8, 2016
|
thanks for the guidance and merging everyone |
phobson commentedNov 4, 2016
Closes #7406.
Also applied fixes to Axes.hlines and wrote tests for both.