Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup: broadcasting #7562

Merged
merged 1 commit into from
Aug 27, 2017
Merged

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Dec 5, 2016

  • Rely more on broadcasting.
  • Use power operator / np.hypot where appropriate.
  • Use np.pi instead of math.pi.

@@ -25,7 +25,7 @@
# Generate random data
data = np.random.uniform(0, 1, (64, 75))
X = np.linspace(-1, 1, data.shape[-1])
G = 1.5 * np.exp(-4 * X * X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases I think this construct is marginally faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# float is probably the most common case
In [8]: x = np.arange(100000).astype(float)

In [9]: %timeit x * x
10000 loops, best of 3: 65.5 µs per loop

In [10]: %timeit x ** 2
10000 loops, best of 3: 66.4 µs per loop

(and you can easily get them reversed on another run).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. That is an apparently wrong rule of thumb I have been using for a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you think this would be the case? (I am honestly puzzled.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It dates back to the early days of computers and compilers, when computers were slow and compilers were not so bright. Multiplication is faster than taking a power, in general. But now compilers recognize things like this and find the best algorithm. In the case of numpy, I'm pretty sure there is explicit internal optimization of small integer powers so that we wouldn't have to use that ancient manual optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I respectfully request the children remain off of my lawn. 😈

Copy link
Contributor

@eric-wieser eric-wieser Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @efiring says, numpy hard-codes power(x, 2) to fall back on square(x), along with a handful of other constants (I think (-1, 0, 0.5, 1, 2))

# make them safe to take len() of
_left = left
left = make_iterable(left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break unit handling? It is probably better to leave the duck-ish typing here and not force to numpy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what unit handling is actually supposed to do, but in any case if an object array (or list of objects, or single object) is passed in then atleast_1d will return an object array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the top of examples/units/radian_demo.py, I added:

print(x[0])
print(np.atleast_1d(x)[0])
print(np.atleast_1d(x[0]))

and it printed:

[ 0.] in radians
[ 0.] in radians
[0.0]

so I guess it might have broken something, unit-wise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems due to a useless and incorrect implementation of __array__. Tried to fix the example...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is when the unit is carried on the container class, not the values. This will strip the units and turn it into a plain numpy array. These sorts of things tend to be implemented as numpy array sub-classes (ex pint and yt).

This will also force a copy of pandas Series we get in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to stay as they were or the process_unic_info call needs to be moved above these casts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atleast_1d calls asanyarray so it passes subclasses through with no problem. It is true that pandas Series do not subclass numpy arrays and get copied, but given that we're going to generate one Rectangle object per entry in the Series I'd say the additional cost of the copy is rather small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that the current implementation is actually buggy: bar([1, 2], [3, 4], width=[.8]) works but bar([1, 2], [3, 4], width=np.array([.8])) currently raises an exception due to the fact that width *= nbars works elementwise here. (Can you open a separate issue if you don't think this PR can get merged any time soon, so that it doesn't get lost?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Actually asarray doesn't even copy the underlying buffer for pandas series:

In [1]: s = pd.Series([1, 2, 3]); t = np.asarray(s); t[0] = 42; s
Out[1]: 
0    42
1     2
2     3
dtype: int64

so the point is moot.

@anntzer anntzer force-pushed the cleanup-broadcasting branch 6 times, most recently from 7b3171b to f15e8f4 Compare December 5, 2016 04:31
@@ -440,8 +440,8 @@ def transform_non_affine(self, xy):

quarter_x = 0.25 * x
half_y = 0.5 * y
z = np.sqrt(1.0 - quarter_x*quarter_x - half_y*half_y)
longitude = 2 * np.arctan((z*x) / (2.0 * (2.0*z*z - 1.0)))
z = np.sqrt(1 - quarter_x * quarter_x - half_y * half_y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using **2 on these ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

c = np.sqrt(area)
r = np.sqrt(x*x + y*y)
r = np.sqrt(x * x + y * y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.hypot?

Copy link
Contributor Author

@anntzer anntzer Dec 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to use hypot every time, especially in examples, as the function may be somewhat unknown.

# make them safe to take len() of
_left = left
left = make_iterable(left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the top of examples/units/radian_demo.py, I added:

print(x[0])
print(np.atleast_1d(x)[0])
print(np.atleast_1d(x[0]))

and it printed:

[ 0.] in radians
[ 0.] in radians
[0.0]

so I guess it might have broken something, unit-wise?

@@ -318,7 +318,7 @@ def draw_text(self, gc, x, y, s, prop, angle, ismath=False, mtext=None):
if angle == 0.0:
gfx_ctx.DrawText(s, x, y)
else:
rads = angle / 180.0 * math.pi
rads = math.radians(angle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not deg2rad like the other PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math only has radians and degrees, not deg2rad and rad2deg. I seems that both names are used in the codebase (even before my earlier PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I meant to go back and check whether that was math or np and forgot about it.

@@ -388,8 +388,8 @@ def transform_non_affine(self, xy):

quarter_x = 0.25 * x
half_y = 0.5 * y
z = np.sqrt(1.0 - quarter_x*quarter_x - half_y*half_y)
longitude = 2 * np.arctan((z*x) / (2.0 * (2.0*z*z - 1.0)))
z = np.sqrt(1 - quarter_x * quarter_x - half_y * half_y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No **2 again?

@@ -1445,8 +1445,8 @@ def bump(a):
y = 2 * np.random.random() - .5
z = 10 / (.1 + np.random.random())
for i in range(m):
w = (i / float(m) - y) * z
a[i] += x * np.exp(-w * w)
w = (i / m - y) * z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be written as a NumPy expression instead of a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

raise ValueError(("Argument 'zs' must be of same size as 'xs' "
"and 'ys' or of size 1."))

xs, ys, zs = np.broadcast_arrays(*map(np.ma.ravel, [xs, ys, zs]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This possibly makes any error about incorrect shapes a bit more obscure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broadcast_to explicitly prints the nonmatching shape in the error message, but not broadcast_arrays. I'll open an issue on numpy but I don't think this should hold up this PR.

(Note how just below we gain checking on the size of dx/dy/dz, which was not present before.)

@efiring
Copy link
Member

efiring commented Dec 5, 2016

Looks like lots of nice cleanups here, once again. In future, however, I suggest that you reconsider your policy of always surrounding operators by spaces. This is not required by PEP 8, and although I am a fan of white space, I think that in many cases omitting spaces around operators improves readability. E.g., a * b**2 is better than a * b ** 2 and a*b + c*d is better than a * b + c * d. This is mainly for variables with very short names.

@anntzer
Copy link
Contributor Author

anntzer commented Dec 5, 2016

Sounds like a reasonable policy, thanks for the notice.

@anntzer
Copy link
Contributor Author

anntzer commented Dec 5, 2016

Looks like there the build failure comes from the newly released sphinx 1.5.

@QuLogic
Copy link
Member

QuLogic commented Dec 5, 2016

Yep, #7569.

@@ -20,22 +20,21 @@

# Create the mesh in polar coordinates and compute x, y, z.
radii = np.linspace(min_radius, 0.95, n_radii)
angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False)
angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very strongly opposed to unnecessary whitespace changes of this form, and there are quite a lot of them in this PR.

They are also not related to the title and description of the PR.

@anntzer
Copy link
Contributor Author

anntzer commented Dec 5, 2016

I got rid of all the trivial whitespace changes. (Lines with both whitespace changes and nonwhitespace changes kept their changes.)

@anntzer anntzer force-pushed the cleanup-broadcasting branch 3 times, most recently from 6803631 to f0c3b4f Compare December 5, 2016 19:27
@ianthomas23
Copy link
Member

@anntzer Thankyou, but you do need to be consistent. Changing some is worse than changing all or none.

@anntzer
Copy link
Contributor Author

anntzer commented Dec 6, 2016

I think it's all fairly consistent now.

@anntzer
Copy link
Contributor Author

anntzer commented Dec 6, 2016

There seems to be some issues left with the unit handling code that I need to look into.

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Dec 29, 2016
@@ -91,7 +91,7 @@
fig = plt.figure()
ax = fig.add_subplot(111, projection='polar')
r = np.arange(0, 1, 0.001)
theta = 2*2*np.pi*r
theta = 4 * np.pi * r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2*2 here may have been pedagogical (ex 2 * tau )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted; however I chose to leave the use of **2 instead of X * X above (unless you feel strongly about it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, @efiring explained that one correctly. If it isn't faster, use the more obvious way to write it.

# make them safe to take len() of
_left = left
left = make_iterable(left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is when the unit is carried on the container class, not the values. This will strip the units and turn it into a plain numpy array. These sorts of things tend to be implemented as numpy array sub-classes (ex pint and yt).

This will also force a copy of pandas Series we get in.

# make them safe to take len() of
_left = left
left = make_iterable(left)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to stay as they were or the process_unic_info call needs to be moved above these casts.

y = np.atleast_2d(*args)
elif len(args) > 1:
y = np.row_stack(args)
y = np.row_stack(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not equivalent

In [18]: np.atleast_2d([1, 2, 3])
Out[18]: array([[1, 2, 3]])

In [19]: np.row_stack([1, 2, 3])
Out[19]: 
array([[1],
       [2],
       [3]])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's np.atleast_2d(*args) (note the unpack) and only in the case where len(args) == 1.

Say args has shape (after casting to an array) (1, x, y, ...).

  • np.atleast_2d(*args) == np.atleast_2d(<shape (x, y, ...)>) == <shape (x, y, ...)>
  • np.row_stack(args) also has shape (x, y, ...).

In the case args is 2D (shape (1, x)), both expressions result in a shape (1, x) as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐑

for i in range(m):
w = (i / float(m) - y) * z
a[i] += x * np.exp(-w * w)
a += x * np.exp(-((np.arange(m) / m - y) * z) ** 2)
a = np.zeros((m, n))
for i in range(n):
for j in range(5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this inner loop here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because whoever came up with that example decided to add some random numbers five times to each column rather than only once...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐑 Yeah, I am greatly confused by the local operating in-place function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stackplot_demo2 (from which this function comes) has the slightly more helpful docstring "Return n random Gaussian mixtures, each of length m." I can copy it there, or just inline the function (also in the demo), let me know if you have a preference.

@QuLogic
Copy link
Member

QuLogic commented Jun 1, 2017

@tacaswell uses ⚔️ for conflicts...

@anntzer
Copy link
Contributor Author

anntzer commented Jul 30, 2017

Rebased.

@QuLogic
Copy link
Member

QuLogic commented Jul 31, 2017

Still LGTM, though maybe squash the import fixup into the relevant commits. Might also want to rebase just to get CircleCI+AppVeyor fixes too..

@anntzer
Copy link
Contributor Author

anntzer commented Jul 31, 2017

I squashed-rebased everything as splitting the fixup commit across the rebase seems a bit tricky, plus everything is still more or less related.

@QuLogic
Copy link
Member

QuLogic commented Jul 31, 2017

I forgot; why did the test image change?

@anntzer
Copy link
Contributor Author

anntzer commented Jul 31, 2017

Because the previous behavior was incorrect: see the leftmost green "triangle", which is did not go all the way to the left but now does.

@QuLogic
Copy link
Member

QuLogic commented Jul 31, 2017

Ah, I see it now.

@QuLogic
Copy link
Member

QuLogic commented Aug 23, 2017

Needs a rebase; I'm also going to dismiss @tacaswell's review which seems to be outdated.

@anntzer
Copy link
Contributor Author

anntzer commented Aug 23, 2017

done

@dopplershift
Copy link
Contributor

And of course in that time a PR went in that causes conflicts...

@anntzer
Copy link
Contributor Author

anntzer commented Aug 24, 2017

and fixed again...

@@ -9,7 +9,7 @@
import matplotlib.pyplot as plt

t = np.arange(0.0, 1.0 + 0.01, 0.01)
s = np.cos(2 * 2 * np.pi * t)
s = np.cos(2 * 2*np.pi * t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why is this the one where you don't have spaces around it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nm...I guess I see it now.

Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question about imports...

import six

from collections import OrderedDict
import warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this import necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing actually for six.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably some rebase made the thing irrelevant. Fixed.

Also fixes a bug in fill_between with masked data.  In the modified test
figures, the area in green is supposed to correspond to the part of the
hatched area where the curve is below y=2.  The new behavior is the
correct one.

Also fixes cbook._reshape2D for scalar object inputs.  Before the fix,
`plt.hist(None)` would fail with `x must have 2 or fewer dimensions`,
which it does have.  Now it fails with a bit later with `unsupported
operands type(s) for +: 'NoneType' and 'float'`, which is hopefully
clearer.
Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting on CI to pass...

@QuLogic QuLogic merged commit 7cf904d into matplotlib:master Aug 27, 2017
@anntzer anntzer deleted the cleanup-broadcasting branch August 27, 2017 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants