Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: One Element Array Inputs Return Scalars in np.random #7055

Merged
merged 1 commit into from
Jan 25, 2016
Merged

BUG: One Element Array Inputs Return Scalars in np.random #7055

merged 1 commit into from
Jan 25, 2016

Conversation

gfyoung
Copy link
Contributor

@gfyoung gfyoung commented Jan 18, 2016

Fixes issue raised in #4263 in which np.random methods returned scalars when passed one-element array inputs. This is because one-element ndarrays can be cast to integers / floats, which is what functions like PyFloat_AsDouble do before converting to the intended data type.

This commit changes the check used to determine whether the inputs are purely scalar by, as suggested by @rkern, converting all inputs to arrays and checking if the resulting shape is an empty tuple (scalar) or not (array).

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 18, 2016

PR is currently WIP because I still need to add a test to make sure that this behavior does not resurface. However, I just want to make sure that my changes aren't breaking any existing tests.

@@ -1551,18 +1551,17 @@ cdef class RandomState:
cdef double flow, fhigh, fscale
cdef object temp

flow = PyFloat_AsDouble(low)
fhigh = PyFloat_AsDouble(high)
if np.array(low).shape == np.array(high).shape == ():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this double comparison valid Cython? Or does it end comparing True (or False) to ()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to have worked. I tested the methods briefly by hand (more formal testing is coming though!), and I didn't have any issues with the double (or even triple) comparison with regards to output data type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add that while I do not know the validity of chained comparisons in Cython, I know it is valid in pure Python. I guess since you can compile pure Python with Cython, it should then be valid by extension? A quick look over the Cython docs does not indicate any issues with handling such chains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it is the Python thing, guess what confused me was the double ==, but it does what it is expected to:

In [184]: 0 == 0 == 0
Out[184]: True

In [185]: (0 == 0) == 0
Out[185]: False

In [186]: 0 == (0 == 0)
Out[186]: False

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 19, 2016

Tests have been added, and Travis + Appveyor are happy. Should be good to merge as is, as I can uncomment the randint test after a rebase in #6938.

@jaimefrio
Copy link
Member

This change makes a call to np.random.uniform(0, 1) 5x slower in my system:

In [39]: %timeit np.random.uniform(0, 1)
# current master
The slowest run took 19.17 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 199 ns per loop
# this PR
The slowest run took 26.30 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 988 ns per loop

Not sure if this is really relevant, as bulk calling e.g. np.random.uniform(0, 1, 1000), only shows a much smaller slowdown (15us to 16us), but would again like to hear other opinions.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 22, 2016

@jaimefrio : Should we wait longer, or can this be merged?

@jaimefrio
Copy link
Member

You can get some of the speed back if you change the condition:

if np.array(low).shape == np.array(high).shape == ():

to:

if (not (isinstance(low, np.ndarray) and low.shape) and
                not (isinstance(high, np.ndarray) and high.shape))
In [2]: %timeit np.random.uniform(0, 1)
# this PR
1000000 loops, best of 3: 1.04 µs per loop
# my changes
1000000 loops, best of 3: 358 ns per loop

In [3]: %timeit np.random.uniform([0], [1])
# this PR
100000 loops, best of 3: 4.95 µs per loop
# my changes
100000 loops, best of 3: 3.76 µs per loop

Not sure if the extra complexity is worth the gains, what do you think?

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 23, 2016

@jaimefrio : IMHO, simplicity wins in this case. It seems that the performance scales relatively well from the timing analysis you provided earlier, so obfuscating the code in order to optimize the smallest input case possible does not seem worthwhile here.

@jaimefrio
Copy link
Member

If we go this route, I think we should move the array creation up, rather than redoing it:

olow = <ndarray>PyArray_FROM_OTF(low, NPY_DOUBLE, NPY_ARRAY_ALIGNED)
ohigh = <ndarray>PyArray_FROM_OTF(high, NPY_DOUBLE, NPY_ARRAY_ALIGNED)
if olow.shape == ohigh.shape == ():
    ...

It will make a noticeable difference if low and high are large arrays.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 23, 2016

Ah, yes, good point. Done.

@homu
Copy link
Contributor

homu commented Jan 24, 2016

☔ The latest upstream changes (presumably #7082) made this pull request unmergeable. Please resolve the merge conflicts.

@jaimefrio
Copy link
Member

This is going to be painful to rebase, but once you get it done, it should be ready to go.

…andom

Fixes bug in np.random methods that would return scalars
when passed one-element array inputs. This is because
one-element ndarrays can be cast to integers / floats, which
is what functions like PyFloat_AsDouble do before converting
to the intended data type.

This commit changes the check used to determine whether the
inputs are purely scalar by converting all inputs to arrays
and checking if the resulting shape is an empty tuple (scalar)
or not (array).

Closes gh-4263.
@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 24, 2016

@jaimefrio : Rebase is done, and Travis + Appveyor are happy.

@gfyoung gfyoung closed this Jan 24, 2016
@gfyoung gfyoung reopened this Jan 24, 2016
@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 24, 2016

Accidentally clicked the wrong button when I wanted to comment. Whoops.

jaimefrio added a commit that referenced this pull request Jan 25, 2016
BUG: One Element Array Inputs Return Scalars in np.random
@jaimefrio jaimefrio merged commit d641eed into numpy:master Jan 25, 2016
@gfyoung gfyoung deleted the single_elt_array_return_matching branch January 25, 2016 00:19
@jaimefrio
Copy link
Member

Thanks Greg, one more in...

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 25, 2016

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants