Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Simplify mtrand.pyx helpers #6997

Merged
merged 1 commit into from
Jan 20, 2016
Merged

MAINT: Simplify mtrand.pyx helpers #6997

merged 1 commit into from
Jan 20, 2016

Conversation

gfyoung
Copy link
Contributor

@gfyoung gfyoung commented Jan 12, 2016

Extends from suggestions/comments for #6938 in which it was possible to simplify some of the if...else blocks when broadcasting was involved.

@gfyoung gfyoung changed the title WIP, MAINT: Simplify mtrand.pyx WIP, MAINT: Simplify mtrand.pyx helpers Jan 12, 2016
@gfyoung gfyoung changed the title WIP, MAINT: Simplify mtrand.pyx helpers MAINT: Simplify mtrand.pyx helpers Jan 12, 2016
@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 14, 2016

Could someone take a look at this? I think it should be good to merge unless I've missed something in the simplification. I've just been rebasing this PR continuously with master.

array_data = <double *>PyArray_DATA(array)

with lock, nogil:
for i from 0 <= i < multi.size:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason not to convert this ancient PyRex format loop to a modern for i in range(multi.size)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason at all. Done.

@bashtage
Copy link
Contributor

Have you tried to measure the performance on say a 2-input function + size when size is none and so must be computed from the initial broadcast? I would think that is the only concern - it is possible that the refactor might be faster since there can be fewer function calls.

I rewrote these here and noticed the same thing in terms of the MultiIter_Next that you have refactored (I also moved virtaully all parameter checks upstream, which is the extra code).

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 15, 2016

I ran the following command on my local build of numpy and on my PR build of numpy:

python -m timeit -n 100 "import numpy as np;np.random.uniform(low=0, high=np.arange(1000000))"

np.random.uniform calls cont2_array when broadcasting, and this was one of the functions that I had modified.

For my local build (version 1.10.2):

100 loops, best of 3: 55.7 msec per loop

For my PR build:

100 loops, best of 3: 47.5 msec per loop

That's approximately a 15% speed-up there.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 19, 2016

Travis + Appveyor are happy. If there is nothing else, this should be good to merge.

@jaimefrio
Copy link
Member

Could you rewrite this using the Python API (instead of the C API) where it won't hurt performance?

For instance, this has no performance loss against current master, and is much more readable for maintainers without a good grasp of C:

cdef object cont2_array(rk_state *state, rk_cont2 func, object size,
                        ndarray oa, ndarray ob, object lock):
    cdef double *array_data
    cdef double *oa_data
    cdef double *ob_data
    cdef ndarray array "arrayObject"
    cdef npy_intp i
    cdef broadcast multi

    if size is None:
        multi = <broadcast>np.broadcast(oa, ob)
        array = <ndarray>np.empty(multi.shape, dtype=np.float64)
    else:
        array = <ndarray>np.empty(size, dtype=np.float64)
        multi = <broadcast>np.broadcast(oa, ob, array)
        if multi.shape != array.shape:
            raise ValueError("size is not compatible with inputs")
    with lock, nogil:
        for i from 0 <= i < multi.size:
            oa_data = <double *>PyArray_MultiIter_DATA(multi, 0)
            ob_data = <double *>PyArray_MultiIter_DATA(multi, 1)
            array_data[i] = func(state, oa_data[0], ob_data[0])
            PyArray_MultiIter_NEXT(multi)

    return array

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 20, 2016

@jaimefrio : Sure thing. BTW, are we to ignore the failing 3.2 test on Travis?

@jaimefrio
Copy link
Member

I think so, it seems to be a problem with nose having dropped Python 3,2 support and started using unicode literals, which don't work in 3.2.

@charris
Copy link
Member

charris commented Jan 20, 2016

The test failure is unrelated to this PR.

@charris
Copy link
Member

charris commented Jan 20, 2016

In fact, if you rebase -- yes, now would be a good time ;) -- the error should go away as we have dropped Python 3.2 support in master.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 20, 2016

@charris : I still see 3.2 in the .travis.yaml file, but will rebase as soon as #7063 is merged

array_data[i] = func(state, on_data[0], op_data[0])
PyArray_MultiIter_NEXT(multi)
multi = <broadcast>np.broadcast(on, op)
array = <ndarray>np.empty(multi.shape, dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dtype=int the correct value to always get a C long? Hopefully Appveyor will tell us if this breaks for 64 bit Windows and its 32 bit longs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following what had been done in the else block, but now I suddenly see that there is a gap in the testing there because the test_binomial function (and I suspect other functions as well) never tests broadcasting and hence the else block is not reached in the tests.

Is there perhaps a safer dtype we could use? Perhaps np.long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int always translates to np.int_ which always translates to long (since ths is the python object, not some Cython stuff).That said, np.long is much more clear and I agree we should prefer that if we mean a C long equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...apparently np.long is not the way to go, as Travis failed on the 32-bit build here. Reverting to np.int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use int. np.int is an unnecessary alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Done.

Refactored methods that broadcast arguments
together by finding additional common ground
between code in the if...else branches that
involved a size parameter being passed in.
@bashtage
Copy link
Contributor

For instance, this has no performance loss against current master, and is much more readable for maintainers without a good grasp of C

Is this change really free? Maybe due to a change coming in NumPy 1.11?

Using Cython/ 0.23/NumPy 1.10 shows different C code for the C-API vs numpy.broadcast.

Using C-API

    __pyx_t_6 = __pyx_f_5numpy_PyArray_MultiIterNew3(((PyObject *)__pyx_v_a_arr), ((PyObject *)__pyx_v_b_arr), ((PyObject *)__pyx_v_randoms)); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 161; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_6);
    if (!(likely(((__pyx_t_6) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_6, __pyx_ptype_5numpy_broadcast))))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 161; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_v_it = ((PyArrayMultiIterObject *)__pyx_t_6);
    __pyx_t_6 = 0;

Using np.broadcast

    __pyx_t_6 = PyTuple_New(2); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 163; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_6);
    __Pyx_INCREF(((PyObject *)__pyx_v_a_arr));
    __Pyx_GIVEREF(((PyObject *)__pyx_v_a_arr));
    PyTuple_SET_ITEM(__pyx_t_6, 0, ((PyObject *)__pyx_v_a_arr));
    __Pyx_INCREF(((PyObject *)__pyx_v_b_arr));
    __Pyx_GIVEREF(((PyObject *)__pyx_v_b_arr));
    PyTuple_SET_ITEM(__pyx_t_6, 1, ((PyObject *)__pyx_v_b_arr));
    __pyx_t_1 = __Pyx_PyObject_Call(((PyObject *)__pyx_ptype_5numpy_broadcast), __pyx_t_6, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 163; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    __pyx_t_6 = __pyx_t_1;
    __Pyx_INCREF(__pyx_t_6);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    __pyx_v_it = ((PyArrayMultiIterObject *)__pyx_t_6);
    __pyx_t_6 = 0;

@jaimefrio
Copy link
Member

@bashtage No, it is not entirely free, but it is negligibly more expensive in the situations I tested. Most of the extra code in your listing seems to be creating the tuple with the arguments to pass to the Python function call, so if you are calling the function with a small size, this will probably be somewhat slower. But it is a one time, overhead cost, and e.g. with size=1000 the difference is unnoticeable.

And it really makes the code accessible to a larger audience, which is one of the selling points of Cython.

@jaimefrio
Copy link
Member

I'm going to let AppVeyor do its thing, but will merge it as soon as it gives the green light, thanks for your patience, Greg.

@gfyoung
Copy link
Contributor Author

gfyoung commented Jan 20, 2016

@jaimefrio : I see green everywhere now.

jaimefrio added a commit that referenced this pull request Jan 20, 2016
@jaimefrio jaimefrio merged commit 9ad54ae into numpy:master Jan 20, 2016
@jaimefrio
Copy link
Member

In it goes, thanks again!

@gfyoung gfyoung deleted the mtrand_helpers_compress branch January 20, 2016 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants