Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor ndarray.take performance on Fortran order arrays (Trac #2065) #2657

Open
numpy-gitbot opened this issue Oct 19, 2012 · 7 comments
Open

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/2065 on 2012-02-26 by @wesm, assigned to unknown.

3000x slowdown observed:

In [25]: arr = np.random.randn(350000, 5)

In [26]: timeit arr.take(np.arange(5), axis=0)
100000 loops, best of 3: 2.86 us per loop

In [27]: arr = np.random.randn(350000, 5).copy('F')

In [28]: timeit arr.take(np.arange(5), axis=0)
100 loops, best of 3: 9.03 ms per loop
@numpy-gitbot
Copy link
Author

@mwiebe wrote on 2012-03-12

Confirmed. ndarray.take needs to be rewritten to be CPU cache-aware. In 1.6, element-wise ufuncs got this treatment. In 1.7, the ufunc reduce method has received this treatment. Making NumPy use more algorithms that respect the CPU cache should continue.

@numpy-gitbot
Copy link
Author

Milestone changed to NumPy 1.8 by @mwiebe on 2012-03-12

@njsmith
Copy link
Member

njsmith commented Mar 6, 2013

Good plan, but not a 1.8 blocker; removing milestone.

@batterseapower
Copy link
Contributor

A workaround is to use this numba with this _take_2d function instead of np.take or numpy fancy-indexing:

@nb.njit(nogil=True)
def _take_2d_ixs0(rets, ixs, out):
    T, N = rets.shape
    M, = ixs.shape
    assert out.shape == (T, M)

    for t in range(T):
        for m in range(M):
            n = ixs[m]
            assert -N < n < N, 'n'
            out[t, m] = rets[t, n]

@nb.njit(nogil=True)
def _take_2d_ixs1(rets, ixs, out):
    T, N = rets.shape
    M, = ixs.shape
    assert out.shape == (T, M)

    for m in range(M):
        n = ixs[m]
        assert -N < n < N, 'n'
        for t in range(T):
            out[t, m] = rets[t, n]

def _take_2d(xs, ixs, axis):
    if axis == 0:
        return _take_2d(xs.T, ixs, 1).T
    else:
        assert axis == 1

    T, N = xs.shape
    M, = ixs.shape

    order = 'C' if xs.flags.c_contiguous else 'F'
    out = np.empty((T, M), dtype=xs.dtype, order=order)
    (_take_2d_ixs0 if xs.flags.c_contiguous else _take_2d_ixs1)(xs, ixs, out)
    return out

Obviously not brilliant because it only works for the 2D case!

@wesm
Copy link

wesm commented Mar 1, 2019

I think this has been fixed in the last 7 years

@batterseapower
Copy link
Contributor

batterseapower commented Mar 2, 2019

@wesm my benchmarks in #9450 still show very poor performance for take with recent numpy. Another workaround is to replace ndarray.take with Pandas take_nd (https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L1543): this function is what DataFrame.iloc dispatches to, and is why iloc is usually faster than numpy advanced indexing.

@seberg
Copy link
Member

seberg commented Mar 2, 2019

I think the typical work around should normally be using advanced indexing anyway, even if there are a few cases advanced indexing is not super fast. And the 2D special cases where it is not quite optimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants