-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Description
Original ticket http://projects.scipy.org/numpy/ticket/1779 on 2011-03-23 by trac user Han, assigned to unknown.
Hi,
For a while, a small issue has been bugging me, where array.tolist() takes a huge amount of time, compared to the speed of Python.
To illustrate, here are some timings (on Windows XP):
>>> timeit.timeit('a.tolist()', 'from numpy import arange; a = arange(1e5)', number=500)
19.07303038231646
The conversion of 500 x 100000 elements takes up to 20(!) seconds.
>>> timeit.timeit('a = arange(1e5)', 'from numpy import arange', number=500)
0.4194663546483639
While creating a NumPy arrays with the same amount of items takes .5 seconds.
>>> timeit.timeit('a = range(int(1e5))', number=500)
0.97656364422391562
And creating a Python list takes no more than 1 second, this is 20x faster than array.tolist(). So where is this discrepancy coming from?
To find out, I did some runs with valgrind on NumPy 1.4.1 dbg (Debian), and used kcachegrind to produce a few calling graphs.
The first thing I noticed was the amount of array_alloc and consequent array_dealloc calls. The number of calls amount up to the number of elements in the array! PyArray_NewFromDescr is called per array element, which creates a lot of overhead.
In NumPy 1.6.1b1, this overhead still exists; it stems from the PyArray_ToList function in convert.c:
NPY_NO_EXPORT PyObject *
PyArray_ToList(PyArrayObject *self)
{
PyObject *lp;
PyArrayObject *v;
intp sz, i;
if (!PyArray_Check(self)) {
return (PyObject *)self;
}
if (self->nd == 0) {
return self->descr->f->getitem(self->data,self);
}
sz = self->dimensions[0];
lp = PyList_New(sz);
for (i = 0; i < sz; i++) {
v = (PyArrayObject *)array_big_item(self, i);
if (PyArray_Check(v) && (v->nd >= self->nd)) {
PyErr_SetString(PyExc_RuntimeError,
"array_item not returning smaller-" \
"dimensional array");
Py_DECREF(v);
Py_DECREF(lp);
return NULL;
}
PyList_SetItem(lp, i, PyArray_ToList(v));
Py_DECREF(v);
}
return lp;
}
For every element in the array, a array_big_item call is made to create a new array with that element and given recursively to PyArray_ToList to get the actual element item.
I added an extra clause to the function to account for 1-dimensional arrays:
if (self->nd == 1) {
sz = self->dimensions[0];
lp = PyList_New(sz);
for (i = 0; i < sz; i++) {
PyList_SetItem(lp, i, self->descr->f->getitem(index2ptr(self, i),self));
}
return lp;
}
Which gets the time down to 2 seconds on Windows.
I'm not sure about the patch, though, because it does not account for errors, and might be more optimized / streamlined.
Anyway, hope it can go in at 1.6.0 in some way or another, because it really helps in NumPy->Python conversions!
[attached: calling graphs]