Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API for appending to arrays #49391

Open
hniksic mannequin opened this issue Feb 3, 2009 · 6 comments
Open

C API for appending to arrays #49391

hniksic mannequin opened this issue Feb 3, 2009 · 6 comments
Labels
stdlib Python modules in the Lib dir topic-C-API type-feature A feature request or enhancement

Comments

@hniksic
Copy link
Mannequin

hniksic mannequin commented Feb 3, 2009

BPO 5141
Nosy @pitrou, @hniksic, @websurfer5

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2009-02-03.09:55:33.274>
labels = ['expert-C-API', 'type-feature', 'library']
title = 'C API for appending to arrays'
updated_at = <Date 2020-08-07.20:24:40.584>
user = 'https://github.com/hniksic'

bugs.python.org fields:

activity = <Date 2020-08-07.20:24:40.584>
actor = 'Jeffrey.Kintscher'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'C API']
creation = <Date 2009-02-03.09:55:33.274>
creator = 'hniksic'
dependencies = []
files = []
hgrepos = []
issue_num = 5141
keywords = []
message_count = 6.0
messages = ['81039', '81168', '81189', '87828', '87838', '87883']
nosy_count = 6.0
nosy_names = ['ggenellina', 'kxroberto', 'pitrou', 'hniksic', 'bfroehle', 'Jeffrey.Kintscher']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue5141'
versions = []

@hniksic
Copy link
Mannequin Author

hniksic mannequin commented Feb 3, 2009

The array.array type is an excellent type for storing a large amount of
"native" elements, such as integers, chars, doubles, etc., without
involving the heavy machinery of numpy. It's both blazingly fast and
reasonably efficient with memory. The one thing missing from the array
module is the ability to directly access array values from C.

This might seem superfluous, as it's perfectly possible to manipulate
array contents from Python/C using PyObject_CallMethod and friends. The
problem is that it requires the native values to be marshalled to Python
objects, only to be immediately converted back to native values by the
array code. This can be a problem when, for example, a numeric array
needs to be filled with contents, such as in this hypothetical example:

/* error checking and refcounting subtleties omitted for brevity */
PyObject *load_data(Source *src)
{
   PyObject *array_type = get_array_type();
   PyObject *array = PyObject_CallFunction(array_type, "c", 'd');
   PyObject *append = PyObect_GetAttrString(array, "append");
   while (!source_done(src)) {
     double num = source_next(src);
     PyObject *f = PyFloat_FromDouble(num);
     PyObject *ret = PyObject_CallFunctionObjArgs(append, f, NULL);
     if (!ret)
       return NULL;
     Py_DECREF(ret);
     Py_DECREF(f);
   }
   Py_DECREF(array_type);
   return array;
}

The inner loop must convert each C double to a Python Float, only for
the array to immediately extract the double back from the Float and
store it into the underlying array of C doubles. This may seem like a
nitpick, but it turns out that more than half of the time of this
function is spent creating and deleting those short-lived floating-point
objects.

Float creation is already well-optimized, so opportunities for speedup
lie elsewhere. The array object exposes a writable buffer, which can be
used to store values directly. For test purposes I created a faster
"append" specialized for doubles, defined like this:

int array_append(PyObject *array, PyObject *appendfun, double val)
{
   PyObject *ret;
   double *buf;
   Py_ssize_t bufsize;
   static PyObject *zero;
   if (!zero)
     zero = PyFloat_FromDouble(0);

   // append dummy zero value, created only once
   ret = PyObject_CallFunctionObjArgs(appendfun, zero, NULL);
   if (!ret)
     return -1;
   Py_DECREF(ret);

   // append the element directly at the end of the C buffer
   PyObject_AsWriteBuffer(array, (void **) &buf, &bufsize));
   buf[bufsize / sizeof(double) - 1] = val;
   return 0;
}

This hack actually speeds up array creation by a significant percentage
(30-40% in my case, and that's for code that was producing the values by
parsing a large text file).

It turns out that an even faster method of creating an array is by using
the fromstring() method. fromstring() requires an actual string, not a
buffer, so in C++ I created an std::vector<double> with a contiguous
array of doubles, passed that array to PyString_FromStringAndSize, and
called array.fromstring with the resulting string. Despite all the
unnecessary copying, the result was much faster than either of the
previous versions.

Would it be possible for the array module to define a C interface for
the most frequent operations on array objects, such as appending an
item, and getting/setting an item? Failing that, could we at least make
fromstring() accept an arbitrary read buffer, not just an actual string?

@hniksic hniksic mannequin added the type-feature A feature request or enhancement label Feb 3, 2009
@ggenellina
Copy link
Mannequin

ggenellina mannequin commented Feb 4, 2009

Arrays already support the buffer interface

@hniksic
Copy link
Mannequin Author

hniksic mannequin commented Feb 5, 2009

Yes, and I use it in the second example, but the buffer interface
doesn't really help with adding new elements into the array.

@kxroberto
Copy link
Mannequin

kxroberto mannequin commented May 15, 2009

I had a similar problem creating a C-fast array.array interface for Cython.
The array.pxd package here (latest zip file)
http://trac.cython.org/cython_trac/ticket/314
includes a arrayarray.h file, which provides ways for efficient creation
and growth from C (extend, extend_buffer, resize, resize_smart ). Its
probably in one of the next Cython distributions anyway, and will be
maintained. And perhaps array2 and arrayM extension subclasses (very
light-weight numpy) with public API coming soon too.
It respects the different Python versions, so its a lite "quasi API".
And in case there will be a (unlikely) change in future Pythons, the
Cython people will take care as far as there is no official API coming up.
Or perhaps most people with such interest use Cython anyway.

@pitrou
Copy link
Member

pitrou commented May 15, 2009

This has more chances of seeing some progress if you propose a patch.

@kxroberto
Copy link
Mannequin

kxroberto mannequin commented May 16, 2009

A first thing would be to select a suitable prefix name for the Array
API. Because the Numpy people have 'stolen' PyArray_ instead of staying
home with PyNDArray_ or so ;-)

In case sb goes into this:
Other than PyList_ like stuff and existing members, think for speedy
access (like in Cython array.pxd) a direct resizing, the buffer pointer,
and something handy like this should be directly exposed:

int 
PyArr_ExtendFromBuffer(PyObject *arr, void* stuff, Py_ssize_t items)

@RamchandraApte RamchandraApte mannequin added the stdlib Python modules in the Lib dir label Nov 3, 2012
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants