We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil1.
numpy-api-bench is a small Python package comparing speed differences between NumPy's Python and C APIs that also serves as an example project for writing C extension modules that make use of the NumPy C API2.
Linux, Mac, and Windows binary wheels have been built from source on Github Actions runners using the excellent cibuildwheel tool, which eases the process of building binary wheels from compiled code for different platforms. cibuildwheel
especially helps with building manylinux wheels.
To build locally, you will need numpy>=1.19
and the latest setuptools3 installed. Your C compiler should be appropriate for your platform, ex. GCC for Linux, MSVC for Windows, but let setuptools
do the work.
First, use git clone
or download + unzip to get the repo source code and install the requirements with4
pip3 install -r install_requires.txt
After you cd
into the repository root, you can build the C extensions in-place and install the package files with
make inplace && pip3 install .
If you don't have or don't wish to use make
, you may instead use
python3 setup.py build_ext --inplace && pip3 install .
64-bit Python 3.6-3.9 binary wheels for Windows, MacOS, manylinux1, and manylinux2010 can be installed from PyPI, with 32-bit wheels for Windows (x86) and Linux (i686) also available. Install with
pip3 install numpy-api-bench
The numpy-api-bench
package contains a pure Python module and several C extension modules. The pure Python module is npapibench.pyimpl
, containing one function that centers and scales to unit variance a numpy.ndarray
that is implemented with only one line of numpy
-enabled Python code. It is the "benchmark" for the C extension module npapibench.cimpl
, which implements a near-identical function by using the NumPy C API. The other C extension modules are part of the npapibench.functimer
subpackage, which provides a callable API for timing the execution of a function with optional arguments in a timeit-like fashion5.
On installation, setuptools
will also create an entry point titled npapibench
to access the benchmarking code. Just typing the name of the entry point in the terminal should produce the timeit
-like output
numpy.ndarray shape (40, 5, 10, 10, 20, 5), size 2000000
pyimpl.stdscale -- 10 loops, best of 5: 31.9 msec per loop
cimpl.stdscale -- 50 loops, best of 5: 13.6 msec per loop
For usage details, try npapibench --help
.
The unit testing requirements for a C extension module are rather unique. Although one is writing C code, the resulting shared object built by setuptools
is loaded by the Python interpreter, so it easier to test Python-accessible functions by using Python unit testing tools. However, it is likely that the C extension module, which by convention is a single file with all members static except the module initialization function, may contain some internal functions that cannot be accessed directly from Python. So far, there does not seem to be a widely accepted approach to unit testing code in Python C extensions, especially these internal C functions.
For this project, in separate C extension modules, I wrote Python wrappers for the internal functions I wanted to test, providing a C API for other extension modules by using the header file and PyCapsule
method described in the official tutorial on writing Python C extensions. Then, I wrote unit tests in Python using the pytest API and simply invoked pytest
to collect and run all unit tests, as it produces far better unit test output compared to most C unit testing frameworks and is aware of Python objects. If there were any segmentation faults or need to more closely debug, I would just then invoke gdb
on the Python interpreter running pytest
6 with
gdb --args python3 -m pytest
Together, pytest
and gdb
allowed me to hammer out a significant number of bugs.
To run the unit tests in the package, pytest>=6.0.1
must be installed. If installing the wheel from PyPI, you can install pytest
as an optional dependency alongside the package code with
pip3 install numpy-api-bench[tests]
The unit tests are located in npapibench.tests
and npapibench.functimer.tests
and can be run with
pytest --pyargs npapibench.tests && pytest --pyargs npapibench.functimer.tests
Other desired flags can be passed to pytest
before the --pyargs
flag.
If building from source, follow the steps in From source but replace the final pip3 install .
with
pip3 install .[tests]
The unit tests can be run after cd
ing to the repository root by simply calling pytest
.
Attributed to Sir Tony Hoare, popularized by Donald Knuth.↩
This package is a fork of the latest version of an old PyPI project of mine called
c-npy-demo
with more concise and efficient code as well as a new name.c-npy-demo
was written at a time when I was still struggling with the Python and NumPy C APIs and had no idea how to test them, but now that I am much more practiced with both APIs, I felt it right to use my new knowledge to rework my old code.↩setuptools
has seen a lot of change, especially post PEP 517, but since C extension modules have to be built in this package the legacysetup.py
method of building distributions still has to be used. Note that the distutils.core.Extension class is present insetuptools
as thesetuptools.extension.Extension
class.↩Only Linux users need worry about using
pip3
. Usepip
for Windows and Mac.↩Previously, I had used timeit.main for its pretty output, but unlike the callable API provided by
timeit
, one cannot pass in a global symbol table to avoid repeated setup. Therefore, thenumpy.ndarray
allocated in the benchmarking code is allocated twice. I thus wrotenpapibench.functimer
, which providestimeit.main
-like capabilities with a callable API intended for use with functions. It is written as a C extension module to reduce the timing measurement error resulting from timingn
executions of a statement within a Python loop, which has a higher per-loop overhead than a C for loop.↩The
pytest
entry point is a Python script run by the interpreter sogdb pytest
does not work.↩