Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import NumPy private C-API #26516

Closed
piotr-blaszyk-fyp opened this issue May 24, 2024 · 6 comments
Closed

Import NumPy private C-API #26516

piotr-blaszyk-fyp opened this issue May 24, 2024 · 6 comments
Labels
33 - Question Question about NumPy usage or development 57 - Close? Issues which may be closable unless discussion continued

Comments

@piotr-blaszyk-fyp
Copy link

Steps to reproduce:

Hi,

I’m working on a C++ project, in which I use the NumPy C-API. Only part of this API is public. I would like to import a function belonging to NumPy's private / internal API. How can I achieve that?

The standard way of importing the NumPy C-API in a project involves including its header files from
/usr/local/lib/python3.11/site-packages/numpy/core/include/numpy.

There’s no shared library object that I need to link my executable to.

Hence, I would imagine it necessary to modify NumPy’s source code, build from source and then install using pip. Then the newly exposed header file would appear inside
/usr/local/lib/python3.11/site-packages/numpy/core/include/numpy. (Conversely, including the header files directly from the numpy source repo, i.e. not from the installed python numpy package, throws an error that tells me not to include directly from source repo.)

So I have a high-level idea but I’m not sure about the implementation details of this plan.

Assume I want to import the function

NPY_NO_EXPORT PyObject *
array_richcompare(PyArrayObject *self, PyObject *other, int cmp_op)

from
/path-to-numpy-repo/numpy/_core/src/multiarray/arrayobject.h

What would be the steps necessary to expose / import just array_richcompare?

What would be the steps necessary to expose / import all functions defined in arrayobject.h?

What would be the steps necessary to expose / import all header files in the NumPy repo?

Thanks

What I've tried so far

I removed all NPY_NO_EXPORT modifiers from the repo. Also, I removed all instances of the following clause

#ifndef _MULTIARRAYMODULE
#error You should not include this
#endif

, e.g. from the following file
/path-to-numpy-repo/numpy/_core/src/multiarray/arrayobject.h.
Then I built NumPy using the official instructions, i.e. I ran
pip install ..

The newly generated numpy python package include directory still contained the same header files as before (no new ones).

When I then ran my C++ executable, there was some error. I can't recall precisely what it was but most likely it was a compiler error complaining that the array_richcompare symbol from the private API is undefined.

Error message:

No response

Additional information:

No response

@piotr-blaszyk-fyp piotr-blaszyk-fyp added the 32 - Installation Problems installing or compiling NumPy label May 24, 2024
@seberg
Copy link
Member

seberg commented May 24, 2024

You would have to properly link against NumPy and AFAIK that is only possible if you bundle your own copy of NumPy and don't interact with any normally installed NumPy.

So, it doesn't make sense for a Python module/package for example unless you explicit add functionality to the public API.

Before diving further, you should probably reconsider the path and/or explain why you are even interested in it. NumPy arrays are Python objects and not light-weight, I would be surprised if you even see a measurable speed improvement by using array_richcompare directly.

@seberg seberg added 33 - Question Question about NumPy usage or development and removed 32 - Installation Problems installing or compiling NumPy labels May 24, 2024
@piotr-blaszyk-fyp
Copy link
Author

piotr-blaszyk-fyp commented May 24, 2024

Thanks for the swift reply!

I’m creating a database query execution engine that uses NumPy. The idea is that the user can do both relational algebra processing (using other engines within the database) and linear algebra / mathematical processing using NumPy. The benefit is that the user doesn’t need to copy the data out of the database for processing, e.g. to a separate Python shell / interpreter.

I know that I can also start the Python interpreter inside a C++ executable and use PyRun_SimpleString to execute arbitrary python code (including calls to numpy). This method supports passing numpy arrays from C++ into Python and back to C++ without copying any data. I was hoping that using the NumPy C-API instead would have a higher performance.

Would you say that using numpy inside PyRun_SimpleString is just as fast as calling the NumPy C-API functions?

@ngoldbaum
Copy link
Member

Would you say that using numpy inside PyRun_SimpleString is just as fast as calling the NumPy C-API functions?

I would expect ~50% speedups at best, similar to compiling with cython: https://cython.readthedocs.io/en/latest/src/tutorial/pure.html

@piotr-blaszyk-fyp
Copy link
Author

Oh, cython is an interesting choice - didn’t consider this route before.

To sum up I can use

  1. Python interpreter
  2. Cython
  3. NumPy C-API

Given that I care about performance, I would like to use either Cython or the NumPy C-API. Writing plain C / C++ code, though, seems easier than importing a compiled cython module inside C++.

I am fine with building my own NumPy from source (have already done this using the in-place build) as my project is a research one rather than something that's meant to be distributed to end users.

Could you please tell me the steps needed to build NumPy from source so that all private header files and functions get exposed?

@ngoldbaum
Copy link
Member

Could you please tell me the steps needed to build NumPy from source so that all private header files and functions get exposed?

I think you're going to have to figure all this out yourself, this is definitely not a use-case we support.

Also check out numba, pythran, or transonic.

@ngoldbaum ngoldbaum added the 57 - Close? Issues which may be closable unless discussion continued label May 24, 2024
@seberg
Copy link
Member

seberg commented May 25, 2024

To reiterate: if you build a modified NumPy, you are on your own and ignoring all recommendations for interacting with NumPy.
I could also say: don't do it unless you know for a fact that you must do it (which still leaves your on your own).

To sum up I can use

There are many tools as Nathan also suggested (additionally nanobind/pybind11, nanobind has no explicit NumPy support, but that isn't usually needed, it does have the buffer protocol maybe(?)). No-copy interaction is simple and exceedingly common in many variations.

@seberg seberg closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development 57 - Close? Issues which may be closable unless discussion continued
Projects
None yet
Development

No branches or pull requests

3 participants