Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Undefined symbol when compiling shared library for numpy 2.0 #26091

Open
duburcqa opened this issue Mar 20, 2024 · 31 comments
Open

BUG: Undefined symbol when compiling shared library for numpy 2.0 #26091

duburcqa opened this issue Mar 20, 2024 · 31 comments
Labels

Comments

@duburcqa
Copy link

duburcqa commented Mar 20, 2024

Describe the issue:

I am getting some undefined symbol error when compiling a shared library (C Python Module) that is linking against yet another shared library (C Python Module), both compiled against Numpy 2.0. Everything was fine before the update of Numpy to 2.0.0b1.

Does anyone have any idea regarding what is causing this issue?

Reproduce the code example:

I don't have a MRE. I'm running the following Github Action pipeline

I could write one if really necessary. Basically I compile the library Eigenpy and then I linked my own module on it. It is the latter that contains undefined symbols.

Error message:

ImportError: /home/runner/.local/lib/python3.10/site-packages/jiminy_py/core/core.cpython-310-x86_64-linux-gnu.so: undefined symbol: EIGENPY_ARRAY_APIPyArray_RUNTIME_VERSION

Python and NumPy Versions:

Python 3.10, Numpy 2.0.0b1

Runtime Environment:

No response

Context for the issue:

No response

@charris
Copy link
Member

charris commented Mar 20, 2024

Everything was fine before the update.

What update?

@duburcqa
Copy link
Author

Before the migration to Numpy 2.0.0b1.

@ngoldbaum

This comment was marked as outdated.

@duburcqa
Copy link
Author

duburcqa commented Mar 20, 2024

It's hard to say without more info and spending a lot of time to understand the build of your project

Yes, that is what I was afraid of... I was hoping it could be useful for someone involved in the development of numpy 2.0.

using a build of eigen that was compiled against numpy 2.0.

I'm building everything myself from source, so yes I'm sure I'm using the exact same numpy version (2.0.0b1).

@duburcqa
Copy link
Author

What I don't understand is that PyArray_RUNTIME_VERSION should not be a symbol in the first place no ? By looking at the source code it looks like a preprocessor macro.

@ngoldbaum

This comment was marked as outdated.

@duburcqa
Copy link
Author

OK I think the difference is here:

Numpy 2.0.0b1

#if defined(NO_IMPORT) || defined(NO_IMPORT_ARRAY)
extern void **PyArray_API;
extern int PyArray_RUNTIME_VERSION;
#else
#if defined(PY_ARRAY_UNIQUE_SYMBOL)
void **PyArray_API;
int PyArray_RUNTIME_VERSION;
#else
static void **PyArray_API = NULL;
static int PyArray_RUNTIME_VERSION = 0;
#endif
#endif

Numpy 1.26.4

#if defined(PY_ARRAY_UNIQUE_SYMBOL)
#define PyArray_API PY_ARRAY_UNIQUE_SYMBOL
#endif

#if defined(NO_IMPORT) || defined(NO_IMPORT_ARRAY)
extern void **PyArray_API;
#else
#if defined(PY_ARRAY_UNIQUE_SYMBOL)
void **PyArray_API;
#else
static void **PyArray_API=NULL;
#endif
#endif

@duburcqa
Copy link
Author

duburcqa commented Mar 20, 2024

Here Is what Eigenpy is doing:

#ifndef PY_ARRAY_UNIQUE_SYMBOL
#define PY_ARRAY_UNIQUE_SYMBOL EIGENPY_ARRAY_API
#endif

#include <numpy/numpyconfig.h>
#ifdef NPY_1_8_API_VERSION
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#endif

#include <numpy/ndarrayobject.h>
#include <numpy/ufuncobject.h>

Does it make sense to you or is it hard to tell ?

@seberg
Copy link
Member

seberg commented Mar 20, 2024

The point is that you use Eigens(?) PY_ARRAY_UNIQUE_SYMBOL together with NPY_NO_IMPORT from a different compilation unit and Eigen has not been recompiled with NumPy 2.

If that is a serious use-case I have to think a bit how to deal with it.

@ngoldbaum
Copy link
Member

So eigenpy is defining PY_ARRAY_UNIQUE_SYMBOL to be EIGENPY_ARRAY_API, then it #include's the numpy headers. __multiarray_api.h has this block:

#if defined(PY_ARRAY_UNIQUE_SYMBOL)
    #define PyArray_API PY_ARRAY_UNIQUE_SYMBOL
    #define _NPY_VERSION_CONCAT_HELPER2(x, y) x ## y
    #define _NPY_VERSION_CONCAT_HELPER(arg) \
        _NPY_VERSION_CONCAT_HELPER2(arg, PyArray_RUNTIME_VERSION)
    #define PyArray_RUNTIME_VERSION \
        _NPY_VERSION_CONCAT_HELPER(PY_ARRAY_UNIQUE_SYMBOL)
#endif

which presumably your build hits.

I'm not sure offhand if we test this sort of build, ping @seberg since he did this work on adding PyArray_RUNTIME_VERSION

@duburcqa
Copy link
Author

duburcqa commented Mar 20, 2024

The point is that you use Eigens(?) PY_ARRAY_UNIQUE_SYMBOL together with NPY_NO_IMPORT from a different compilation unit

It is indeed defining NO_IMPORT_ARRAY here:

#define NO_IMPORT_ARRAY
#include "eigenpy/numpy.hpp"
#undef NO_IMPORT_ARRAY

and Eigen has not been recompiled with NumPy 2.

Eigenpy has been recompiled with NumPy 2.

If that is a serious use-case I have to think a bit how to deal with it.

It is hard to tell for sure if this is intended or not... But using Eigenpy to expose Eigen for other libraries is definitely a serious usecase. The library is used by pinocchio, hpp-fcl, crocoddyl... and many others INRIA projects.

@duburcqa
Copy link
Author

I think this pattern is coming from Boost::Python which introduces it here. Both Eigenpy and my own library are using Boost::Python over pybind11 to generate bindings for historical reasons.

@seberg
Copy link
Member

seberg commented Mar 20, 2024

Yeah, but boost python is header only. While here it seems this isn't a single compilation unit. At least that is my only explanation.

The only way this makes sense to me is that for some reason you are finding an older eigenpy version at runtime. It is not even clear to me that it makes any sense at all for eigenpy's numpy.hpp to be public API at all.

So, I am not clear whether we need a solution that always works, a work-around for these odder cases where it is necessary for some reason, or nothing at all and some part of the project just has not been recompiled with NumPy 2 (e.g. compilation cache).

@duburcqa
Copy link
Author

The only way this makes sense to me is that for some reason you are finding an older eigenpy version at runtime

Indeed ! This is it... I'm explicitly doing this and I forgot about it... Sorry about that.

@ngoldbaum
Copy link
Member

Closing then, let us know in a new issue if you have any other issues handling numpy 2.0.

@ngoldbaum ngoldbaum closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
@duburcqa
Copy link
Author

It is not even clear to me that it makes any sense at all for eigenpy's numpy.hpp to be public API at all.

Good point. Is it difficult to make it private API ?

Maybe I should not support this scenario in the first place, but I would love to be able to let the user use its own version of eigenpy at runtime without causing such undefined symbol error. It was working fine so far.

@duburcqa
Copy link
Author

Closing then, let us know in a new issue if you have any other issues handling numpy 2.0.

You want me to open a new issue to continue this discussion ? If you guys have any advice about how this scenario should be handled.

@ngoldbaum
Copy link
Member

ngoldbaum commented Mar 20, 2024

Ah sorry for closing a little hastily. I thought you didn't want to do this.

So I think what will work going forward is to build against numpy 2.0, and then use those builds to support numpy 1.x and numpy 2.0. The numpy ABI is now backward compatible - we can expose older ABIs if you compile against newer numpy. It's not possible for the other way to work, so I don't think what you're trying to do will work with numpy 2.0. Maybe sebastian can comment further.

@ngoldbaum ngoldbaum reopened this Mar 20, 2024
@duburcqa
Copy link
Author

Yes, this is what I am currently doing, with great success actually ! What does not work is:

  • I build eigenpy against numpy 2
  • I build my library against numpy 2, and linked with eigenpy too
  • At runtime, the user decides to use its own binaries of eigenpy (same version tag obviously) and I want to be able to use its symbol instead of loading my own instance of the shared library concurrently

This scenario makes sense in the context of Boost::Python actually, the "global" registry of from-to python converter works requires to have a unique instance of boost python shared liabry to be loaded at the same time, otherwise inter-operability between modules would be broken.

@seberg
Copy link
Member

seberg commented Mar 20, 2024

Good point. Is it difficult to make it private API ?

I don't know? Within the private API, you can still include ndarraytypes.h which should give you most things you need (in case you need part of the NumPy stuff to be public).
Late EDIT: Actually that part isn't quite true :(, exporting any NumPy symbols has issues at last in theory since we everyone to use the headers you used.

Most of the stuff in the numpy.hpp seems for internal use (e.g. to register new dtypes), but it also is maybe intentional to make call_PyArray_SimpleNew public?! But it is a static inline function so it requires using the NumPy-API via eigenpy if used downstream?

Maybe I should not support this scenario in the first place

I don't know. It seems to me that leaking NumPy API out is bad (i.e. any include of ndarrayobject.h should be confined to a private header). But I am not surprised that this happens.

We can easily work around this, by just defining PyArray_RUNTIME_VERSION to something else (maybe slightly slower).

But a caveat remains: The caller of import_array promises that every user is NumPy 2 ready (and we assume that any user of NumPy headers calls import_array at some point at least to some degree).

That is a potential issue, but I guess one we just have to live with, and honestly, one that I suspect nobody is likely to run into in practice.

@duburcqa
Copy link
Author

duburcqa commented Mar 20, 2024

Thank you for your very insightful rely! Now I understand better how to get around the issue. I will try to propose a MR for eigenpy that makes it private and see if it helps.

That is a potential issue, but I guess one we just have to live with, and honestly, one that I suspect nobody is likely to run into in practice.

I'm not 100% sure if this means that my usecase is not supported, but i can assure you that the problem I'm running into is real. My library is one among many other relying on Eigenpy. Mine is the only one related to INRIA that got some traction. From this perspective, i want to be able to ensure inter-operability with all of them without requiring the user to build my library from source while all the others are already distributed as pre-compiled binaries. It is easy for me to add as requirement a specific eigenpy version, but I have no way to enforce the version of numpy used to compile it. This means that I have to encourage all the maintainers of all these projects to distributed wheels compiled against numpy 2.0 if I want to be able to do the same with mine. That is going to be challenging... i would say a "dirty" workaround would be fine for me.

@seberg
Copy link
Member

seberg commented Mar 20, 2024

Well we'll need to find a solution whether in NumPy or not, but it is still unclear where/what that is.

I think that if you just add a n#include "numpy/ndarrayobject.h" before the other includes things will magically work, since you are probably not using the NumPy API (directly) in a way that requires an explicit import (i.e. you use it through a boost::python::numpy but that seems its own compilation unit).

(I.e. you would get a static define for these symbols, and would be expected to call import_array(), but since you don't actually need that API at all, things are fine.)

I still suspect eigenpy shouldn't transitively expose the NumPy API (which it does by setting unique-symbol and including ndarrayobject.h for you). And I also don't think it needs to at all.
But, while I think eigenpy doesn't really need to, I am not sure that no-one else has a better reason.

(I wish I there was a better way to do this unique symbol dance.)

@duburcqa
Copy link
Author

i.e. you use it through a boost::python::numpy but that seems its own compilation unit

Unfortunately, this is a wrong assumption. Both Eigenpy and my own library are using barebone C-API of Numpy because not everything has been exposed in boost::python::numpy, typically when it comes to custom scalar type registration or memory alignment handling.

@seberg
Copy link
Member

seberg commented Mar 21, 2024

Both Eigenpy and my own library are using barebone C-API

Let me summarize the situation here a bit:

  • NumPy is set up to allow a project to have multiple c/cpp files which share one API table with a single import_array() call. This is the "Unique Symbol + No Import" mechanism.
    • This is really mainly meant for multiple C-files which are bundled into a single .so I think. Not really for sharing across library lines (but nothing prevents that).
  • eigenpy sets the unique symbol, which means it shares it's API table with any project that includes it (i.e. at runtime you must find the eigenpy .so).
    • N.B.: The point of the NumPy setup is exactly to avoid any direct runtime symbol lookup, the API is instead shared through Python API.

This means that:

  • eigenpy transitively exposes NumPy C-API to user (although I doubt they need this, i.e. users could include numpy themselves). However, eigenpy does this even if users do not need any NumPy API.
  • In your situation, you do need the NumPy API, and you could take care of the ndarrayobject.h and import_array() call yourself in principle. In fact, a (maybe strange) approach would be to:
    • Add #include "eigenpy/numpy.hpp" before other includes that might pull in NumPy.
    • Add if (_import_array() < 0) { throw ... } somewhere in the module init (where you call eigenpy::enableEigenPy())
      That will define the symbol locally to your library/project, which, while using the same name, should overshadow those from eigenpy, I think.

"Simple" Solution: We can easily redefine PyArray_RUNTIME_VERSION to be one of:

// First could be 0 in some cases, but its fine 0 means "old", second is a function call.
#define PyArray_RUNTIME_VERSION ((int)PyArray_API[1])
#define PyArray_RUNTIME_VERSION PyArray_GetNDArrayCVersion()

both of which should be fine in practice.

Problem:

  • Due to this transitive export you never call import_array() from your library which means you have no compatibility checks:
    • If your library is compiled with NumPy 1.x it might not notice it is running in NumPy 2 and just break/return wrong things.
    • You cannot really use new NumPy API because you never do the runtime check if that API is actually available.
  • Unfortunately, I guess the NumPy headers make it somewhat easy to do this even unintended.

So we can change the above. Things will work, but you lose those checks and balances. I suspect we can live with it, but...


... What bothers me most, is not what we do right now, but that I would like a way forward so that eigenpy stops transitively exporting the NumPy API, ideally without losing anything.

For example, a pattern of #define DEFINE_EIGENPY_ENABLE which must be added to be able to call eigenpy::enableEigenPy(); and making eigenpy::enableEigenPy a static inline entry-point which calls import_array()/import_numpy() as static inline would work. But in practice, I am not sure if some libraries don't even get away without it entirely, so not sure how well just overshadowing works.

@mattip
Copy link
Member

mattip commented Mar 21, 2024

Maybe we should get some eigenpy people in on this: @jcarpent or @jorisv?

@duburcqa
Copy link
Author

Thank you @seberg , all these details are helping a lot. I will wait for a feedback from eigenpy team before deciding going for one direction or the other.

@seberg
Copy link
Member

seberg commented Mar 21, 2024

One new observation: Windows defaults to private symbols for shared libraries. Thus, it seems like your code is only working because you force static linking of eigenpy on windows.
And I guess there is a good chance the only reason you do this is the NumPy API use!?

@duburcqa
Copy link
Author

Windows defaults to private symbols for shared libraries. Thus, it seems like your code is only working because you force static linking of eigenpy on windows.

Indeed I do, but only for historically reason (it was easier at the beginning to ensure that the C module can be loaded on any machine if it has no external dependency at runtime). Now, I'm "repairing" the generated wheel using machomachomangler (much like what auditwheel does on Linux), so it is not an issue anymore. But since it was working this way, I never tried to go back to shared library on Windows. Especially because the scenario I mentioned before only makes sense on Unix, since everybody that is using Eigenpy for serious research and business is using MacOS or Linux but never Windows, so inter-operability is much less revenant in this case.

@seberg
Copy link
Member

seberg commented Mar 21, 2024

(sorry for the long post...)

but only for historically reason

No, this is the reason for this old PR stack-of-tasks/eigenpy@7a57c19, but that isn't enough at all, since eigenpy makes all of it's internals public and they randomly use the NumPy API. You must link eigenpy statically on windows unless you either:

  • deal with NumPy includes before eigenpy does so.
  • Or just happen to not use any API which might need NumPy.

I also do wonder how much you actually care about this behavior: The wheels you package must include eigenpy so presumably you can make sure it is compiled with the newest NumPy. conda-forge, etc. will also be able to just handle those dependencies, fine I presume.


For NumPy, I think we ideally "break" eigenpy here. That is, default to the windows behavior and hide the import away (up to you to undo that). That doesn't make existing usage safer (it could crash in weird ways if not recompiled for NumPy 2).

Now, since compilers optimize things away that are unused, I could imagine things are actually somewhat fine after that, although it would be nice if eigenpy could at least only include ndarraytypes.h rather than (nd)arrayobject.h if imported as public API (and remove the #define Py_ARRAY_UNIQUE_SYMBOL and #define NPY_NO_IMPORT).
I don't think that works for all files, but maybe for all files which get included by the fwd.hpp.

That will mean that some eigenpy features will fail to link/compile. Most likely nobody will notice that part (you and pinoccio seem to only really call a few initialize functions from it.

Of course jiminy-py uses the NumPy API! So it'll have to implement the import_array call and the unique-symbol, etc. which is currently implicitly done by eigenpy.
If you add that to jiminy-py it will just work perfectly with any eigenpy version.


It might be nice to simplify the include pattern a bit at least for C++, I can imagine that would be easy (i.e. a bit similar to what pybind11 is doing).

I also wish we had a numpy/stubs.h header so that a project like eigenpy could define its function with struct PyArray_Descr *func(struct PyArrayObject *arr) without including the full ndarraytypes.h header.

@duburcqa
Copy link
Author

duburcqa commented Mar 21, 2024

You must link eigenpy statically on windows

Yes, I agree with you. What I meant is that I just never tried to link dynamically because I wanted to link statically from the start to improve portability. I never had in mind to link dynamically. But yes for sure, it would not work anyway.

@seberg
Copy link
Member

seberg commented Mar 21, 2024

I am considering escalating this a bit for you (because I think it clarifies things and is generally better), see gh-26103.
I.e. breaking the linking always, unless you add a define. I am not sure that is the right thing, but this whole windows oddity about windows needing static linking anyway makes me feel this is all very brittle.

But I am not sure... I am also fine with just "helping" you here (could be tied into that same define also). Although I am not even sure that the failing use-case is important (dynamically linking with an eigenpy that was compiled with an older NumPy version than your own library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants