Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitelisting of libraries shipped by other wheels #76

Closed
xhochy opened this issue Jun 28, 2017 · 10 comments · Fixed by #368
Closed

Whitelisting of libraries shipped by other wheels #76

xhochy opened this issue Jun 28, 2017 · 10 comments · Fixed by #368

Comments

@xhochy
Copy link

xhochy commented Jun 28, 2017

In turbodbc we link against pyarrow in such a fashion that turbodbc picks up the Arrow libraries via a relative RPATH in the same virtualenv. Thus we don't need to include them in the manylinux1 wheel of Turbodbc. With auditwheel's current behaviour they are sadly grafted into turbodbc's wheel. To fix this, two options come to my mind:

  • whilelist all libraries that are contained in the virtualenv
  • explicitly whitelist some libraries via a commandline option

I'm happy to implement one of these approaches but I would like to hear what the preferred one would be.

@MathMagique
Copy link

I would like to point out that turbodbc consists of several libraries. Only a supporting library links against pyarrow. If pyarrow is not present in the environment, turbodbc will still work, but with features requiring pyarrow being disabled. The user is free to install pyarrow at a later time to enable this feature set.

Thus, it is not desirable to force all users of turbodbc to have pyarrow installed.

@ogrisel
Copy link
Contributor

ogrisel commented Jun 28, 2017

whilelist all libraries that are contained in the virtualenv

I am afraid that this can have unintended consequences although I cannot see any at the moment.

Maybe we could have a command line argument to enable this behavior explicitly but keep it disabled by default.

explicitly whitelist some libraries via a commandline option

That sounds safe to do as it would not impact the default behavior of auditwheel repair. When documenting the option one should be careful to describe when it's ok to use this option.

@njsmith
Copy link
Member

njsmith commented Jun 28, 2017

I'm not a big fan of the use of RPATH like this, because it's fragile: virtualenvs are only one possible deployment layout for python packages. What if someone puts these two packages into different directories on PYTHONPATH? It's even possible to end up with a situation where you use the C library from one version of pyarrow together with the python code from a different version.

The general solution is something like my pynativelib proposal, but that's beyond the scope of this issue... I'm just bringing this up here to say that I don't think auditwheel should add hardcoded support for this (ab)use of RPATH specifically.

An explicit whilelist seems fine though.

@rmcgibbo
Copy link
Member

rmcgibbo commented Jun 28, 2017

I don't think this is a good idea.

The big idea in the manylinux1 spec is that you can't ship shared libraries that have ELF-level dependencies on other .so files not included in the wheel (beyond the ones that all supported linux distros are guaranteed to have).

My understanding is that you explicitly don't want this. You want a shared library in the turbodc wheel to have an ELF-level dependency on a .so that it doesn't ship (one that pyarrow ships). A wheel with such an ELF-level dependency doesn't satisfy PEP 513.

Obviously there's nothing to stop anyone from retagging non-manylinux1 wheels as "manylinux1". PyPI doesn't run auditwheel to verify the accuracy of the tag when you upload a wheel, it just reads whatever tag it finds in the wheel and trusts it.

But I'm -1 on adding features to auditwheel, in its capacity as the "PyPA-blessed manylinux1 retagging tool", whose explicit purpose is to create wheels that are incorrectly labeled as manylinux1 and don't conform to PEP 513. That's exactly what a whitelist would do. It seems like a big footgun to me.

@rmcgibbo
Copy link
Member

I'm not a big fan of the use of RPATH like this, because it's fragile: virtualenvs are only one possible deployment layout for python packages. What if someone puts these two packages into different directories on PYTHONPATH?

Exactly. It's more work to implement, but one robust way to get the behavior that you want would be to remove the ELF-level dependency on the Arrow libraries. Instead of having them loaded by the dynamic linker, you could load them yourself using dlopen. This would allow you to resolve the paths at runtime and not depend on an assumed virtualenv directory layout.

@njsmith
Copy link
Member

njsmith commented Jun 28, 2017

There's a best of both worlds approach possible as well: you could keep the ELF-level dependency (so you don't need to rewrite all your code to use explicit dlopen/dladdr calls), but then before you import the extension that's linked to pyarrow, first figure out where the library is (e.g. by importing pyarrow and looking to see where the python loader found it), and then dlopen the library you want. When the ELF loader is searching for libraries, the first thing it does is to see if it's already got a library with the appropriate name loaded, and if so then it skips its normal searching logic and just uses that. So this effectively lets us replace the normal ELF search path with something based on PYTHONPATH and keep everything consistent.

Basically the pynativelib idea is this + similar tricks for Windows and MacOS + some conventions to avoid accidental namespace collisions.

@anntzer
Copy link
Contributor

anntzer commented Sep 17, 2017

In fact you can just use ctypes for this; see e.g. https://github.com/anntzer/mpl_cairo/blob/master/mpl_cairo/__init__.py (let me know if the example is unclear).

The advantage of using ctypes is that you can write your C code as usual, without any dlopen() (which may be helpful when trying to keep things cross-platform); you just don't link the shared object when building a manylinux wheel and rely on ctypes to load the needed symbols globally before loading the extension module.

@brills
Copy link

brills commented Jul 18, 2019

Hi All,

In light of pypi/warehouse#5420 , what is the recommend way to make a wheel that depends on shared libraries shipped by other wheels pass the audition without pulling those shared libraries in?

Per suggestion by @njsmith:

There's a best of both worlds approach possible as well: you could keep the ELF-level dependency (so you don't need to rewrite all your code to use explicit dlopen/dladdr calls), but then before you import the extension that's linked to pyarrow, first figure out where the library is (e.g. by importing pyarrow and looking to see where the python loader found it), and then dlopen the library you want. When the ELF loader is searching for libraries, the first thing it does is to see if it's already got a library with the appropriate name loaded, and if so then it skips its normal searching logic and just uses that. So this effectively lets us replace the normal ELF search path with something based on PYTHONPATH and keep everything consistent.

Basically the pynativelib idea is this + similar tricks for Windows and MacOS + some conventions to avoid accidental namespace collisions.

if the ELF-level dependencies are kept (I assume that's the DT_NEEDED section in the header), I don't see how auditwheel will be happy without trying to pull in recursively what's in DT_NEEDED?

The other suggested way, which I assume involving a dlopen(RTLD_GLOBAL), which will pollute the global symbol look up table, seems to be dangerous (also see [1]).

[1] Section 1.5.4 "Lookup Scope", How To Write Shared Libraries, https://www.akkadia.org/drepper/dsohowto.pdf

@njsmith
Copy link
Member

njsmith commented Jul 18, 2019

@brills Correct, right now auditwheel doesn't have any support for pynativelib-style library dependencies. This hasn't caused any problems so far because there are several pieces that need to be implemented to make those work, and no one has implemented the other pieces either. But if you want to change that, then yeah, extending auditwheel needs to be one of the steps on your to-do list.

My suggestion for the auditwheel part: come up with a unique naming convention for shared libraries that are using this approach – for example, call them pynativelib-<pypi name>-<version>.so – and then update auditwheel to notice when a DT_NEEDED matches pynativelib-* and skip vendoring those libraries.

@mattip
Copy link

mattip commented Oct 9, 2022

I think a command line argument to auditwheel makes more sense than a known prefix. xref #368 for adding an --exclude argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants