Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Handle python packages that have shared object files #55

Closed
wants to merge 3 commits into from

Conversation

jac-stripe
Copy link

@jac-stripe jac-stripe commented Nov 16, 2017

Some python packages have shared object files that are required to be extracted in order for the system to execute them. This patch extracts all packages that include shared objects to a tmp dir so that the so files can be run properly.

@jac-stripe
Copy link
Author

I managed to get rid of the deps. I think its better to have hack tests than to pollute the WORKSPACE with dependencies, but I left both commits as a comparison.

@duggelz
Copy link
Contributor

duggelz commented Nov 16, 2017

Thanks for this PR. This is very similar to what the internal Google implementation does.

However, I've already been working on a slightly different implementation, with the goal of integrating more closely with the standard setuptools and pkg_resources modules, and the standard dist-info-style metadata. You can see some of the groundwork in #52.

If the pkg_resource approach doesn't work for some reason, or to handle cases where pkg_resources is not available, we could still have something like this as a fallback. But I'm probably not going to merge this PR in this form, unfortunately.

@jmillikin-stripe
Copy link

Hello! I've been helping Josh write this, with the goal of getting a runfiles-like tree so we can par_binary things that use NumPy.

If the pkg_resource approach doesn't work for some reason, or to handle cases where pkg_resources is not available, we could still have something like this as a fallback. But I'm probably not going to merge this PR in this form, unfortunately.

pkg_resources works for data files, but it doesn't seem to support importing C extension modules. The zipimport docs are very clear that C extensions can't be imported directly from a zip. There's a few web pages that suggest pkg_resources is capable of working around this somehow (e.g. http://setuptools.readthedocs.io/en/latest/pkg_resources.html), but we've not been able to get that working even for non-subpar zips.

Do you have any advice on how to proceed here? It would be great if we could somehow extract only the .so files and then inject those into the standard import mechanisms, but we couldn't get NumPy to accept anything less than a full extraction of its subtree.

@duggelz
Copy link
Contributor

duggelz commented Nov 16, 2017

How badly are you blocked here? As a hack, you should be able to take this code and essentially stuff it into your main source file before importing numpy/scipy. I ask, because I'm going to be out of office for the next two weeks, so nothing's going to happen either way for a while.

pkg_resources ties into the various forms of Python metadata to, for example, add shared libraries to the "eager resources" list, and/or use the "zip safe" attribute. The result is the same, extracting to a temp directory (note that considerable additional complexity is added if you try to share extractions between multiple processes safely), but is a standard mechanism, instead of one more "strange Google thing". I need to do more investigation though.

This shared library problem has been a thorn in my side since 2002, so I'm loathe to go down the same path I've been down before. Inside Google, we patched libc to allow importing from shared libraries stored in zip archives, and some other hacks are used both inside Google and some other companies I know of, including various kernel patches. As an additional hurdle, people quite reasonably also want Windows and MacOS support (.dlls have some of their own peculiarities).

@duggelz
Copy link
Contributor

duggelz commented Nov 16, 2017

I'm also having a discussion right now with the Bazel team, and it was brought up that maybe the --build_python_zip flag to Bazel obsoletes this entire discussion. That flag uses the "big hammer" approach of just extracting everything all the time, which I had rejected inside Google on performance/disk space concerns, but ... you know, maybe I'm too hung up on that, maybe it's fine? We arguably went way too far the other way inside Google, with elaborate extraction/fingerprint/verification/garbage collection schemes.

build_python_zip does have a logistical problem that it's a global flag, not per-target, and it's not specified in a BUILD file or WORKSPACE file, it has to be specified in .bazelrc or the command line. But that can presumably be remedied.

CC @meteorcloudy for comment and sanity check what I said above.

@jmillikin-stripe
Copy link

How badly are you blocked here? As a hack, you should be able to take this code and essentially stuff it into your main source file before importing numpy/scipy.

We're not hard-blocked, because we can point Bazel at our local branch until this gets merged into upstream.

I ask, because I'm going to be out of office for the next two weeks, so nothing's going to happen either way for a while.

Are there any other maintainers of this package that could review it in the next two weeks? If you could point us toward some docs on getting C extensions working with pkg_resources temp directories, we could try to make progress while you're out.

build_python_zip doesn't really work for us at the moment, because we're using the par_binary output as inputs to other rules. Thank you for the pointer, though!

@pcn
Copy link

pcn commented Nov 24, 2017

Just thought I'd mention #56 here since it is happening to me when I'm trying to build with --build_python_zip.

Clearly this is not a proper fix, but it does produce a zip with grpc that I can use: https://github.com/pcn/subpar/blob/tooling_around/compiler/cli.py#L106

@meteorcloudy
Copy link

That flag uses the "big hammer" approach of just extracting everything all the time, which I had rejected inside Google on performance/disk space concerns, but ... you know, maybe I'm too hung up on that, maybe it's fine?

build_python_zip will zip everything under runfiles, including transitive dependencies from deps and data. If there are too many and too big files under runfiles directory, it could cause some performance/dis space issues.

This is option is originally introduced to solve the problem of building py_binary on Windows, because we don't have runfiles on Windows(no symlink without admin right). So it's very straightforward, but if there are more requirements from other platforms, we can definitely improve it. For example, adding an attribute to py_binary for the executable zip functionality is already on my to-do list.

@yliu120 yliu120 mentioned this pull request Dec 11, 2017
@duggelz
Copy link
Contributor

duggelz commented Dec 15, 2017

Ok, setuptools eager resources was a dead end. If you don't mind, I'm going to reorganize this a bit, and make it conditional on a zip-safe flag to match PEX and Python distutils.

@jac-stripe
Copy link
Author

Sure! That would be much appreciated!

@hwright
Copy link

hwright commented Dec 15, 2017

Out of curiosity, would these approaches help with issues like bazelbuild/rules_python#48?

@duggelz
Copy link
Contributor

duggelz commented Dec 15, 2017

It addresses the runtime part, we still need to figure out the build-time part. E.g. make a py_extension rule.

@k4y3ff
Copy link

k4y3ff commented Jan 11, 2018

Unfortunately, I don't have anything to contribute to this conversation, though I thought I'd leave a comment to register interest in getting this PR (or something like it) merged! 💞

tl;dr Currently setting up initial builds of some Python projects for internal use, was hoping to use Bazel + subpar, realized within a day or two that almost all the projects we'd be building fall under this use case, maintaining an internal fork of subpar isn't an option right now.

@duggelz
Copy link
Contributor

duggelz commented Mar 12, 2018

Hopefully this is addressed by #65, and will be further address by long-term Bazel changes. Reopen if that is not the case.

@duggelz duggelz closed this Mar 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants