Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building manylinux wheels #19

Merged
merged 20 commits into from
Jun 22, 2024
Merged

Building manylinux wheels #19

merged 20 commits into from
Jun 22, 2024

Conversation

mwydmuch
Copy link

@mwydmuch mwydmuch commented Jun 8, 2024

This PR adds support for building manylinux wheels using cibuildwheel tool, with manylinux2014 compatibility for both x84_64 and aarch64 (arm) architectures. This will allow installation of NLE on almost any Linux distro as well as Google Colab or similar platforms that don't support building packages from source.

Changes

  • cibuildwheel config added to pyproject.toml
  • New jobs responsible for building wheels added to test_and_deploy.yml
  • CMakeLists.txt now searches for bzip2/bz2 lib using find_package
  • Because bzip2 is 3rd party dependency, auditwheel places it under nle.libs and ships inside wheels, to support envs without bzip2 installed.
    Because nle/nethack/nethack.py makes an unlinked copy of libnethack.so, to allow proper linking to bzip2.so shipped with wheel, the patchelf is installed and used to fix rpath for the temporary copy.
    I'm not a fan of this solution, but making the libnethack.so thread-safe is not a simple change, and the alternative of modifying env seems to be even less elegant.
    Unfortunately, this solution does not work with memfd_create or O_TMPFILE.
    Alternatively, bzip2 could be linked statically, but libnethack is not the only one linking against it in the project. <- We went with this one.

How to check the changes locally

To check the changes locally on Linux, one can run (requires docker and cibuildwheel installed) in the repo's root:

export CIBW_ENVIRONMENT="NLE_RELEASE_BUILD=1"  # For release build
cibuildwheel --platform linux --arch $(uname -m)
pip uninstall -y nle
pip install wheelhouse/nle-*3$(python3 -c "import sys; print(sys.version_info.minor)")*.whl

# Run tests
mkdir -p tmp
cd tmp
python3 -c 'import nle; import gymnasium as gym; e = gym.make("NetHack-v0"); e.reset(); e.step(0)'
python3 -m pytest --import-mode=append -svx ../nle/tests

What was tested

The following tests:

python3 -c 'import nle; import gymnasium as gym; e = gym.make("NetHack-v0"); e.reset(); e.step(0)'
python3 -m pytest --import-mode=append -svx ../nle/tests

where run on images of different Linux distros (Alma, Fedora, Rocky, Debian, Ubuntu, different versions) using a script similar to this one:
https://github.com/Farama-Foundation/stable-retro/blob/master/tests/test_cibuildwheel/test_cibuildwheel_linux.sh
All tests passed on all these distros.

Current TODOs

  • verify if these changes don't break anything that is not covered by tests [DONE]

@BartekCupial will help with that.

Possible extensions

A small change to test_and_deploy.yml will allow macOS wheels to be built in the same way, but currently, GH does not provide macOS ARM runners for free accounts.

@BartekCupial
Copy link

Tested on my local machine and in colab, LGTM!

@BartekCupial
Copy link

@heiner @mklissa can you take a look?

@mwydmuch mwydmuch changed the title [WIP] Building manylinux wheels Building manylinux wheels Jun 9, 2024
@mwydmuch
Copy link
Author

mwydmuch commented Jun 9, 2024

Tested on my local machine and in colab, LGTM!

Thanks @BartekCupial!

nle/nethack/nethack.py Outdated Show resolved Hide resolved
@heiner heiner requested a review from StephenOman June 9, 2024 12:16
@heiner
Copy link
Owner

heiner commented Jun 9, 2024

Adding @StephenOman to get a second opinion.

@heiner
Copy link
Owner

heiner commented Jun 9, 2024

Thanks a bunch for adding this! Looks very useful - installation woes were the biggest issue of NLE users.

Could you add some technical explanation, here or in a code comment, what this does and how?

Re: the memfd_create hack: The purpose of that is to not physically copy files (e.g., not write to a disk). Using sendfile under the hood, it should even manage to not even duplicate memory, but instead only reference the same pages (the DATA section of which will likely be copied by CoW when used).

@mwydmuch
Copy link
Author

mwydmuch commented Jun 9, 2024

Could you add some technical explanation, here or in a code comment, what this does and how?

Sure, I've added some more comments that I hope explain the changes.

@StephenOman
Copy link
Collaborator

This is a great idea, removing friction for people getting NLE running in their environments.

We also need to check the total binary sizes that are generated as I think PyPI has some limits on both the size of the binary and the total project size allowed.

@mwydmuch
Copy link
Author

mwydmuch commented Jun 9, 2024

We also need to check the total binary sizes that are generated as I think PyPI has some limits on both the size of the binary and the total project size allowed.

Actually, NLE wheels are very small. One wheel is ~3.0 MB (here is my GH action that already built all of them: https://github.com/mwydmuch/nle/actions/runs/9437511056, you can download the results and check). I'm not sure if it is up-to-date info, but in the past, the default limit for a single file was 100 MB and 10 GB for the whole project.

@heiner
Copy link
Owner

heiner commented Jun 9, 2024

As I said above, I'm very much in favor of this.

However, I'm still a bit confused on 2 accounts:

  1. Why do we need to call the patchelf tool? Isn't finding the .so a matter of LD_LIBRARY_PATH, or some other way of telling the linker where to look for dynamic libraries?
  2. I understand what patchelf does is reading the .so file and writing it back to the same path with certain changes (right?). If so, we should rewrite it once, and copy the rewritten version thereafter. I don't think we should call an external process every time we open a new environemnt.

@mwydmuch
Copy link
Author

mwydmuch commented Jun 9, 2024

  1. Why do we need to call the patchelf tool? Isn't finding the .so a matter of LD_LIBRARY_PATH, or some other way of telling the linker where to look for dynamic libraries?

It is related, as stated here: https://en.wikipedia.org/wiki/Rpath, the linker first looks in places specified in rpath/runpath of a file, then checks LD_LIBRARY_PATH env variable, then ld.so.cache, and finally default locations like /usr/lib, /lib etc.

So why I use patchelf? Because I believe we don't want to ask a user to set the LD_LIBRARY_PATH env variable to a specific value before running the script using NLE. Setting LD_LIBRARY_PATH inside the running script doesn't work, as doing something as os.environ['LD_LIBRARY_PATH'] = <some path> only modifies the environment for its subprocess but not for the process itself. We can restart the process to apply the change, but it can have bad consequences for a user in some cases. Alternatively, we can spaw subprocess with NetHack but that requires more code modifications. And both solutions seemed to me less elegant than just patching rpath.

  1. I understand what patchelf does is reading the .so file and writing it back to the same path with certain changes (right?). If so, we should rewrite it once, and copy the rewritten version thereafter. I don't think we should call an external process every time we open a new environment.

Yes, you are right, and we can do that. But I wasn't sure how to do it in the best way, cause:

  • Creating such a file in side-packages/nle may not be possible as it may not be writable (for example, if NLE was installed with sudo to be used system-wide).
  • Also, in some rare cases, the location of site-packages dir may change, so I believe it would be good practice to check from time to time if this modified .so file still uses the right rpath, which again requires a call to readelf/patchelf or usage of a similar tool. Sure this can be done once per script start.
  • So the only possibility I see is to create one patched copy in tmp (we duplicate the risk of leaving it there in case of unclean exit, but maybe it's fine) when nethack.py is imported to reuse it for all created NetHack objects. What do you think about it @heiner? Would that be better? Still I think that calling Patchelf every time is a simple and pretty good solution. I assume that the cost of calling the process is small compared to the cost of a later usage of the environment instance. But maybe I'm wrong here. @BartekCupial can you maybe tell us, if this version has some noticeable performance drop in real use-case, when multiple instances of the environment are used?

@mwydmuch
Copy link
Author

mwydmuch commented Jun 9, 2024

Ok, so I ran a quick benchmark on my machine, and calling patchelf every time when creating a new NetHack object increases the construction time from ~0.003s to ~0.012s, so it is relatively costly. For comparison, a single env.reset takes ~0.002s, and a single env.step ~0.00004s (its fast!). So yeah, maybe it's a good idea to reduce patchelf calls.

@heiner
Copy link
Owner

heiner commented Jun 10, 2024

Thanks for the explanation. One more question: Which specific file do we require the linker to find here? It wouldn't be the .so we patch with patchelf itself; instead, it would be a dependency loaded when it is loaded, right?

Now, I don't understand why such a dependency would reside in a special directory, but regardless these dependencies are not reloaded a second time anyway?

Re: How to modify it -- we could "simply" change the code to have a specific src instead of only a dest. At first, the src would be the system-installed copy, which we can copy once (e.g., via the memfd_create hack) and patch; afterwards we can copy that copy and don't require the patching. That should leave no risk of leftover files either (that is the point of the memfd_create hack).

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

I tried using linker namespaces for this back in ~2019 and moved away from it. I couldn't quite get it to work back then. A colleague at FB wrote a ELF interpreter that allowed isolation + resetting the data section (for restarts). We got that to work, including on MacOS via cross-compiling, but it's quite the machinery and the current system works and is efficient.

The other alternative of course is that we could enforce the singleton nature of NetHack by simply not allowing multiple versions of NLE to run on the same machine.

I'd not do that.

@mwydmuch
Copy link
Author

Hi @heiner and @StephenOman, thank you for your comments!

I believe I overthought the initial solution to the problem. I should just go with static linking of bzip2, I see it's a ~100 KB library (while libnethack.so is over 3 MB), it's linked in two places: libnethack and _pyconverter, so this way we duplicate the code, but after all this discussion I think, this is a small price to pay and maybe overall better solution.

This way, nothing changes in nethack.py as libnethack.so it doesn't link dynamically to 3rd party libraries. I think it is unlikely that Nethack will start to require some other 3rd party library in the feature, so there is no need for a general solution for linking libraries bundled with binary wheels.

So now, this PR is only about the building changes. I added the newest version of bzip2 1.0.X to the source, as the static version of the library is not available as an RPM package and I also had to write a simple CMakeFile.txt for it.

I hope you agree with that solution.

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

If it's at all helpful, we could drop bzip2. It's not a requirement from NetHack; we added it because I thought compressing all the ttyrecs is a good idea. We could move that to Python or whatever.

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

Anyway, this looks great. 0 objections or questions for this diff.

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

Looks like in the case of Debian/Ubuntu, bzip2 comes with a libbz2.a. That might not be true on other distros though. Given that it's not the first third_party dependency, but a simple one, I like this solution.

Great work Marek!

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

It just occurred to me the files added here might as well be a vector like in the xz hack. How hard would it be to make this a git submodule instead? We can probably create the CMakeFile on the fly?

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

E.g., we could use https://github.com/heiner/bzip2 (clone of git://sourceware.org/git/bzip2.git, which we could also try to use directly).

@mwydmuch
Copy link
Author

Looks like in the case of Debian/Ubuntu, bzip2 comes with a libbz2.a. That might not be true on other distros though. Given that it's not the first third_party dependency, but a simple one, I like this solution.

Great work Marek!

Thank you, @heiner!

Yeah, I find it a bit odd, but CentOS (and it seems that also other distros from the RedHat family) doesn't provide a static version of bzip2. Manylinux2014 image for building these wheels is based on CentOS, and this, unfortunately, cannot be changed, this version is the most commonly used due to it's high compatibility with even older distros.

It just occurred to me the files added here might as well be a vector like in the xz hack. How hard would it be to make this a git submodule instead? We can probably create the CMakeFile on the fly?

I think a lot of people add the source of bzip2 directly to their projects, as it's very lightweight, and it doesn't really need updating.

E.g., we could use https://github.com/heiner/bzip2 (clone of git://sourceware.org/git/bzip2.git, which we could also try to use directly).

If you prefer it this way, sure. But I think creating CMakeFiles.txt on the fly is not very elegant, so maybe we should a proper CMakeFile to your clone?

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

As you prefer. I'm not against including it manually but I'll compare the sha1s at some point to make sure wherever they are from they are the bzip2 we are expecting.

@heiner
Copy link
Owner

heiner commented Jun 12, 2024

The other option I guess is that we include the bzip2 logic in the main CMakeFiles.txt?

@mwydmuch
Copy link
Author

@heiner I've replaced the local version with a submodule pointing to the original bzip2 repo at git://sourceware.org/git/bzip2.git. I've added the building logic to the main CMakeFiles.txt as you suggested. I think this is overall a better solution than somehow injecting a file into the directory. However, some CMake purists may not like it ;)

Copy link
Owner

@heiner heiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! :shipit:

@StephenOman
Copy link
Collaborator

The tests and build wheels actions all passed successfully this morning although the aarch64 builds are quite slow (and there are four of them).

Should we restrict the wheel build & test to just the latest Python and x86_64 during the PR test suite, leaving the rest of them for when we're targeting a release to PyPI?

@mwydmuch
Copy link
Author

This is up to you. Indeed, building aarch64 wheels on QEMU is not very efficient. But maybe it is worth testing for at least one Python version since C++ changes may break building on only one architecture but not the other.

Should I change that?

@mwydmuch
Copy link
Author

mwydmuch commented Jun 14, 2024

Also, you may consider adding something like that at the beginning of the workflow file:

on:
  push:
    paths-ignore:
      - 'DEVEL/**'
      - 'dat/**'
      - 'doc/**'
      - 'docker/**'
      - '**.md'
      - '**.nh'
    branches: [main]
    release:
      types: [released]

It will only trigger the workflow if files related to the building of the package are changed (I am not sure if I listed the right paths).

@StephenOman
Copy link
Collaborator

This is up to you. Indeed, building aarch64 wheels on QEMU is not very efficient. But maybe it is worth testing for at least one Python version since C++ changes may break building on only one architecture but not the other.

Ok, that's a good compromise. The x86 build is very fast and the Python 3.11 aarch64 build takes ten minutes, which isn't too bad. I just want to avoid the situation where you'd have nearly 50-minute builds after each push.

@StephenOman StephenOman added the enhancement New feature or request label Jun 18, 2024
@StephenOman
Copy link
Collaborator

@mwydmuch In the interests of getting our version 1.0.0 out as soon as possible, I'm going to merge this as is (if you have no objections). We can put the suggested build workflow enhancements into a separate issue to be worked on afterwards.

@mwydmuch
Copy link
Author

@StephenOman Sorry for my lack of activity; I've got quite busy this week. I can add this change to build only 3.11 if it's not a release, now.

@mwydmuch
Copy link
Author

Ok, that should do. I think it is a good idea to do a separate PR for updating the workflows a little bit. E.g., test_package.yml is running the same tests as test_and_deploy.yml on macOS runners, for which one minute of the running time is worth 10 minutes of Ubuntu runners on GH. Here you can also save some resources.

@StephenOman StephenOman merged commit d4b7da6 into heiner:main Jun 22, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants