-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building manylinux wheels #19
Conversation
Tested on my local machine and in colab, LGTM! |
Thanks @BartekCupial! |
Adding @StephenOman to get a second opinion. |
Thanks a bunch for adding this! Looks very useful - installation woes were the biggest issue of NLE users. Could you add some technical explanation, here or in a code comment, what this does and how? Re: the memfd_create hack: The purpose of that is to not physically copy files (e.g., not write to a disk). Using |
Sure, I've added some more comments that I hope explain the changes. |
This is a great idea, removing friction for people getting NLE running in their environments. We also need to check the total binary sizes that are generated as I think PyPI has some limits on both the size of the binary and the total project size allowed. |
Actually, NLE wheels are very small. One wheel is ~3.0 MB (here is my GH action that already built all of them: https://github.com/mwydmuch/nle/actions/runs/9437511056, you can download the results and check). I'm not sure if it is up-to-date info, but in the past, the default limit for a single file was 100 MB and 10 GB for the whole project. |
As I said above, I'm very much in favor of this. However, I'm still a bit confused on 2 accounts:
|
It is related, as stated here: https://en.wikipedia.org/wiki/Rpath, the linker first looks in places specified in So why I use patchelf? Because I believe we don't want to ask a user to set the
Yes, you are right, and we can do that. But I wasn't sure how to do it in the best way, cause:
|
Ok, so I ran a quick benchmark on my machine, and calling patchelf every time when creating a new NetHack object increases the construction time from ~0.003s to ~0.012s, so it is relatively costly. For comparison, a single env.reset takes ~0.002s, and a single env.step ~0.00004s (its fast!). So yeah, maybe it's a good idea to reduce patchelf calls. |
Thanks for the explanation. One more question: Which specific file do we require the linker to find here? It wouldn't be the Now, I don't understand why such a dependency would reside in a special directory, but regardless these dependencies are not reloaded a second time anyway? Re: How to modify it -- we could "simply" change the code to have a specific |
I tried using linker namespaces for this back in ~2019 and moved away from it. I couldn't quite get it to work back then. A colleague at FB wrote a ELF interpreter that allowed isolation + resetting the data section (for restarts). We got that to work, including on MacOS via cross-compiling, but it's quite the machinery and the current system works and is efficient.
I'd not do that. |
…s in setup.py and nethack.py
Hi @heiner and @StephenOman, thank you for your comments! I believe I overthought the initial solution to the problem. I should just go with static linking of bzip2, I see it's a ~100 KB library (while libnethack.so is over 3 MB), it's linked in two places: libnethack and _pyconverter, so this way we duplicate the code, but after all this discussion I think, this is a small price to pay and maybe overall better solution. This way, nothing changes in nethack.py as libnethack.so it doesn't link dynamically to 3rd party libraries. I think it is unlikely that Nethack will start to require some other 3rd party library in the feature, so there is no need for a general solution for linking libraries bundled with binary wheels. So now, this PR is only about the building changes. I added the newest version of bzip2 1.0.X to the source, as the static version of the library is not available as an RPM package and I also had to write a simple CMakeFile.txt for it. I hope you agree with that solution. |
If it's at all helpful, we could drop bzip2. It's not a requirement from NetHack; we added it because I thought compressing all the ttyrecs is a good idea. We could move that to Python or whatever. |
Anyway, this looks great. 0 objections or questions for this diff. |
Looks like in the case of Debian/Ubuntu, bzip2 comes with a Great work Marek! |
It just occurred to me the files added here might as well be a vector like in the xz hack. How hard would it be to make this a git submodule instead? We can probably create the CMakeFile on the fly? |
E.g., we could use https://github.com/heiner/bzip2 (clone of git://sourceware.org/git/bzip2.git, which we could also try to use directly). |
Thank you, @heiner! Yeah, I find it a bit odd, but CentOS (and it seems that also other distros from the RedHat family) doesn't provide a static version of bzip2. Manylinux2014 image for building these wheels is based on CentOS, and this, unfortunately, cannot be changed, this version is the most commonly used due to it's high compatibility with even older distros.
I think a lot of people add the source of bzip2 directly to their projects, as it's very lightweight, and it doesn't really need updating.
If you prefer it this way, sure. But I think creating CMakeFiles.txt on the fly is not very elegant, so maybe we should a proper CMakeFile to your clone? |
As you prefer. I'm not against including it manually but I'll compare the sha1s at some point to make sure wherever they are from they are the bzip2 we are expecting. |
The other option I guess is that we include the bzip2 logic in the main CMakeFiles.txt? |
@heiner I've replaced the local version with a submodule pointing to the original bzip2 repo at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!
The tests and build wheels actions all passed successfully this morning although the aarch64 builds are quite slow (and there are four of them). Should we restrict the wheel build & test to just the latest Python and x86_64 during the PR test suite, leaving the rest of them for when we're targeting a release to PyPI? |
This is up to you. Indeed, building aarch64 wheels on QEMU is not very efficient. But maybe it is worth testing for at least one Python version since C++ changes may break building on only one architecture but not the other. Should I change that? |
Also, you may consider adding something like that at the beginning of the workflow file:
It will only trigger the workflow if files related to the building of the package are changed (I am not sure if I listed the right paths). |
Ok, that's a good compromise. The x86 build is very fast and the Python 3.11 aarch64 build takes ten minutes, which isn't too bad. I just want to avoid the situation where you'd have nearly 50-minute builds after each push. |
@mwydmuch In the interests of getting our version 1.0.0 out as soon as possible, I'm going to merge this as is (if you have no objections). We can put the suggested build workflow enhancements into a separate issue to be worked on afterwards. |
@StephenOman Sorry for my lack of activity; I've got quite busy this week. I can add this change to build only 3.11 if it's not a release, now. |
Ok, that should do. I think it is a good idea to do a separate PR for updating the workflows a little bit. E.g., |
This PR adds support for building manylinux wheels using
cibuildwheel
tool, with manylinux2014 compatibility for both x84_64 and aarch64 (arm) architectures. This will allow installation of NLE on almost any Linux distro as well as Google Colab or similar platforms that don't support building packages from source.Changes
cibuildwheel
config added topyproject.toml
test_and_deploy.yml
auditwheel
places it under nle.libs and ships inside wheels, to support envs without bzip2 installed.Becausenle/nethack/nethack.py
makes an unlinked copy oflibnethack.so
, to allow proper linking to bzip2.so shipped with wheel, thepatchelf
is installed and used to fix rpath for the temporary copy.I'm not a fan of this solution, but making thelibnethack.so
thread-safe is not a simple change, and the alternative of modifying env seems to be even less elegant.Unfortunately, this solution does not work with memfd_create or O_TMPFILE.Alternatively, bzip2 could be linked statically, but libnethack is not the only one linking against it in the project. <- We went with this one.
How to check the changes locally
To check the changes locally on Linux, one can run (requires
docker
andcibuildwheel
installed) in the repo's root:What was tested
The following tests:
where run on images of different Linux distros (Alma, Fedora, Rocky, Debian, Ubuntu, different versions) using a script similar to this one:
https://github.com/Farama-Foundation/stable-retro/blob/master/tests/test_cibuildwheel/test_cibuildwheel_linux.sh
All tests passed on all these distros.
Current TODOs
verify if these changes don't break anything that is not covered by tests[DONE]@BartekCupial will help with that.
Possible extensions
A small change to
test_and_deploy.yml
will allow macOS wheels to be built in the same way, but currently, GH does not provide macOS ARM runners for free accounts.