-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU specific modules (neon, SSE4, etc). #1723
Comments
This is based on some of the discussion in #1711 ... Here's another way we could hack up distutils based on python modules and compilation flags. (obviously these ways would be out of scope for this PR) Currently Setup.SDLX.in has stuff like this.
Adding CPU specific modules could be done by adding flags (
Then the init machinery ( |
Was just quoting that :) I think it would require us to:
All we are really trying to do is isolate the impact of the |
Another possibility is compiling the whole wheel into a separate package. Speaking of other tools: things like cx_freeze and pyinstaller need to be considered. Also, we can distribute our own wheels however we want, but how can other distro packages (such as Debian pygame) take advantage of the solution? |
A discussion of problems Google had with compiler specific modules using AVX: https://randomascii.wordpress.com/2016/12/05/vc-archavx-option-unsafe-at-any-speed/ |
I discussed this a little bit on the end of this issue too: |
There was some more discussion in here about an issue with neon compilation on pi: #2373 |
The arm-neon optimisations are much needed, as they give a significant speed boost for pygame (checked on a raspberry pi 3). They might also be of advantage to other arm based systems, like android and the new M1 macs. Also, my theory is that with Apple shifting to ARM, and the switch proving effective, more companies are likely follow the lead. Soon we might see more ARM systems entering the market, and one fine day in the future, ARM might even end up overtaking intel/amd. Creating seperate files, conditional compilation, runtime detection, all that sounds very good. But its a bit too complicated, and is a lot of work. I have a much simpler solution, and that is changing this line of code Line 136 in 91df485
With # ref: https://en.wikipedia.org/wiki/Uname
if platform.machine() in ["armv7l", "armv8l", "arm64", "aarch64"]: Yes, this is compile-time platform detection and wheels compiled for these platforms will only work on these platforms (isn't that the point of wheels anyways). Things like creating an armv7l wheel and later renaming the wheel to armv6l will no longer work if this change is applied (that is a bad idea anyways, because you never know what other problems doing that creates). This change might create some issues for the folk at piwheels, so we must make a decision considering that in mind (we do not want to bother the people at piwheels, if we end up doing a change like this, we might have to consider building arm linux wheels ourselves or something like that). |
This build issue was discussed a while ago with piwheels here in case anyone else is missing the context: Essentially right now piwheels hosts an 'arm6' and an 'arm7' wheel publicly but in 99% of cases those are actually the same 'arm7' wheel built from source & renamed automatically by script. Having actually separate arm7 and arm6 wheels is instead a manual hand compilation process which (I think) was only being done for new releases of openCV. Why this matters is because on raspberry pi, piwheels is the default repository when users do a So while making separate arm6 & arm7 wheels seems like an easy solution for us working on pygame, it is not for the piwheels maintainers. |
...though I see in this blog post: https://blog.piwheels.org/new-opencv-builds-including-opencv-4-4-x/ ..That piwheels are, as of September last year, building OpenCV automatically too. So maybe there is something to be worked out here with @bennuttall to have separate genuine arm6 and arm7 wheels of pygame. |
I'd be happy to build armv7 wheels separately (because it's pygame) if possible. Is there a way to supply the compiler flag to the pip command? |
Do you mean - does this work:
Because if so, yes. I don't know of the top of my head if pip passes along arguments to setup.py if it can't find a wheel. I'm generally not much of an expert on our building process, but I did add that one flag. I believe we would just need that neon flag enabled for the arm7 (pi3 & pi4) wheel and disabled for the arm6 (pi Zero etc) wheel. |
I meant something like I ask because that's the normal way we build wheels - using the |
You may need to compile SDL2 itself with the right flags for aarch64 on the pi4. The last time I looked at the SDL2 that ships with Raspberry Pi OS, I found that a couple of spicy features were disabled. I don't know if we even need the In another thread, a Gentoo user reported that on aarch64, neon does not work, but it was unclear whether SDL2 or PyGame was the culprit. It might be Gentoo itself. In the short term, I quite like @ankith26's idea for detecting running on arm and doing the right thing be default, but 1. I would keep the command line flag for compatibility 2. I would disable these kinds of detection when a certain environment var is set, so you can cross-compile from a RPi to another target arch (I know, pretty niche use case, but theoretically a source of bugs). I set PYGAME_CROSS_COMPILE in the p4a build recipe so I could disable detection of ABI or CPU features in setup.py, but then I didn't use that. We should probably document this or I'll have to make a pull request at kivy to remove it from the p4a recipe. |
Ok, one red line I'd have to draw would be creating a Pi4-only wheel and publishing it to piwheels as an armv7 wheel (because Pi 3 users would get it and it wouldn't work for them), as there's no way to distinguish. If there's a way I can build an armv7 wheel that's more optimised but not limited to Pi 4 then I'd be happy to. Otherwise, best bet is for pygame to host its own optimised wheels for individual Pi models. piwheels doesn't build for aarch64 yet but we intend to. Maintainers can already publish aarch64 wheels to PyPI which can be picked up by Pi 4 aarch64 users (e.g. numpy does this). But for packages that don't produce aarch64 wheels, we intend to build them in future. |
Yes, those armv7l wheels would indeed work on the pi2, pi3 and pi4 regardless of which pi it was compiled on. It’s just that an armv7l compiled (optimised) wheel would not work on armv6l by just renaming the wheel. We need to do some work from our end to properly optimize stuff for aarch64, do some platform detection based compilation and also make it configurable via env vars, so that it works even with the pip wheel command. I will try to open a PR for this (soonish) |
Great. That sounds ideal. Ping me when you want me to take a look. |
Okay, to make things easier for everyone (like I already mentioned in the linked issue), runtime detection is the way to go. That would also make armv7 to armv6 wheel renaming work fine, so piwheels workflow is not disturbed and we don't give you any extra work 😄 We generally need a more robust system for platform detection for SIMD, and even on intel codepaths, perhaps refactoring SIMD code into seperate files loaded after a runtime check is the way to go |
As a user I want to install and distribute pygame where the CPU specific optimizations are included for optimized functions.
As part of the PR Adds MMX, SSE2 & Arm NEON optimised versions of blit_blend_premultiplied() some more SIMD optimized functions were made. However there's no runtime detection for those functions, so some are disabled by default.
The text was updated successfully, but these errors were encountered: