-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
C extension naming doesn't take bitness into account #67169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently, C extensions are named something like "_helperlib.cpython-34dm.so". This doesn't take into account the bitness of the interpreter (32- vs. 64-bit), which makes it awkward to use the same working copy with two different interpreters (you have to rebuild everything each time you switch bitnesses). Worse, under Windows it seems ABI tags aren't even used, giving generic names such as "_helperlib.pyd". Why is that? |
See also the PEP-3149. |
PEP-3149 says """It is not currently clear that the facilities in this PEP are even useful for Windows""". Well, it seems I have found a use for it :-) |
Ideally, we would use distutils.util.get_platform(). However, there are two cases where it relies on other modules:
Of course, ideally we should be able to hardcode this into the compiled CPython executable... |
As a side-note, it is interesting to note that Python currently wrongly identifies 32-bit builds under 64-bit Linux: Python 3.5.0a0 (default:64a54f0c87d7, Nov 2 2014, 17:18:13)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os, sysconfig
>>> sys.maxsize
2147483647
>>> os.uname()
posix.uname_result(sysname='Linux', nodename='fsol', release='3.16.0-25-generic', version='#33-Ubuntu SMP Tue Nov 4 12:06:54 UTC 2014', machine='x86_64')
>>> sysconfig.get_platform()
'linux-x86_64' AFAIU, sysconfig.get_platform() (or its sibling distutils.util.get_platform()) is used for the naming of binary distributions... |
The MULTIARCH variable can help at least under Linux: >>> import sysconfig
>>> sysconfig.get_platform()
'linux-x86_64'
>>> sysconfig.get_config_var('MULTIARCH')
'i386-linux-gnu' |
There is also platform.architecture(). I don't like its implementation, it relies on the external file program :-( |
I'm very much in favor of adding this for .pyds on Windows. I assume the hard part will be getting the details for Linux (doesn't bitness have to be compiled in there? For Windows it can be determined at compile-time...), but preferring an extension with the ABI tag and falling back on one without seems easy enough. (Would/could this also work for .py files? So a 2.7/3.x or Jython/CPython/IronPython package could include tags in pure-Python code files?) |
Note that there's a difference between the platform's architecture (which is what get_platform() returns) and the pointer size of the currently running Python executable. On 64-bit Linux, it's rather rare to have an application built as 32-bit executable. On 64-bit Windows, it's rather common to have 32-bit applications running. The best way to determine 32-bit vs. 64-bit is by using the struct module: # Determine bitsize used by Python (not necessarily the same as
# the one used by the platform)
import struct
bits = struct.calcsize('P') * 8 This should be portable across all platforms and will always refer to the pointer size of the currently running Python executable. |
Yes, that's pointed out above. |
Nothing new should be necessary for pyc files under Windows: Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 22 2014, 11:51:45) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.implementation.cache_tag
'cpython-34' The problem is with C extensions: >>> import _imp
>>> _imp.extension_suffixes()
['.pyd'] Compare with Linux: >>> import _imp
>>> _imp.extension_suffixes()
['.cpython-35dm.so', '.abi3.so', '.so'] |
Sticking to bitness should be easy (although I wonder if it would be desirable for platforms with fat binaries - Ned?). If we can go the extra mile and include platform identification all the better, of course. |
I was more interested in source file resolution than bytecode caching. If Python 3.5 would prefer "spam.cpython-35.py" or "spam.cpython-3.py" over "spam.py" and Python 2 preferred "spam.py", then I can more easily separate the code that won't parse in the alternative. Happy to be told it's unrelated and I should raise it separately, but from my POV resolving .pyd filenames looks very similar to resolving .py files. |
On 02.12.2014 19:02, Antoine Pitrou wrote:
I hear the "can of worms" alarm ringing :-) Seriously, I think that putting platform infos into the file name I think we should only focus on platforms where fat builds are http://en.wikipedia.org/wiki/Fat_binary Note that on Linux, 32-bit and 64-bit versions are typically placed http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard so I'm not sure whether it's a real problem on Linux. |
But since you pointed out cache-tag, should that distinguish for bitness as well? It seems to be 'cpython-34' for both 32-bit and 64-bit interpreters on Windows, which isn't really a problem now, but may become one if we start allowing/encouraging sharing packages between interpreters. In fact, it probably is an issue now with user site-packages, since that path is the same for both 32-bit and 64-bit... |
Fat binaries seem to exist under:
|
On 02.12.2014 19:40, Steve Dower wrote:
That's an interesting idea, but indeed unrelated to this ticket :-) |
By whom? Our standard installer doesn't (it uses ../lib/python-X.Y for all builds). Also, one of the problems (and actually the problem which triggered this tracker entry) is when doing development inside a working copy (either through "setup.py develop" or "setup.py build_ext --inplace" - both copy C extensions directly into the source tree). |
@Steve: IIRC, pyc files should be portable, so there's no need to differentiate between various bitnesses. |
@antoine: You're right. I hereby withdraw all contributions to this thread after my first statement of support :) |
On 02.12.2014 19:46, Antoine Pitrou wrote:
By the system vendors. Packages (with extensions) will automatically
Fair enough; it's a rare use case, but may be worth supporting. My main point was that we shouldn't start adding tags for e.g. How about using these flags: b0 - 16-bit |
Le 02/12/2014 19:59, Marc-Andre Lemburg a écrit :
Fair enough, although I think we only need 32-bit and 64-bit for now, |
On 02.12.2014 20:10, Antoine Pitrou wrote:
True, I'm just not sure what the parsing requirements are and |
Would it be possible to add something to the sys module, computed Note: There is also the funnny x32 platform project :-) |
My initial thought is to add an "abitags" attribute to sys.implementation If we define the algorithm clearly, then setuptools & distlib could make it |
Re PEP-3149 file names: it hadn't struck me until fairly recently that PEP-3149-style extension file names were never implemented for OS X, i.e. they are still of the form _helperlib.so. I'm not sure why that is the case since other aspects of PEP-3149-like file names do exist on OS X, including naming libpython; perhaps it was just erring on the side of caution. Re bitness: As Marc-Andre points out, Apple addressed the multi-arch problem with the concept of universal (or "fat") binary files, implemented for executables, libs (static and dynamic), and bundles (e.g .so's). In general, dealing with multiple architectures is abstracted away by the compiler tool chain at build time and the dynamic loader at run time and it's not something either Python or the user have to deal with (usually), as various combinations of architectures (currently up to 4 on OS X) are contained within the same file; for example: $ file _socket.so
_socket.so: Mach-O universal binary with 3 architectures
_socket.so (for architecture x86_64): Mach-O 64-bit bundle x86_64
_socket.so (for architecture i386): Mach-O bundle i386
_socket.so (for architecture ppc7400): Mach-O bundle ppc
$ file /usr/bin/python
/usr/bin/python: Mach-O universal binary with 3 architectures
/usr/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/python (for architecture i386): Mach-O executable i386
/usr/bin/python (for architecture ppc7400): Mach-O executable ppc So, I agree with Marc-Andre that adding arch info (like bitiness) to extension module file names on OS X would add unneeded complexity for little, if any, benefit. That part works well today. Changing builds on OS X to use today's PEP-3149 file names is a separate question. It could help in the case where one site-packages library is used with multiple Python instances but, even there, that is probably not a big issue outside of developer environments: (1) I don't know of any distributor of Python for OS X who supports multiple ABIs (e.g. non-debug vs debug) in one package; (2) Python OS X framework builds, used by python.org, Apple, and most third-parties, generally have separate install locations including their lib-dynload and site-packages directories so installing multiple instances of the same Python version from different vendors isn't a big deal. It would be nice to be able to allow non-debug vs debug builds to co-exist better (the primary use case I see for PEP-3149 file names for Py3 on OS X) but I don't recall anyone asking for it. If we were to change OS X to use today's PEP-3149 file names, I would only want to do it in a new release, not backport it. |
What can I do to help move this along? It sounds like for Windows builds we could change _imp.extension_suffixes() from ['.pyd'] to ['.{}.pyd'.format(distutils.util.get_platform()), '.pyd'] and update distutils to produce the more specific name (I've got some work to do on distutils anyway for 3.5, so I'm happy to do this part). This would also include somehow hard-coding the get_platform() result into the executable (probably a #define in pyconfig.h) I'm more inclined towards get_platform() than adding new architecture tags. Windows at least doesn't support fat binaries - the closest equivalent is universal apps, which use separate binaries and a naming convention. Adding a debug marker here would also be nice, as I've never been a huge fan of the "_d" suffix we currently have, but it's not a big deal. I suspect any changes here would be completely separate from other platforms, but ISTM that we're looking at a similar change to handle the bitness/debug issue on Linux. I'm not volunteering to do that part :) |
Le 06/12/2014 21:11, Steve Dower a écrit :
I think committing changes on a per-platform basis is fine here. After So, yes, let's get the ball rolling under Windows. I think you're the |
The attached patch adds platform tags for .pyd files for "win32", "win-arm", "win-amd64" and "win-ia64", which are the known compilers in pyconfig.h and the potential return values from distutils.util.get_platform(). It also fixes a bug where the suffix would be incorrect if building a debug extension. I haven't been able to think of any scenarios where this could break other than perhaps packaging (since distutils defaults to including the tag), and we've got plenty of time to sort those issues out. A quick test installing Cython and some packages built with Cython seemed to be fine. AIUI, MinGW/cygwin builds won't use PC/pyconfig.h, and so they won't see any change. |
On 04/16/2015 05:56 PM, Marc-Andre Lemburg wrote:
I'm disappointed that you discredit any other use case besides what you think as |
On 16.04.2015 18:53, Matthias Klose wrote:
I'm not trying to discredit any use cases, I just don't see them. For package distributions you do need to make your distribution However, for plain .so files that you have on your system (which will Perhaps you can point me to some use cases where the triple |
On 16.04.2015 19:14, Marc-Andre Lemburg wrote:
Antoine's ticket is the first in two decades to request being If you have a need, it's not really hard to build your extensions |
If I understand correctly (and ABI isn't my strong suite), it would be useful in the sense that you could utilize it to create a sort of "fat wheel" that included the .so's for multiple architectures and then pip could simply drop them all into place and have the interpreter decide which one to load. This is useful because maybe you have one .so in a wheel and 30 .py files, it's somewhat wasteful (both disk space and in cache efficiency) to have 10 different wheel files and those 30 .py files duplicated when it could be possible to have a single one serving 10 different architectures. To be clear, this ability doesn't yet exist in Wheel and I don't know of anyone pushing for it, but if Python is smart enough to load the right .so that makes fat wheels significantly easier to implement (in fact, you wouldn't need to add anything else to pip or the wheel spec to handle it I think). |
That's exactly what PEP-3149 was supposed to implement, isn't it? |
On 16.04.2015 19:47, Ned Deily wrote:
No, PEP-3149 is about the Python ABI, following PEP-3147, The intent is to be able to have mutliple *Python* ABI/API versions |
On 16.04.2015 19:44, Donald Stufft wrote:
Well, it's even more wasteful if you have to download 100MB wheels This approach has been considered a few times in distutils history Today, you usually have a web installer take care of grabbing |
I think it's going to vary greatly based on how many platforms you're attempting to support and how big your .so's are compared to the rest of the Wheel. You can also mix and match, do a single bundle for the most popular platforms (which will mean that you're almost always serving out of cache) but then do individual wheels for the less popular platforms to keep the file size of the "main" wheel from bloating up with a bunch of .so's for platforms which are unlikely to be needed very often. Another possible (future) benefit - Right now we have executable zip files, but they can only reasonably contain pure Python files. There are rumblings of making it so it's possible to import .so's from inside of an executable zip file. If you bake in the platform ABI into the .so file name, it would mean in that possible future you could have a single executable zip file that just works across multiple platforms as long as you already have Python installed. I do agree that pretty much every place someone would want to do this, could possibly be implemented by having it look inside a per platform directory (you could implement fat wheels for instance by having platform sub dirs, same with a single executable zip file), however doing that causes duplication because every place you deal with .so's then need to deal with the question of platform ABI and have to come up with their own solution to it, instead of having a central solution which is managed by Python itself and can be re-used by all of these situations. |
Well, for all practical purposes, the platform *is* part of the ABI :=) So, if we have been supporting multiple P@P 3147 extension modules since 3.2, I don't see this as a risky change. I don't think anyone is advocating installing distributions with dozens of extension module variants as a general practice. But it seems like there are times when it would be useful to have the capability to have more than one and this seems like a safe and logical extension to what PEP-3147 already provides. I don't have a strong opinion about other platforms. For OS X, because of the complexity and usefulness of mixing and matching various fat CPU archs and OS X ABIs ("deployment target"), pip already supports selecting the proper wheel to download and wheel creators are tagging with the right metadata to make the right decisions, so I don't think the changes here bring much added value for OS X users, except for two things: (1) we now support PEP-3147 ext file names (which for some reason was never fully implemented on OS X and should have been) which is useful for the original PEP-3147 use cases (for example, if someone wants to distribute non-debug and debug versions of ext modules); (2) the addition of '-darwin' to the PEP-3147 ext file name allows for this additional use case of allowing multiple platform extensions to be stored in the same directory without fear of name clash (even if only one is eventually installed). I think both are reasonable and safe changes for OS X. |
On 16.04.2015 20:17, Donald Stufft wrote:
Whatever you do, you're still going to force all your main users to
Since you need special support for such ZIP files (either using dlopen There's a very real use case for having multiple Python versions
I'm not saying that having a central solution is wrong. All I'm We now have four ways of describing ABI flags in Python (well, actually I can already see all the different OS vendors creating |
On 16.04.2015 20:21, Ned Deily wrote:
Yes, but if all your files on your box share the same ABI, do you All Linux distributions I know place the 32-bit and 64-bit versions Why should Python behave differently ? Just because we can is not |
pip caches downloads by default, many systems are starting to utilize that Looking at a few of the top projects on PyPI in terms of download count we Of those, only really lxml is large enough that adding a second or third or
There are other reasons as have already been mentioned, this is just yet
I don't care if it gets added as part of this ticket, another ticket, or as |
On 17.04.2015 00:51, Donald Stufft wrote:
Sure, but whatever the central implementation is going to be, There's a very simple trick which some packages used in the To simplify this, the platform triplets and other platform ABI flags |
so why do you see this on x86 for 32/64bit, but not for ARM soft-float/hard-float. The example given was pretty clear.
Well, then at least you don't know Debian, Ubuntu, and any of their derivates. And even the Python default install on Linux installs into the same place. |
New changeset 558335559383 by doko in branch 'default':
|
I think we should add something to the 3.5 "What's New" document about these changes and which platforms are affected. Otherwise is there anything left to do before closing? |
I sure hope not. |
I'm leaving this open just because we're apparently waiting on some "What's New" docs. |
Here's an attempt at a What's New section for this change. I expect it's wrong! Maybe someone can fix it. Maybe it's actually better than not having one at all. Can we maybe get a round or two of edits on this and get something in for 3.5 final? |
Adding Yury since he and Elvis are working on Doc/whatsnew/3.5.rst and they might want to take a look at the latest patch. |
Only thing I'd add is that the extra tag is optional (on Windows at least), and Python will happily import extensions without it. But extensions with a mismatched tag won't be loaded. |
thanks for the draft! I'm not sure how to describe this properly. The extension names are derived from https://wiki.debian.org/Multiarch/Tuples and this again is derived from the GNU triplets/quadruplets. there is no "cpu" and "os" part, depending on the architecture some ABI parts are either encoded in the "cpu" part or the "os" part. So what about just enumerating the most common cases (i386-linux-gnu, x86_64-linux-gnu, arm-linux-gnueabi (still used for the old Raspberry Pi), arm-linux-gnueabihf), and then point to the "spec"? The above examples have some irregular cases, most other cases just follow the triplets. I wouldn't mention x86_64-linux-gnux32 explicitly. Until now there are only unreleased or experimental distros. Not sure if it is worth mentioning that this would allow distributing "fat" wheels. |
New changeset 1744d65705d0 by Yury Selivanov in branch '3.5': New changeset cfbcb3a6a848 by Yury Selivanov in branch 'default': |
Larry, Matthias, Steve, Berker - I've mentioned this issue in the whatsnew (applied Larry's patch with some modifications to address comments from Steve and Matthias). Please review. |
There's no dot before the debug marker on Windows. Otherwise, looks good to me. Thanks for writing this up. |
It's fixed! So it's finally closed. |
New changeset 80fc40a9ae47 by Martin Panter in branch '3.5': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: