-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with handling the file command output in platform.architecture() #79529
Comments
The code of _syscmd_file() in the platform module does not match the docstring. The "-b" option was removed in 685fffa (bpo-16112), and this leads to inclusion the executable path in the file command output. If the executable path contains some key strings like "32-bit" or "PE", platform.architecture() can return an incorrect result. I think that the "-b" option should be restored. $ python3 -c 'import platform; print(platform.architecture("/usr/bin/python3.6"))'
('64bit', 'ELF')
$ cp /usr/bin/python3.6 /tmp/32-bitPE
$ python3 -c 'import platform; print(platform.architecture("/tmp/32-bitPE"))'
('32bit', 'ELF') Other problem is that the code tests if the string "executable" is contained in the file command output. But it is not always contained for executables on Linux. $ file python
python: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d3cfa06c2bdcbf7b6af9e4e6be5061cb8398c086, with debug_info, not stripped
$ file /usr/bin/python2.7
/usr/bin/python2.7: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=fd3904306c73383fb371287416257b82d6a3363b, stripped
$ file /usr/bin/python3.6
/usr/bin/python3.6: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9dae0eec9b3f9cb82612d20dc0c3088feab9e356, stripped |
Why does platform has to analyze sys.executable binary to check if it's 32 or 64 bits? Can't we use sizeof(void*) for example? Is it something related to FAT binary on macOS? (single binary for 32 and 64 bits, or single binary for PPC and x86) |
I agreed with Serhiy. I also found the function decode the output with latin-1, I think it will be better to use utf-8 instead. |
I removed the dependency between bpo-35346 and this issue. I don't see how they are related. |
Decoding from UTF-8 can fail with UnicodeDecodeError, whereas decoding from latin-1 never fails. file output is ASCII, so I don't see the point of using UTF-8. Currently, the command displays the filename, but I don't think that we should care of the encoding of the filename.
That, or strip/skip the filename in the output? |
-b option added by: and removed the day after by: |
Oh wait, this change is for the 2.7 branch. The change in master (old "default" branch) didn't add -b, but replaced "-b" with "-b --": |
I tested that the "-b" option is supported on Linux, *BSD and OpenIndiana. But it is not a part of POSIX. So perhaps we should fall back to "file" without "-b" if "file -b" failed. We can also check that the output starts with executable+': ' and strip this prefix. |
In 2.7 branch, _syscmd_file() only used -b option during one day (no Python 2.7.x release used -b):
Python 3.2.0 (Feb 2011) to 3.2.3 (Sep 2012) and Python 3.3.0 (Sep 2012) used -b: |
Technically, on UNIX, ':' is valid in a filename. Filename examples which contain ':' on my Fedora 29: /usr/share/man/man3/List::Util.3pm.gz Note: I cannot find a program name which contains ':'. |
A convervative approach would be to leave stable branches unchanged and use -b in the master branch. |
PR 11160 is an alternate solution which strips the filename in the output. It does not matter if the filename contains ":", because the format of the output in the POSIX locale is strictly specified. |
BTW. A related problem with platform.architecture() is that it doesn't know how to deal with fat binaries (such as those found on macOS). As an example: $ file /usr/bin/python
/usr/bin/python: Mach-O universal binary with 2 architectures: [i386:Mach-O executable i386] [x86_64:Mach-O 64-bit executable x86_64]
/usr/bin/python (for architecture i386): Mach-O executable i386
/usr/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64 This will be reported as "64-bit" by platform.architecture() because there is '64-bit' in the output of file(1). Using sizeof(void*) or sys.maxsize suffers from the a simular problem: this will only detect the pointer-size of the current proces and not that the binary is capable of running with a different pointer-size as well. P.S. platform.architecture() uses file(1) because you can specify different executables than sys.executable. |
What result of platform.architecture() do you expect for an universal binary? |
I don't understand the purpose of the 'linkage' information of platform.architecture(). Does anyone care if Python is an ELF program or a WindowsPE program? Maybe it was useful 20 years ago when there were COFF on Unix, but right now ELF is the defacto standard on Unix, and WindowsPE on Windows. 32-bit and 64-bit information should be enough, no? I would suggest to just return ('%sbit' % bits, '') if executable is not set. Use struct.calcsize('P')*8 or sys.maxsize to get bits. |
I honestly don't know. What is the purpose of this functionality in the first place? I have never had a problem where using this function was the right solution. To be honest I have the same problem with a number of other APIs in this module. As an example, platform.system_alias() looks interesting but has an API that won't work in general (macOS release version cannot be calculated from platform.uname() information, likewise for linux distribution information). |
Ronald Oussoren gave more info on my previous PR 10780 ("platform.platform() uses mac_ver() on macOS"): """ ronald@Menegroth[0]$ arch -i386 python3.6 -m platform ronald@Menegroth[0]$ arch -i386 python3.6 -c 'import sys; print(sys.maxsize)' This platform output includes "64bit" because the binary for python3.6 includes support for both i386 and x86_64, and doesn't show that the command is using i386 instructions. I made some tests: $ file /usr/local/bin/python3
/usr/local/bin/python3: Mach-O universal binary with 2 architectures: [i386:Mach-O executable i386] [x86_64:Mach-O 64-bit executable x86_64]
/usr/local/bin/python3 (for architecture i386): Mach-O executable i386
/usr/local/bin/python3 (for architecture x86_64): Mach-O 64-bit executable x86_64
$ /usr/local/bin/python3 -c 'import struct, sys, platform; print(platform.architecture(), struct.calcsize("P"), sys.maxsize)'
('64bit', '') 8 9223372036854775807
$ arch -x86_64 /usr/local/bin/python3 -c 'import struct, sys, platform; print(platform.architecture(), struct.calcsize("P"), sys.maxsize)'
('64bit', '') 8 9223372036854775807
$ arch -i386 /usr/local/bin/python3 -c 'import struct, sys, platform; print(platform.architecture(), struct.calcsize("P"), sys.maxsize)'
('64bit', '') 4 2147483647 IMHO platform.architecture() should return 32bit when running "arch -i386 /usr/local/bin/python3" to be consistent with struct.calcsize("P") == 4 and sys.maxsize == 2147483647. Otherwise, how would you notice that you are using the 32-bit flavor of Python? My PR 11186 implements this fix. Ronald Oussoren:
Right, but I don't think that it's possible to report that Python executable is FAT binary in platform.architecture() result. If you want to provide such information, IMHO you should write a new function or at least add a new parameter to platform.architecture(). IMHO it's more consistent to report "32bit" for "arch -i386 python3" and "64bit" for "arch -x86_64 python3". |
I don't agree. Platform.architecture() is defined to look at a specified binary, not the currently running process. That can lead to inconsistencies like this and is not something you can avoid.
This doesn't necessarily need a new function, platform.architecture could also return something like "32bit,64bit". But as I mentioned in my previous message I don't know why anyone would want to use this function in the first place. There are better ways to determine information about the current process (struct.calcsize, sys.maxsize, sys.byteorder), and I have never had a need to determine information about executable files that I couldn't get in a better way using other libraries (like macholib and pyelftools) |
See also bpo-35516: "platform.system_alias(): add macOS support". |
architecture() looks at running Python executable by default and documents a special case when executable equals to sys.executable:
As an user, I don't need for this information. architecture() already contains a note: """ To get at the “64-bitness” of the current interpreter, it is more reliable to query the sys.maxsize attribute: is_64bits = sys.maxsize > 2**32
"""
platform.architecture() has multiple issues:
Another solution is to deprecate the function. I agree with Ronald that sys.maxsize is enough for most use cases (get "bits"). For more accurate information, platform.architecture() is wrong and a third-party module is required. By the way, platform.architecture() is not used in the stdlib which is a sign that maybe the function is not really helpful. Moreover, sysconfig and distutils.util contain the following code: # We can't use "platform.architecture()[0]" because a
# bootstrap problem. We use a dict to get an error
# if some suspicious happens.
bitness = {2147483647:"32bit", 9223372036854775807:"64bit"}
machine += ".%s" % bitness[sys.maxsize] Serhiy, Ronald: What do you think of deprecating platform.architecture() instead of trying to fix it? |
Guys, please read the doc-string of the platform.architecture() function (or ask the person who wrote most of the module). It clearly refers to inspecting a specific executable and only uses the Python interpreter as default. The running process can provide some sane defaults, but is not necessarily using the same values as the given executable. The function does not support multi-architecture executables. This is simply out of scope for the function. Victor: AFAIK, I still own this module, so if you want to deprecate something, please ping me first. |
Ok, I closed my PR 11186 which modified architecture() to only return struct.calcsize('P') if the executable argument is equal to sys.executable.
I see the platform module as a module to get info about the operating system and Python, but it seems like I misunderstood the purpose of the specific case of the architecture() function. I propose a small addition to the doc to avoid confusion: |
The initial issue has been fixed, I close the issue. Thanks for the review and feedback! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: