-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-40302: Add _Py_bswap32() function to pyport.h #19552
Conversation
Include/pyport.h
Outdated
#ifdef _PY_HAVE_BUILTIN_BSWAP | ||
return __builtin_bswap64(word); | ||
#elif defined(_MSC_VER) | ||
return ( ((word & 0x00000000000000FFUL) << 56) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be done with 4 &
s, 6 shifts and 3 |
s instead of 8 &
s, 8 shifts and 7 |
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the code from _ctypes. I cannot guess which code you would like.
In practice, efficient builtin functions are used on GCC, clang and Windows (MSC).
Include/pyport.h
Outdated
#ifdef _PY_HAVE_BUILTIN_BSWAP | ||
return __builtin_bswap32(word); | ||
#else | ||
word = ( (word & 0xFF00FF00UL) >> 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but it may be a benefit from using the same constant: (word >> 8) & 0x00FF00FFUL
. It may also be a benefit from using &
and >>
in different orders. I did not tested this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I let you test it. I don't really care of the performance on this code path, since efficient builtin functions are used on most platforms.
PR updated:
In Objects/stringlib/codecs.h, I left the following macro unchanged since I don't understand if it uses uint16_t or uint32_t:
My _ctypes/cfield.c static inline functions look overkill. Maybe using _Py_bswapXX() directly would be enough? I'm not sure about conversions between signed and unsigned integers. Move new functions to an internal header means that sha256, sha512 and _ctypes modules must now be compiled with the internal C API to access this header. |
Perhaps |
I looked at Linux /usr/include/bits/byteswap.h and decided to use a simpler implementation: don't expect "a << b" or "a >> b" to be circular. |
I can come up with a better name if needed. But first, do you care of having these functions in the public C API or are you ok to have it in the internal C API? Public C API: private function prefixed by _Py, and it would be a new header file, since it adds a new include on Windows. |
I don't think they need to be in the public API. It's just internal helpers for CPython. |
Ok, so let's start with pycore_byteswap.h. If tomorrow, we add more bit and byte utilities, we can rename the header. Since it's the internal C API, we don't have to bother with the backward compatibility. "byteswap.h" name comes from GNU byteswap.h header name:
|
Add a new internal pycore_byteswap.h header file with the following functions: * _Py_bswap16() * _Py_bswap32() * _Py_bswap64() Use these functions in _ctypes, sha256 and sha512 modules. Also use it in the UTF-32 encoder. sha256, sha512 and _ctypes modules are now built with the internal C API.
The glibc managed to unify all of its byteswap.h implementations into a single header file thanks to GCC builtin functions: https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0d40d0ecba3b1e5b8c3b8da01c708fea3948e193 Previously, it had one implementation per architecture using assembly code! I checked how _sha256 module is compiled with gcc -O3. longReverse() is inlined and
Oh. GCC looks clever. Without this PR, it was already able to recognize that longReverse() implements bytecode and it already used
I added volatile in
GCC is quite smart nowadays! |
I know that the "portable" implementation may not be the most efficient, but nowadays C compilers are clever and implement crazy optimizations (see my previous comment). Moreover, this change now uses efficient builtin functions when the C compiler provides them. Overall, this change shouldn't have any impact on performance, but new functions have better defined API: static inline function with wel defined input and output types. |
* master: (1985 commits) bpo-40179: Fix translation of #elif in Argument Clinic (pythonGH-19364) bpo-35967: Skip test with `uname -p` on Android (pythonGH-19577) bpo-40257: Improve help for the typing module (pythonGH-19546) Fix two typos in multiprocessing (pythonGH-19571) bpo-40286: Use random.randbytes() in tests (pythonGH-19575) bpo-40286: Makes simpler the relation between randbytes() and getrandbits() (pythonGH-19574) bpo-39894: Route calls from pathlib.Path.samefile() to os.stat() via the path accessor (pythonGH-18836) bpo-39897: Remove needless `Path(self.parent)` call, which makes `is_mount()` misbehave in `Path` subclasses. (pythonGH-18839) bpo-40282: Allow random.getrandbits(0) (pythonGH-19539) bpo-40302: UTF-32 encoder SWAB4() macro use a|b rather than a+b (pythonGH-19572) bpo-40302: Replace PY_INT64_T with int64_t (pythonGH-19573) bpo-40286: Add randbytes() method to random.Random (pythonGH-19527) bpo-39901: Move `pathlib.Path.owner()` and `group()` implementations into the path accessor. (pythonGH-18844) bpo-40300: Allow empty logging.Formatter.default_msec_format. (pythonGH-19551) bpo-40302: Add pycore_byteswap.h header file (pythonGH-19552) bpo-40287: Fix SpooledTemporaryFile.seek() return value (pythonGH-19540) Minor modernization and readability improvement to the tokenizer example (pythonGH-19558) bpo-40294: Fix _asyncio when module is loaded/unloaded multiple times (pythonGH-19542) Fix parameter names in assertIn() docs (pythonGH-18829) bpo-39793: use the same domain on make_msgid tests (python#18698) ...
Add the following static inline functions to pyport.h:
Use these functions in _ctypes, sha256 and sha512 modules.
https://bugs.python.org/issue40302