-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify initialization of cybuffer
with a builtin array
#13
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When initializing `cybuffer` with an `array, ensure that we only patch up the `format` and associated attributes for the `array` buffer on Python 2/3. This workaround is needed for Python 2 as the `array` type does not support the (new) buffer protocol. So we need to use the old buffer protocol and then the (new) buffer protocol with that result. This loses information about the `array` in the process, which is the bulk of the reason this is needed. Python 3's `array` supports the (new) buffer protocol; so, we can largely ignore these workarounds. However on Python 2/3, we have the `"u"` (unicode) format for the `array`, which is a little broken. So we workaround those issues by casting the array to an unsigned integer of equivalent width to the unicode type. However the `"u"` format for `array` is deprecated and scheduled for removal in Python 4. So there should be no need to do any of this work on Python 4. As such, check whether Python 2/3 is in use when checking if the data is provided is an `array`. That way this whole branch will automatically be removed on Python 4 when it lands.
Previously we dealt with `Py_UNICODE_SIZE` as a C compile-time constant, which would inject the value during the C preprocessor pass. Later compilation optimizations by the C compiler likely removed unneeded branches caused by this. Though this is pretty needless as Cython is generating these branches that we later drop. Not to mention Cython sometimes assigns constants like these to intermediate or stack local variables, which can hinder compiler optimizations. It would be better yet to have Cython not define these branches at all. To that end, we make `Py_UNICODE_SIZE` a Cython compile-time constant that is set in `config.pxi` along with other compile-time constants. As a result, Cython injects these directly in `cybuffer.pxd` and `cybuffer.pxi` before even performing any compilation. Once it goes to compile the Cython code to C, it recognizes `Py_UNICODE_SIZE` is a compile-time constant and evaluates all expressions it is used in. As a result, the Cython branches with `Py_UNICODE_SIZE` are eliminated during compile-time leaving only one unconditional code path in C. In order to discern what `Py_UNICODE_SIZE` should be at the Python level, we inspect the `itemsize` of an empty unicode (i.e. `"u"`) `array`'s `itemsize` in `setup.py`. Based on this we can determine `Py_UNICODE_SIZE`'s value. We then inject this into `config.pxi`. This value is only inspected and added for Python 2/3 as the unicode `array` is scheduled for removal in Python 4. As we only check this value within a branch of code that should be removed on Python 4 anyways, leaving `Py_UNICODE_SIZE` undefined shouldn't be an issue for Python 4 builds. While `Py_UNICODE_SIZE` is defined in Python headers, we don't need to be worried about our definition of `Py_UNICODE_SIZE` clashing with it. The reason being Cython guarantees the assigned value will injected into the source. So the definition will have already been eliminated by the time the C code is generated. Further we only use `Py_UNICODE_SIZE` in comparisons to other constants. So Cython will inject the value and then evaluate these comparisons resulting in them being turned into always `True` or `False` branches. These in turn are optimized away.
On Python 2, the `format` of all `array`s is `uint8` (i.e. `"B"`). The reason being that all `array`s on Python 2 only support the old buffer protocol. As such they need to be converted through that path to get access to use them with the (new) buffer protocol. Doing so results in such objects being treated as if they are of `uint8` type. Since that is how we want character (i.e. `"c"`) arrays treated on Python 2 (they don't exist on Python 3), then there is nothing we need to here. This includes changing the `_format`. So drop that line.
Go ahead and join the different no-op cases with `array` types into one condition. Just makes it easier to see that all of these are just being skipped. Besides Cython will break this out however it sees fit in C code. So there is no benefit to us doing anything special here.
Go ahead and set `itemsize` at the same time that `format` is being set for the Python 2 `array` cases. These two properties are connected and naturally go together. So this makes a bit more sense from a code readability point of view.
On Python 2 when handling the `array` case, the only reason to worry about adjusting the shape and strides is if the `itemsize` changed as a consequence of casting back to the original type. If the itemsize is the same, which on Python 2 is 1 as a consequence of needing to use the old buffer protocol for `array`s, then there is no need to perform this casting. Hence we update the Python 2 shape and stride adjustment case to check to see if the `itemsize` is different from `1`, which is now possible as we assign the `itemsize` when handling the casting above. This is fast check since it is comparing to C integers and only matters for Python 2 as the code is removed for newer Python versions.
Drop the checks for `uint8` and character `array`s in the general case in the casting code. Instead favor handling these checks as part of the Python 2 `elif` case after handling the unicode `array`s. This consolidates the code a bit. Also simplifies the number of checks being done on Python 3. The Python 2 case winds up being about the same other than it makes better usage of its branches. Should improve readability and streamline things a bit.
As there is no Python 4+ yet, we do not need to worry about coverage on the `else` case here. So inform `coverage` to not worry about it. ref: https://coverage.readthedocs.io/en/coverage-4.4.2/branch.html#structurally-partial-branches
Make this comment a bit more informative. Particularly after we have included changing `itemsize`.
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Makes some small adjustments to how
cybuffer
handles thearray
initialization case. Namely only special case handling ofarray
on Python 3 or earlier (as Python 4 shouldn't have any of these issues). Also turnPy_UNICODE_SIZE
into a Cython compile-time constant to cutdown on C code generated in relation to it. Finally cleanup and fuse together some no-op cases for somearray
types. Streamlines things a bit and makes it a bit clearer what is happening in the code.