-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify initialization of cybuffer
with a builtin array
#13
Commits on Nov 24, 2018
-
Only patch-up
array
buffer on Python 2/3When initializing `cybuffer` with an `array, ensure that we only patch up the `format` and associated attributes for the `array` buffer on Python 2/3. This workaround is needed for Python 2 as the `array` type does not support the (new) buffer protocol. So we need to use the old buffer protocol and then the (new) buffer protocol with that result. This loses information about the `array` in the process, which is the bulk of the reason this is needed. Python 3's `array` supports the (new) buffer protocol; so, we can largely ignore these workarounds. However on Python 2/3, we have the `"u"` (unicode) format for the `array`, which is a little broken. So we workaround those issues by casting the array to an unsigned integer of equivalent width to the unicode type. However the `"u"` format for `array` is deprecated and scheduled for removal in Python 4. So there should be no need to do any of this work on Python 4. As such, check whether Python 2/3 is in use when checking if the data is provided is an `array`. That way this whole branch will automatically be removed on Python 4 when it lands.
Configuration menu - View commit details
-
Copy full SHA for c533035 - Browse repository at this point
Copy the full SHA c533035View commit details -
Define
Py_UNICODE_SIZE
at Cython compile-timePreviously we dealt with `Py_UNICODE_SIZE` as a C compile-time constant, which would inject the value during the C preprocessor pass. Later compilation optimizations by the C compiler likely removed unneeded branches caused by this. Though this is pretty needless as Cython is generating these branches that we later drop. Not to mention Cython sometimes assigns constants like these to intermediate or stack local variables, which can hinder compiler optimizations. It would be better yet to have Cython not define these branches at all. To that end, we make `Py_UNICODE_SIZE` a Cython compile-time constant that is set in `config.pxi` along with other compile-time constants. As a result, Cython injects these directly in `cybuffer.pxd` and `cybuffer.pxi` before even performing any compilation. Once it goes to compile the Cython code to C, it recognizes `Py_UNICODE_SIZE` is a compile-time constant and evaluates all expressions it is used in. As a result, the Cython branches with `Py_UNICODE_SIZE` are eliminated during compile-time leaving only one unconditional code path in C. In order to discern what `Py_UNICODE_SIZE` should be at the Python level, we inspect the `itemsize` of an empty unicode (i.e. `"u"`) `array`'s `itemsize` in `setup.py`. Based on this we can determine `Py_UNICODE_SIZE`'s value. We then inject this into `config.pxi`. This value is only inspected and added for Python 2/3 as the unicode `array` is scheduled for removal in Python 4. As we only check this value within a branch of code that should be removed on Python 4 anyways, leaving `Py_UNICODE_SIZE` undefined shouldn't be an issue for Python 4 builds. While `Py_UNICODE_SIZE` is defined in Python headers, we don't need to be worried about our definition of `Py_UNICODE_SIZE` clashing with it. The reason being Cython guarantees the assigned value will injected into the source. So the definition will have already been eliminated by the time the C code is generated. Further we only use `Py_UNICODE_SIZE` in comparisons to other constants. So Cython will inject the value and then evaluate these comparisons resulting in them being turned into always `True` or `False` branches. These in turn are optimized away.
Configuration menu - View commit details
-
Copy full SHA for 071aade - Browse repository at this point
Copy the full SHA 071aadeView commit details -
Drop character array format assignment on Python 2
On Python 2, the `format` of all `array`s is `uint8` (i.e. `"B"`). The reason being that all `array`s on Python 2 only support the old buffer protocol. As such they need to be converted through that path to get access to use them with the (new) buffer protocol. Doing so results in such objects being treated as if they are of `uint8` type. Since that is how we want character (i.e. `"c"`) arrays treated on Python 2 (they don't exist on Python 3), then there is nothing we need to here. This includes changing the `_format`. So drop that line.
Configuration menu - View commit details
-
Copy full SHA for c89c982 - Browse repository at this point
Copy the full SHA c89c982View commit details -
Go ahead and join the different no-op cases with `array` types into one condition. Just makes it easier to see that all of these are just being skipped. Besides Cython will break this out however it sees fit in C code. So there is no benefit to us doing anything special here.
Configuration menu - View commit details
-
Copy full SHA for 3578115 - Browse repository at this point
Copy the full SHA 3578115View commit details -
Set itemsize and format for
array
on Python 2Go ahead and set `itemsize` at the same time that `format` is being set for the Python 2 `array` cases. These two properties are connected and naturally go together. So this makes a bit more sense from a code readability point of view.
Configuration menu - View commit details
-
Copy full SHA for 7b09039 - Browse repository at this point
Copy the full SHA 7b09039View commit details -
Check itemsize before adjusting array on Python 2
On Python 2 when handling the `array` case, the only reason to worry about adjusting the shape and strides is if the `itemsize` changed as a consequence of casting back to the original type. If the itemsize is the same, which on Python 2 is 1 as a consequence of needing to use the old buffer protocol for `array`s, then there is no need to perform this casting. Hence we update the Python 2 shape and stride adjustment case to check to see if the `itemsize` is different from `1`, which is now possible as we assign the `itemsize` when handling the casting above. This is fast check since it is comparing to C integers and only matters for Python 2 as the code is removed for newer Python versions.
Configuration menu - View commit details
-
Copy full SHA for 9b7641e - Browse repository at this point
Copy the full SHA 9b7641eView commit details -
Drop the checks for `uint8` and character `array`s in the general case in the casting code. Instead favor handling these checks as part of the Python 2 `elif` case after handling the unicode `array`s. This consolidates the code a bit. Also simplifies the number of checks being done on Python 3. The Python 2 case winds up being about the same other than it makes better usage of its branches. Should improve readability and streamline things a bit.
Configuration menu - View commit details
-
Copy full SHA for 09dcea2 - Browse repository at this point
Copy the full SHA 09dcea2View commit details -
As there is no Python 4+ yet, we do not need to worry about coverage on the `else` case here. So inform `coverage` to not worry about it. ref: https://coverage.readthedocs.io/en/coverage-4.4.2/branch.html#structurally-partial-branches
Configuration menu - View commit details
-
Copy full SHA for 321dc8b - Browse repository at this point
Copy the full SHA 321dc8bView commit details -
Make this comment a bit more informative. Particularly after we have included changing `itemsize`.
Configuration menu - View commit details
-
Copy full SHA for ae9f899 - Browse repository at this point
Copy the full SHA ae9f899View commit details