Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify initialization of cybuffer with a builtin array #13

Merged
merged 9 commits into from
Nov 24, 2018

Commits on Nov 24, 2018

  1. Only patch-up array buffer on Python 2/3

    When initializing `cybuffer` with an `array, ensure that we only patch
    up the `format` and associated attributes for the `array` buffer on
    Python 2/3. This workaround is needed for Python 2 as the `array` type
    does not support the (new) buffer protocol. So we need to use the old
    buffer protocol and then the (new) buffer protocol with that result.
    This loses information about the `array` in the process, which is the
    bulk of the reason this is needed. Python 3's `array` supports the (new)
    buffer protocol; so, we can largely ignore these workarounds. However on
    Python 2/3, we have the `"u"` (unicode) format for the `array`, which is
    a little broken. So we workaround those issues by casting the array to
    an unsigned integer of equivalent width to the unicode type. However the
    `"u"` format for `array` is deprecated and scheduled for removal in
    Python 4. So there should be no need to do any of this work on Python 4.
    As such, check whether Python 2/3 is in use when checking if the data is
    provided is an `array`. That way this whole branch will automatically be
    removed on Python 4 when it lands.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    c533035 View commit details
    Browse the repository at this point in the history
  2. Define Py_UNICODE_SIZE at Cython compile-time

    Previously we dealt with `Py_UNICODE_SIZE` as a C compile-time constant,
    which would inject the value during the C preprocessor pass. Later
    compilation optimizations by the C compiler likely removed unneeded
    branches caused by this. Though this is pretty needless as Cython is
    generating these branches that we later drop. Not to mention Cython
    sometimes assigns constants like these to intermediate or stack local
    variables, which can hinder compiler optimizations. It would be better
    yet to have Cython not define these branches at all.
    
    To that end, we make `Py_UNICODE_SIZE` a Cython compile-time constant
    that is set in `config.pxi` along with other compile-time constants. As
    a result, Cython injects these directly in `cybuffer.pxd` and
    `cybuffer.pxi` before even performing any compilation. Once it goes to
    compile the Cython code to C, it recognizes `Py_UNICODE_SIZE` is a
    compile-time constant and evaluates all expressions it is used in. As a
    result, the Cython branches with `Py_UNICODE_SIZE` are eliminated during
    compile-time leaving only one unconditional code path in C.
    
    In order to discern what `Py_UNICODE_SIZE` should be at the Python
    level, we inspect the `itemsize` of an empty unicode (i.e. `"u"`)
    `array`'s `itemsize` in `setup.py`. Based on this we can determine
    `Py_UNICODE_SIZE`'s value. We then inject this into `config.pxi`. This
    value is only inspected and added for Python 2/3 as the unicode `array`
    is scheduled for removal in Python 4. As we only check this value
    within a branch of code that should be removed on Python 4 anyways,
    leaving `Py_UNICODE_SIZE` undefined shouldn't be an issue for Python 4
    builds.
    
    While `Py_UNICODE_SIZE` is defined in Python headers, we don't need to
    be worried about our definition of `Py_UNICODE_SIZE` clashing with it.
    The reason being Cython guarantees the assigned value will injected into
    the source. So the definition will have already been eliminated by the
    time the C code is generated. Further we only use `Py_UNICODE_SIZE` in
    comparisons to other constants. So Cython will inject the value and then
    evaluate these comparisons resulting in them being turned into always
    `True` or `False` branches. These in turn are optimized away.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    071aade View commit details
    Browse the repository at this point in the history
  3. Drop character array format assignment on Python 2

    On Python 2, the `format` of all `array`s is `uint8` (i.e. `"B"`). The
    reason being that all `array`s on Python 2 only support the old buffer
    protocol. As such they need to be converted through that path to get
    access to use them with the (new) buffer protocol. Doing so results in
    such objects being treated as if they are of `uint8` type. Since that is
    how we want character (i.e. `"c"`) arrays treated on Python 2 (they
    don't exist on Python 3), then there is nothing we need to here. This
    includes changing the `_format`. So drop that line.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    c89c982 View commit details
    Browse the repository at this point in the history
  4. Fuse no-op array cases

    Go ahead and join the different no-op cases with `array` types into one
    condition. Just makes it easier to see that all of these are just being
    skipped. Besides Cython will break this out however it sees fit in C
    code. So there is no benefit to us doing anything special here.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    3578115 View commit details
    Browse the repository at this point in the history
  5. Set itemsize and format for array on Python 2

    Go ahead and set `itemsize` at the same time that `format` is being set
    for the Python 2 `array` cases. These two properties are connected and
    naturally go together. So this makes a bit more sense from a code
    readability point of view.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    7b09039 View commit details
    Browse the repository at this point in the history
  6. Check itemsize before adjusting array on Python 2

    On Python 2 when handling the `array` case, the only reason to worry
    about adjusting the shape and strides is if the `itemsize` changed as a
    consequence of casting back to the original type. If the itemsize is the
    same, which on Python 2 is 1 as a consequence of needing to use the old
    buffer protocol for `array`s, then there is no need to perform this
    casting. Hence we update the Python 2 shape and stride adjustment case
    to check to see if the `itemsize` is different from `1`, which is now
    possible as we assign the `itemsize` when handling the casting above.
    This is fast check since it is comparing to C integers and only matters
    for Python 2 as the code is removed for newer Python versions.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    9b7641e View commit details
    Browse the repository at this point in the history
  7. Simplify array format checks

    Drop the checks for `uint8` and character `array`s in the general case
    in the casting code. Instead favor handling these checks as part of the
    Python 2 `elif` case after handling the unicode `array`s. This
    consolidates the code a bit. Also simplifies the number of checks being
    done on Python 3. The Python 2 case winds up being about the same other
    than it makes better usage of its branches. Should improve readability
    and streamline things a bit.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    09dcea2 View commit details
    Browse the repository at this point in the history
  8. Ignore coverage of Python 4+

    As there is no Python 4+ yet, we do not need to worry about coverage on
    the `else` case here. So inform `coverage` to not worry about it.
    
    ref: https://coverage.readthedocs.io/en/coverage-4.4.2/branch.html#structurally-partial-branches
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    321dc8b View commit details
    Browse the repository at this point in the history
  9. Update array casting comment

    Make this comment a bit more informative. Particularly after we have
    included changing `itemsize`.
    jakirkham committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    ae9f899 View commit details
    Browse the repository at this point in the history