-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle array format fixes with C strings #18
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a51503c
to
453eb04
Compare
Makes it a little clearer that these typecodes are intended to represent unsigned integer types as opposed to UCS types per se.
When handling the `array` case, go ahead and copy the `itemsize` and `format` over to start with on Python 2. This way Python 2/3 cases are handled basically the same after this point. Then use `_format` to perform string comparisons in C instead of Python. As the strings are all short, these are very fast. This avoids the overhead of Python string comparison. Update the unicode strategy to handle the fact that UCS2 is represented by the `"u"` format and UCS4 is represented by the `"w"` format. Specify the type casting for each case using the appropriate unsigned integer width type as before. As a result the format check now occurs at run-time instead of build-time, this should make binaries more portable. Explicitly check for the character array format `"c"` on Python 2. In this case patch the format, to be the unsigned char format `"B"`. This is basically what we did before. However now we do so explicitly. As this type only exists on Python 2 (not Python 3), there is no need to handle this case for other Python version. So include `PY2K` in the conditional to ensure this check is not included on Python 3.
As `Py_UNICODE_SIZE` is no longer being used in the comparisons, skip writing it to `config.pxi` as part of the build in `setup.py`.
453eb04
to
b30ad91
Compare
Matches nicely with the ordering below. Also suggests implicitly what formats we start with and what they are mapped to.
Shortens the code a bit and improves readability.
1aebe99
to
7af1ecb
Compare
In some cases the unicode typecode `"w"` shows up, but it is not well documented and is on its way out as well. That said, the code assumed this to be UCS4. This may very well be true, but it is a bit unclear. Similarly on Python 2, it seems that the unicode typecode `"u"` can mean either UCS2 or UCS4. Given the confusion about the unicode typecodes, lump all comparisons of them together and simply handle casting based on `itemsize`. This should avoid getting into any technical issues of these legacy types while still maintaining the intended behavior (casting to an appropriately sized unsigned integer). While we are at it, go ahead skip defining these legacy types as they don't have as a clear of a mapping as one might initially think.
96bbe67
to
4cd0361
Compare
Instead of using `strcmp`, simply access the first `char` of the `format` string and assign it to `fmt` a `char` type. Then compare the different values that `fmt` can take on. Cython turns this into a `switch`/`case` block with relatively little interference. So the compiler can easily optimize this into the appropriate form.
This makes it a little easier to read the code either in Cython or its C equivalent.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Copies over
format
anditemsize
information for Python 2 to start with. This largely removes the differences between Python 2/3array
format handling (with the exception of Python 2 only formats). Then handle format checks with C strings for all cases. As the strings should all by onechar
long in our case. Simply compare thechar
values directly.As unicode
array
s are represented with either a"u"
or"w"
format to indicate UCS2 or UCS4, check both of these independently at runtime. This drops the use ofPy_UNICODE_SIZE
in the code, which should make binaries a bit more portable as they are not dependent on the underlying unicode character size.Checks for the Python 2 character array format case
"c"
explicitly. This was handled implicitly before as the Python 2 code path through the old buffer protocol casting everything to the unsigned bytes format"B"
. However now it needs to be handled explicitly as we want to have our data treated as unsigned bytes instead of the legacy character format.All other cases simply leverage whatever the
array
exported either through the buffer protocol on Python 3 or what we patched in afterwards from thearray
's information on Python 2. This makes it a little clearer what the unusual cases are that need patching (i.e. unicode and character arrays). Also it makes it a little clearer that on Python 2 we are pulling this information from thearray
. Appears to improve performance as well. Not to mention binaries are a bit more portable.