New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow dtype field names to be ascii encoded unicode in Python2 #10672
Conversation
Allow unicode names in record arrays when datatype specified as tuples. Name is encoded as ascii if possible, raising an exception if not.
if (PyUString_Check(name)) { | ||
#else | ||
if ((PyUString_Check(name) || PyUnicode_Check(name))) { | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PyBaseString_Check(name)
(from npy_3kcompat.h
)
|
||
#if !defined(NPY_PY3) | ||
/* convert unicode name to ascii on Python 2 if possible */ | ||
if PyUnicode_Check(name){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax error
/* convert unicode name to ascii on Python 2 if possible */ | ||
if PyUnicode_Check(name){ | ||
Py_DECREF(name); | ||
name = PyUnicode_AsASCIIString(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not safe - you need to convert before you decref
The approach looks good, just some implementation details to fix. |
Thanks for the comments! I've made the changes. |
(No review, just changed the title to be meaningful.) |
/* convert unicode name to ascii on Python 2 if possible */ | ||
if (PyUnicode_Check(name)) { | ||
PyObject *tmp; | ||
tmp = PyUnicode_AsASCIIString(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could combine these lines
numpy/core/tests/test_multiarray.py
Outdated
# But raises UnicodeEncodeError if it can't be encoded: | ||
nonencodable_title = u'\uc3bc' | ||
assert_raises(UnicodeEncodeError, np.dtype, [(nonencodable_title, int)]) | ||
assert_raises(UnicodeEncodeError, np.dtype, [(('a', nonencodable_title), int)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this fail on py3 too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure what you're asking, but this part of test_multiarray.py
is within an if statement such that it only is defined on Python 2, if that answers your question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I would have expected a @skipIf
decorator for that, but that's cleanup for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation and test looks correct to me.
I've leave it to someone else to decide whether we want to fix #2407 at all - I think I'm +0.75 on it.
This maybe needs a release note, but I'd be ok without one. Again, will let someone else decide.
I am in favour of the fix: this particular bug meant that people where discouraged from using Given the above, I think a release note is a good idea. |
Thanks very much! I look forward to removing my workarounds for this once 1.15 is out. |
As in the earlier discussion, I'm also in favor.
|
@chrisjbillington Needs a release note in `doc/release/1.15.0-notes.rst' under Enhancements. |
Release note added! |
Thanks @chrisjbillington . |
(#2407)
Allow unicode names in record arrays when datatype specified as tuples in Python 2.
Name is encoded as ascii if possible, raising an exception if not.
The result passes round-tripping with pickle
Result: