Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Added field name to error message. Closes #6232 #6242

Merged
merged 3 commits into from Aug 25, 2015
Merged

BUG: Added field name to error message. Closes #6232 #6242

merged 3 commits into from Aug 25, 2015

Conversation

yashmehrotra
Copy link
Contributor

Changed error message (Added field name occurring more than once) as discussed in #6232

screenshot from 2015-08-25 03 02 05

@jaimefrio
Copy link
Member

We should add a test that validates the behavior.

Also, for Python 3 PyString_AsString is an alias for PyBytes_AsString, see here, which is likely not going to work very well if the input is a Python 3 Unicode string, will be surprised if you do not have to do some convert to ASCII magic in that case. But if you add the test suggested before, Travis will let us know how bad this is.

@yashmehrotra
Copy link
Contributor Author

@jaimefrio Ya, you were right. I have to convert Python3 Unicode string to ASCII else it fails in python3. Well, now I used PyUnicode_AsASCIIString to convert it. I first check whether its python3 or not, if it is I convert it to ASCII. I am now going to write the test and will update this PR as soon as possible.

@yashmehrotra
Copy link
Contributor Author

@jaimefrio I tested it with python 2.7.9 and python 3.4.3. Same for the test case written.

@njsmith
Copy link
Member

njsmith commented Aug 25, 2015

Do we support non-ASCII field names, and if so do they make things blow up?
(In Python 3 even variable names can legally contain non-ascii.)
.
It looks like in python 3 PyErr_Format uses PyUnicode_FromFormat which
accepts the special strings %S for "call str() on a PyObject_" and %R for
"call repr() on a PyObject_", but on py2 it uses PyString_FromFormat which
doesn't.
.
(I guess it isn't the end of the world if duplicate non-ASCII field names
trigger a misleading error message, but it would be nice.)

On Mon, Aug 24, 2015 at 5:23 PM, Yash Mehrotra notifications@github.com
wrote:

@jaimefrio https://github.com/jaimefrio I tested it with python 2.7.9
and python 3.4.3. Same for the test case written.


Reply to this email directly or view it on GitHub
#6242 (comment).

Nathaniel J. Smith -- http://vorpus.org

@yashmehrotra
Copy link
Contributor Author

@njsmith I made the changes as required, and I used PyUnicode_AsUTF8String. I tested it with a few non-ascii characters and it was working fine.
Example
screenshot from 2015-08-25 08 37 18

PyErr_SetString(PyExc_ValueError,
"two fields with the same name");
#if defined(NPY_PY3K)
name = PyUnicode_AsUTF8String(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyUnicode_AsUTF8String returns a new reference, see here, so you will need to Py_DECREF(next) afterwards. There may be other ways, but a possible solution could look like:

#if defined(NPY_PY3K)
            name = PyUnicode_AsUTF8String(name);
#else
            Py_INCREF(next);
#endif
            PyErr_Format(PyExc_ValueError,
                    "field '%s' occurs more than once", PyString_AsString(name));
            Py_DECREF(next);
            goto fail;

Also, we are assuming that PyUnicode_AsUTF8String will always succeed. This seems to be consistent with other uses of the same function in the NumPy code base, but I am not 100% sure it is a good practice.

@jaimefrio
Copy link
Member

Python 2's dtype barfs on unicode:

In [15]: np.dtype([(u"\u20b9", 'f8')])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-2265ac1c74a8> in <module>()
----> 1 np.dtype([(u"\u20b9", 'f8')])

TypeError: data type not understood

So I guess we do not need to worry about name being unicode instead of str.

Python 3 is happy:

In [11]: np.dtype([("\u20b9", 'f8')])
Out[11]: dtype([('₹', '<f8')])

@yashmehrotra
Copy link
Contributor Author

@jaimefrio Thanks for the help.

#endif
PyErr_Format(PyExc_ValueError,
"field '%s' occurs more than once", PyString_AsString(name));
Py_DECREF(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, if we only put this Py_DECREF in an #if defined(NPY_PY3K) block, the line count of the code is going to be the same, and we spare ourselves a spurious Py_INCREF / Py_DECREF pair in Python 2... Sorry for the noise.

…nting for multiple field names error message
@yashmehrotra
Copy link
Contributor Author

@jaimefrio I hope the latest commit is good to go. I removed the extra Py_INCREF. Also, I squashed my multiple commits to 2 commits. I hope it is merge-ready now.

"two fields with the same name");
#if defined(NPY_PY3K)
name = PyUnicode_AsUTF8String(name);
Py_DECREF(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work: once you decref next, it could be garbage collected before you call PyString_AsString on it. What I had in mind on my other comment was something like:

#if defined(NPY_PY3K)
            name = PyUnicode_AsUTF8String(name);
#endif
            PyErr_Format(PyExc_ValueError,
                    "field '%s' occurs more than once", PyString_AsString(name));
#if defined(NPY_PY3K)
            Py_DECREF(next);
#endif
            goto fail;

It ain't pretty, but neither was the previous version, and at least it puts all the weirdness in the Python 3 code path, and leaves the Python 2 one nice and clean.

@jaimefrio
Copy link
Member

Take a look at the comments I inlined with your code, I think that should be it, and we can merge. Thanks!

@yashmehrotra
Copy link
Contributor Author

@jaimefrio Thanks for your feedback. I have made the changes accordingly.

jaimefrio added a commit that referenced this pull request Aug 25, 2015
BUG: Added field name to error message. Closes #6232
@jaimefrio jaimefrio merged commit f306dd6 into numpy:master Aug 25, 2015
@jaimefrio
Copy link
Member

Thanks for your patience, Yash, in it goes.

Seems to be your first contribution to NumPy: welcome, we hope to see many more! ;-)

@yashmehrotra
Copy link
Contributor Author

@jaimefrio Thanks Jaime. It was a good experience. Yes, its my first contibution, and yes, I plan to contribute more 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants