-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numpy cimport #1406
Numpy cimport #1406
Conversation
changing the ref counting and the implicit cast if possible.
Replace the comment with "raise RuntimeError()" does not prevent the tests from passing. Related to issue h5py#1405
Codecov Report
@@ Coverage Diff @@
## master #1406 +/- ##
=======================================
Coverage 84.48% 84.48%
=======================================
Files 17 17
Lines 2037 2037
=======================================
Hits 1721 1721
Misses 316 316 Continue to review full report at Codecov.
|
(no more performance impact since cython 0.1x)
Only trivial cases were addressed.
I believe you can start reading the stuff... |
Thanks Thomas for pointing the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, but I've spotted a few things.
I'd ideally like another pair of eyes to review the changes in _conv.pyx
in particular - I don't follow some of the Cython syntax precisely, especially where pointers are involved.
h5py/_conv.pyx
Outdated
if sizes.cset == H5T_CSET_ASCII: | ||
temp_object = bytes(temp_object) | ||
elif sizes.cset == H5T_CSET_UTF8: | ||
temp_object = str(temp_object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we encode this, so that temp_object
always comes out as bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let @t20100 comment ... I agree it is a big re-write.
If you think it should always be bytes, it is OK for me to change the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it should always be bytes. You're calling .decode()
on it below, which won't work for unicode, and your comment below says "temp_object is bytes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be encoded to utf-8 and temp_object should be a bytes
.
I also think there is an issue with the current code before this PR here:
if sizes.cset == H5T_CSET_ASCII:
temp_object = PyObject_Str(buf_obj0)
temp_string = PyBytes_AsString(temp_object)
temp_string_len = PyBytes_Size(temp_object)
In this code temp_object
is a unicode but then accessed with the bytes
API.
I'll look at it again with a fresh eye tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. It was probably overlooked when adding Python 3 support. It seems ot work somehow, though. Maybe Cython does something clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current code in master fails there with a TypeError
.
I fixed the new implementation (PR on @kif branch).
I'll make a small PR with more rework of this function and some tests.
h5py/_conv.pyx
Outdated
temp_string_len = PyBytes_Size(temp_encoded) | ||
if sizes.cset == H5T_CSET_UTF8: | ||
try: | ||
temp_object.decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to skip decoding if we've just encoded it, because then we know it's valid UTF-8. But that doesn't need to be part of this PR. There are some changes planned for string handling in 3.0 anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you speaking of h5py 3.0 ... because all the string handling is already performed in python3 syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant h5py 3.0. See #1338 for what's planned. For this PR, though, let's keep things working the way they do in h5py 2.x, otherwise it will be much harder to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try:.. except
part can be removed, it was there to replace a test that was comment 8 years ago (db045a1)
The static checks are complaining about trailing whitespace in |
N.B. to run those same checks locally, run |
On Sun, 20 Oct 2019 05:27:00 -0700 James Tocknell ***@***.***> wrote:
aragilar commented on this pull request.
> @@ -18,6 +18,6 @@ def run_tests(args=''):
from shlex import split
from subprocess import call
from sys import executable
- cli = [executable, "-m", "pytest", "--pyargs", "h5py"]
+ cli = [executable, "-m", "pytest", "--pyargs", "h5py", "-x"]
We probably want to allow users to run the full test suite, rather than failing on the first test. Calling `run_tests("-x")` would be equivalent to this change.
My bad ... sorry for committing that.
|
h5py/h5t.pyx
Outdated
@@ -1642,7 +1637,7 @@ cpdef TypeID py_create(object dtype_in, bint logical=0, bint aligned=0): | |||
return _c_complex(dt) | |||
|
|||
# Compound | |||
elif (kind == c'V') and (dtype_in.names is not None): | |||
elif (kind == c'V') and (getattr(dt, "names") is not None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is getattr()
necessary? dt
should always be a dtype object, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dt
gives access to the C struct of dtype, and in there, names
is either a tuple or a NULL pointer instead of None
(see https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L639).
The use of getattr
makes sure names
it is accessed through the Python API in order to get None
.
An alternative way of writing it is casting dt to a python object first: ((<object> dt).names != None)
And an equivalent check but written the C-like way is (<void *> dt.names != NULL)
.
(Those alternatives work).
``` /* * An ordered tuple of field names or NULL * if no fields are defined */ PyObject *names; ``` Like this we enforce the use of Python side of the dtype
On Mon, 21 Oct 2019 06:16:01 -0700 Thomas Kluyver ***@***.***> wrote:
- elif (kind == c'V') and (dtype_in.names is not None):
+ elif (kind == c'V') and (getattr(dt, "names") is not None):
Is `getattr()` necessary? `dt` should always be a dtype object, no?
The real problem is that a dtype, on the c-side, may have "names" being a null
pointer if it is not a compound dtype as explained in the source of numpy:
https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L639
Then our problem is just how to make "names" be accessed on the
pythonic side of the dtype ?
…--
Jérôme Kieffer
|
this test was commented in previous implementation
vlen string converter updates
Travis needs to be restarted as it failed while installing packages with apt. |
Aha, thanks both for explaining why getattr is needed. Can I ask you to put a summary of that in a comment? E.g. "getattr is used to force Python attribute access, as dt.names may be a NULL pointer at the C level" I want to ensure it's clear to someone reading the code that it's not an oversight (like I thought before you explained). |
Other than that, this LGTM |
This PR is the continuation of the cleaning up of numpy's c-import.
The file numpy.pxd has disappeared but there is some of it remaining in _conv.pyx.
The last part to be removed should probably be by using cython's memory-views to build a structure on top of a memory-space coming from hdf5 before creating the numpy array instead of stealing references which is commented as bas practice.
There are likely to be a
related to #1367 #1405
One should notice the large contribution of @t20100 in the cleaning up of _conv.pyx. We are not (yet) completely sure about all the refcounting gym which was completely changed.