Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy cimport #1406

Merged
merged 32 commits into from
Oct 23, 2019
Merged

Numpy cimport #1406

merged 32 commits into from
Oct 23, 2019

Conversation

kif
Copy link
Contributor

@kif kif commented Oct 15, 2019

This PR is the continuation of the cleaning up of numpy's c-import.

The file numpy.pxd has disappeared but there is some of it remaining in _conv.pyx.
The last part to be removed should probably be by using cython's memory-views to build a structure on top of a memory-space coming from hdf5 before creating the numpy array instead of stealing references which is commented as bas practice.

There are likely to be a
related to #1367 #1405

One should notice the large contribution of @t20100 in the cleaning up of _conv.pyx. We are not (yet) completely sure about all the refcounting gym which was completely changed.

changing the ref counting and the implicit cast if possible.
Replace the comment with "raise RuntimeError()" does
not prevent the tests from passing.
Related to issue h5py#1405
@codecov
Copy link

codecov bot commented Oct 15, 2019

Codecov Report

Merging #1406 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1406   +/-   ##
=======================================
  Coverage   84.48%   84.48%           
=======================================
  Files          17       17           
  Lines        2037     2037           
=======================================
  Hits         1721     1721           
  Misses        316      316

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30c0da3...c7623be. Read the comment docs.

@kif
Copy link
Contributor Author

kif commented Oct 16, 2019

I believe you can start reading the stuff...
close #1367
I had to replace a cimport in numpy by another-one to properly set the flags. I did not find any alternative solution.

@kif
Copy link
Contributor Author

kif commented Oct 17, 2019

for info:
close #1400
close #1367

Copy link
Member

@takluyver takluyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, but I've spotted a few things.

I'd ideally like another pair of eyes to review the changes in _conv.pyx in particular - I don't follow some of the Cython syntax precisely, especially where pointers are involved.

h5py/h5t.pyx Outdated Show resolved Hide resolved
h5py/h5t.pyx Outdated Show resolved Hide resolved
h5py/utils.pyx Outdated Show resolved Hide resolved
h5py/_conv.pyx Outdated
if sizes.cset == H5T_CSET_ASCII:
temp_object = bytes(temp_object)
elif sizes.cset == H5T_CSET_UTF8:
temp_object = str(temp_object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we encode this, so that temp_object always comes out as bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let @t20100 comment ... I agree it is a big re-write.
If you think it should always be bytes, it is OK for me to change the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it should always be bytes. You're calling .decode() on it below, which won't work for unicode, and your comment below says "temp_object is bytes".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should be encoded to utf-8 and temp_object should be a bytes.

I also think there is an issue with the current code before this PR here:

                if sizes.cset == H5T_CSET_ASCII:
                    temp_object = PyObject_Str(buf_obj0)
                    temp_string = PyBytes_AsString(temp_object)
                    temp_string_len = PyBytes_Size(temp_object)

In this code temp_object is a unicode but then accessed with the bytes API.
I'll look at it again with a fresh eye tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. It was probably overlooked when adding Python 3 support. It seems ot work somehow, though. Maybe Cython does something clever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code in master fails there with a TypeError.
I fixed the new implementation (PR on @kif branch).

I'll make a small PR with more rework of this function and some tests.

h5py/_conv.pyx Outdated
temp_string_len = PyBytes_Size(temp_encoded)
if sizes.cset == H5T_CSET_UTF8:
try:
temp_object.decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to skip decoding if we've just encoded it, because then we know it's valid UTF-8. But that doesn't need to be part of this PR. There are some changes planned for string handling in 3.0 anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you speaking of h5py 3.0 ... because all the string handling is already performed in python3 syntax

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant h5py 3.0. See #1338 for what's planned. For this PR, though, let's keep things working the way they do in h5py 2.x, otherwise it will be much harder to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try:.. except part can be removed, it was there to replace a test that was comment 8 years ago (db045a1)

h5py/_conv.pyx Outdated Show resolved Hide resolved
h5py/_conv.pyx Show resolved Hide resolved
h5py/_conv.pyx Outdated Show resolved Hide resolved
@takluyver
Copy link
Member

The static checks are complaining about trailing whitespace in utils.pyx, _conv.pyx and h5t.pyx.

@takluyver
Copy link
Member

N.B. to run those same checks locally, run tox -e pre-commit

h5py/tests/__init__.py Outdated Show resolved Hide resolved
@kif
Copy link
Contributor Author

kif commented Oct 20, 2019 via email

h5py/_conv.pyx Outdated Show resolved Hide resolved
h5py/h5t.pyx Outdated Show resolved Hide resolved
h5py/h5t.pyx Outdated
@@ -1642,7 +1637,7 @@ cpdef TypeID py_create(object dtype_in, bint logical=0, bint aligned=0):
return _c_complex(dt)

# Compound
elif (kind == c'V') and (dtype_in.names is not None):
elif (kind == c'V') and (getattr(dt, "names") is not None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is getattr() necessary? dt should always be a dtype object, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dt gives access to the C struct of dtype, and in there, names is either a tuple or a NULL pointer instead of None (see https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L639).
The use of getattr makes sure names it is accessed through the Python API in order to get None.
An alternative way of writing it is casting dt to a python object first: ((<object> dt).names != None)
And an equivalent check but written the C-like way is (<void *> dt.names != NULL).
(Those alternatives work).

```
        /*
         * An ordered tuple of field names or NULL
         * if no fields are defined
         */
        PyObject *names;
```
Like this we enforce the use of Python side of the dtype
@kif
Copy link
Contributor Author

kif commented Oct 21, 2019 via email

vlen string converter updates
@t20100
Copy link
Contributor

t20100 commented Oct 23, 2019

Travis needs to be restarted as it failed while installing packages with apt.

@takluyver
Copy link
Member

Aha, thanks both for explaining why getattr is needed. Can I ask you to put a summary of that in a comment? E.g. "getattr is used to force Python attribute access, as dt.names may be a NULL pointer at the C level"

I want to ensure it's clear to someone reading the code that it's not an oversight (like I thought before you explained).

@takluyver
Copy link
Member

Other than that, this LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants