Numpy cimport #1406

kif · 2019-10-15T15:53:08Z

This PR is the continuation of the cleaning up of numpy's c-import.

The file numpy.pxd has disappeared but there is some of it remaining in _conv.pyx.
The last part to be removed should probably be by using cython's memory-views to build a structure on top of a memory-space coming from hdf5 before creating the numpy array instead of stealing references which is commented as bas practice.

There are likely to be a
related to #1367 #1405

One should notice the large contribution of @t20100 in the cleaning up of _conv.pyx. We are not (yet) completely sure about all the refcounting gym which was completely changed.

changing the ref counting and the implicit cast if possible.

Replace the comment with "raise RuntimeError()" does not prevent the tests from passing. Related to issue h5py#1405

codecov · 2019-10-15T16:06:18Z

Codecov Report

Merging #1406 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1406   +/-   ##
=======================================
  Coverage   84.48%   84.48%           
=======================================
  Files          17       17           
  Lines        2037     2037           
=======================================
  Hits         1721     1721           
  Misses        316      316

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30c0da3...c7623be. Read the comment docs.

(no more performance impact since cython 0.1x)

Only trivial cases were addressed.

kif · 2019-10-16T16:07:21Z

I believe you can start reading the stuff...
close #1367
I had to replace a cimport in numpy by another-one to properly set the flags. I did not find any alternative solution.

Thanks Thomas for pointing the bug.

kif · 2019-10-17T16:19:57Z

for info:
close #1400
close #1367

takluyver

This looks good overall, but I've spotted a few things.

I'd ideally like another pair of eyes to review the changes in _conv.pyx in particular - I don't follow some of the Cython syntax precisely, especially where pointers are involved.

h5py/h5t.pyx

h5py/utils.pyx

takluyver · 2019-10-18T09:35:48Z

h5py/_conv.pyx

+            if sizes.cset == H5T_CSET_ASCII:
+                temp_object = bytes(temp_object)
+            elif sizes.cset == H5T_CSET_UTF8:
+                temp_object = str(temp_object)


Shouldn't we encode this, so that temp_object always comes out as bytes?

I will let @t20100 comment ... I agree it is a big re-write.
If you think it should always be bytes, it is OK for me to change the code.

I do think it should always be bytes. You're calling .decode() on it below, which won't work for unicode, and your comment below says "temp_object is bytes".

Yes it should be encoded to utf-8 and temp_object should be a bytes.

I also think there is an issue with the current code before this PR here:

if sizes.cset == H5T_CSET_ASCII: temp_object = PyObject_Str(buf_obj0) temp_string = PyBytes_AsString(temp_object) temp_string_len = PyBytes_Size(temp_object)

In this code temp_object is a unicode but then accessed with the bytes API.
I'll look at it again with a fresh eye tomorrow.

I think you're right. It was probably overlooked when adding Python 3 support. It seems ot work somehow, though. Maybe Cython does something clever.

Current code in master fails there with a TypeError.
I fixed the new implementation (PR on @kif branch).

I'll make a small PR with more rework of this function and some tests.

takluyver · 2019-10-18T09:37:49Z

h5py/_conv.pyx

-                temp_string_len = PyBytes_Size(temp_encoded)
+        if sizes.cset == H5T_CSET_UTF8:
+            try:
+                temp_object.decode('utf-8')


It would be good to skip decoding if we've just encoded it, because then we know it's valid UTF-8. But that doesn't need to be part of this PR. There are some changes planned for string handling in 3.0 anyway.

Are you speaking of h5py 3.0 ... because all the string handling is already performed in python3 syntax

Yes, I meant h5py 3.0. See #1338 for what's planned. For this PR, though, let's keep things working the way they do in h5py 2.x, otherwise it will be much harder to review.

This try:.. except part can be removed, it was there to replace a test that was comment 8 years ago (db045a1)

h5py/_conv.pyx

takluyver · 2019-10-18T13:12:08Z

The static checks are complaining about trailing whitespace in utils.pyx, _conv.pyx and h5t.pyx.

takluyver · 2019-10-18T14:02:08Z

N.B. to run those same checks locally, run tox -e pre-commit

h5py/tests/__init__.py

kif · 2019-10-20T16:38:29Z

On Sun, 20 Oct 2019 05:27:00 -0700 James Tocknell ***@***.***> wrote: aragilar commented on this pull request. > @@ -18,6 +18,6 @@ def run_tests(args=''): from shlex import split from subprocess import call from sys import executable - cli = [executable, "-m", "pytest", "--pyargs", "h5py"] + cli = [executable, "-m", "pytest", "--pyargs", "h5py", "-x"] We probably want to allow users to run the full test suite, rather than failing on the first test. Calling `run_tests("-x")` would be equivalent to this change.

My bad ... sorry for committing that.

h5py/_conv.pyx

h5py/h5t.pyx

takluyver · 2019-10-21T13:15:58Z

h5py/h5t.pyx

@@ -1642,7 +1637,7 @@ cpdef TypeID py_create(object dtype_in, bint logical=0, bint aligned=0):
            return _c_complex(dt)

        # Compound
-        elif (kind == c'V') and (dtype_in.names is not None):
+        elif (kind == c'V') and (getattr(dt, "names") is not None):


Is getattr() necessary? dt should always be a dtype object, no?

dt gives access to the C struct of dtype, and in there, names is either a tuple or a NULL pointer instead of None (see https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L639).
The use of getattr makes sure names it is accessed through the Python API in order to get None.
An alternative way of writing it is casting dt to a python object first: ((<object> dt).names != None)
And an equivalent check but written the C-like way is (<void *> dt.names != NULL).
(Those alternatives work).

``` /* * An ordered tuple of field names or NULL * if no fields are defined */ PyObject *names; ``` Like this we enforce the use of Python side of the dtype

kif · 2019-10-21T14:58:49Z

On Mon, 21 Oct 2019 06:16:01 -0700 Thomas Kluyver ***@***.***> wrote: - elif (kind == c'V') and (dtype_in.names is not None): + elif (kind == c'V') and (getattr(dt, "names") is not None): Is `getattr()` necessary? `dt` should always be a dtype object, no?

The real problem is that a dtype, on the c-side, may have "names" being a null pointer if it is not a compound dtype as explained in the source of numpy: https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L639 Then our problem is just how to make "names" be accessed on the pythonic side of the dtype ?

…

-- Jérôme Kieffer

this test was commented in previous implementation

vlen string converter updates

t20100 · 2019-10-23T09:25:44Z

Travis needs to be restarted as it failed while installing packages with apt.

takluyver · 2019-10-23T10:19:14Z

Aha, thanks both for explaining why getattr is needed. Can I ask you to put a summary of that in a comment? E.g. "getattr is used to force Python attribute access, as dt.names may be a NULL pointer at the C level"

I want to ensure it's clear to someone reading the code that it's not an oversight (like I thought before you explained).

takluyver · 2019-10-23T10:27:14Z

Other than that, this LGTM

kif added 8 commits October 15, 2019 17:21

Remove numpy.pxd

252e3bf

major rewrite of vlen2str,

8e85616

changing the ref counting and the implicit cast if possible.

merge the changes from upstream

2bad5b6

Added comments for untested function

4209de3

Replace the comment with "raise RuntimeError()" does not prevent the tests from passing. Related to issue h5py#1405

remove reference to numpy

688d221

make it run again

1b48963

remove trailing spaces

5d49372

undo quick fail of the tests

5cc0f33

kif added 3 commits October 16, 2019 09:05

Use the pythonic writing of for loops

91b7e19

(no more performance impact since cython 0.1x)

replace memcpy with explicit assignment

64edf49

replace trivial memcpy with direct assignment

dc9f942

Only trivial cases were addressed.

kif added 2 commits October 16, 2019 21:23

right-trim lines

11e1146

Use the python version for checking array flags.

6c029eb

Thanks Thomas for pointing the bug.

t20100 mentioned this pull request Oct 17, 2019

Fix h5py._errors.unsilence_errors #1353

Merged

This was referenced Oct 17, 2019

unsupported operand type(s) for &: 'numpy.flagsobj' and 'int' #1400

Closed

Enable nogil #1412

Merged

takluyver reviewed Oct 18, 2019

View reviewed changes

kif added 5 commits October 18, 2019 13:40

Correction based on the review of ThomasK: swtch to cnp.dtype

d91a808

Update docstring and explain why the argument is mangled

5cf3f5d

Correct doc-string

ca4a187

update doc-string

e1bd8cc

update comment

05e2a2b

aragilar reviewed Oct 20, 2019

View reviewed changes

h5py/tests/__init__.py Outdated Show resolved Hide resolved

revert the test execution sequence (@aragilar)

b803a52

r-trim lines

f6d5578

takluyver reviewed Oct 21, 2019

View reviewed changes

h5py/_conv.pyx Outdated Show resolved Hide resolved

takluyver reviewed Oct 21, 2019

View reviewed changes

h5py/h5t.pyx Outdated Show resolved Hide resolved

avoid manglinging dtype_in as ThomasK suggested

00e1376

takluyver reviewed Oct 21, 2019

View reviewed changes

kif added 2 commits October 21, 2019 15:46

Consistency on the signature of functions

5ea6d39

We found in the doc of numpy that the "names" may be a null pointer ...

8d072ff

``` /* * An ordered tuple of field names or NULL * if no fields are defined */ PyObject *names; ``` Like this we enforce the use of Python side of the dtype

t20100 added 5 commits October 22, 2019 10:57

encode str to utf-8

8d7fb22

remove checking utf-8 is correct

e81df83

this test was commented in previous implementation

fix block indentation error

0216511

trim whitespace

05c13d5

Fix converting objects for H5T_CSET_ASCII

5c3a0e2

takluyver mentioned this pull request Oct 22, 2019

vlen string converter updates kif/h5py#1

Merged

Merge pull request #1 from t20100/numpy_cimport

c4727d3

vlen string converter updates

kif and others added 3 commits October 23, 2019 13:35

Provide a couple of comment on why things are done like this.

0428e35

Update apt caches on Azure before trying to install packages

2f60a22

Strip trailing whitespace

c7623be

takluyver merged commit eb670b8 into h5py:master Oct 23, 2019

t20100 mentioned this pull request Oct 23, 2019

Add a few tests for writing vlen string datasets #1420

Merged

takluyver mentioned this pull request Nov 6, 2019

Clean up cimport of numpy #1367

Closed

takluyver added this to the 3.0 milestone Nov 25, 2019

takluyver mentioned this pull request Nov 5, 2020

Error in 3.0 reading attributes from Matlab struct #1742

Closed

takluyver mentioned this pull request Feb 18, 2021

Fix reading data with vlen array of fixed-length strings #1819

Merged

drew-parsons mentioned this pull request Jan 19, 2024

test_compound_vlen_bool intermittently fails on armhf architecture #1927

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numpy cimport #1406

Numpy cimport #1406

kif commented Oct 15, 2019 •

edited

Loading

codecov bot commented Oct 15, 2019 •

edited

Loading

kif commented Oct 16, 2019

kif commented Oct 17, 2019

takluyver left a comment

takluyver Oct 18, 2019

kif Oct 18, 2019

takluyver Oct 21, 2019

t20100 Oct 21, 2019

takluyver Oct 21, 2019

t20100 Oct 22, 2019

takluyver Oct 18, 2019

kif Oct 18, 2019

takluyver Oct 18, 2019

t20100 Oct 21, 2019

takluyver commented Oct 18, 2019

takluyver commented Oct 18, 2019

kif commented Oct 20, 2019 via email

takluyver Oct 21, 2019

t20100 Oct 21, 2019

kif commented Oct 21, 2019 via email

t20100 commented Oct 23, 2019

takluyver commented Oct 23, 2019

takluyver commented Oct 23, 2019

Numpy cimport #1406

Numpy cimport #1406

Conversation

kif commented Oct 15, 2019 • edited Loading

codecov bot commented Oct 15, 2019 • edited Loading

Codecov Report

kif commented Oct 16, 2019

kif commented Oct 17, 2019

takluyver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Oct 18, 2019

takluyver commented Oct 18, 2019

kif commented Oct 20, 2019 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kif commented Oct 21, 2019 via email

t20100 commented Oct 23, 2019

takluyver commented Oct 23, 2019

takluyver commented Oct 23, 2019

kif commented Oct 15, 2019 •

edited

Loading

codecov bot commented Oct 15, 2019 •

edited

Loading