Fix bugs around null termination bytes #11

ngoldbaum · 2022-12-07T22:55:16Z

I wasn't accounting properly for null termination bytes in my original implementation. I now append a null byte to the result of getitem.

I've also added tests and fixed some bugs around making sure data are truncated properly at array creation time and casting time if the dtype is too narrow to contain the full input data.

seberg · 2022-12-08T08:51:54Z

Just to note, NumPy's string/bytes does not store that null terminator, there is an implicit null terminator behind things. Of course you have to be very careful to not copy it.

mattip · 2022-12-08T10:25:21Z

I think this is wrong: the dtype itemsize should match the name.

ngoldbaum · 2022-12-08T14:02:22Z

Thanks for the feedback, I’m not a Numpy expert so I’m going to make mistakes like this and I really appreciate getting guidance towards the right approach.

When I took a look at the bytes type yesterday I thought there had to be null terminators in the array data but I guess I was wrong.

Should the dtype getitem add the null terminator when someone accesses a scalar?

seberg · 2022-12-08T14:15:06Z

Should the dtype getitem add the null terminator when someone accesses a scalar?

I would lean towards yes. Null termination should make some things easier and wasting one byte really doesn't matter for scalars.

ngoldbaum · 2022-12-08T21:29:38Z

asciidtype/tests/test_asciidtype.py

+    # dtype = ASCIIDType()
+    # arr = np.array(["hello", "this", "is", "an", "array"], dtype=dtype)
+    # assert repr(arr) == ("array(['', '', '', '', ''], dtype=ASCIIDType(0))")
+    # assert arr.tobytes() == b""


These tests are commented out because they depend on numpy/numpy#22763 being merged in numpy.

ngoldbaum · 2022-12-08T21:46:39Z

Now it doesn’t store the null characters anymore and getitem will pad the result with null.

I also realized the contiguous casting loop isn’t terribly useful because almost all casts aren’t aligned, so I’ve made it so there’s only one casting loop.

ngoldbaum · 2022-12-08T21:53:57Z

because almost all casts aren’t aligned

In particular, I found that the check for whether a cast is aligned does npy_uint_alignment(dtype->elsize), which only returns true for dtypes that are 1, 4, 8, or 16 bytes wide.

seberg · 2022-12-09T07:45:54Z

Eeek, the copying code has additional alignment for the purpose of complex loops... Because we copy complex64 (two 32bit floats) via uint64 which has a lrger alignment.

None of that is ideal, but due to that the alignment passed in is larger than you would think.

ngoldbaum added 6 commits December 7, 2022 15:52

configure maximum line length for black

9e951f6

add an extra byte for terminating null character

b2e9cdc

add tests for truncation behavior at creation time

66db4c0

fix typo in resolve_descriptors

6a37c85

update casting logic to account for null termination bytes

8dba927

add casting tests

f20e8e1

don't store null characters in the array

3abea62

ngoldbaum commented Dec 8, 2022

View reviewed changes

ngoldbaum merged commit 3b0d427 into numpy:main Dec 9, 2022

ngoldbaum mentioned this pull request Feb 1, 2023

uncomment zero-length test in ASCIIDType #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix bugs around null termination bytes #11

Fix bugs around null termination bytes #11

Uh oh!

ngoldbaum commented Dec 7, 2022 •

edited

Loading

Uh oh!

seberg commented Dec 8, 2022

Uh oh!

mattip commented Dec 8, 2022

Uh oh!

ngoldbaum commented Dec 8, 2022

Uh oh!

seberg commented Dec 8, 2022

Uh oh!

ngoldbaum Dec 8, 2022

Uh oh!

ngoldbaum commented Dec 8, 2022

Uh oh!

ngoldbaum commented Dec 8, 2022 •

edited

Loading

Uh oh!

seberg commented Dec 9, 2022

Uh oh!

Uh oh!

Uh oh!

Fix bugs around null termination bytes #11

Fix bugs around null termination bytes #11

Uh oh!

Conversation

ngoldbaum commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Dec 8, 2022

Uh oh!

mattip commented Dec 8, 2022

Uh oh!

ngoldbaum commented Dec 8, 2022

Uh oh!

seberg commented Dec 8, 2022

Uh oh!

ngoldbaum Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented Dec 8, 2022

Uh oh!

ngoldbaum commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Dec 9, 2022

Uh oh!

Uh oh!

ngoldbaum commented Dec 7, 2022 •

edited

Loading

ngoldbaum commented Dec 8, 2022 •

edited

Loading