-
-
Notifications
You must be signed in to change notification settings - Fork 9
Fix bugs around null termination bytes #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just to note, NumPy's string/bytes does not store that null terminator, there is an implicit null terminator behind things. Of course you have to be very careful to not copy it. |
I think this is wrong: the dtype itemsize should match the name. |
Thanks for the feedback, I’m not a Numpy expert so I’m going to make mistakes like this and I really appreciate getting guidance towards the right approach. When I took a look at the bytes type yesterday I thought there had to be null terminators in the array data but I guess I was wrong. Should the dtype getitem add the null terminator when someone accesses a scalar? |
I would lean towards yes. Null termination should make some things easier and wasting one byte really doesn't matter for scalars. |
# dtype = ASCIIDType() | ||
# arr = np.array(["hello", "this", "is", "an", "array"], dtype=dtype) | ||
# assert repr(arr) == ("array(['', '', '', '', ''], dtype=ASCIIDType(0))") | ||
# assert arr.tobytes() == b"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are commented out because they depend on numpy/numpy#22763 being merged in numpy.
Now it doesn’t store the null characters anymore and I also realized the contiguous casting loop isn’t terribly useful because almost all casts aren’t aligned, so I’ve made it so there’s only one casting loop. |
In particular, I found that the check for whether a cast is aligned does |
Eeek, the copying code has additional alignment for the purpose of complex loops... Because we copy complex64 (two 32bit floats) via uint64 which has a lrger alignment. None of that is ideal, but due to that the alignment passed in is larger than you would think. |
I wasn't accounting properly for null termination bytes in my original implementation. I now append a null byte to the result of getitem.
I've also added tests and fixed some bugs around making sure data are truncated properly at array creation time and casting time if the dtype is too narrow to contain the full input data.