Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix unicode with byte swap transfer and copyswap #7664

Merged
merged 4 commits into from
Jun 1, 2016

Conversation

seberg
Copy link
Member

@seberg seberg commented May 23, 2016

These fixes should make unicode byteswapping not be
completely broken. The code is not so much designed for absolute
speed.

Tests still needed any comments appreciated. I tried to keep things simple, so used the copyswap function wrappers....

@@ -1086,7 +1086,8 @@ _strings_richcompare(PyArrayObject *self, PyArrayObject *other, int cmp_op,
int val;

/* Cast arrays to a common type */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That whole casting if, could probably be moved to the new iterator, but I think that is good enough on another day maybe.

@seberg
Copy link
Member Author

seberg commented May 24, 2016

OK, added some basic tests. To be honest, I guess nobody ever runs into this. I suppose nobody saves unicode arrays on big endian machines and tries to load them on another one ;). And all the other libs just use string/byte arrays and decode those.

int i;

while (N > 0) {
memcpy(dst, src, dst_itemsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this read too far in src if src_itemsize < dst_itemsize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good point. I guess the tests just don't care that much about a bit reading too far ;). (I hope valgrind would show it but OK).

Just to note, I did not check the python 3 failure here, and I noticed there is a bug that casting from S -> nonnative U actually seems to reverse the string, and I am not sure why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, checked, the tests show the bad behaviour in valgrind, so I will call it covered by tests already considering that it is non-trivial to test more reliably then that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed both things, but the extra inversion of the fields really throws me off still

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed that weird string cast as well, heh.

@seberg seberg force-pushed the fix-unicode-trans branch 4 times, most recently from b60965c to 414bca2 Compare May 29, 2016 11:17
@@ -3628,11 +3672,19 @@ PyArray_GetDTypeTransferFunction(int aligned,
}
}

/* The special types, which have no byte-order */
/* The special types, which have no or subelement byte-order */
switch (src_type_num) {
case NPY_VOID:
Copy link
Member

@ahaldane ahaldane May 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while NPY_VOID will always fail the if-statement in the next NPY_UNICODE case (there's no bug), maybe it is clearer to put the NPY_VOID case below the unicode case?

@ahaldane
Copy link
Member

It looks good to me so far. I just want to check one or two things, I'll get back to it soon.

else if (dtype->kind == 'U') {
return wrap_copy_swap_function(aligned,
src_stride, dst_stride, dtype, 1,
outstransfer, outtransferdata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might this be changed to a call to PyArray_GetStridedZeroPadCopyFn? I would imagine that would be a small performance gain over using the element-wise copy_swap function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can change, not sure it should be faster, since I would expect this uses the copyswapn function and not copyswap which might actually be quicker, but did not check.

@ahaldane
Copy link
Member

OK, finished reading. Everything else looks good to me.

@ahaldane
Copy link
Member

ahaldane commented Jun 1, 2016

@seberg, good point that it uses copyswapn instead of copyswap. Probably it is pretty fast as-is.

In that case, is there anything else you want to add? If not, it looks good to me and we can merge.

@seberg
Copy link
Member Author

seberg commented Jun 1, 2016

Could still squash (on the run can do tomorrow). Don't have anything to add, can't think of any big bug surrounding this I might have missed. If you have any nitpick will fix of course.

These fixes should make unicode byteswapping not be
completely broken. The code is not so much designed for absolute
speed.

Fixes numpygh-3939
Casting to non-negative unicode used the wrong swapping
functionality.
Just to note, the whole swapping is rather inefficient, since
it is completely unnecessary due to the fact that we go via
python in any case, though likely it does not matter.
@seberg
Copy link
Member Author

seberg commented Jun 1, 2016

OK, squased/rebased from bus, so got to let the test run through (will probably take a while).

@ahaldane
Copy link
Member

ahaldane commented Jun 1, 2016

Tests passed, proving bus-based development works.

Merging now. Thanks Sebastian!

@ahaldane ahaldane merged commit bd82a0d into numpy:master Jun 1, 2016
@seberg seberg deleted the fix-unicode-trans branch June 2, 2016 07:43
@seberg
Copy link
Member Author

seberg commented Jun 2, 2016

Thanks for the review Allan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants