[query] Fix NDArrays of Non-numerics: Part 1 #9503
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CHANGELOG: Fixed bug where making NDArrays of non-numeric types would fail. Non-numeric ndarrays still cannot be collected to python though.
NDArrays of non numeric types are broken, have been for a while. No one seems to use them for that currently, so it hasn't been an issue, but I suspect with
dndarray
or BlockedMatrixTable experiments it's going to be desirable.This PR starts to address that problem by doing the following:
checkedConvertFrom
, which only supported primitive arrays, is replaced with the more flexiblecopyFromType
. As this was the only use ofcheckedConvertFrom
, I removed it altogether.Add tests that show that it's now possible to make an ndarray of non-numeric types, so long as the only things that get returned in python are numbers.
The remaining problems all involve conversions to numpy. If you never convert to numpy, things should be fine:
I need to get strides out of the Java ndarray representation. Strides make no sense for non-numeric objects after converting from Java to Python. We say the size of a required tuple of 3 int32's is 12 bytes, but that's not going to be the size of the python object
Strings are tricky too, since the numpy string dtype comes with a max length, so we'll have to do a pass over the strings to figure out how large the largest one is before converting.