New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: structured_to_unstructured: view more often #23652
Changes from 1 commit
5497568
326f3bb
a304021
6501fc9
d40a01a
ed29bda
fe4543b
3fa6295
a8752d2
836735e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -885,6 +885,52 @@ def count_elem(dt): | |
fields.extend([(d, c, o + i*size) for d, c, o in subfields]) | ||
return fields | ||
|
||
def _common_stride(offsets, counts, itemsize): | ||
""" | ||
Returns the stride between the fields, or None if the stride is not | ||
constant. The values in "counts" designate the lengths of | ||
sub-arrays. Sub-arrays are treated as many contiguous fields, with | ||
always positive stride. | ||
""" | ||
|
||
if len(offsets) <= 1: | ||
return itemsize | ||
|
||
negative = offsets[1] < offsets[0] # negative stride | ||
if negative: | ||
# reverse, so offsets will be ascending | ||
it = zip(reversed(offsets), reversed(counts)) | ||
else: | ||
it = zip(offsets, counts) | ||
|
||
prev_offset = None | ||
stride = None | ||
for offset, count in it: | ||
if count != 1: # sub array: always c-contiguous | ||
if negative: | ||
return None # sub-arrays can never have a negative stride | ||
if stride is None: | ||
stride = itemsize | ||
if stride != itemsize: | ||
return None | ||
end_offset = offset + (count - 1) * itemsize | ||
else: | ||
end_offset = offset | ||
|
||
if prev_offset is not None: | ||
new_stride = offset - prev_offset | ||
if stride is None: | ||
stride = new_stride | ||
if stride != new_stride: | ||
return None | ||
|
||
prev_offset = end_offset | ||
|
||
if stride is not None: | ||
if negative: | ||
return -stride | ||
return stride | ||
|
||
|
||
def _structured_to_unstructured_dispatcher(arr, dtype=None, copy=None, | ||
casting=None): | ||
|
@@ -960,7 +1006,7 @@ def structured_to_unstructured(arr, dtype=None, copy=False, casting='unsafe'): | |
if dtype is None: | ||
out_dtype = np.result_type(*[dt.base for dt in dts]) | ||
else: | ||
out_dtype = dtype | ||
out_dtype = np.dtype(dtype) | ||
|
||
# Use a series of views and casts to convert to an unstructured array: | ||
|
||
|
@@ -972,6 +1018,30 @@ def structured_to_unstructured(arr, dtype=None, copy=False, casting='unsafe'): | |
'itemsize': arr.dtype.itemsize}) | ||
arr = arr.view(flattened_fields) | ||
|
||
if (not copy) and all(dt.base == out_dtype for dt in dts): | ||
# all elements have the right dtype already; if they have a common | ||
# stride, we can just return a view | ||
common_stride = _common_stride(offsets, counts, out_dtype.itemsize) | ||
if common_stride is not None: | ||
# ensure that we have a real ndarray; other types (e.g. matrix) | ||
# have strange slicing behavior | ||
arr = arr.view(type=np.ndarray) | ||
new_shape = arr.shape + (sum(counts), out_dtype.itemsize) | ||
new_strides = arr.strides + (abs(common_stride), 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not use a negative There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give a suggestion of how you would do that cleanly? I can of course do something like this: # first_field_offset is the first byte of the first field
ar1 = arr[..., first_field_offset:]
# new_strides may be negative here
arr2 = np.lib.stride_tricks.as_strided(arr, new_shape, new_strides) If the stride is negative, all but the last element will be cut off by the slicing operation. I don't like that, because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at it again, I think things should just work with a negative stride, as long as you make sure that the offset is guaranteed to be for the first item in the list - for a negative stride, that will be largest offset, so the slice below will not go out of memory. It may well mean that you have to adjust There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You mean the code that I showed in my previous comment, but with
I don't see how that simplifies that function. It would stay identical as far as I can see. Let me illustrate why I don't like using a negative stride here: Let's say we have a structured array with two fields 'a' and 'b'. The first field is 'a', but 'b' is the first field in memory.
Now we perform the slicing operation to set the correct offset. > arr = arr[..., first_field_offset:] Now the memory looks like this:
Now we set the correct offset. This is the part I do not like, because we are effectively accessing memory, that was (temporarily) out-of-bounds. > arr = np.lib.stride_tricks.as_strided(arr,
new_shape,
new_strides) After that, it looks like this:
|
||
|
||
arr = arr[..., None].view(np.uint8) # view as bytes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
arr = arr[..., min(offsets):] # remove the leading unused data | ||
arr = np.lib.stride_tricks.as_strided(arr, | ||
new_shape, | ||
new_strides) | ||
|
||
# cast and drop the last dimension again | ||
arr = arr.view(out_dtype)[..., 0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just noticed something else: Is there a reason I'm missing for not using new_shape = arr.shape + (sum(counts),)
new_strides = arr.strides + (abs(common_stride),)
arr = np.ndarray(buffer=arr, dtype=out_dtype, shape=new_shape, strides=new_strides, offset=min(offsets)).view(type(a)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was not aware that constructor existed. I will try that out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm facing two issues with that:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, thanks for checking. Then the code is fine as-is. Somehow it seems like an oversight in the API that the |
||
|
||
if common_stride < 0: | ||
arr = arr[..., ::-1] # reverse, if the stride was negative | ||
return arr | ||
|
||
# next cast to a packed format with all fields converted to new dtype | ||
packed_fields = np.dtype({'names': names, | ||
'formats': [(out_dtype, dt.shape) for dt in dts]}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should trust that subclasses do this right, and instead explicitly exclude
matrix
- this is done elsewhere in the code too, andmatrix
is deprecated anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only allow the types
ndarray
,recarray
andmemmap
, at least for now.