Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Add a proper implementation for structured zerofill #23620

Merged
merged 2 commits into from
May 10, 2023

Conversation

seberg
Copy link
Member

@seberg seberg commented Apr 20, 2023

This reorganizes the generic traversal functions for structured dtypes a bit (before it wasn't quite generic).

Then uses that for zerofilling. We could get the two a bit closer by also supporting func==NULL explicitly for clearing. (I have not includes this here.)

The old FillObjectArray is still in use now, this is not ideal and the new approach should be duplicated to add an "emptyfill" (same semantics, normally just memset to 0, but for objects we place an explicit None object).

This is a necessary follow-up to gh-23591.

This reorganizes the generic traversal functions for structured
dtypes a bit (before it wasn't quite generic).

Then uses that for zerofilling.  We could get the two a bit closer
by also supporting `func==NULL` explicitly for clearing.
(I have not includes this here.)

The old `FillObjectArray` is still in use now, this is not ideal
and the new approach should be duplicated to add an "emptyfill"
(same semantics, normally just memset to 0, but for objects we
place an explicit `None` object).

This is a necessary follow-up to numpygh-23591.
@seberg seberg force-pushed the correct-zerofill-structured branch from 5a2b25f to 534c2e5 Compare April 20, 2023 13:50
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the stringdtype tests pass with this PR and I didn't see any code issues to comment about, LGTM!

@seberg
Copy link
Member Author

seberg commented Apr 20, 2023

One test we could add there (if you like), is that this should make np.zeros(3, dtype=[("a", np.float64), ("b", StringDType())]) work (although, I suspect there are still alignment issues, so you have to make sure to use an aligned struct dtype).

@ngoldbaum
Copy link
Member

Nice! It looks like it does successfully create the array with this PR (on main it dies in an assert(0) inside refcount.c). However it crashes a little bit later because the struct dtype printing code doesn't know about new dtypes:

In [2]: In [1]: np.zeros(3, dtype=[("a", np.float64), ("b", StringDType())])
Out[2]: 
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/.pyenv/versions/3.10.9-debug/lib/python3.10/site-packages/IPython/core/formatters.py:706, in PlainTextFormatter.__call__(self, obj)
    699 stream = StringIO()
    700 printer = pretty.RepresentationPrinter(stream, self.verbose,
    701     self.max_width, self.newline,
    702     max_seq_length=self.max_seq_length,
    703     singleton_pprinters=self.singleton_printers,
    704     type_pprinters=self.type_printers,
    705     deferred_pprinters=self.deferred_printers)
--> 706 printer.pretty(obj)
    707 printer.flush()
    708 return stream.getvalue()

File ~/.pyenv/versions/3.10.9-debug/lib/python3.10/site-packages/IPython/lib/pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File ~/.pyenv/versions/3.10.9-debug/lib/python3.10/site-packages/IPython/lib/pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File ~/Documents/numpy/numpy/core/arrayprint.py:1518, in _array_repr_implementation(arr, max_line_width, precision, suppress_small, array2string)
   1515 if skipdtype:
   1516     return arr_str
-> 1518 dtype_str = "dtype={})".format(dtype_short_repr(arr.dtype))
   1520 # compute whether we should put dtype on a new line: Do so if adding the
   1521 # dtype would extend the last line past max_line_width.
   1522 # Note: This line gives the correct result even when rfind returns -1.
   1523 last_line_len = len(arr_str) - (arr_str.rfind('\n') + 1)

File ~/Documents/numpy/numpy/core/arrayprint.py:1470, in dtype_short_repr(dtype)
   1467     return repr(dtype)
   1468 if dtype.names is not None:
   1469     # structured dtypes give a list or tuple repr
-> 1470     return str(dtype)
   1471 elif issubclass(dtype.type, flexible):
   1472     # handle these separately so they don't give garbage like str256
   1473     return "'%s'" % str(dtype)

File ~/Documents/numpy/numpy/core/_dtype.py:36, in __str__(dtype)
     34 def __str__(dtype):
     35     if dtype.fields is not None:
---> 36         return _struct_str(dtype, include_align=True)
     37     elif dtype.subdtype:
     38         return _subarray_str(dtype)

File ~/Documents/numpy/numpy/core/_dtype.py:310, in _struct_str(dtype, include_align)
    305 def _struct_str(dtype, include_align):
    306     # The list str representation can't include the 'align=' flag,
    307     # so if it is requested and the struct has the aligned flag set,
    308     # we must use the dict str instead.
    309     if not (include_align and dtype.isalignedstruct) and _is_packed(dtype):
--> 310         sub = _struct_list_str(dtype)
    312     else:
    313         sub = _struct_dict_str(dtype, include_align)

File ~/Documents/numpy/numpy/core/_dtype.py:297, in _struct_list_str(dtype)
    292     item += "{}, {}".format(
    293         _construction_repr(base, short=True),
    294         shape
    295     )
    296 else:
--> 297     item += _construction_repr(fld_dtype, short=True)
    299 item += ")"
    300 items.append(item)

File ~/Documents/numpy/numpy/core/_dtype.py:100, in _construction_repr(dtype, include_align, short)
     98     return _subarray_str(dtype)
     99 else:
--> 100     return _scalar_str(dtype, short=short)

File ~/Documents/numpy/numpy/core/_dtype.py:156, in _scalar_str(dtype, short)
    153     return dtype.type.__name__
    155 else:
--> 156     raise RuntimeError(
    157         "Internal error: NumPy dtype unrecognized type number")

RuntimeError: Internal error: NumPy dtype unrecognized type number

@seberg seberg requested a review from mattip April 26, 2023 19:33
@mattip
Copy link
Member

mattip commented Apr 27, 2023

Was the error above fixed?

@seberg
Copy link
Member Author

seberg commented Apr 27, 2023

Was the error above fixed?

The error Nathan referred to was in the printing code and is unrelated to this code. This fixes the assert (since it won't use that function anymore).

@ngoldbaum
Copy link
Member

We can file an issue about the structured dtype printing issue and fix that in a followup.

@charris charris merged commit 82a8297 into numpy:main May 10, 2023
57 checks passed
@charris
Copy link
Member

charris commented May 10, 2023

Thanks Sebastian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants