Skip to content

Conversation

ngoldbaum
Copy link
Member

This brings over some functionality from asciidtype. I also include some asciidtype changes here to keep stringdtype and asciidtype more uniform.

  • Makes StringScalar a str subclass. This makes it possible to use the functions in np.char to manipulate StringDType data
    • Adds partition and rpartition wrappers to make sure the result of those functions has uniform types.
  • Adds a missing incref to common_instance. Also simplifies it a bit since StringDType isn't parametric.
  • Can create a StringDType with an optional size argument, which gets ignored. This makes the API more uniform with numpy's other string dtypes. In particular, this way of creating a string dtype is now used in np.char, so this also allows StringDType to work with functions in np.char. Currently we just ignore the size but in principle we could use it as a size hint? I'm not sure if that's actually useful.

@seberg
Copy link
Member

seberg commented Jan 11, 2023

Honestly, now that I see arbitrary sized strings passing length seems weird. For most functions, the long-term thing would be a ufunc replacement. For others, I suppose we have no good story, we could hide something away like arr.dtype.array_funcs.char.mod.

Anyway, don't want to derail trials here, just seems like something better is needed if we really want full support for something like np.char.

{
PyObject *ret_bytes = NULL;
PyTypeObject *scalar_type = Py_TYPE(scalar);
// FIXME: handle bytes too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe numpy bytes only? Normal bytes shouldn't be necessary, Python 3 doesn't do it. NumPy bytes are just weird because they serve the dual purpose of an ascii/latin1 string.

@ngoldbaum
Copy link
Member Author

Agreed that passing a size is weird. I'd like to come back to that, I agree that it might be better to just expose all the functionality in np.char as ufuncs. That just seemed like a bigger project than I wanted to take on right now.

"common_instance called on unequal StringDType instances");
return NULL;
}
Py_INCREF(dtype1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


long size = 0;

if (!PyArg_ParseTupleAndKeywords(args, kwds, "|l:ASCIIDType", kwargs_strs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|l:ASCIIDType", kwargs_strs,
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|l:StringDType", kwargs_strs,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Contributor

@peytondmurray peytondmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a minor change.

@ngoldbaum ngoldbaum merged commit 4648ca5 into numpy:main Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants