New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: what should we do about string ufuncs that return lists? #26176
Comments
To me it doesn't make sense to create ufuncs for these. Even if we were to come up with a smart way to return arrays with empty strings wherever necessary, that would be incredibly memory-inefficient. Adding those to |
Agreed that really we do not have good tools for this - it needs ragged arrays of some form (which, if sharing the allocator, might be able to indicate the strings themselves really efficiently, just by adjusting offsets and lengths). My preference would to not implement these in |
One thing we can do that does make sense for many applications is a padded split. We would need to implement a two-pass algorithm that finds the locations of all the splits, finds the entry with the most splits, and uses that to compute the output shape. |
I guess this is where @lysnikolaou's comment comes in:
But maybe it is still worth doing? |
That said, this usage wouldn't find a padded split particularly useful: For comparison |
We talked a little bit about this in #25993 but didn't resolve things.
Currently, the following functions don't have implementations in
np.strings
:numpy/numpy/_core/strings.py
Lines 66 to 67 in ef6299c
We should figure out the appropriate API for adding fast ufuncs for these. I'm actually not sure if it's possible to write ufuncs for these given that they have an unknown number of outputs for each array element, you really need ragged arrays for this to make sense. Although that said, we could also return object arrays of python lists or just leave them with the
_vec_string
implementation.The text was updated successfully, but these errors were encountered: