-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: Add support for numpy 2's string dtype #58394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ndarray[uint8_int64_object_t] out, | ||
| ndarray[object] arr, | ||
| ndarray out, | ||
| ndarray arr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make iteration slower.
Look into numpy C API iteration functions.
(e.g. look at Validator._validate)
If this gives perf improvement, maybe can be split out.
|
|
||
| @classmethod | ||
| def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = False): | ||
| na_mask, any_na = libmissing.isnaobj(np.array(scalars, dtype=object), check_for_any_na=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe can use _lib.convert_nans_to_NA instead?
Would get rid of diff in missing.pyx.
| return super()._str_endswith(pat, na) | ||
| pat = np.asarray(pat, dtype=get_numpy_string_dtype_instance()) | ||
| result = np.strings.endswith(self._ndarray, pat) | ||
| return BooleanArray(result, isna(self._ndarray)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache na mask?
|
|
||
| if ret.dtype == object and ret_dtype is not None: | ||
| # cast from object back to StringDType | ||
| return ret.astype(ret_dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In map_infer_mask there is a convert=True argument, that calls ``maybe_convert_objects on it.
Alternately, we could look at setting the dtype argument to an instance of StringDType, and hope that PyArray_SETITEM will work on it?
| return dtype == object or dtype.kind in "SUT" | ||
|
|
||
| def get_numpy_string_dtype_instance( | ||
| na_object=libmissing.NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is na_object parameter used?
| ) | ||
| raise TypeError(msg) | ||
| dtype = self._inferred_dtype | ||
| if dtype not in allowed_types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are numpy dtypes hashable?
Might just want to add to allowed_types
| return np.dtype("object") | ||
|
|
||
| return np_find_common_type(*types) | ||
| return np.result_type(*types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: This (other changes in this file) are causing test failures.
| if self.values.dtype.kind == 'T': | ||
| return NumpyStringArray(self.values) | ||
| else: | ||
| return NumpyExtensionArray(self.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because NumpyExtensionArray inherits from ObjectStringArrayMixin, so unless I make this change there's no way to call the string method overrides in NumpyStringArray.
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
|
Closing in favor of #58394. |
|
Oh wait I can't, I guess the bot will get to it if Thomas doesn't :) |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.Just submitting this for CI purposes, nothing to review yet.