Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

ngoldbaum
Copy link
Member

@ngoldbaum ngoldbaum commented Mar 14, 2024

Without the change to stringdtype_is_known_scalar_type, the second iteration of the for loop in the test would fail. Datetime already fails, so this just makes timedelta consistent with that.

I discovered this running the pandas test suite against my pandas string dtype built on StringDType.

@seberg is there maybe a better way to write is_known_scalar_type than using a bunch of if statements?

@ngoldbaum ngoldbaum changed the title BUG: raise error trying to coerce timedelta64('NaT') BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType Mar 14, 2024
@ngoldbaum ngoldbaum added the 09 - Backport-Candidate PRs tagged should be backported label Mar 14, 2024
@seberg
Copy link
Member

seberg commented Mar 14, 2024

You could even use a Python set in principle. But then you have to build it on the first call or during module init. For best speed a bloom filter :). But no, I don't think we have a super short and specific pattern here.

@mhvk
Copy link
Contributor

mhvk commented Mar 14, 2024

is there maybe a better way to write is_known_scalar_type than using a bunch of if statements?

no real suggestion, but the top of the function seems a bit duplicated with

static inline npy_bool
_is_basic_python_type(PyTypeObject *tp)
{
return (
/* Basic number types */
tp == &PyBool_Type ||
tp == &PyLong_Type ||
tp == &PyFloat_Type ||
tp == &PyComplex_Type ||
/* Basic sequence types */
tp == &PyList_Type ||
tp == &PyTuple_Type ||
tp == &PyDict_Type ||
tp == &PySet_Type ||
tp == &PyFrozenSet_Type ||
tp == &PyUnicode_Type ||
tp == &PyBytes_Type ||
/* other builtins */
tp == &PySlice_Type ||
tp == Py_TYPE(Py_None) ||
tp == Py_TYPE(Py_Ellipsis) ||
tp == Py_TYPE(Py_NotImplemented) ||
/* TODO: ndarray, but we can't see PyArray_Type here */
/* sentinel to swallow trailing || */
NPY_FALSE
);
}

Given short-circuiting, at least writing it as a series of || may be more readable...

p.s. Really getting off-topic, but also somewhat duplicated with

NPY_NO_EXPORT int
python_builtins_are_known_scalar_types(
PyArray_DTypeMeta *NPY_UNUSED(cls), PyTypeObject *pytype)
{
/*
* Always accept the common Python types, this ensures that we do not
* convert pyfloat->float64->integers. Subclasses are hopefully rejected
* as being discovered.
* This is necessary only for python scalar classes which we discover
* as valid DTypes.
*/
if (pytype == &PyFloat_Type) {
return 1;
}
if (pytype == &PyLong_Type) {
return 1;
}
if (pytype == &PyBool_Type) {
return 1;
}
if (pytype == &PyComplex_Type) {
return 1;
}
if (pytype == &PyUnicode_Type) {
return 1;
}
if (pytype == &PyBytes_Type) {
return 1;
}
return 0;
}

@seberg
Copy link
Member

seberg commented Mar 14, 2024

Good point, || does seem a bit more concise and nicer.

@ngoldbaum
Copy link
Member Author

Good call about all these similarities. The stringdtype code predates adding this to numpy. There was definitely some copy/pasting that can be reverted now that the code lives in numpy.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes itself look good! (And maybe best to do any refactoring elsewhere!)

@ngoldbaum
Copy link
Member Author

(And maybe best to do any refactoring elsewhere!)

Maybe.. but I went ahead and did it before I saw this comment 🙃. Was simple enough though...

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That python_builtins_are_known_scalar_types name is not brilliant (is_python_version_of_scalar_type might be better), but that's really out of scope here... So, approving...

@mhvk
Copy link
Contributor

mhvk commented Mar 14, 2024

No idea why the sanitizer job is filling its log with AddressSanitizer:DEADLYSIGNAL - seems unrelated...

@ngoldbaum
Copy link
Member Author

No idea why the sanitizer job is filling its log with AddressSanitizer:DEADLYSIGNAL - seems unrelated...

Yeah, that's #25875 which seems to have gotten particularly annoying today for some reason. I disabled that action and it shouldn't show up in future CI logs until we figure out what's wrong and re-enable it.

@mhvk
Copy link
Contributor

mhvk commented Mar 14, 2024

Indeed, seems good to disable - though hopefully the alternatives work - it caught few bugs in my FFT gufunc implementations!

@seberg
Copy link
Member

seberg commented Mar 15, 2024

Thanks, lets put this in. I just remembered we actually do have a faster version for finding the builtin scalar types I think (it sorts them and does a binary search for finding them when you do np.dtype(np.float64), which we could use here).

But let's fix the bug for now.

@seberg seberg merged commit 99511e0 into numpy:main Mar 15, 2024
62 of 63 checks passed
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants