BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

ngoldbaum · 2024-03-14T18:42:08Z

Without the change to stringdtype_is_known_scalar_type, the second iteration of the for loop in the test would fail. Datetime already fails, so this just makes timedelta consistent with that.

I discovered this running the pandas test suite against my pandas string dtype built on StringDType.

@seberg is there maybe a better way to write is_known_scalar_type than using a bunch of if statements?

seberg · 2024-03-14T20:07:12Z

You could even use a Python set in principle. But then you have to build it on the first call or during module init. For best speed a bloom filter :). But no, I don't think we have a super short and specific pattern here.

mhvk · 2024-03-14T20:14:42Z

is there maybe a better way to write is_known_scalar_type than using a bunch of if statements?

no real suggestion, but the top of the function seems a bit duplicated with

numpy/numpy/_core/src/common/get_attr_string.h

Lines 7 to 37 in dda030f

    
           static inline npy_bool 
        
           _is_basic_python_type(PyTypeObject *tp) 
        
           { 
        
               return ( 
        
                   /* Basic number types */ 
        
                   tp == &PyBool_Type || 
        
                   tp == &PyLong_Type || 
        
                   tp == &PyFloat_Type || 
        
                   tp == &PyComplex_Type || 
        
                   /* Basic sequence types */ 
        
                   tp == &PyList_Type || 
        
                   tp == &PyTuple_Type || 
        
                   tp == &PyDict_Type || 
        
                   tp == &PySet_Type || 
        
                   tp == &PyFrozenSet_Type || 
        
                   tp == &PyUnicode_Type || 
        
                   tp == &PyBytes_Type || 
        
                   /* other builtins */ 
        
                   tp == &PySlice_Type || 
        
                   tp == Py_TYPE(Py_None) || 
        
                   tp == Py_TYPE(Py_Ellipsis) || 
        
                   tp == Py_TYPE(Py_NotImplemented) || 
        
                   /* TODO: ndarray, but we can't see PyArray_Type here */ 
        
                   /* sentinel to swallow trailing || */ 
        
                   NPY_FALSE 
        
               ); 
        
           }

Given short-circuiting, at least writing it as a series of || may be more readable...

p.s. Really getting off-topic, but also somewhat duplicated with

numpy/numpy/_core/src/multiarray/dtypemeta.c

Lines 830 to 860 in dda030f

    
           NPY_NO_EXPORT int 
        
           python_builtins_are_known_scalar_types( 
        
                   PyArray_DTypeMeta *NPY_UNUSED(cls), PyTypeObject *pytype) 
        
           { 
        
               /* 
        
                * Always accept the common Python types, this ensures that we do not 
        
                * convert pyfloat->float64->integers. Subclasses are hopefully rejected 
        
                * as being discovered. 
        
                * This is necessary only for python scalar classes which we discover 
        
                * as valid DTypes. 
        
                */ 
        
               if (pytype == &PyFloat_Type) { 
        
                   return 1; 
        
               } 
        
               if (pytype == &PyLong_Type) { 
        
                   return 1; 
        
               } 
        
               if (pytype == &PyBool_Type) { 
        
                   return 1; 
        
               } 
        
               if (pytype == &PyComplex_Type) { 
        
                   return 1; 
        
               } 
        
               if (pytype == &PyUnicode_Type) { 
        
                   return 1; 
        
               } 
        
               if (pytype == &PyBytes_Type) { 
        
                   return 1; 
        
               } 
        
               return 0; 
        
           }

seberg · 2024-03-14T20:16:43Z

Good point, || does seem a bit more concise and nicer.

ngoldbaum · 2024-03-14T20:18:11Z

Good call about all these similarities. The stringdtype code predates adding this to numpy. There was definitely some copy/pasting that can be reverted now that the code lives in numpy.

mhvk

Code changes itself look good! (And maybe best to do any refactoring elsewhere!)

ngoldbaum · 2024-03-14T20:30:16Z

(And maybe best to do any refactoring elsewhere!)

Maybe.. but I went ahead and did it before I saw this comment 🙃. Was simple enough though...

mhvk

That python_builtins_are_known_scalar_types name is not brilliant (is_python_version_of_scalar_type might be better), but that's really out of scope here... So, approving...

mhvk · 2024-03-14T21:01:56Z

No idea why the sanitizer job is filling its log with AddressSanitizer:DEADLYSIGNAL - seems unrelated...

ngoldbaum · 2024-03-14T21:03:40Z

No idea why the sanitizer job is filling its log with AddressSanitizer:DEADLYSIGNAL - seems unrelated...

Yeah, that's #25875 which seems to have gotten particularly annoying today for some reason. I disabled that action and it shouldn't show up in future CI logs until we figure out what's wrong and re-enable it.

mhvk · 2024-03-14T21:07:45Z

Indeed, seems good to disable - though hopefully the alternatives work - it caught few bugs in my FFT gufunc implementations!

seberg · 2024-03-15T06:55:28Z

Thanks, lets put this in. I just remembered we actually do have a faster version for finding the builtin scalar types I think (it sorts them and does a binary search for finding them when you do np.dtype(np.float64), which we could use here).

But let's fix the bug for now.

BUG: raise error trying to coerce timedelta64('NaT')

5257994

github-actions bot added the 00 - Bug label Mar 14, 2024

ngoldbaum changed the title ~~BUG: raise error trying to coerce timedelta64('NaT')~~ BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType Mar 14, 2024

ngoldbaum added the 09 - Backport-Candidate PRs tagged should be backported label Mar 14, 2024

seberg approved these changes Mar 14, 2024

View reviewed changes

mhvk approved these changes Mar 14, 2024

View reviewed changes

MNT: eliminate branching and duplication in is_known_scalar_type

6e11618

mhvk approved these changes Mar 14, 2024

View reviewed changes

seberg merged commit 99511e0 into numpy:main Mar 15, 2024
62 of 63 checks passed

charris mentioned this pull request Mar 16, 2024

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26049

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

ngoldbaum commented Mar 14, 2024 •

edited

Loading

seberg commented Mar 14, 2024

mhvk commented Mar 14, 2024

seberg commented Mar 14, 2024

ngoldbaum commented Mar 14, 2024

mhvk left a comment

ngoldbaum commented Mar 14, 2024

mhvk left a comment

mhvk commented Mar 14, 2024

ngoldbaum commented Mar 14, 2024

mhvk commented Mar 14, 2024

seberg commented Mar 15, 2024

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

BUG: raise error trying to coerce object arrays containing timedelta64('NaT') to StringDType #26024

Conversation

ngoldbaum commented Mar 14, 2024 • edited Loading

seberg commented Mar 14, 2024

mhvk commented Mar 14, 2024

seberg commented Mar 14, 2024

ngoldbaum commented Mar 14, 2024

mhvk left a comment

Choose a reason for hiding this comment

ngoldbaum commented Mar 14, 2024

mhvk left a comment

Choose a reason for hiding this comment

mhvk commented Mar 14, 2024

ngoldbaum commented Mar 14, 2024

mhvk commented Mar 14, 2024

seberg commented Mar 15, 2024

ngoldbaum commented Mar 14, 2024 •

edited

Loading