New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equality check between numpy string array and string element fails #6312
Comments
Thanks for the report. This is a bug, the code here: numba/numba/cpython/unicode.py Lines 450 to 470 in 159510a
needs to handle arrays of unicode char seq. |
I've started working on an implementation for comparing unicode arrays. Currently numba supports comparing unicode arrays when the UnicodeCharSeq elements have the same length, but not with different size (eg @register_jitable
def _cmp_unicode(a, b):
# the str() is for UnicodeCharSeq, it's a nop else
a = str(a)
b = str(b)
if len(a) != len(b):
return False
return _cmp_region(a, 0, b, 0, len(a)) == 0
@register_jitable
def _cmp_unicode_with_array(arr, val):
rv = np.zeros_like(arr, dtype=np.bool_)
for i in range(arr.size):
rv.flat[i] = _cmp_unicode(arr.flat[i], val)
return rv
@overload(operator.eq)
def unicode_eq(a, b):
if not (a.is_internal and b.is_internal):
return
accept = (types.UnicodeType, types.StringLiteral, types.UnicodeCharSeq)
a_unicode = isinstance(a, accept)
b_unicode = isinstance(b, accept)
a_unicode_array = isinstance(a, types.Array) and isinstance(a.dtype, accept)
b_unicode_array = isinstance(b, types.Array) and isinstance(b.dtype, accept)
if a_unicode and b_unicode:
return _cmp_unicode
elif a_unicode_array and b_unicode_array:
def eq_impl(a, b):
if a.size != b.size:
raise ValueError("Cannot compare arrays of different size")
rv = np.zeros_like(a, dtype=np.bool_)
for i in range(a.size):
rv.flat[i] = _cmp_unicode(a.flat[i], b.flat[i])
return rv
return eq_impl
elif a_unicode_array and b_unicode:
def eq_impl(a, b):
return _cmp_unicode_with_array(a, b)
return eq_impl
elif a_unicode and b_unicode_array:
def eq_impl(a, b):
return _cmp_unicode_with_array(b, a)
return eq_impl
elif a_unicode ^ b_unicode:
# one of the things is unicode, everything compares False
def eq_impl(a, b):
return False
return eq_impl This works nicely for comparing two arrays (which may have a different unicode length): print(numba.njit(lambda a, b: a == b)(np.array(['a','a'], dtype='<U4'), np.array(['a', 'b'])))
# [True, False] There are however a number of issues that I do not understand:
numba.njit(lambda a, b: a == b)(np.array(['a', 'b']), 'a')
# raises exception:
# numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
# unsupported type for input operand: unicode_type
print(numba.njit(lambda a, b: a == b)(np.array(['a'], dtype='<U4'), np.array(['a', 'b'])))
# [True, False]
Any ideas on how to improve on this? |
Hi. I encountered a situation where comparing a Numpy array of strings with a single string doesn't perform element wise comparison and doesn't match the Numpy results. Here is a reproducer:
Running this example produces the output:
It seems like numba is only performing the comparison between the entire array and the single element and is not doing an element wise comparison.
The text was updated successfully, but these errors were encountered: