Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix incorrect strcmp implementation for unequal length strings #25522

Merged
merged 1 commit into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions numpy/_core/src/umath/string_buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -290,21 +290,15 @@ struct Buffer {
tmp2++;
}
while (tmp1.buf < tmp1.after) {
if (*tmp1 < 0) {
return -1;
}
if (*tmp1 > 0) {
if (*tmp1) {
Copy link
Member

@seberg seberg Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, wouldn't this make:

In [1]: "a" == "a\0"
Out[1]: False

Return True? For NumPy strings, this should be handled already earlier by the length, and for the new strings embedded 0 characters should be supported.

EDIT: Or I don't know maybe we need strcmp_padded for legacy strings...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm learning new things about numpy strings every day. For some reason I thought that this was True until now, but I'm clearly wrong:

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.26.3'

In [3]: np.str_("a") == np.str_("a\0")
Out[3]: False

The scalar repr doesn't include the trailing nulls, which I guess makes sense but definitely added to my confusion here:

In [4]: np.str_("a\0")
Out[4]: 'a'

In [5]: repr(np.str_("a\0")) == repr(np.str_("a"))
Out[5]: True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I guess the above is only true for scalars, for array elements trailing nulls are ignored in Numpy 1.26.3:

In [12]: np.array([np.str_("a\0")]) == np.array([np.str_("a")])
Out[12]: array([ True])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just shows that scalars are broken... they should strip nulls on construction to not inherit the unicode behavior, I guess.

return 1;
}
tmp1++;
}
while (tmp2.buf < tmp2.after) {
if (*tmp2 < 0) {
if (*tmp2) {
return -1;
}
if (*tmp2 > 0) {
return 1;
}
tmp2++;
}
return 0;
Expand Down
37 changes: 22 additions & 15 deletions numpy/_core/tests/test_defchararray.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,28 +157,34 @@ def test_it(self):

class TestComparisons:
def setup_method(self):
self.A = np.array([['abc', '123'],
['789', 'xyz']]).view(np.char.chararray)
self.B = np.array([['efg', '123 '],
['051', 'tuv']]).view(np.char.chararray)
self.A = np.array([['abc', 'abcc', '123'],
['789', 'abc', 'xyz']]).view(np.char.chararray)
self.B = np.array([['efg', 'efg', '123 '],
['051', 'efgg', 'tuv']]).view(np.char.chararray)

def test_not_equal(self):
assert_array_equal((self.A != self.B), [[True, False], [True, True]])
assert_array_equal((self.A != self.B),
[[True, True, False], [True, True, True]])

def test_equal(self):
assert_array_equal((self.A == self.B), [[False, True], [False, False]])
assert_array_equal((self.A == self.B),
[[False, False, True], [False, False, False]])

def test_greater_equal(self):
assert_array_equal((self.A >= self.B), [[False, True], [True, True]])
assert_array_equal((self.A >= self.B),
[[False, False, True], [True, False, True]])

def test_less_equal(self):
assert_array_equal((self.A <= self.B), [[True, True], [False, False]])
assert_array_equal((self.A <= self.B),
[[True, True, True], [False, True, False]])

def test_greater(self):
assert_array_equal((self.A > self.B), [[False, False], [True, True]])
assert_array_equal((self.A > self.B),
[[False, False, False], [True, False, True]])

def test_less(self):
assert_array_equal((self.A < self.B), [[True, False], [False, False]])
assert_array_equal((self.A < self.B),
[[True, True, False], [False, True, False]])

def test_type(self):
out1 = np.char.equal(self.A, self.B)
Expand All @@ -191,17 +197,18 @@ class TestComparisonsMixed1(TestComparisons):

def setup_method(self):
TestComparisons.setup_method(self)
self.B = np.array([['efg', '123 '],
['051', 'tuv']], np.str_).view(np.char.chararray)
self.B = np.array(
[['efg', 'efg', '123 '],
['051', 'efgg', 'tuv']], np.str_).view(np.char.chararray)

class TestComparisonsMixed2(TestComparisons):
"""Ticket #1276"""

def setup_method(self):
TestComparisons.setup_method(self)
self.A = np.array([['abc', '123'],
['789', 'xyz']], np.str_) \
.view(np.char.chararray)
self.A = np.array(
[['abc', 'abcc', '123'],
['789', 'abc', 'xyz']], np.str_).view(np.char.chararray)

class TestInformation:
def setup_method(self):
Expand Down