-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringArray comparisions return BooleanArray #30231
StringArray comparisions return BooleanArray #30231
Conversation
@@ -59,6 +59,37 @@ | |||
rxor, | |||
) | |||
|
|||
# ----------------------------------------------------------------------------- | |||
# constants | |||
ARITHMETIC_BINOPS: Set[str] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel am I reinventing any wheels here? This seems like something useful enough that it would have been done before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think so, no
Thanks! Fixed all those I think. Tangentially related: we'll need a docs section discussing the behavior of |
I was floating the idea in my head (but didn't make an issue or so yet) to create a "user_guide/data_types" directory and move the integer and boolean files there (so an extra nesting in the left sidebar). And then the main "data types" page could have a short intro to extension dtypes, pointing to those pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you are really testing this. I think you actually need to take some of our existing indexing tests, then paramterize over the existing object based strings and then StringArray, then test indexing. I suspect a fair amount of things will break.
This does not yet add indexing, this only changes the comparison operations. Indexing is one of the next steps on the list in #29556 |
ok that's fair. would at like to at least in this PR put into place an xfail test that assures that if we try to do indexing with a BooleanArray it fails |
Indexing seems independent right? BooleanArray comparisons already return BooleanArray, and AFAIK those aren't tested in indexing yet. I'm happy to write those tests / work on the implementation, but it feels like a separate piece of work. |
Yes, let's keep it separate |
they are separate until sometries to assign a BooleanArray to a column and use it. We have nothing for this, certainly do in a separate PR. What I am suggesting is a single test to validate that this raises (for now) |
Do we know what the defined behavior is yet? In #28778 it sounds like we're contemplating either raising or propagating. |
doc/source/user_guide/text.rst
Outdated
@@ -94,7 +94,12 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>` | |||
2. Some string methods, like :meth:`Series.str.decode` are not available | |||
on ``StringArray`` because ``StringArray`` only holds strings, not | |||
bytes. | |||
|
|||
3. In comparision operations, :class:`arrays.StringArray` and ``Series`` backed | |||
by a ``StringArray`` will return an object with :class:`arrays.BooleanDtype`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by a ``StringArray`` will return an object with :class:`arrays.BooleanDtype`, | |
by a ``StringArray`` will return an object with :class:`BooleanDtype`, |
doc/source/user_guide/text.rst
Outdated
3. In comparision operations, :class:`arrays.StringArray` and ``Series`` backed | ||
by a ``StringArray`` will return an object with :class:`arrays.BooleanDtype`, | ||
rather than a ``bool`` dtype object, depending on whether | ||
there are missing values. Missing values in a ``StringArray`` will propagate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we are not always returning a BooleanArray, or am I mis-reading this? this would be very confusing if this is indeed the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is stated confusingly. When the dtype is "string" (not object), comparisons always return BooleanDtype.
Tom, I think you can removed the ", depending on whether there are missing values"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally lgtm
@@ -59,6 +59,37 @@ | |||
rxor, | |||
) | |||
|
|||
# ----------------------------------------------------------------------------- | |||
# constants | |||
ARITHMETIC_BINOPS: Set[str] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as FYI if you create a container with literal values mypy will infer that it is a Set[str]
for you, so not required to specify. Have to specify if the container is empty on instantiation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know!
This should be good to go. |
Thanks! |
xref #29556
cc @jorisvandenbossche