-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unneeded methods in Column #14730
Remove unneeded methods in Column #14730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! These kinds of simplifications to the API will definitely help ease the pylibcudf transition!
/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I failed to flush this before merge. Most of the comments are minor but there was one logic mistake in moving from valid_count
to null_count
, will fix.
self, | ||
label, | ||
side: Literal["left", "right"], | ||
kind: Literal["ix", "loc", "getitem", None] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Optional[Literal["ix", "loc", "getitem"]]
?
@@ -1381,7 +1381,9 @@ def _concat( | |||
# improved as the concatenation API is solidified. | |||
|
|||
# Find the first non-null column: | |||
head = next((obj for obj in objs if obj.valid_count), objs[0]) | |||
head = next( | |||
(obj for obj in objs if not obj.null_count != len(obj)), objs[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue I think this logic is wrong. The old code was equivalent to:
(obj for obj in objs if (len(obj) - obj.null_count != 0))
Rearranging the condition:
len(obj) != obj.null_count
So we've gained an extra negation.
(obj for obj in objs if not obj.null_count != len(obj)), objs[0] | |
(obj for obj in objs if obj.null_count != len(obj)), objs[0] |
def has_nulls(self, include_nan: bool = False) -> bool: | ||
return bool(self.null_count != 0) or ( | ||
include_nan and bool(self.nan_count != 0) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def has_nulls(self, include_nan: bool = False) -> bool: | |
return bool(self.null_count != 0) or ( | |
include_nan and bool(self.nan_count != 0) | |
) | |
def has_nulls(self, include_nan: bool = False) -> bool: | |
return self.null_count != 0 or (include_nan and self.nan_count != 0) |
@_cudf_nvtx_annotate | ||
def valid_count(self): | ||
"""Number of non-null values""" | ||
return self._column.valid_count | ||
return len(self) - self._column.null_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not here for API compat (pandas series do not have a valid_count
method), could we deprecate it at the series level too?
The removal of `valid_count` on columns in #14730 had one logic bug, fixed here. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) URL: #14742
Description
valid_count
can be composed ofnull_count
or where checkedhas_nulls
contains_na_entries
is redundant withhas_nulls
searchsorted
Checklist