Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneeded methods in Column #14730

Merged

Conversation

mroeschke
Copy link
Contributor

Description

  • valid_count can be composed of null_count or where checked has_nulls
  • contains_na_entries is redundant with has_nulls
  • Better typing in searchsorted

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 10, 2024
@mroeschke mroeschke requested a review from a team as a code owner January 10, 2024 00:33
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! These kinds of simplifications to the API will definitely help ease the pylibcudf transition!

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 3f19d04 into rapidsai:branch-24.02 Jan 10, 2024
68 checks passed
@mroeschke mroeschke deleted the ref/columns/clean_methods2 branch January 10, 2024 21:11
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I failed to flush this before merge. Most of the comments are minor but there was one logic mistake in moving from valid_count to null_count, will fix.

self,
label,
side: Literal["left", "right"],
kind: Literal["ix", "loc", "getitem", None] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit Optional[Literal["ix", "loc", "getitem"]] ?

@@ -1381,7 +1381,9 @@ def _concat(
# improved as the concatenation API is solidified.

# Find the first non-null column:
head = next((obj for obj in objs if obj.valid_count), objs[0])
head = next(
(obj for obj in objs if not obj.null_count != len(obj)), objs[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue I think this logic is wrong. The old code was equivalent to:

(obj for obj in objs if (len(obj) - obj.null_count != 0))

Rearranging the condition:

len(obj) != obj.null_count

So we've gained an extra negation.

Suggested change
(obj for obj in objs if not obj.null_count != len(obj)), objs[0]
(obj for obj in objs if obj.null_count != len(obj)), objs[0]

Comment on lines +143 to 146
def has_nulls(self, include_nan: bool = False) -> bool:
return bool(self.null_count != 0) or (
include_nan and bool(self.nan_count != 0)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def has_nulls(self, include_nan: bool = False) -> bool:
return bool(self.null_count != 0) or (
include_nan and bool(self.nan_count != 0)
)
def has_nulls(self, include_nan: bool = False) -> bool:
return self.null_count != 0 or (include_nan and self.nan_count != 0)

Comment on lines 1710 to +1713
@_cudf_nvtx_annotate
def valid_count(self):
"""Number of non-null values"""
return self._column.valid_count
return len(self) - self._column.null_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not here for API compat (pandas series do not have a valid_count method), could we deprecate it at the series level too?

wence- added a commit to wence-/cudf that referenced this pull request Jan 11, 2024
@wence- wence- mentioned this pull request Jan 11, 2024
3 tasks
rapids-bot bot pushed a commit that referenced this pull request Jan 12, 2024
The removal of `valid_count` on columns in #14730 had one logic bug, fixed here.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants