Remove unneeded methods in Column #14730

mroeschke · 2024-01-10T00:33:03Z

Description

valid_count can be composed of null_count or where checked has_nulls
contains_na_entries is redundant with has_nulls
Better typing in searchsorted

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…/clean_methods2

vyasr

Thanks! These kinds of simplifications to the API will definitely help ease the pylibcudf transition!

mroeschke · 2024-01-10T21:11:25Z

/merge

wence-

Sorry, I failed to flush this before merge. Most of the comments are minor but there was one logic mistake in moving from valid_count to null_count, will fix.

wence- · 2024-01-10T11:36:41Z

python/cudf/cudf/core/_base_index.py

+        self,
+        label,
+        side: Literal["left", "right"],
+        kind: Literal["ix", "loc", "getitem", None] = None,


nit Optional[Literal["ix", "loc", "getitem"]] ?

wence- · 2024-01-10T11:42:52Z

python/cudf/cudf/core/column/categorical.py

@@ -1381,7 +1381,9 @@ def _concat(
        # improved as the concatenation API is solidified.

        # Find the first non-null column:
-        head = next((obj for obj in objs if obj.valid_count), objs[0])
+        head = next(
+            (obj for obj in objs if not obj.null_count != len(obj)), objs[0]


issue I think this logic is wrong. The old code was equivalent to:

(obj for obj in objs if (len(obj) - obj.null_count != 0))

Rearranging the condition:

len(obj) != obj.null_count

So we've gained an extra negation.

Suggested change

(obj for obj in objs if not obj.null_count != len(obj)), objs[0]

(obj for obj in objs if obj.null_count != len(obj)), objs[0]

wence- · 2024-01-10T11:45:30Z

python/cudf/cudf/core/column/numerical.py

+    def has_nulls(self, include_nan: bool = False) -> bool:
        return bool(self.null_count != 0) or (
            include_nan and bool(self.nan_count != 0)
        )


Suggested change

def has_nulls(self, include_nan: bool = False) -> bool:

return bool(self.null_count != 0) or (

include_nan and bool(self.nan_count != 0)

)

def has_nulls(self, include_nan: bool = False) -> bool:

return self.null_count != 0 or (include_nan and self.nan_count != 0)

wence- · 2024-01-10T11:49:34Z

python/cudf/cudf/core/series.py

    @_cudf_nvtx_annotate
    def valid_count(self):
        """Number of non-null values"""
-        return self._column.valid_count
+        return len(self) - self._column.null_count


This is not here for API compat (pandas series do not have a valid_count method), could we deprecate it at the series level too?

The removal of `valid_count` on columns in #14730 had one logic bug, fixed here. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) URL: #14742

mroeschke added 3 commits January 9, 2024 15:54

Use has_nulls instead of contains_na_entries

2fe3d48

Remove valid_count usage in favor of has_nulls, null_count

00b4c6e

Use literal in searchsorted typing

b56542a

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 10, 2024

mroeschke requested a review from a team as a code owner January 10, 2024 00:33

mroeschke requested review from wence- and charlesbluca January 10, 2024 00:33

mroeschke added 2 commits January 9, 2024 17:32

Merge remote-tracking branch 'upstream/branch-24.02' into ref/columns…

042e534

…/clean_methods2

Fix old valid_count usage

14585f9

vyasr approved these changes Jan 10, 2024

View reviewed changes

rapids-bot bot merged commit 3f19d04 into rapidsai:branch-24.02 Jan 10, 2024
68 checks passed

mroeschke deleted the ref/columns/clean_methods2 branch January 10, 2024 21:11

wence- reviewed Jan 11, 2024

View reviewed changes

wence- added a commit to wence-/cudf that referenced this pull request Jan 11, 2024

Fix logic bug introduced in rapidsai#14730

d8c71f7

wence- mentioned this pull request Jan 11, 2024

Fix logic bug introduced in #14730 #14742

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unneeded methods in Column #14730

Remove unneeded methods in Column #14730

mroeschke commented Jan 10, 2024

vyasr left a comment

mroeschke commented Jan 10, 2024

wence- left a comment

wence- Jan 10, 2024

wence- Jan 10, 2024

wence- Jan 10, 2024

wence- Jan 10, 2024

	(obj for obj in objs if not obj.null_count != len(obj)), objs[0]
	(obj for obj in objs if obj.null_count != len(obj)), objs[0]

Remove unneeded methods in Column #14730

Remove unneeded methods in Column #14730

Conversation

mroeschke commented Jan 10, 2024

Description

Checklist

vyasr left a comment

Choose a reason for hiding this comment

mroeschke commented Jan 10, 2024

wence- left a comment

Choose a reason for hiding this comment

wence- Jan 10, 2024

Choose a reason for hiding this comment

wence- Jan 10, 2024

Choose a reason for hiding this comment

wence- Jan 10, 2024

Choose a reason for hiding this comment

wence- Jan 10, 2024

Choose a reason for hiding this comment