Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame|Series.squeeze #15244

Merged
merged 6 commits into from
Mar 13, 2024

Conversation

mroeschke
Copy link
Contributor

Description

closes #15177

Also moved axes from Series to IndexedFrame as it is overwritten by DataFrame anyways. Happy to move back if there was a reason not to include it there

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 6, 2024
@mroeschke mroeschke requested a review from a team as a code owner March 6, 2024 22:52
[RangeIndex(start=0, stop=4, step=1)]

"""
return [self.index]
Copy link
Contributor

@bdice bdice Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in SingleColumnFrame rather than IndexedFrame? This method is overridden in DataFrame.axes. I am not sure of our conventions for inheritance here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wasn't sure where to put this. I thought to put this here because squeeze needs axes (or another method to get "number of axes"). Thoughts @shwina?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A SingleColumnFrame doesn't have an index, so you can't implement axes there I think, I would put it on IndexedFrame if you want to share between Series and DataFrame.

Copy link
Contributor

@bdice bdice Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A SingleColumnFrame doesn't have an index

Oh. I apologize. This is probably fine as-is, then.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you additionally apply this patch to remove the workaround we have in dask-cudf for lack of squeeze right now?

diff --git a/python/dask_cudf/dask_cudf/expr/_expr.py b/python/dask_cudf/dask_cudf/expr/_expr.py
index cbe7a71cb7..85fd44b610 100644
--- a/python/dask_cudf/dask_cudf/expr/_expr.py
+++ b/python/dask_cudf/dask_cudf/expr/_expr.py
@@ -25,21 +25,6 @@ CumulativeBlockwise._args = PatchCumulativeBlockwise._args
 CumulativeBlockwise._kwargs = PatchCumulativeBlockwise._kwargs
 
 
-# This can be removed if squeeze support is added to cudf,
-# or if squeeze is removed from the dask-expr logic.
-# See: https://github.com/rapidsai/cudf/issues/15177
-def _takelast(a, skipna=True):
-    if not len(a):
-        return a
-    if skipna:
-        a = a.bfill()
-    # Cannot use `squeeze` with cudf
-    return a.tail(n=1).iloc[0]
-
-
-TakeLast.operation = staticmethod(_takelast)
-
-
 # This patch accounts for differences between
 # numpy and cupy behavior. It may make sense
 # to move this logic upstream.

[RangeIndex(start=0, stop=4, step=1)]

"""
return [self.index]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A SingleColumnFrame doesn't have an index, so you can't implement axes there I think, I would put it on IndexedFrame if you want to share between Series and DataFrame.

@mroeschke mroeschke requested a review from a team as a code owner March 13, 2024 18:31
@mroeschke
Copy link
Contributor Author

Can you additionally apply this patch to remove the workaround we have in dask-cudf for lack of squeeze right now?

Sure thing, 3c38f6b

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 85f41df into rapidsai:branch-24.04 Mar 13, 2024
75 checks passed
@mroeschke mroeschke deleted the enh/squeeze branch March 13, 2024 23:05
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA] Implement squeeze API
4 participants