Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add polars.Expr.list.drop_nans() #16736

Open
FriedLabJHU opened this issue Jun 5, 2024 · 2 comments
Open

Add polars.Expr.list.drop_nans() #16736

FriedLabJHU opened this issue Jun 5, 2024 · 2 comments
Labels
A-dtype-list/array Area: list/array data type enhancement New feature or an improvement of an existing feature

Comments

@FriedLabJHU
Copy link

Description

There is no behavior to remove NaN from Expr.list that functions similarly to Expr.list.drop_nulls().
This would be nice to have, since Expr.drop_nans() already exists.

The only work around appears to involve Expr.list.eval as shown below:

## Expected behavior
df = pl.DataFrame({"a": [[None, 2.0, 4.0], [5.0, 2.0, 1.0]]}, strict=False)
>>>
┌──────────────────┐
│ a                │
│ ---              │
│ list[f64]        │
╞══════════════════╡
│ [null, 2.0, 4.0] │
│ [5.0, 2.0, 1.0]  │
└──────────────────┘

df.with_columns(
    pl.col("a").list.drop_nulls().alias("b")
)
>>>
┌──────────────────┬─────────────────┐
│ ab               │
│ ------             │
│ list[f64]        ┆ list[f64]       │
╞══════════════════╪═════════════════╡
│ [null, 2.0, 4.0] ┆ [2.0, 4.0]      │
│ [5.0, 2.0, 1.0]  ┆ [5.0, 2.0, 1.0] │
└──────────────────┴─────────────────┘
## Drop NaN is not supported
df = pl.DataFrame({"a": [[float("nan"), 2.0, 4.0], [5.0, 2.0, 1.0]]}, strict=False)
>>>
┌─────────────────┐
│ a               │
│ ---             │
│ list[f64]       │
╞═════════════════╡
│ [NaN, 2.0, 4.0] │
│ [5.0, 2.0, 1.0] │
└─────────────────┘

print(df.with_columns(
    pl.col("a").list.drop_nans().alias("b")
))

>>> 
AttributeError: 'ExprListNameSpace' object has no attribute 'drop_nans'

# Solution
df.with_columns(
    pl.col("a").list.eval(pl.element().drop_nans()).alias("b")
)
>>>
┌─────────────────┬─────────────────┐
│ ab               │
│ ------             │
│ list[f64]       ┆ list[f64]       │
╞═════════════════╪═════════════════╡
│ [NaN, 2.0, 4.0] ┆ [2.0, 4.0]      │
│ [5.0, 2.0, 1.0] ┆ [5.0, 2.0, 1.0] │
└─────────────────┴─────────────────┘
@FriedLabJHU FriedLabJHU added the enhancement New feature or an improvement of an existing feature label Jun 5, 2024
@jeroenjanssens
Copy link
Contributor

This also holds for DataFrame and LazyFrame. Here's an overview:

Object .drop_nulls() .drop_nans() .has_nulls() .has_nans()
DataFrame
LazyFrame
Series
Expr
List

@stinodego stinodego added the A-dtype-list/array Area: list/array data type label Jun 5, 2024
@FriedLabJHU
Copy link
Author

These all seem like they would be useful additions and would likely be necessary for a v1.0 release, thank you @jeroenjanssens for this table. I will work on PRs for these in the meantime.

mansanlab added a commit to mansanlab/polars that referenced this issue Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-list/array Area: list/array data type enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants