Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for right-open cumulative operations #16272

Open
t-ded opened this issue May 16, 2024 · 0 comments
Open

Allow for right-open cumulative operations #16272

t-ded opened this issue May 16, 2024 · 0 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@t-ded
Copy link

t-ded commented May 16, 2024

Description

Problem description:
I have encountered multiple times the necessity to compute some cum_* op in a way that would not consider the value of the current row. I have not found an issue addressing this.

Usecase:
For a very simplistic usecase example, assume that I would want to compute the cumulative maximum of a column and then see if the current value of the column is the highest one encountered so far. I could simply filter by pl.col('A') == pl.col('A_cum_max'), but this will not consider cases when the same value has been encountered before.

Proposal:
The most straightforward workaround that I have found is to just use a shifted cumulative maximum/whatever (i.e., cumulative maximum of the previous row in context), which may not be the easiest step computation-wise. Having some parameter such as right_closed=False (or maybe last_closed to not confuse when also using the reverse parameter) within the cum_* ops would save a lot of computation in this manner while (I believe and correct me if I am wrong please) not being so complicated to implement.
See the example for cum_max below taken from documentation along with my expected result:


df.with_columns(
    pl.col("a").cum_max().alias("cum_max"),
    pl.col("a").cum_max(right_closed=False).alias("cum_max_without_current_element"),
)
shape: (4, 3)
┌─────┬─────────┬────────────────────┐
│ a   ┆ cum_max ┆ cum_max_right_open │
│ --- ┆ ---     ┆ ---                │
│ i64 ┆ i64     ┆ i64                │
╞═════╪═════════╪════════════════════╡
│ 1   ┆ 1       ┆ null               │
│ 2   ┆ 2       ┆ 1                  │
│ 3   ┆ 3       ┆ 2                  │
│ 4   ┆ 4       ┆ 3                  │
└─────┴─────────┴────────────────────┘
@t-ded t-ded added the enhancement New feature or an improvement of an existing feature label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant