You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem description:
I have encountered multiple times the necessity to compute some cum_* op in a way that would not consider the value of the current row. I have not found an issue addressing this.
Usecase:
For a very simplistic usecase example, assume that I would want to compute the cumulative maximum of a column and then see if the current value of the column is the highest one encountered so far. I could simply filter by pl.col('A') == pl.col('A_cum_max'), but this will not consider cases when the same value has been encountered before.
Proposal:
The most straightforward workaround that I have found is to just use a shifted cumulative maximum/whatever (i.e., cumulative maximum of the previous row in context), which may not be the easiest step computation-wise. Having some parameter such as right_closed=False (or maybe last_closed to not confuse when also using the reverse parameter) within the cum_* ops would save a lot of computation in this manner while (I believe and correct me if I am wrong please) not being so complicated to implement.
See the example for cum_max below taken from documentation along with my expected result:
Description
Problem description:
I have encountered multiple times the necessity to compute some
cum_*
op in a way that would not consider the value of the current row. I have not found an issue addressing this.Usecase:
For a very simplistic usecase example, assume that I would want to compute the cumulative maximum of a column and then see if the current value of the column is the highest one encountered so far. I could simply filter by
pl.col('A') == pl.col('A_cum_max')
, but this will not consider cases when the same value has been encountered before.Proposal:
The most straightforward workaround that I have found is to just use a shifted cumulative maximum/whatever (i.e., cumulative maximum of the previous row in context), which may not be the easiest step computation-wise. Having some parameter such as
right_closed=False
(or maybelast_closed
to not confuse when also using thereverse
parameter) within thecum_*
ops would save a lot of computation in this manner while (I believe and correct me if I am wrong please) not being so complicated to implement.See the example for
cum_max
below taken from documentation along with my expected result:The text was updated successfully, but these errors were encountered: