feat(python): Expr.forget()
and assertions
#16311
Labels
enhancement
New feature or an improvement of an existing feature
Expr.forget()
and assertions
#16311
Description
Assertions by themselves are simple. But fitting them into data processing pipelines is subtle.
TL;DR
Conceptually, I propose to separate:
/dev/null
, if you want to be poetic.Technically, I propose to add several methods to
Expr
:In the future, we may find ergonomic ways to combine those, but now I think we should introduce them separately.
This feature should be unstable/experimental and python-only, until we explore the solution space.
Some context
Data pipelines
There are two scenarios with assertions:
For concreteness, let's take a look at this code:
Checking and passing on:
Checking and forgetting:
Fluent interfaces
Syntactically, data pipelines are presented as fluent interfaces: a sequence of expressions/methods connected with the dot operator.
We could re-imagine one of the examples above as:
(But what do we do with the second example?)
(Of course, it really gets interesting when we go from Series to DataFrames and expressons.)
The proposal
Expr.forget()
This method works for expressions of any type.
The result is also an expression which evaluates to nothing.
For example,
pl.col(pl.Boolean)
evaluates to nothing if the context has no boolean columns.The result of
.forget()
evaluates to nothing unconditionally.Some trivial assertion methods
They return ("pass on") the original value.
Two methods for columns of data type
Boolean
(based on.any()
and.all()
):assert_any
,assert_all
.One method for non-columnar values of data type
Boolean
:assert
.An example
When we have this:
we can write this:
The text was updated successfully, but these errors were encountered: