You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a functional language with immutable data, memory management is important. The current implementation utilises polars's eager mode and computes new dataframes for every function. Because the dataframes are represented as a ResourceArc, they are only dropped from memory when the GC runs. This can be pretty heavy on memory, to say the least. The most efficient approach would be to treat dataframes as lazy by default with 'peeking' for inspect. In R, for example, function arguments are only evaluated when they are needed to show output.
An additional benefit to lazy by default is the opportunity to optimise queries. Why evaluate every function call when you can build up a query that may be executed in a more efficient way all together?
Polars has polars_lazy which permits exactly this. Making this shift will then permit the use of lazy evaluation for other backends -- esp. Datafusion/Ballista and Ecto.
For Explorer, we'll need to do a bit of exploration (pun absolutely intended) for how we can achieve this while maintaining the flexibility of pluggable backends. And when looking to a pure Elixir backend we should consider whether it's unnecessarily onerous compared to the benefits.
I'd really love ideas and feedback for making Explorer lazy by default. Is there a good peeking mechanism in other libraries? For example, something I'm going to be exploring is how tibbles in R minimise computation for print and head.
The text was updated successfully, but these errors were encountered:
The cool thing about a laziness is that the backend can actually be eager if it believes it is more efficient, although I doubt this is the case given immutability. For peeking, it should be very close to setting a limit and then executing the query, right? But it is probably important to keep it as its own callback, in case other backends may want to do optimizations (for example, maybe Ballista gets only local data when peeking?).
In a functional language with immutable data, memory management is important. The current implementation utilises
polars
's eager mode and computes new dataframes for every function. Because the dataframes are represented as a ResourceArc, they are only dropped from memory when the GC runs. This can be pretty heavy on memory, to say the least. The most efficient approach would be to treat dataframes as lazy by default with 'peeking' for inspect. In R, for example, function arguments are only evaluated when they are needed to show output.An additional benefit to lazy by default is the opportunity to optimise queries. Why evaluate every function call when you can build up a query that may be executed in a more efficient way all together?
Polars has
polars_lazy
which permits exactly this. Making this shift will then permit the use of lazy evaluation for other backends -- esp. Datafusion/Ballista and Ecto.For Explorer, we'll need to do a bit of exploration (pun absolutely intended) for how we can achieve this while maintaining the flexibility of pluggable backends. And when looking to a pure Elixir backend we should consider whether it's unnecessarily onerous compared to the benefits.
I'd really love ideas and feedback for making Explorer lazy by default. Is there a good peeking mechanism in other libraries? For example, something I'm going to be exploring is how
tibble
s in R minimise computation forprint
andhead
.The text was updated successfully, but these errors were encountered: