Implement lazy by default #54

cigrainger · 2021-09-06T23:20:35Z

In a functional language with immutable data, memory management is important. The current implementation utilises polars's eager mode and computes new dataframes for every function. Because the dataframes are represented as a ResourceArc, they are only dropped from memory when the GC runs. This can be pretty heavy on memory, to say the least. The most efficient approach would be to treat dataframes as lazy by default with 'peeking' for inspect. In R, for example, function arguments are only evaluated when they are needed to show output.

An additional benefit to lazy by default is the opportunity to optimise queries. Why evaluate every function call when you can build up a query that may be executed in a more efficient way all together?

Polars has polars_lazy which permits exactly this. Making this shift will then permit the use of lazy evaluation for other backends -- esp. Datafusion/Ballista and Ecto.

For Explorer, we'll need to do a bit of exploration (pun absolutely intended) for how we can achieve this while maintaining the flexibility of pluggable backends. And when looking to a pure Elixir backend we should consider whether it's unnecessarily onerous compared to the benefits.

I'd really love ideas and feedback for making Explorer lazy by default. Is there a good peeking mechanism in other libraries? For example, something I'm going to be exploring is how tibbles in R minimise computation for print and head.

The text was updated successfully, but these errors were encountered:

josevalim · 2021-09-07T05:42:40Z

The cool thing about a laziness is that the backend can actually be eager if it believes it is more efficient, although I doubt this is the case given immutability. For peeking, it should be very close to setting a limit and then executing the query, right? But it is probably important to keep it as its own callback, in case other backends may want to do optimizations (for example, maybe Ballista gets only local data when peeking?).

josevalim · 2022-04-30T06:26:33Z

Closing as lazy is a separate backend, not default.

cigrainger added the note:discussion Further information is requested label Sep 6, 2021

cigrainger self-assigned this Sep 6, 2021

cigrainger mentioned this issue Feb 13, 2022

Additional backends #55

Closed

3 tasks

cigrainger mentioned this issue Mar 21, 2022

lazy by default #141

Closed

josevalim closed this as completed Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lazy by default #54

Implement lazy by default #54

cigrainger commented Sep 6, 2021

josevalim commented Sep 7, 2021

josevalim commented Apr 30, 2022

Implement lazy by default #54

Implement lazy by default #54

Comments

cigrainger commented Sep 6, 2021

josevalim commented Sep 7, 2021

josevalim commented Apr 30, 2022