Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy by default #54

Closed
cigrainger opened this issue Sep 6, 2021 · 2 comments
Closed

Implement lazy by default #54

cigrainger opened this issue Sep 6, 2021 · 2 comments
Assignees
Labels
note:discussion Further information is requested

Comments

@cigrainger
Copy link
Member

In a functional language with immutable data, memory management is important. The current implementation utilises polars's eager mode and computes new dataframes for every function. Because the dataframes are represented as a ResourceArc, they are only dropped from memory when the GC runs. This can be pretty heavy on memory, to say the least. The most efficient approach would be to treat dataframes as lazy by default with 'peeking' for inspect. In R, for example, function arguments are only evaluated when they are needed to show output.

An additional benefit to lazy by default is the opportunity to optimise queries. Why evaluate every function call when you can build up a query that may be executed in a more efficient way all together?

Polars has polars_lazy which permits exactly this. Making this shift will then permit the use of lazy evaluation for other backends -- esp. Datafusion/Ballista and Ecto.

For Explorer, we'll need to do a bit of exploration (pun absolutely intended) for how we can achieve this while maintaining the flexibility of pluggable backends. And when looking to a pure Elixir backend we should consider whether it's unnecessarily onerous compared to the benefits.

I'd really love ideas and feedback for making Explorer lazy by default. Is there a good peeking mechanism in other libraries? For example, something I'm going to be exploring is how tibbles in R minimise computation for print and head.

@cigrainger cigrainger added the note:discussion Further information is requested label Sep 6, 2021
@cigrainger cigrainger self-assigned this Sep 6, 2021
@josevalim
Copy link
Member

The cool thing about a laziness is that the backend can actually be eager if it believes it is more efficient, although I doubt this is the case given immutability. For peeking, it should be very close to setting a limit and then executing the query, right? But it is probably important to keep it as its own callback, in case other backends may want to do optimizations (for example, maybe Ballista gets only local data when peeking?).

@josevalim
Copy link
Member

Closing as lazy is a separate backend, not default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
note:discussion Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants