-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy logic-plan from one LazyFrame to another LazyFrame? #16430
Comments
So far, two possible solution is
in this case, we can hack the json path Parquetscan.input to create lf which joins server data and clinet operation (the side-effect is redundant round-trip of parquet serialisation/deserialisation, which is worthwhile) |
There is an interesting test that was recently added: https://github.com/pola-rs/polars/blob/main/py-polars/tests/unit/lazyframe/cuda/test_node_visitor.py It hooks into the plan node iteration and replaces |
Another thing I was thinking about was if the DataFrame was to be embedded in json, could it be in Arrow IPC format instead of embedding the values as they are? |
hello @ritchie46 As you can see from the discussion in discord and above, probably it's worthwhile to introduce the serialisation/deserialisation of pure logic-plan (without data) does it make sense to you, or alternatively, do you have any concern if pr with similar feature proposed |
|
This would be handy for anyone running polars in a loop, ie. you have a ring buffer that you create a dataframe from on each iteration, and then you create a lazyframe from an already optimized logical plan (i'm not sure if the expensive part of optimization is from logical plan or physical plan optimization). But this might mitigate the greatly increased cost of the resolution of Ie. this lets you "emulate" what flink/risingwave/arroyo do by letting you kind of run polars on a streaming data source. Obviously it's not ideal, but might mitigate the cost a little if your actual computations aren't too expensive. |
Description
Is this possible to serialise/deserialise the logic plan only?
Possible use case:
Suppose that there is
a large LazyFrame on server side with great memory and compute resource (denoted as large_lf)
a small LazyFrame (denoted as small_df whose schema is indifferent with large_df) on client side with limited resource
In this case, user could implement a few actions, and send request to server side to apply those actions on large_df
Had a quick look at document and discord looks like it is not yet supported, is it possible to support it in the future (or is a PR related would be welcomed?)
The text was updated successfully, but these errors were encountered: