Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to special case ldf.sort(pl.col("a").over("b")) so that it works with streaming? #15259

Open
kszlim opened this issue Mar 24, 2024 · 3 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@kszlim
Copy link
Contributor

kszlim commented Mar 24, 2024

Description

Specifically so that:

ldf = pl.LazyFrame({"a": [1,2,3,2,1,5,6], "b": [1,1,1,1,2,2,2]})
ldf = ldf.sort(pl.col("a").over("b"))
ldf.collect(streaming=True)

Could work fully streaming.

It has better runtime and memory properties in general if feasible.

@kszlim kszlim added the enhancement New feature or an improvement of an existing feature label Mar 24, 2024
@kszlim
Copy link
Contributor Author

kszlim commented Mar 24, 2024

Actually that doesn't seem to do what I desire either.

I'm hoping to get something like:

pl.LazyFrame({"a": [1,2,2,3,1,5,6], "b": [1,1,1,1,2,2,2]}

@kszlim
Copy link
Contributor Author

kszlim commented Mar 24, 2024

Instead:

ldf.select(pl.all().sort_by("a").over("b")).collect()

Seems to do what i want, but without working in a streaming context.

@ritchie46
Copy link
Member

Not yet, we are working on the architecture to do more of the expression API streaming. But this requires some patience. There is a lot of pre-work to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants