Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add describe() for LazyFrame #13928

Closed
niccolopetti opened this issue Jan 23, 2024 · 5 comments · Fixed by #13982
Closed

Add describe() for LazyFrame #13928

niccolopetti opened this issue Jan 23, 2024 · 5 comments · Fixed by #13982
Assignees
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@niccolopetti
Copy link

Description

Currently describe is not implemented for lazyframes, and you have to first materialize into a Dataframe before being able to call describe , however all methods called into describe seem to be supported by Lazyframe:
polars.LazyFrame.count
polars.LazyFrame.max
polars.LazyFrame.mean
polars.LazyFrame.median
polars.LazyFrame.min
polars.LazyFrame.null_count
polars.LazyFrame.quantile
polars.LazyFrame.std
polars.LazyFrame.sum
polars.LazyFrame.var

so why don't we allow LazyFrame.describe() ?

@niccolopetti niccolopetti added the enhancement New feature or an improvement of an existing feature label Jan 23, 2024
@stinodego
Copy link
Member

describe is meant as a way to get some quick insights in your data. It could work on a LazyFrame, but then you would have to call collect() right after anyway, otherwise you're not much wiser. It just doesn't make sense as a lazy method.

Just collect first and then use describe.

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jan 23, 2024
@niccolopetti
Copy link
Author

describe is meant as a way to get some quick insights in your data. It could work on a LazyFrame, but then you would have to call collect() right after anyway, otherwise you're not much wiser. It just doesn't make sense as a lazy method.

Just collect first and then use describe.

But what if the Dataframe is so big that it wouldn't fit in memory?

@stinodego
Copy link
Member

Fair enough - that seems like a valid use case. I'll re-open to see what others have to say.

@stinodego stinodego reopened this Jan 23, 2024
@alexander-beedie
Copy link
Collaborator

Yup, I think it's worthwhile - I can see that there are plenty of cases where you don't want to materialise GBs of data into a local frame just to run describe() on it 🤔

@mcrumiller
Copy link
Contributor

mcrumiller commented Jan 23, 2024

"looks pretty lazy!" sounds like an accurate description to me, can we just return that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants