Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually converting LazyGroupBy to LazyFrame breaks printing #338

Closed
etiennebacher opened this issue Jul 28, 2023 · 8 comments
Closed

Manually converting LazyGroupBy to LazyFrame breaks printing #338

etiennebacher opened this issue Jul 28, 2023 · 8 comments

Comments

@etiennebacher
Copy link
Collaborator

I tried to manually convert a LazyGroupBy object to a LazyFrame using class() but it generates an error when I want to print the output. Is this expected?

library(polars)
#> Warning: le package 'polars' a été compilé avec la version R 4.3.1

test1 <- pl$LazyFrame(mtcars)
test1
#> [1] "polars LazyFrame naive plan: (run ldf$describe_optimized_plan() to see the optimized plan)"
#> DF ["mpg", "cyl", "disp", "hp"]; PROJECT */11 COLUMNS; SELECTION: "None"

test2 <- test1$groupby("am")
test2
#> polars LazyGroupBy: 
#> LazyGroupBy (internals are opaque)

class(test2) <- "LazyFrame"
test2
#> [1] "polars LazyFrame naive plan: (run ldf$describe_optimized_plan() to see the optimized plan)"
#> Error in .pr$LazyFrame$print(x): expected LazyFrame

To give some context, in tidypolars if the user has a group_by() then the data is either GroupBy or LazyGroupBy. This is fine to use summarize() (which uses $agg() under the hood) but to be able to use mutate() on grouped data I need to convert GroupBy to DataFrame and LazyGroupBy to LazyFrame. This is where I got the error. If you have a better way to ungroup data, I'm all for it

@etiennebacher
Copy link
Collaborator Author

Note however that there's no problem with DataFrame:

library(polars)

test1 <- pl$DataFrame(mtcars)
test1
#> shape: (32, 11)
#> ┌──────┬─────┬───────┬───────┬───┬─────┬─────┬──────┬──────┐
#> │ mpg  ┆ cyl ┆ disp  ┆ hp    ┆ … ┆ vs  ┆ am  ┆ gear ┆ carb │
#> │ ---  ┆ --- ┆ ---   ┆ ---   ┆   ┆ --- ┆ --- ┆ ---  ┆ ---  │
#> │ f64  ┆ f64 ┆ f64   ┆ f64   ┆   ┆ f64 ┆ f64 ┆ f64  ┆ f64  │
#> ╞══════╪═════╪═══════╪═══════╪═══╪═════╪═════╪══════╪══════╡
#> │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0  ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 1.0  │
#> │ 21.4 ┆ 6.0 ┆ 258.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0  ┆ 1.0  │
#> │ …    ┆ …   ┆ …     ┆ …     ┆ … ┆ …   ┆ …   ┆ …    ┆ …    │
#> │ 15.8 ┆ 8.0 ┆ 351.0 ┆ 264.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 4.0  │
#> │ 19.7 ┆ 6.0 ┆ 145.0 ┆ 175.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 6.0  │
#> │ 15.0 ┆ 8.0 ┆ 301.0 ┆ 335.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 8.0  │
#> │ 21.4 ┆ 4.0 ┆ 121.0 ┆ 109.0 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 2.0  │
#> └──────┴─────┴───────┴───────┴───┴─────┴─────┴──────┴──────┘

test2 <- test1$groupby("am")
test2
#> shape: (32, 11)
#> ┌──────┬─────┬───────┬───────┬───┬─────┬─────┬──────┬──────┐
#> │ mpg  ┆ cyl ┆ disp  ┆ hp    ┆ … ┆ vs  ┆ am  ┆ gear ┆ carb │
#> │ ---  ┆ --- ┆ ---   ┆ ---   ┆   ┆ --- ┆ --- ┆ ---  ┆ ---  │
#> │ f64  ┆ f64 ┆ f64   ┆ f64   ┆   ┆ f64 ┆ f64 ┆ f64  ┆ f64  │
#> ╞══════╪═════╪═══════╪═══════╪═══╪═════╪═════╪══════╪══════╡
#> │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0  ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 1.0  │
#> │ 21.4 ┆ 6.0 ┆ 258.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0  ┆ 1.0  │
#> │ …    ┆ …   ┆ …     ┆ …     ┆ … ┆ …   ┆ …   ┆ …    ┆ …    │
#> │ 15.8 ┆ 8.0 ┆ 351.0 ┆ 264.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 4.0  │
#> │ 19.7 ┆ 6.0 ┆ 145.0 ┆ 175.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 6.0  │
#> │ 15.0 ┆ 8.0 ┆ 301.0 ┆ 335.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 8.0  │
#> │ 21.4 ┆ 4.0 ┆ 121.0 ┆ 109.0 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 2.0  │
#> └──────┴─────┴───────┴───────┴───┴─────┴─────┴──────┴──────┘
#> groups: [[1]]
#> [1] "am"
#> 
#> maintain order:  FALSE

class(test2) <- "DataFrame"
test2
#> shape: (32, 11)
#> ┌──────┬─────┬───────┬───────┬───┬─────┬─────┬──────┬──────┐
#> │ mpg  ┆ cyl ┆ disp  ┆ hp    ┆ … ┆ vs  ┆ am  ┆ gear ┆ carb │
#> │ ---  ┆ --- ┆ ---   ┆ ---   ┆   ┆ --- ┆ --- ┆ ---  ┆ ---  │
#> │ f64  ┆ f64 ┆ f64   ┆ f64   ┆   ┆ f64 ┆ f64 ┆ f64  ┆ f64  │
#> ╞══════╪═════╪═══════╪═══════╪═══╪═════╪═════╪══════╪══════╡
#> │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0  ┆ 4.0  │
#> │ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0  ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 1.0  │
#> │ 21.4 ┆ 6.0 ┆ 258.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0  ┆ 1.0  │
#> │ …    ┆ …   ┆ …     ┆ …     ┆ … ┆ …   ┆ …   ┆ …    ┆ …    │
#> │ 15.8 ┆ 8.0 ┆ 351.0 ┆ 264.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 4.0  │
#> │ 19.7 ┆ 6.0 ┆ 145.0 ┆ 175.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 6.0  │
#> │ 15.0 ┆ 8.0 ┆ 301.0 ┆ 335.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0  ┆ 8.0  │
#> │ 21.4 ┆ 4.0 ┆ 121.0 ┆ 109.0 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0  ┆ 2.0  │
#> └──────┴─────┴───────┴───────┴───┴─────┴─────┴──────┴──────┘

@vincentarelbundock
Copy link
Collaborator

Don't have any insight into this, but I wonder if there shouldn't be a better interface to this. Changing the class feels hackish. Perhaps a simple approach would be for pl$DataFrame() to convert the grouped object back to a standard polars data frame?

@etiennebacher
Copy link
Collaborator Author

The thing is there's no function that can be applied on a GroupBy object to ungroup it: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/groupby.html

So even a mechanism like this is implemented in r-polars it will still be a hack and not something Polars support. I also didn't find any issues mentioning "ungroup" on the rust-polars repo.

@etiennebacher
Copy link
Collaborator Author

FYI this is why Python's tidypolars doesn't have group_by() but uses a .by argument instead: markfairbanks/tidypolars#201

@eitsupi
Copy link
Collaborator

eitsupi commented Aug 3, 2023

FYI this is why Python's tidypolars doesn't have group_by() but uses a .by argument instead: markfairbanks/tidypolars#201

Perhaps it would be best to give up reproducing group_by and ungroup in R tidypolars as well?

I believe the arrow package had numerous bugs due to reproducing these. (For example apache/arrow#14175)

@etiennebacher
Copy link
Collaborator Author

FYI I think I found a solution for tidypolars: etiennebacher/tidypolars#22
Just waiting for a PR in r-polars

@sorhawell
Copy link
Collaborator

sorhawell commented Aug 4, 2023

@etiennebacher I think what you are trying will work.

For reference the rust-polars GroupBy is implemented with a lifetime chained to the DataFrame. However, extendr does not support lifetime annotated structs. A rust-polars GroupBy is just a reference to a DataFrame + some few options such maintain_order. In r-polars GroupBy is a "virtual R-side class abstraction" on top of DataFrame + some attributes on R side to remember options.

The r-polars LazyFrameGroupBy in rust-polars has no lifetime annotation and is just wrapped with extendr as is in r-polars .

I think you could get far by not using LazyFrameGroupBy and instead working with attributes as r-polars GroupBy.

I have approved your <LazyFrame>.clone()PR #347 which is needed as it is the only known way to have immutable attributes for the ExternalPtr.

@etiennebacher
Copy link
Collaborator Author

etiennebacher commented Aug 4, 2023

Thanks for approving the LazyFrame$clone() PR.

In r-polars GroupBy is a "virtual R-side class abstraction" on top of DataFrame + some attributes on R side to remember options.

I didn't notice that before, but using attributes only of the input Data/LazyFrame will make handling groups in tidypolars much much simpler and probably more robust.

It's probably not a good practice to manually change from GroupBy to DataFrame, as mentioned above, and I have another way to make this work, so I close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants