-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add $ungroup() for GroupBy and LazyGroupBy #522
Conversation
eitsupi
commented
Nov 18, 2023
•
edited
Loading
edited
- Allow GroupBy and LazyGroupBy to be ungrouped
- Fix as_polars_df.GroupBy
If Rust Polars were to add features like pola-rs/polars#12562, this workaround would no longer be necessary. |
Looks good to me from my phone. I will review tonight. |
We need to call clone on ExtPtr's if modifying the I will add the line. library(polars)
# immutability except for environment(s), this is what we are used to in R.
x = list("foo") # x is immutable
tracemem(x)
#> [1] "<0x7fb895761c90>"
class(x) = "bar" # if writing to x, a clone will defacto be made
#> tracemem[0x7fb895761c90 -> 0x7fb895826f28]: eval eval eval_with_user_handlers withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers withCallingHandlers process_file <Anonymous> <Anonymous> <Anonymous> <Anonymous> do.call saveRDS withCallingHandlers doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch
y = x # y is a shallow copy of x
class(x) = "foobar" # if write to x, a new clone is made and y is untouched
#> tracemem[0x7fb895826f28 -> 0x7fb8958cc940]: eval eval eval_with_user_handlers withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers withCallingHandlers process_file <Anonymous> <Anonymous> <Anonymous> <Anonymous> do.call saveRDS withCallingHandlers doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch
class(x)
#> [1] "foobar"
class(y)
#> [1] "bar"
# but this is not the case for ExtPtr.
gbx = pl$DataFrame(mtcars)$group_by("cyl")
tracemem(gbx) # not allowed
#> Error in tracemem(gbx): 'tracemem' is not useful for weak reference or external pointer objects
polars:::mem_address(gbx)
#> [1] "0x48c7c243780"
gby = gbx # some user makes a copy and assumes gbx is immutable
# user will call some ungroup method, all seems fine
class(gbx) = "DataFrame"
attr(gbx, "private") = NULL
attributes(gbx)
#> $class
#> [1] "DataFrame"
# and for the surprise behold!
attributes(gby) # attributes to an SEXP ExtPtr are mutable!! dfy was also ungrouped
#> $class
#> [1] "DataFrame"
# I was shocked about this, but came around to believe it is for the better.
# R cannot automatically clone the object ExtPtr is pointing to.
polars:::mem_address(gby)
#> [1] "0x48c7c243780"
# A ExtPtr is one SEXP where zero or more R variables can point to. If zero variables point to one
#ExtPtr then the next gc() will release the ExtPtr.
# The first time and only time the ExtPtr is released it will call desctructor/drop method across
# the foreign interface.
# My take on this is ExtPtr are singletons and are mutable by nature and attributes are associated
# with the SEXP not with the variable. We can disguise this fact andmake ExtPtr appear immutable.
# This is in the DNA of polars api, so it is not much effort.
# any method in the api called "mut" or "in_place" breaks this rule immutability
# ok one more time
gbx = pl$DataFrame(mtcars)$group_by("cyl")
polars:::mem_address(gbx)
#> [1] "0x48c7c242b20"
gby = gbx # some user makes a copy and assumes gbx is immutable
# clone before modifying attributes
gbx = .pr$DataFrame$clone_see_me_macro(gbx)
# this is a new ExtPtr pointing to another DataFrame on rust side
# but in the end it is all pointing to the same Arrow arrays, so cloning is still cheap.
polars:::mem_address(gbx)
#> [1] "0x48c7c242ba0"
class(gbx) = "DataFrame"
attr(gbx, "private") = NULL
attributes(gbx)
#> $class
#> [1] "DataFrame"
# gby was not affacted
polars:::mem_address(gby)
#> [1] "0x48c7c242b20"
attributes(gby)
#> $class
#> [1] "GroupBy"
#>
#> $private
#> $private$groupby_input
#> $private$groupby_input[[1]]
#> [1] "cyl"
#>
#>
#> $private$maintain_order
#> [1] FALSE Created on 2023-11-19 with reprex v2.0.2 |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to do |
git push git@github.com:eitsupi/r-polars.git
Enumerating objects: 37, done.
Counting objects: 100% (37/37), done.
Delta compression using up to 8 threads
Compressing objects: 100% (19/19), done.
Writing objects: 100% (19/19), 1.95 KiB | 46.00 KiB/s, done.
Total 19 (delta 17), reused 0 (delta 0)
remote: Resolving deltas: 100% (17/17), completed with 17 local objects.
To github.com:eitsupi/r-polars.git
! [remote rejected] ungroup2 -> ungroup2 (permission denied)
error: failed to push some refs to 'git@github.com:eitsupi/r-polars.git' |
I think branch name is not correct. |
Pls read the error. |
@eitsupi I was honestly reading the error msgs :/ I didn't know the meaning of But thank you for spelling that out
|
Thanks for the update! |